# Interpreting scatter plots

Scatter plots let you explore the relationship between two continuous variables.

Here you can see a scatter plot of average life expectancy (on the y-axis) versus average length of schooling (on the x-axis) for countries around the world. Each point in the plot represents one country. A straight trend line from a linear regression model is shown.

<center><img src="images/02.02.png"  style="width: 400px, height: 300px;"/></center>


<center><img src="images/02.022.jpg"  style="width: 400px, height: 300px;"/></center>
<center><img src="images/02.023.jpg"  style="width: 400px, height: 300px;"/></center>


# Trends with scatter plots

Adding trend lines to a scatter plot can make it easier to articulate the relationship between the two variables.

Here you can see the life expectancy for each country again, this time plotted against the Gross National Income (GNI) per capita (a measure of how rich the country is). You have a choice between linear and logarithmic scales on the x-axis, and can add linear or curved trend lines.

Which statement best describes the trend?

<center><img src="images/02.03.jpg"  style="width: 400px, height: 300px;"/></center>


- Life expectancy increases linearly with the logarithm of GNI when GNI is between $1k and $50k.

# Interpreting line plots

Line plots are excellent for comparing two continuous variables, where consecutive observations are connected somehow. A common type of line plot is to have dates or times on the x-axis, and a numeric quantity on the y-axis. In this case, "consecutive observations" means values on successive dates, like today and tomorrow. By drawing multiple lines on the same plot, you can compare values.

The following line plot shows the percentage of households in the United States that adopted each of four technologies (automobiles, refrigerators, stoves, and vacuums) from 1930 to 1970.

<center><img src="images/02.05.png"  style="width: 400px, height: 300px;"/></center>


<center><img src="images/02.052.jpg"  style="width: 400px, height: 300px;"/></center>


# Logarithmic scales for line plots

If you have a dataset where the values span several orders of magnitude, it can be easier to view them on a logarithmic scale.

A subset of the COVID-19 coronavirus data is shown in the line plot. You saw in the video that most of the cases in early 2020 occurred in mainland China. You might wonder what is happening in the rest of the world. Here, the six countries with the most number of confirmed cases outside of mainland China are shown.

On the linear scale, notice that moving up one grid line in the plot adds 20000. On the logarithmic scale, moving up one grid line in the plot multiplies by 4.

Considering the six countries on the plot, which statement is true?

<center><img src="images/02.061.jpg"  style="width: 400px, height: 300px;"/></center>
<center><img src="images/02.062.jpg"  style="width: 400px, height: 300px;"/></center>


- On Feb 17, Germany had more cumulative confirmed cases of COVID-19 than France.

# Line plots without dates on the x-axis

Although dates and times are the most common type of variable for the x-axis in line plot, other types of variable are possible.

In the video, you saw data on the ages of juvenile offenders in Switzerland. That data was presented with time on the x-axis and one line for each age. Since that plot wasn't very satisfactory, we'll try again. This time, age is on the x-axis and there is one line for each year. In the plot you can see two separate clusters of lines representing different age profiles for the offenders.

Which year did the change in age profile of juvenile offenders take place?

<center><img src="images/02.07.jpg"  style="width: 400px, height: 300px;"/></center>


- 2011

# Interpreting bar plots

Bar plots are a great way to see counts of each category in a categorical variable.

The ESPN Top 100 famous athletes dataset has two categorical variables: country and sport.

Explore the plots and determine which statement is false.

<center><img src="images/02.09.png"  style="width: 400px, height: 300px;"/></center>


- Soccer players from the USA had more famous athletes than any other country/sport combination.

# Interpreting stacked bar plots

If you care about percentages rather than counts, then stacked bar plots are often a good choice of plot.

The dataset for this exercise relates to another question from the Health Survey for England. Adults aged 65 or more were asked how many "activities of daily living" (day-to-day tasks) they needed assistance with.

Type show_plot in the DataCamp console and press ENTER to see the plot. It's interactive – hover your mouse over the bars to see the percentage for that block.

Which statement is true?

<center><img src="images/02.10.jpg"  style="width: 400px, height: 300px;"/></center>


- The group with the largest percentage of people needing no assistance was men aged 70-74.

# Interpreting dot plots

Dot plots are similar to bar plots in that they show a numeric metric for each category of a categorical variable. They have two advantages over bar plots: you can use a log scale for the metric, and you can display more than one metric per category.

Here is a dot plot of the social media followings of the ESPN 2017 top 100 famous athletes, with one row per athlete. Three metrics are shown for each athlete: the number of followers on Facebook, Instagram, and Twitter. Only the athletes for Basketball, Cricket, Soccer, and Tennis who had accounts on each platform are shown. Rows are sorted alphabetically for each sport.

<center><img src="images/02.12.png"  style="width: 400px, height: 300px;"/></center>


- Soccer: Christiano Ronaldo has more Twitter followers than Marcelo Viera.

# Sorting dot plots

As with box plots and bar plots, how you order the rows in a dot plot affects the kinds of questions that are easy to answer.

Here you can see the Big Mac Index: the price of a McDonalds Big Mac in various countries around the world (in Jan 2020). The "Actual price" is the price converted to US dollars. The "GDP adjusted price" has an additional correction for the gross domestic product of a country. Roughly, if people earn less in a country, it will cost more using the adjusted price.

By default, the rows in the dot plot are ordered alphabetically. This makes it really easy to look up the price for a specific country, but difficult to answer question about where the most expensive or least expensive Big Macs can be found. By sorting the rows by price, those questions are easier to answer.

Which statement is true?

<center><img src="images/02.13.png"  style="width: 400px, height: 300px;"/></center>


- Two countries have Big Macs that cost over 100 USD after adjusting for GDP.