* Matplotlib
* Dictionaries & Pandas
* Logic, Control Flow and Filtering
* Loops
* Case Study: Hackers statistics

**Line plot (1)**

With matplotlib, you can create a bunch of different plots in Python. The most basic plot is the line plot. A general recipe is given here.

```Python
import matplotlib.pyplot as plt
plt.plot(x,y)
plt.show()
```

In the video, you already saw how much the world population has grown over the past years. Will it continue to do so? The world bank has estimates of the world population for the years 1950 up to 2100. The years are loaded in your workspace as a list called **year**, and the corresponding populations as a list called **pop**.

This course touches on a lot of concepts you may have forgotten, so if you ever need a quick refresher, download the Python for data science Cheat Sheet and keep it handy!

**Instructions**

* **print()** the last item from both the **year** and the **pop** list to see what the predicted population for the year 2100 is. Use two **print()** functions.
* Before you can start, you should import **matplotlib.pyplot** as **plt**. **pyplot** is a sub-package of **matplotlib**, hence the dot.
* Use **plt.plot()** to build a line plot. **year** should be mapped on the horizontal axis, **pop** on the vertical axis. Don't forget to finish off with the **plt.show()** function to actually display the plot.

In [None]:
# Print the last item from year and pop
print(year[-1])
print(pop[-1])

# Import matplotlib.pyplot as plt
from matplotlib import pyplot as plt

# Make a line plot: year on the x-axis, pop on the y-axis
plt.plot(year, pop)

# Display the plot with plt.show()
plt.show()

**Line Plot (2): Interpretation**

Have another look at the plot you created in the previous exercise; it's shown on the right. Based on the plot, in approximately what year will there be more than ten billion human beings on this planet?

**Instructions**

Possible Answers

* 【】2040

* 【X】2060

* 【】2085

* 【】2095

**Line plot (3)**<br/>

Now that you've built your first line plot, let's start working on the data that professor Hans Rosling used to build his beautiful bubble chart. It was collected in 2007. Two lists are available for you:

* **life_exp** which contains the life expectancy for each country and
* **gdp_cap**, which contains the GDP per capita (i.e. per person) for each country expressed in US Dollars.

GDP stands for Gross Domestic Product. It basically represents the size of the economy of a country. Divide this by the population and you get the GDP per capita.

**matplotlib.pyplot** is already imported as **plt**, so you can get started straight away.

**Instructions**

* Print the last item from both the list **gdp_cap**, and the list **life_exp**; it is information about Zimbabwe.
* Build a line chart, with **gdp_cap** on the x-axis, and **life_exp** on the y-axis. Does it make sense to plot this data on a line plot?
* Don't forget to finish off with a **plt.show()** command, to actually display the plot.

**Scatter Plot (1)**<br/>

When you have a time scale along the horizontal axis, the line plot is your friend. But in many other cases, when you're trying to assess if there's a correlation between two variables, for example, the scatter plot is the better choice. Below is an example of how to build a scatter plot.

```Python
import matplotlib.pyplot as plt
plt.scatter(x,y)
plt.show()
```

Let's continue with the **gdp_cap** versus **life_exp** plot, the GDP and life expectancy data for different countries in 2007. Maybe a scatter plot will be a better alternative?

Again, the **matplotlib.pyplot** package is available as **plt**.

**Instructions**

* Change the line plot that's coded in the script to a scatter plot.
* A correlation will become clear when you display the GDP per capita on a logarithmic scale. Add the line **plt.xscale('log')**.
* Finish off your script with **plt.show()** to display the plot.

In [None]:
# Change the line plot below to a scatter plot
plt.scatter(gdp_cap, life_exp)

# Put the x-axis on a logarithmic scale
plt.xscale('log')

# Show plot
plt.show()

**Scatter plot (2)**

In the previous exercise, you saw that the higher GDP usually corresponds to a higher life expectancy. In other words, there is a positive correlation.

Do you think there's a relationship between population and life expectancy of a country? The **list life_exp** from the previous exercise is already available. In addition, now also **pop** is available, listing the corresponding populations for the countries in 2007. The populations are in millions of people.

**Instructions**

* Start from scratch: import **matplotlib.pyplot** as **plt**.
* Build a scatter plot, where **pop** is mapped on the horizontal axis, and **life_exp** is mapped on the vertical axis.
* Finish the script with **plt.show()** to actually display the plot. Do you see a correlation?

In [None]:
# Import package
from matplotlib import pyplot as plt

# Build Scatter plot
plt.scatter(pop, life_exp)

# Show plot
plt.show()

**Build a histogram (1)**<br/>

**life_exp**, the list containing data on the life expectancy for different countries in 2007, is available in your Python shell.

To see how life expectancy in different countries is distributed, let's create a histogram of **life_exp**.

**matplotlib.pyplot** is already available as **plt**.

**Instructions**

* Use **plt.hist()** to create a histogram of the values in **life_exp**. Do not specify the number of bins; Python will set the number of bins to 10 by default for you.
* Add **plt.show()** to actually display the histogram. Can you tell which bin contains the most observations?

In [None]:
# Create histogram of life_exp data
plt.hist(life_exp)

# Display histogram
plt.show()

**Build a histogram (2): bins**
    
In the previous exercise, you didn't specify the number of bins. By default, Python sets the number of bins to 10 in that case. The number of bins is pretty important. Too few bins will oversimplify reality and won't show you the details. Too many bins will overcomplicate reality and won't show the bigger picture.

To control the number of bins to divide your data in, you can set the **bins** argument.

That's exactly what you'll do in this exercise. You'll be making two plots here. The code in the script already includes **plt.show()** and **plt.clf()** calls; **plt.show()** displays a plot; **plt.clf()** cleans it up again so you can start afresh.

As before, life_exp is available and **matplotlib.pyplot** is imported as **plt**.

**Instructions**

* Build a histogram of **life_exp**, with 5 bins. Can you tell which bin contains the most observations?
* Build another histogram of **life_exp**, this time with 20 bins. Is this better?

In [None]:
# Build histogram with 5 bins
plt.hist(life_exp, bins = 5)

# Show and clean up plot
plt.show()
plt.clf()

# Build histogram with 20 bins
plt.hist(life_exp, 20)

# Show and clean up again
plt.show()
plt.clf()

**Build a histogram (3): compare**
    
In the video, you saw population pyramids for the present day and for the future. Because we were using a histogram, it was very easy to make a comparison.

Let's do a similar comparison. life_exp contains life expectancy data for different countries in 2007. You also have access to a second list now, life_exp1950, containing similar data for 1950. Can you make a histogram for both datasets?

You'll again be making two plots. The plt.show() and plt.clf() commands to render everything nicely are already included. Also matplotlib.pyplot is imported for you, as plt.

Instructions
100 XP
Build a histogram of life_exp with 15 bins.
Build a histogram of life_exp1950, also with 15 bins. Is there a big difference with the histogram for the 2007 data?