# Python for Researchers
[www.pythonforresearchers.com](http://pythonforresearchers.com)

[@py4res](https://twitter.com/py4res)

## Data handling and visualisation mini series

1. Getting data direct from the web
2. Plotting data (using Matplotlib)
3. Making your plot look awesome
4. Making your plot interactive (Plotly)
5. Cool Jupyter Notebook functionality

## You are going to make this!

<img src="world_population.png">

## Part 1: Getting data direct from the web

In [1]:
# import pandas
import pandas as pd
URL = "https://en.wikipedia.org/wiki/World_population"

In [4]:
# Read data into pandas datafrom direct from the web
df = pd.read_html(URL, header=0, index_col=0)[8]
df

Unnamed: 0_level_0,1500,1600,1700,1750,1800,1850,1900,1950,1999,2008,2010,2012,2050,2150
Region,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1
World,585,660,710,791,978,1262,1650,2521,6008,6707,6896,7052,9725,9746
Africa,86,114,106,106,107,111,133,221,783,973,1022,1052,2478,2308
Asia,282,350,411,502,635,809,947,1402,3700,4054,4164,4250,5267,5561
Europe,168,170,178,190,203,276,408,547,675,732,738,740,734,517
Latin America[Note 1],40,20,10,16,24,38,74,167,508,577,590,603,784,912
North America[Note 1],6,3,2,2,7,26,82,172,312,337,345,351,433,398
Oceania,3,3,3,2,2,2,6,13,30,34,37,38,57,51


In [10]:
# Filter the columns - historical data only
df_filtered = df[df.columns[0:12]]
df_filtered

Unnamed: 0_level_0,1500,1600,1700,1750,1800,1850,1900,1950,1999,2008,2010,2012
Region,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1
World,585,660,710,791,978,1262,1650,2521,6008,6707,6896,7052
Africa,86,114,106,106,107,111,133,221,783,973,1022,1052
Asia,282,350,411,502,635,809,947,1402,3700,4054,4164,4250
Europe,168,170,178,190,203,276,408,547,675,732,738,740
Latin America[Note 1],40,20,10,16,24,38,74,167,508,577,590,603
North America[Note 1],6,3,2,2,7,26,82,172,312,337,345,351
Oceania,3,3,3,2,2,2,6,13,30,34,37,38


In [12]:
# Tidy up index
index = df_filtered.index.values
df_filtered.index = [x.replace("[Note 1]", "") for x in index]
df_filtered

Unnamed: 0,1500,1600,1700,1750,1800,1850,1900,1950,1999,2008,2010,2012
World,585,660,710,791,978,1262,1650,2521,6008,6707,6896,7052
Africa,86,114,106,106,107,111,133,221,783,973,1022,1052
Asia,282,350,411,502,635,809,947,1402,3700,4054,4164,4250
Europe,168,170,178,190,203,276,408,547,675,732,738,740
Latin America,40,20,10,16,24,38,74,167,508,577,590,603
North America,6,3,2,2,7,26,82,172,312,337,345,351
Oceania,3,3,3,2,2,2,6,13,30,34,37,38


In [15]:
# Sort table according to 2012 population
df_filtered.sort_values(by=["2012"], ascending=False)

Unnamed: 0,1500,1600,1700,1750,1800,1850,1900,1950,1999,2008,2010,2012
World,585,660,710,791,978,1262,1650,2521,6008,6707,6896,7052
Asia,282,350,411,502,635,809,947,1402,3700,4054,4164,4250
Africa,86,114,106,106,107,111,133,221,783,973,1022,1052
Europe,168,170,178,190,203,276,408,547,675,732,738,740
Latin America,40,20,10,16,24,38,74,167,508,577,590,603
North America,6,3,2,2,7,26,82,172,312,337,345,351
Oceania,3,3,3,2,2,2,6,13,30,34,37,38


In [16]:
# Save table to excel spreadsheet
df_filtered.to_excel("world_population.xlsx")