# World Population Statistics
---

## Purpose of Study
Gather Countrywise details of the Current World Population and Estimate Global Population by 2050.

## Data Objectives

* Connect to External Resource for Live Population Data
* Calculate Population Growth Rate
* Rank Countries on Population Size, Area and Population Density
* Countries with Declining Population Growth
* Top 10 Countries with Highest Population Density
* Group the Final Dataset by Continents
* Export the Final Outcomes to an Excel File with Date Stamp

### Import the Libraries

In [8]:
import pandas as pd
import matplotlib.pyplot as plt
from population import live_data,growth_rate,population_density
import math
from datetime import datetime
import numpy as np

### Read the Countries Data from the CSV File
The file contains information on Square Kilometer Area of Each Country

In [30]:
df1 = pd.read_csv("countries.csv")

In [31]:
df1.style.format('{:,}')

ValueError: Cannot specify ',' with 's'.

<pandas.io.formats.style.Styler at 0x115e4e128>

In [29]:
df1.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 236 entries, 0 to 235
Data columns (total 4 columns):
 #   Column      Non-Null Count  Dtype 
---  ------      --------------  ----- 
 0   Unnamed: 0  236 non-null    int64 
 1   Country     236 non-null    object
 2   Area        236 non-null    object
 3   Continent   236 non-null    object
dtypes: int64(1), object(3)
memory usage: 7.5+ KB


In [32]:
df1.pivot_table(index="Continent",values="Area",aggfunc="sum",margins=True,margins_name="Total")

Unnamed: 0_level_0,Area
Continent,Unnamed: 1_level_1
Africa,29773986.0
Antarctica,14000000.0
Asia,47253650.2
Australia,8488322.6
Europe,5700169.2
North America,23087646.7
South America,17422511.0
Total,145726285.7


In [3]:
df1.head()

Unnamed: 0.1,Unnamed: 0,Country,Area,Continent
0,0,Antarctica,14000000.0,Antarctica
1,1,China,9326410.0,Asia
2,2,United States,9147593.0,North America
3,3,Canada,9093507.0,North America
4,4,Brazil,8358140.0,South America


### Read Live Population Data from a Website
Source: www.worldpopulationreview.com

In [36]:
df2 = pd.read_html(live_data('https://worldpopulationreview.com/countries/median-age').text)

#df2 = pd.read_html(live_data('https://worldpopulationreview.com/countries/#popTable').text)
df2 = df2[0]

In [37]:
#df2.head()
df2.shape

(214, 4)

In [38]:
rename = {"Place":"Country,Other",
         "Median":"Median Age"}
df2.rename(columns=rename,inplace=True)

In [39]:
df2.drop(["Median Male","Median Female"],axis=1,inplace=True)

In [40]:
df2["Median Age"] = df2["Median Age"].apply(lambda x:float(x.strip(' years')))

In [41]:
df2.to_csv("median_age.csv",index=False)

### Merge the Two DataFrames
Merge the Two DataFrames using the common Column Country. Leave out uncommon entries in both dataframes

In [6]:
df = pd.merge(df1,df2,how="inner",on="Country")

In [7]:
df.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 225 entries, 0 to 224
Data columns (total 10 columns):
Unnamed: 0         225 non-null int64
Country            225 non-null object
Area               225 non-null float64
Continent          225 non-null object
Rank               225 non-null int64
2020 Population    225 non-null int64
2019 Population    225 non-null int64
Growth Rate        225 non-null object
Area (kmÂ²)        225 non-null float64
2018 Density       225 non-null object
dtypes: float64(2), int64(4), object(4)
memory usage: 19.3+ KB


In [8]:
df.head()

Unnamed: 0.1,Unnamed: 0,Country,Area,Continent,Rank,2020 Population,2019 Population,Growth Rate,Area (kmÂ²),2018 Density
0,1,China,9326410.0,Asia,1,1439323776,1433783686,0.39%,9706961.0,148/kmÂ²
1,2,United States,9147593.0,North America,3,331002651,329064917,0.59%,9372610.0,35/kmÂ²
2,3,Canada,9093507.0,North America,39,37742154,37411047,0.89%,9984670.0,4/kmÂ²
3,4,Brazil,8358140.0,South America,6,212559417,211049527,0.72%,8515767.0,25/kmÂ²
4,5,Australia,7682300.0,Australia,55,25499884,25203198,1.18%,7692024.0,3/kmÂ²


#### Findings

* There are two columns providing information on Area. When randomly verifying the Area column, it is found to have innaccurate information.

---
### Drop Duplicate Column
Drop Column "Area"

In [9]:
df.drop("Area",axis=1,inplace=True)

In [10]:
df.head()

Unnamed: 0.1,Unnamed: 0,Country,Continent,Rank,2020 Population,2019 Population,Growth Rate,Area (kmÂ²),2018 Density
0,1,China,Asia,1,1439323776,1433783686,0.39%,9706961.0,148/kmÂ²
1,2,United States,North America,3,331002651,329064917,0.59%,9372610.0,35/kmÂ²
2,3,Canada,North America,39,37742154,37411047,0.89%,9984670.0,4/kmÂ²
3,4,Brazil,South America,6,212559417,211049527,0.72%,8515767.0,25/kmÂ²
4,5,Australia,Australia,55,25499884,25203198,1.18%,7692024.0,3/kmÂ²


---
### Convert Population and Area Figures to Millions

In [11]:
columns = ['2020 Population','2019 Population','Area (kmÂ²)']

for col in columns:
    df[col] = df[col].apply(lambda x: round(x/1000000,2))

In [12]:
df.head()

Unnamed: 0.1,Unnamed: 0,Country,Continent,Rank,2020 Population,2019 Population,Growth Rate,Area (kmÂ²),2018 Density
0,1,China,Asia,1,1439.32,1433.78,0.39%,9.71,148/kmÂ²
1,2,United States,North America,3,331.0,329.06,0.59%,9.37,35/kmÂ²
2,3,Canada,North America,39,37.74,37.41,0.89%,9.98,4/kmÂ²
3,4,Brazil,South America,6,212.56,211.05,0.72%,8.52,25/kmÂ²
4,5,Australia,Australia,55,25.5,25.2,1.18%,7.69,3/kmÂ²


---
### Share of World Population
Calculate Each Country Share of World Population

In [13]:
world_population = df["2020 Population"].sum()
f'{world_population/1000} Billion is the current world population'

'7.79426 Billion is the current world population'

In [14]:
df["Population Share %"] = df["2020 Population"].apply(lambda x: round((x/world_population)*100,1))

In [15]:
df.head()

Unnamed: 0.1,Unnamed: 0,Country,Continent,Rank,2020 Population,2019 Population,Growth Rate,Area (kmÂ²),2018 Density,Population Share %
0,1,China,Asia,1,1439.32,1433.78,0.39%,9.71,148/kmÂ²,18.5
1,2,United States,North America,3,331.0,329.06,0.59%,9.37,35/kmÂ²,4.2
2,3,Canada,North America,39,37.74,37.41,0.89%,9.98,4/kmÂ²,0.5
3,4,Brazil,South America,6,212.56,211.05,0.72%,8.52,25/kmÂ²,2.7
4,5,Australia,Australia,55,25.5,25.2,1.18%,7.69,3/kmÂ²,0.3


---
### Ranks Countries by Population Size, Area and Population Density
Countries are already ranked by Population. The population density by sq.km is not a numeric column.

* Rename **Rank** Column to Rank (Population)
* Create a New Column for Rank (Area)
* Create a Column for Population Density
* Create a New Column for Rank (Density)

In [16]:
df.rename(columns={'Rank':"Rank (Population)"},inplace=True)

In [17]:
df["Rank (Area)"] = df['Area (kmÂ²)'].rank(method='max',ascending=False)

In [18]:
df["Density (kmÂ²)"] = df.apply(lambda x: population_density(x["2020 Population"],x["Area (kmÂ²)"]),axis=1)

In [19]:
df["Rank (Density)"] = df['Density (kmÂ²)'].rank(method='max',ascending=False)

In [20]:
df.drop("2018 Density",axis=1,inplace=True)

In [21]:
df.head()

Unnamed: 0.1,Unnamed: 0,Country,Continent,Rank (Population),2020 Population,2019 Population,Growth Rate,Area (kmÂ²),Population Share %,Rank (Area),Density (kmÂ²),Rank (Density)
0,1,China,Asia,1,1439.32,1433.78,0.39%,9.71,18.5,3.0,148,35.0
1,2,United States,North America,3,331.0,329.06,0.59%,9.37,4.2,4.0,35,118.0
2,3,Canada,North America,39,37.74,37.41,0.89%,9.98,0.5,2.0,4,165.0
3,4,Brazil,South America,6,212.56,211.05,0.72%,8.52,2.7,5.0,25,131.0
4,5,Australia,Australia,55,25.5,25.2,1.18%,7.69,0.3,6.0,3,168.0


---
### Countries with Declining Population

In [22]:
df["Growth Rate"] = df.apply(lambda x: growth_rate(x["2019 Population"],x["2020 Population"]),axis=1)

In [23]:
declining_population = df[df["Growth Rate"] < 0].sort_values("Growth Rate")
declining_population[["Country","Growth Rate"]].head(10)

Unnamed: 0,Country,Growth Rate
167,Puerto Rico,-2.45
122,Lithuania,-1.47
123,Latvia,-1.06
101,Bulgaria,-0.72
80,Romania,-0.62
126,Bosnia and Herzegovina,-0.61
44,Ukraine,-0.59
124,Croatia,-0.49
94,Greece,-0.48
163,Lebanon,-0.44


In [24]:
continent = declining_population.groupby(by="Continent").size().sort_values(ascending=False)

In [25]:
f'''{continent.index[0]}' is the leading continent with most countries ({continent[0]}) having a declining population'''

"Europe' is the leading continent with most countries (14) having a declining population"

### Top 10 Densely Populated Countries

In [26]:
dense_population = df[df["Rank (Density)"] <= 10].sort_values("Rank (Density)")
dense_population[["Country","Continent","Density (kmÂ²)"]]

Unnamed: 0,Country,Continent,Density (kmÂ²)
95,Bangladesh,Asia,1098
163,Lebanon,Asia,683
136,Taiwan,Asia,596
105,South Korea,Asia,513
168,Palestine,Asia,510
151,Israel,Asia,433
147,Rwanda,Africa,432
134,Netherlands,Europe,428
5,India,Asia,419
145,Burundi,Africa,396


---
## Estimate World Population in 2050

We calculate population growth by looking at the change in population over time.
Below is the formula for calculating population growth.<br/>
<img src="https://d2jmvrsizmvf4x.cloudfront.net/XL08Gk3TQA2RxAtEWDLv_WorldPopOne550x474JPG.jpg" width="40%" height="40%">

In [27]:
world_population_2019 = df["2019 Population"].sum()
f'{world_population_2019/1000} Billion was the total world population in the Previous Year'

'7.71292 Billion was the total world population in the Previous Year'

In [28]:
population_growth_rate = (1 - (world_population_2019/world_population)) * 100
f'{round(population_growth_rate,2)}% is the current population growth rate'

'1.04% is the current population growth rate'

In [29]:
population_estimate_2050 = round(world_population * (math.exp ((population_growth_rate/100) * 30)))
f'{population_estimate_2050/1000} Billion is the Estimated Population by 2050'  

'10.66 Billion is the Estimated Population by 2050'

### Grouped by Continent

In [30]:
continent = df.groupby(by='Continent')['2020 Population', '2019 Population', 'Area (kmÂ²)'].sum().reset_index()
continent["Growth Rate"] = continent.apply(lambda x: growth_rate(x["2019 Population"],x["2020 Population"]),axis=1)
continent["Density (kmÂ²)"] = continent.apply(lambda x: population_density(x["2020 Population"],x["Area (kmÂ²)"]),axis=1)
continent["Rank (Population)"] = continent['2020 Population'].rank(method='max',ascending=False)
continent["Rank (Area)"] = continent['Area (kmÂ²)'].rank(method='max',ascending=False)
continent["Rank (Density)"] = continent['Density (kmÂ²)'].rank(method='max',ascending=False)
continent.head()

Unnamed: 0,Continent,2020 Population,2019 Population,Area (kmÂ²),Growth Rate,Density (kmÂ²),Rank (Population),Rank (Area),Rank (Density)
0,Africa,1340.59,1308.09,30.32,2.42,44,2.0,2.0,3.0
1,Asia,4787.0,4747.26,49.26,0.83,97,1.0,1.0,2.0
2,Australia,42.7,42.12,8.5,1.36,5,6.0,5.0,6.0
3,Europe,601.53,601.14,5.88,0.06,102,3.0,6.0,1.0
4,North America,591.68,587.13,24.23,0.77,24,4.0,3.0,5.0


---
### Data Export to Excel

In [31]:
today = datetime.now().date() 

with pd.ExcelWriter(f'world_population_{today}.xlsx') as writer:  
    df.to_excel(writer, sheet_name='World_Population')
    declining_population.to_excel(writer, sheet_name='Declining Population')
    dense_population.to_excel(writer, sheet_name='Top 10 - Population Density')
    continent.to_excel(writer, sheet_name='Data by Continent')

In [67]:
c = df.groupby(by=["Continent","Country"],group_keys=True)["2020 Population"].sum()

In [72]:
df[["Continent","Country","2020 Population"]].sort_values(['Continent','2020 Population'],ascending=False).groupby('Continent').head(2)

Unnamed: 0,Continent,Country,2020 Population
3,South America,Brazil,212.56
25,South America,Colombia,50.88
1,North America,United States,331.0
12,North America,Mexico,128.93
61,Europe,Germany,83.78
78,Europe,United Kingdom,67.89
4,Australia,Australia,25.5
53,Australia,Papua New Guinea,8.95
0,Asia,China,1439.32
5,Asia,India,1380.0


## Clustered Bar Chart

In [None]:
import matplotlib.pyplot as plt
import matplotlib.style as mplstyle

plt.figure(figsize=(8,3)) 
plt.suptitle('Top 5 Countries by Population', fontsize=12)

with mplstyle.context('classic'):
    ax = plt.subplot()
    
    ax.barh(df.Countries,df.Population,
           label="Population (in millions)",
            height = .6,
          color="c",
           capstyle="round")
    
    ax.legend(loc="best")