# 1. Data to be used

All data is from the CIA World Factbook (https://www.cia.gov/library/publications/resources/the-world-factbook/)

1. 'emissions' reports millions of megatons of carbon dioxide emitted nationally from consumption of energy
2. 'urban' is the percent of total population living in urban areas
3. 'gdp' is the gross domestic product per capita in US dollars

Links to the tables of data:

1. emissions = https://www.cia.gov/library/publications/resources/the-world-factbook/fields/274.html
2. urban = https://www.cia.gov/library/publications/resources/the-world-factbook/fields/349.html
3. gdp = https://www.cia.gov/library/publications/resources/the-world-factbook/fields/211.html

# 2. Reading the data

In [7]:
#creating a dataframe of CO2 emmissions
import pandas as pd
link1="https://www.cia.gov/library/publications/resources/the-world-factbook/fields/274.html"
emissions=pd.read_html(link1,header=0,flavor='bs4',attrs={'id': 'fieldListing'})[0]
emissions.head()

Unnamed: 0,Country,Carbon dioxide emissions from consumption of energy
0,Afghanistan,9.067 million Mt (2017 est.)
1,Albania,4.5 million Mt (2017 est.)
2,Algeria,135.9 million Mt (2017 est.)
3,American Samoa,"361,100 Mt (2017 est.)"
4,Angola,20.95 million Mt (2017 est.)


In [8]:
#creating a dataframe of percent urbanization
link2="https://www.cia.gov/library/publications/resources/the-world-factbook/fields/349.html"
urban=pd.read_html(link2,header=0,flavor='bs4',attrs={'id': 'fieldListing'})[0]
urban.head()

Unnamed: 0,Country,Urbanization
0,Afghanistan,urban population: 25.5% of total population ...
1,Albania,urban population: 60.3% of total population ...
2,Algeria,urban population: 72.6% of total population ...
3,American Samoa,urban population: 87.2% of total population ...
4,Andorra,urban population: 88.1% of total population ...


In [9]:
#creating a dataframe of GDP per capita
link3="https://www.cia.gov/library/publications/resources/the-world-factbook/fields/211.html"
gdp=pd.read_html(link3,header=0,flavor='bs4',attrs={'id': 'fieldListing'})[0]
gdp.shape

(232, 2)

# 3. Merging data sets

In [10]:
#1st merge. Confirm that country data is lining up. 
join1=pd.merge(emissions,urban,left_on='Country',right_on='Country')
join1.head()

Unnamed: 0,Country,Carbon dioxide emissions from consumption of energy,Urbanization
0,Afghanistan,9.067 million Mt (2017 est.),urban population: 25.5% of total population ...
1,Albania,4.5 million Mt (2017 est.),urban population: 60.3% of total population ...
2,Algeria,135.9 million Mt (2017 est.),urban population: 72.6% of total population ...
3,American Samoa,"361,100 Mt (2017 est.)",urban population: 87.2% of total population ...
4,Angola,20.95 million Mt (2017 est.),urban population: 65.5% of total population ...


In [11]:
#18 countries that didn't show up in both dataframes being merged were dropped.
join1.shape

(214, 3)

In [12]:
#2nd merge. Confirm that country data is lining up. 
data=pd.merge(join1,gdp,on='Country')
data.head()

Unnamed: 0,Country,Carbon dioxide emissions from consumption of energy,Urbanization,GDP - per capita (PPP)
0,Afghanistan,9.067 million Mt (2017 est.),urban population: 25.5% of total population ...,"$2,000 (2017 est.) $2,000 (2016 est.) $2,0..."
1,Albania,4.5 million Mt (2017 est.),urban population: 60.3% of total population ...,"$12,500 (2017 est.) $12,100 (2016 est.) $1..."
2,Algeria,135.9 million Mt (2017 est.),urban population: 72.6% of total population ...,"$15,200 (2017 est.) $15,200 (2016 est.) $1..."
3,American Samoa,"361,100 Mt (2017 est.)",urban population: 87.2% of total population ...,"$11,200 (2016 est.) $11,300 (2015 est.) $1..."
4,Angola,20.95 million Mt (2017 est.),urban population: 65.5% of total population ...,"$6,800 (2017 est.) $7,200 (2016 est.) $7,6..."


In [13]:
data.shape

(214, 4)

# 4. Renaming columns

In [14]:
data.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 214 entries, 0 to 213
Data columns (total 4 columns):
Country                                                214 non-null object
Carbon dioxide emissions from consumption of energy    214 non-null object
Urbanization                                           214 non-null object
GDP - per capita (PPP)                                 214 non-null object
dtypes: object(4)
memory usage: 8.4+ KB


In [15]:
data.columns

Index(['Country', 'Carbon dioxide emissions from consumption of energy',
       'Urbanization', 'GDP - per capita (PPP)'],
      dtype='object')

In [16]:
newNames=['Country','CO2 Emissions','Urbanization','GDP Per Capita']

In [17]:
nameChanges={old:new for old,new in zip(data.columns,newNames)}

In [18]:
data.rename(nameChanges,axis=1,inplace=True)

In [19]:
data.head()

Unnamed: 0,Country,CO2 Emissions,Urbanization,GDP Per Capita
0,Afghanistan,9.067 million Mt (2017 est.),urban population: 25.5% of total population ...,"$2,000 (2017 est.) $2,000 (2016 est.) $2,0..."
1,Albania,4.5 million Mt (2017 est.),urban population: 60.3% of total population ...,"$12,500 (2017 est.) $12,100 (2016 est.) $1..."
2,Algeria,135.9 million Mt (2017 est.),urban population: 72.6% of total population ...,"$15,200 (2017 est.) $15,200 (2016 est.) $1..."
3,American Samoa,"361,100 Mt (2017 est.)",urban population: 87.2% of total population ...,"$11,200 (2016 est.) $11,300 (2015 est.) $1..."
4,Angola,20.95 million Mt (2017 est.),urban population: 65.5% of total population ...,"$6,800 (2017 est.) $7,200 (2016 est.) $7,6..."


In [29]:
data.dtypes

Country           object
CO2 Emissions     object
Urbanization      object
GDP Per Capita    object
dtype: object

# 5. Simplifying contents

In [35]:
#Saving every first element for each element in the column:
emissionsnumber=[element.split(',')[0] for element in data.iloc[:,1]]

#Making the above list a new column:
data=data.assign(CO2EmissionsNumber=emissionsnumber)

#Checking:
data.head()

Unnamed: 0,Country,CO2 Emissions,Urbanization,GDP Per Capita,CO2EmissionsNumber
0,Afghanistan,9.067 million Mt (2017 est.),urban population: 25.5% of total population ...,"$2,000 (2017 est.) $2,000 (2016 est.) $2,0...",9.067 million Mt (2017 est.)
1,Albania,4.5 million Mt (2017 est.),urban population: 60.3% of total population ...,"$12,500 (2017 est.) $12,100 (2016 est.) $1...",4.5 million Mt (2017 est.)
2,Algeria,135.9 million Mt (2017 est.),urban population: 72.6% of total population ...,"$15,200 (2017 est.) $15,200 (2016 est.) $1...",135.9 million Mt (2017 est.)
3,American Samoa,"361,100 Mt (2017 est.)",urban population: 87.2% of total population ...,"$11,200 (2016 est.) $11,300 (2015 est.) $1...",361
4,Angola,20.95 million Mt (2017 est.),urban population: 65.5% of total population ...,"$6,800 (2017 est.) $7,200 (2016 est.) $7,6...",20.95 million Mt (2017 est.)
