# Climate Change
### Introduction

    We were assigned the Climate Change group. After a bit of discussion, 
    we picked 6 original questions with the expectation that we would need 
    to remove some of them.
    
    These are the questions that we picked:
        1. How much does forest area impact C02 emissions?
        2. How does population growth align with C02 emissions?
        3. Does C02 emissons impact mortality rate?
        4. Does renewable energy use impact freshwater withdrawals?
        5. Does population density or growth impact freshwater withdrawals?
        6. Is there a negative correlation between C02 emissions and renewable energy use.
    
    
    These are the datasets: 
        C02 Emissions (kt)
        https://data.worldbank.org/indicator/EN.ATM.CO2E.KT?view=chart
        
        Annual freshwater withdrawals (internal resources)
        https://data.worldbank.org/indicator/ER.H2O.FWTL.ZS?view=chart
        
        Mortality rate, under-5 (per 1000 live births)
        https://data.worldbank.org/indicator/SH.DYN.MORT?view=chart
    
        Renewable energy consumption (% of total final energy consumption)
        https://data.worldbank.org/indicator/EG.FEC.RNEW.ZS?view=chart
        
        Population (total) - might need to try a different version of this dataset
        https://data.worldbank.org/indicator/SP.POP.TOTL?view=chart
        
        Forest area (% of land area)
        https://data.worldbank.org/indicator/AG.LND.FRST.ZS?view=chart

### Extraction
    
    We downloaded the csv files and then unzipped them from their folders, 
    and renamed and reorganized the files.
    
    We then used pandas to import the data into dataframes. 
     - We had to skip the first 3 rows in each dataset because they were not relevant.
    

#### Extraction Step 1 (Understand the dataset structure)

In [25]:
import pandas as pd
import os

files = [_ for _ in os.listdir('datasets/') if ".csv" in _]

for file in files:
    df = pd.read_csv(f'datasets/{file}', skiprows = 3)
    print(f'Dataset: {file}')
    print('Columns: ' + ", ".join(df.columns))
    print('Indicators: ' + ", ".join(df['Indicator Name'].unique()))
    print('Countries: ' + str(len(df['Country Name'].unique())))
    print("")

Dataset: co2.csv
Columns: Country Name, Country Code, Indicator Name, Indicator Code, 1960, 1961, 1962, 1963, 1964, 1965, 1966, 1967, 1968, 1969, 1970, 1971, 1972, 1973, 1974, 1975, 1976, 1977, 1978, 1979, 1980, 1981, 1982, 1983, 1984, 1985, 1986, 1987, 1988, 1989, 1990, 1991, 1992, 1993, 1994, 1995, 1996, 1997, 1998, 1999, 2000, 2001, 2002, 2003, 2004, 2005, 2006, 2007, 2008, 2009, 2010, 2011, 2012, 2013, 2014, 2015, 2016, 2017, 2018, 2019, 2020, Unnamed: 65
Indicators: CO2 emissions (kt)
Countries: 266

Dataset: forest_area.csv
Columns: Country Name, Country Code, Indicator Name, Indicator Code, 1960, 1961, 1962, 1963, 1964, 1965, 1966, 1967, 1968, 1969, 1970, 1971, 1972, 1973, 1974, 1975, 1976, 1977, 1978, 1979, 1980, 1981, 1982, 1983, 1984, 1985, 1986, 1987, 1988, 1989, 1990, 1991, 1992, 1993, 1994, 1995, 1996, 1997, 1998, 1999, 2000, 2001, 2002, 2003, 2004, 2005, 2006, 2007, 2008, 2009, 2010, 2011, 2012, 2013, 2014, 2015, 2016, 2017, 2018, 2019, 2020, Unnamed: 65
Indicators: Fores

#### Extraction Step 2 (Read in the Datasets)

In [31]:
pre_path = 'datasets/'
co2_df = pd.read_csv(pre_path + files[0], skiprows = 3)
forest_area_df = pd.read_csv(pre_path + files[1], skiprows = 3)
freshwater_df = pd.read_csv(pre_path + files[2], skiprows = 3)
mortality_rate_under5_df = pd.read_csv(pre_path + files[3], skiprows = 3)
renewable_consumption_df = pd.read_csv(pre_path + files[4], skiprows = 3)
total_population_df = pd.read_csv(pre_path + files[5], skiprows = 3)


### Transformations

    First step was to remove the irrelevatnt columns from the data.

In [33]:
co2_df = co2_df[[_ for _ in co2_df.columns if "Code" not in _]]
forest_area_df = forest_area_df[[_ for _ in forest_area_df.columns if "Code" not in _]]
freshwater_df = freshwater_df[[_ for _ in freshwater_df.columns if "Code" not in _]]
mortality_rate_under5_df = mortality_rate_under5_df[[_ for _ in mortality_rate_under5_df.columns if "Code" not in _]]
renewable_consumption_df = renewable_consumption_df[[_ for _ in renewable_consumption_df.columns if "Code" not in _]]
total_population_df = total_population_df[[_ for _ in total_population_df.columns if "Code" not in _]]

In [34]:
co2_df

Unnamed: 0,Country Name,Indicator Name,1960,1961,1962,1963,1964,1965,1966,1967,...,2012,2013,2014,2015,2016,2017,2018,2019,2020,Unnamed: 65
0,Aruba,CO2 emissions (kt),,,,,,,,,...,,,,,,,,,,
1,Africa Eastern and Southern,CO2 emissions (kt),118545.901306,123758.90333,128093.897815,132810.33253,144345.352398,155803.780096,157932.257312,165066.04049,...,559333.857277,580510.924989,601860.163983,586385.004029,592299.593959,601323.394691,600351.133333,,,
2,Afghanistan,CO2 emissions (kt),414.371000,491.37800,689.396000,707.73100,839.743000,1008.425000,1092.766000,1283.45000,...,10450.000000,8510.000000,7810.000000,7990.000000,7390.000000,7380.000000,7440.000000,,,
3,Africa Western and Central,CO2 emissions (kt),8760.463000,9376.51900,9710.216000,11540.04900,13985.938000,19827.469000,21246.598000,21239.26400,...,181740.000000,191990.000000,198440.000000,193060.000000,195120.000000,201900.000000,224380.000000,,,
4,Angola,CO2 emissions (kt),550.050000,454.70800,1180.774000,1151.43800,1224.778000,1188.108000,1554.808000,993.75700,...,30250.000000,32820.000000,34630.000000,35160.000000,35410.000000,30840.000000,27340.000000,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
261,Kosovo,CO2 emissions (kt),,,,,,,,,...,,,,,,,,,,
262,"Yemen, Rep.",CO2 emissions (kt),58.672000,73.34000,69.673000,80.67400,99.009000,102.676000,99.009000,102.67600,...,19680.000000,26350.000000,26710.000000,14210.000000,10880.000000,10060.000000,9310.000000,,,
263,South Africa,CO2 emissions (kt),97934.569000,102213.95800,105767.281000,109826.65000,119657.877000,128260.659000,128356.001000,133885.83700,...,426710.000000,436870.000000,447980.000000,424880.000000,425180.000000,435140.000000,433250.000000,,,
264,Zambia,CO2 emissions (kt),,,,,3278.298000,3916.356000,3501.985000,4792.76900,...,4020.000000,4240.000000,4800.000000,5070.000000,5590.000000,6990.000000,7740.000000,,,
