# Introduction to Python Assignment 

Julia Marsh, University of Edinburgh 

### Instructions

Write a script/module/package in python to do the following:

- Find the countries with the lowest and highest emissions in 1960, and similarly in 2018 (ignoring countries with no data for the given year). Print the names of the countries and their emissions for each year.

- Find and print the mean and standard deviation of the emissions across all countries in 1960 (ignoring countries that have no data for the given year). Do the same for 2018.

- Find the countries with the lowest and highest total emissions summed over all years (ignoring missing years for each country). Print their names and total emissions.

- Sum the total emissions for all countries in each region across all years. Print a list of the regions and their total emissions, ordered by their emissions. Do the same grouping the countries by income group.

- Similarly to problem 4, group the countries by region, then calculate the total emissions of each region over the years 1960-1969 inclusive. Do the same for each decade up to the 2010s (which will be 2010-2018 as there's no data for 2019). Then, for each decade from 1960s to 2010s, print the regions and their total emissions sorted by emissions. Do the same grouping the countries by income group.

Can only use standard libraries which can be found here: https://docs.python.org/3/library/index.html

In [1]:
#libraries 
import csv 
import statistics 
import itertools 

## Task 1 

In [2]:
# open file using DictReader - maps dictionaries onto row outputs
with open('PerCapitaC02Emissions.csv', newline='') as file:
    reader = csv.DictReader(file)

    # create empty lists to hold countries and emissions for 1960 and 2018
    countries_1960 = []
    emissions_1960 = []
    countries_2018 = []
    emissions_2018 = [] 

    # loop over all rows and append emission and country names 
    for row in reader: 
        # ignore null values 
        if row['1960'] != '':
            # append emission as a float so that max() function works
            emissions_1960.append(float(row['1960']))
            countries_1960.append(row['Country Name'])
        if row['2018'] != '':
            emissions_2018.append(float(row['2018']))
            countries_2018.append(row['Country Name'])

In [3]:
# ---------------------------------------------------------
# sanity check - check lengths of lists are the same 
# ---------------------------------------------------------
print("sanity check")
print("---------------")

print("2018 list lengths:", len(countries_2018), len(emissions_2018))
print("1960 list lengths:", len(countries_1960), len(emissions_1960))

sanity check
---------------
2018 list lengths: 190 190
1960 list lengths: 155 155


In [4]:
max_1960 = max(emissions_1960)                        # use max() to get maximum emissions 
max_1960_index = emissions_1960.index(max_1960)       # locate index of max value 
max_1960_country = countries_1960[max_1960_index]     # find corresponding value in country list 

max_2018 = max(emissions_2018)
max_2018_index = emissions_2018.index(max_2018)
max_2018_country = countries_2018[max_2018_index]

min_1960 = min(emissions_1960)
min_1960_index = emissions_1960.index(min_1960)
min_1960_country = countries_1960[min_1960_index]

min_2018 = min(emissions_2018)
min_2018_index = emissions_2018.index(min_2018)
min_2018_country = countries_2018[min_2018_index]

print("1960:")
print("  max emissions (in tons) was", max_1960_country, "with", max_1960)
print("  min emissions (in tons) was", min_1960_country, "with", min_1960)

print("\n2018:")
print("  max emissions (in tons) was", max_2018_country, "with", max_2018)
print("  min emissions (in tons) was", min_2018_country, "with", min_2018)

1960:
  max emissions (in tons) was Aruba with 204.631696428571
  min emissions (in tons) was Nepal with 0.00798352508545224

2018:
  max emissions (in tons) was Qatar with 32.4156391708326
  min emissions (in tons) was Congo, Dem. Rep. with 0.0261692628875174


## Task 2 

In [5]:
mean_1960 = statistics.mean(emissions_1960)       # use statistics mean function 
stdev_1960 = statistics.stdev(emissions_1960)     # use statistics stdev function
mean_2018 = statistics.mean(emissions_2018)
stdev_2018 = statistics.stdev(emissions_2018)

print("1960:")
print("  mean:", mean_1960)
print("  stdev:", stdev_1960)

print("\n2018:")
print("  mean:", mean_2018)
print("  stdev:", stdev_2018)

1960:
  mean: 3.396899049423549
  stdev: 16.87293142556571

2018:
  mean: 4.186036456835386
  stdev: 4.841174757182509


## Task 3

In [6]:
# Open file using DictReader 
with open('PerCapitaC02Emissions.csv', newline='') as file:
    reader = csv.DictReader(file)
    
    # empty lists for country names and total emissions 
    countries = []
    total_emissions = []
    # variable used to sum 
    sum_val = 0

    for row in reader: 
        
        # append country name 
        countries.append(row['Country Name'])
        
        # create new list by slicing the rows from the third element (1960) to the final element (2018)
        for item in list(itertools.islice(row.items(), 3, None)):    # item has form ('year', value)
            # ignore null values 
            if item[1] != '':                                        # item[1] locates the emission value 
                # sum the emissions for all years as floats 
                sum_val += float(item[1])
        
        # append the total value 
        total_emissions.append(sum_val)
        # reset to zero for next row 
        sum_val = 0
        
total_max = max(total_emissions)                      # use max() function on total_emissions list 
total_max_index = total_emissions.index(total_max)    # locate index of max value 
total_max_country = countries[total_max_index]        # locate corresponding country 

total_min = min(total_emissions)                      # use min() function on total_emissions list 
total_min_index = total_emissions.index(total_min)
total_min_country = countries[total_min_index]

print("Total emissions:")
print("  max (in tons) was", total_max_country, "with", total_max)
print("  min (in tons) was", total_min_country, "with", total_min)

Total emissions:
  max (in tons) was Aruba with 5522.3948655893755
  min (in tons) was South Sudan with 1.0339785252303502


In [7]:
# ---------------------------------------------------------
# sanity check - check lengths of lists 
# ---------------------------------------------------------
print("sanity check")
print("---------------")

print("list lengths:", len(countries),len(total_emissions))

sum_total = 0
for value in total_emissions:
    sum_total += value
    
print("sum total:", sum_total)

sanity check
---------------
list lengths: 201 201
sum total: 49401.02883769895


## Task 4 

In [8]:
# function to identify the keys from the csv file 
# string argument is a key corresponding to a column name in the csv file 
def findKeys(string):
    
    with open('PerCapitaC02Emissions.csv', newline='') as file:
        reader = csv.DictReader(file)

        all_list = []          # list to hold all keys (including repeats)
        unique_list = []       # list to contain unique keys (remove repeats)

        for row in reader: 
            all_list.append(row[string])

        # remove repeats 
        unique_list = list(set(all_list))
        
        return unique_list

In [9]:
# identify the unique keys in the region column 
regions = findKeys('Region')
print(regions)

['Europe & Central Asia', 'South Asia', 'Sub-Saharan Africa', 'East Asia & Pacific', 'North America', 'Middle East & North Africa', 'Latin America & Caribbean']


In [10]:
# function to return a dictionary that is grouped by a column in csv file, e.g. region or income group 
# arguments: keys_list is a list of unique keys, string is name of column in csv file 
def createDictionary(keys_list, groupBy_string):
    
    # create a dictionary and give keys as argument, value is an empty list 
    dictionary = dict.fromkeys(keys_list, [])
    
    # loop over each key 
    for key in dictionary:

        # open the file as a dictionary 
        file = open('PerCapitaC02Emissions.csv', newline='')
        reader = csv.DictReader(file)

        # create a list to hold country names 
        list1 = []

        # loop over each row in the file
        for row in reader:
            # if the region of that row matches the key append the country name
            if row[groupBy_string] == key: 
                list1.append(row['Country Name'])

        # update the dictionary with list of country names for a given region 
        dictionary.update({key:list1})
        file.close()
        
    return dictionary

In [11]:
# create dictionary that is grouped by region 
dictionary = createDictionary(regions, 'Region')

In [12]:
# function to return the total emissions grouped by the dictionary keys 
# arguments: dictionary, list of countries, list containing total emissions 
def TotalEmissions(Dictionary, CountryList, TotalEmissionsList):
    
    # intialise empty list 
    empty_list = []

    for key in Dictionary:                       # loop over keys in the dictionary 

        variable_sum = 0                         # variable for summing over 

        for country in Dictionary[key]:          # loop over countries for a given key 

            # find the index in the countries list and use in total_emissions list 
            index = CountryList.index(country)
            variable_sum += TotalEmissionsList[index]

        # append region (key) and total summed emission 
        empty_list.append([key, variable_sum])   
        
    # sort the list into ascending order by the 1st element in the [key, variable] tuple 
    empty_list.sort(key=lambda tup: tup[1])

    return empty_list

In [13]:
# return total emissions by region
TotalEmissions(dictionary, countries, total_emissions)

[['South Asia', 219.1976101891688],
 ['Sub-Saharan Africa', 1939.531538668807],
 ['North America', 2233.6767946598297],
 ['East Asia & Pacific', 7094.603055682899],
 ['Latin America & Caribbean', 10537.126663155479],
 ['Middle East & North Africa', 10703.738299857918],
 ['Europe & Central Asia', 16673.15487548485]]

In [14]:
# ---------------------------------------------------------
# sanity check - check total is the same for both parts 
# ---------------------------------------------------------

regions_total_emissions = TotalEmissions(dictionary, countries, total_emissions)
region_total = 0
for element in regions_total_emissions:
    region_total += element[1]

#### Grouping countries by income 

In [15]:
# find keys by income group 
income_groups = findKeys('IncomeGroup')
print(income_groups)

['Lower middle income', 'Low income', 'High income', 'Upper middle income']


In [16]:
# create a dictionary that is grouped by income 
income_dictionary = createDictionary(income_groups, 'IncomeGroup')

In [17]:
# return total emissions by income group 
TotalEmissions(income_dictionary, countries, total_emissions)

[['Low income', 582.7439992171331],
 ['Lower middle income', 2607.924706767708],
 ['Upper middle income', 8450.526479135613],
 ['High income', 37759.8336525785]]

In [18]:
# ---------------------------------------------------------
# sanity check - check total is the same for both parts 
# ---------------------------------------------------------

print("sanity check")
print("---------------")

income_total_emissions = TotalEmissions(income_dictionary, countries, total_emissions)
income_total = 0
for element in income_total_emissions:
    income_total += element[1]
    
print("By region sum total:", region_total)
print("By income sum total:", income_total)

sanity check
---------------
By region sum total: 49401.02883769895
By income sum total: 49401.02883769895


## Task 5 

Group by region then calculate total emissions for each decade e.g. 1960-69, 1970-79, ... Print regions and total emissions in order. 


In [19]:
# decades list - elements used to edit the region used in islice()
decades = [0, 10, 20, 30, 40, 50]

# total emissions will be appened for each decade 
total_emissions_decades = []  # expect list to have length=6*201 as every country is appended once for each decade 
countries_decades = []        # list to keep track of countries - also expect len=6*201
arg_sum = 0                   # variable to sum over 
    
# loop over elements in decade list 
for value in decades:
    
    # open the file as a dictionary 
    file = open('PerCapitaC02Emissions.csv', newline='')
    reader = csv.DictReader(file)
    
    # set the start stop arguments for islice - change for each decade 
    lower = 3 + value
    upper = 13 + value
    
    # loop over each row 
    for row in reader: 
        
        countries_decades.append(row['Country Name'])
        
        # slice according to decade (e.g. 3=1960, 13=1969, add 10 and you get 1970 and 1979)
        for item in list(itertools.islice(row.items(), lower, upper)):
            if item[1] != '':
                arg_sum += float(item[1])
            else:
                arg_sum += 0 
        
        # append the total emission for that decade 
        total_emissions_decades.append(arg_sum)
        arg_sum = 0
    
    file.close()

In [20]:
# --------------------------------
# sanity check - list should be 6 times (6 decades) the length of the total number of countries 
# --------------------------------

print("sanity check")
print("---------------")

print(201*6, len(countries_decades), len(total_emissions_decades))

sanity check
---------------
1206 1206 1206


In [21]:
# slice total_emissions_decades list to get new list for each decade 
total_emissions_60s = total_emissions_decades[0:201]
total_emissions_70s = total_emissions_decades[201:402]
total_emissions_80s = total_emissions_decades[402:603]
total_emissions_90s = total_emissions_decades[603:804]
total_emissions_00s = total_emissions_decades[804:1005]
total_emissions_10s = total_emissions_decades[1005:]
country_names = countries_decades[0:201]

# --------------------------------
# sanity check 
# --------------------------------

print("sanity check")
print("---------------")

print(len(total_emissions_60s),len(total_emissions_70s),len(total_emissions_80s),len(total_emissions_90s),
len(total_emissions_00s),len(total_emissions_10s))

sanity check
---------------
201 201 201 201 201 201


#### Total emissions grouped by region for each decade 

In [22]:
TotalEmissions(dictionary, country_names, total_emissions_60s)

[['South Asia', 14.006229136608765],
 ['Sub-Saharan Africa', 141.19461467495648],
 ['North America', 336.64037298379844],
 ['East Asia & Pacific', 612.2053854943763],
 ['Middle East & North Africa', 1474.9529534016938],
 ['Europe & Central Asia', 1883.4320464099462],
 ['Latin America & Caribbean', 2556.6363675215066]]

In [24]:
TotalEmissions(dictionary, country_names, total_emissions_70s)

[['South Asia', 13.885339258373193],
 ['Sub-Saharan Africa', 290.2534699809034],
 ['North America', 456.76106769885826],
 ['East Asia & Pacific', 1587.8528696121543],
 ['Middle East & North Africa', 2251.9756653193003],
 ['Europe & Central Asia', 2518.871636256716],
 ['Latin America & Caribbean', 3370.5322618335804]]

In [25]:
TotalEmissions(dictionary, country_names, total_emissions_80s)

[['South Asia', 21.388373625406963],
 ['Sub-Saharan Africa', 317.2505884887202],
 ['North America', 448.1656010532347],
 ['East Asia & Pacific', 1338.581405342321],
 ['Middle East & North Africa', 1722.9032856757535],
 ['Latin America & Caribbean', 2020.5984244073766],
 ['Europe & Central Asia', 2470.6329665252892]]

In [26]:
TotalEmissions(dictionary, country_names, total_emissions_90s)

[['South Asia', 35.20258929288999],
 ['Sub-Saharan Africa', 320.12948173330386],
 ['North America', 349.7013337749195],
 ['Latin America & Caribbean', 759.2621918495278],
 ['East Asia & Pacific', 1126.8641429120423],
 ['Middle East & North Africa', 1662.5503261891874],
 ['Europe & Central Asia', 3582.2017346543735]]

In [27]:
TotalEmissions(dictionary, country_names, total_emissions_00s)

[['South Asia', 56.527282263621046],
 ['North America', 358.82437300255924],
 ['Sub-Saharan Africa', 455.60725730112455],
 ['Latin America & Caribbean', 950.698010809086],
 ['East Asia & Pacific', 1237.18901134212],
 ['Middle East & North Africa', 1937.2744963737198],
 ['Europe & Central Asia', 3422.719897763057]]

In [28]:
TotalEmissions(dictionary, country_names, total_emissions_10s)

[['South Asia', 78.18779661226884],
 ['North America', 283.58404614645974],
 ['Sub-Saharan Africa', 415.0961264897986],
 ['Latin America & Caribbean', 879.3994067344038],
 ['East Asia & Pacific', 1191.910240979883],
 ['Middle East & North Africa', 1654.0815728982632],
 ['Europe & Central Asia', 2795.2965938754687]]

#### Total emissions for each region grouped by income group 

In [29]:
TotalEmissions(income_dictionary, country_names, total_emissions_60s)

[['Low income', 63.82076639324335],
 ['Lower middle income', 170.63442396633405],
 ['Upper middle income', 648.4237161755752],
 ['High income', 6136.189063087734]]

In [30]:
TotalEmissions(income_dictionary, country_names, total_emissions_70s)

[['Low income', 116.8300137598543],
 ['Lower middle income', 297.05628840056335],
 ['Upper middle income', 1091.8619361933079],
 ['High income', 8984.384071606162]]

In [31]:
TotalEmissions(income_dictionary, country_names, total_emissions_80s)

[['Low income', 141.91941105219456],
 ['Lower middle income', 331.1728242683064],
 ['Upper middle income', 1252.159142448615],
 ['High income', 6614.269267348988]]

In [32]:
TotalEmissions(income_dictionary, country_names, total_emissions_90s)

[['Low income', 96.75965360819824],
 ['Lower middle income', 559.4441862032044],
 ['Upper middle income', 1700.5230377723906],
 ['High income', 5479.184922822452]]

In [33]:
TotalEmissions(income_dictionary, country_names, total_emissions_00s)

[['Low income', 94.50833361391513],
 ['Lower middle income', 605.7322751977937],
 ['Upper middle income', 1874.6819788655791],
 ['High income', 5843.917741177997]]

In [34]:
TotalEmissions(income_dictionary, country_names, total_emissions_10s)

[['Low income', 68.90582078972756],
 ['Lower middle income', 643.8847087315065],
 ['Upper middle income', 1882.8766676801413],
 ['High income', 4701.888586535169]]