# A. Guidelines
## A1) Your answers
- Only include the **answers to the posed questions** into this Notebook, not some additional analysis that you may have performed.
- It is **not necessary to comment** the code or results, as long as they are correct and speak for themselves. However, commenting can sometimes be helpful to clarify why you take a particular approach, or in case you note that your results are not entirely correct.

## A2) Grading
- Correct results can be obtained in many, different ways.
- Results may potentially depend on data cleaning approach or certain assumptions. Hence, different sets of results may be counted as "correct". I will check your code, not only the output.
- Follow-up errors are not counted as errors. I will check your code, and evaluate to which degree the code is correct.

# B. Challenge

## Problem description
The topic of this challenge are countries' trade statistics. You will need to request trade data from the **Comtrade data extraction API** of the United Nations (UN) and further process, analyze and visualize the data. The following data related aspects deserve particular mention:

- **API documentation**: The UN Comtrade data extraction API is described in detail here: https://comtrade.un.org/data/doc/api. You will need to carefully study this API and the provided explanations.
- **Interactive exploration**: To better understand the data and the API parameters, you can also use the (1) [data selection interface](https://comtrade.un.org/Data/.) or (2) [run API queries interactively](https://comtrade.un.org/api/swagger/ui/index#!/Data/Data_GetData).
- **No registration**: This is a Public API. Hence, you can use the API as a guest user, i.e. without registration and authentication (which would cost money).
- **Rate limits**: Different kinds of rate limits apply, especially for guest users. For instance, you may only run one request per second. The limits are precisely described on the web site. Some of your tasks can be solved without special efforts within the given rate limits. However, some of the tasks may require that you explicitly deal with the limits. For instance, you may need to force Python to wait for 1 second before running the next request.
- **Default values**: Note that most of the parameters have a certain default value, as specified in the API documentation. If you don't explicitly define the parameter in your query, the default value is assumed. Throughout the entire notebook, you can ignore (i.e. stick to the default values of) the following parameters:
    - `fmt`: default data format is JSON.
    - `type`: default trade data type is commodity trade (as opposed to services trade)
    - `freq`: default frequency is annual data (as opposed to monthly data)
    - `px`: default classification of commodities is called HS (Harmonised System)

## 1) Ukraines Exports by Partner Country (10 points)


**Request** the total (cc='TOTAL') exports (rg=2) of reporting country Ukraine (r=804) to all available partner countries/regions (p='all') in the year 2021 (ps=2021), and turn it into a Pandas DataFrame.

In [1]:
import pandas as pd
import json
from urllib.request import urlopen
import urllib.parse
import time
import plotly.express as px

In [2]:
url='https://comtrade.un.org/api/get?r=804&ps=2021&p=all&rg=2&cc=TOTAL'

f = urlopen(url)
j = json.load(f)
df = pd.DataFrame((j['dataset']))
df

Unnamed: 0,pfCode,yr,period,periodDesc,aggrLevel,IsLeaf,rgCode,rgDesc,rtCode,rtTitle,...,qtAltCode,qtAltDesc,TradeQuantity,AltQuantity,NetWeight,GrossWeight,TradeValue,CIFValue,FOBValue,estCode
0,H5,2021,2021,2021,0,0,2,Export,804,Ukraine,...,,,0,,,,65870275510,,,4
1,H5,2021,2021,2021,0,0,2,Export,804,Ukraine,...,,,0,,0.0,,4553747,,,0
2,H5,2021,2021,2021,0,0,2,Export,804,Ukraine,...,,,0,,0.0,,39768416,,,0
3,H5,2021,2021,2021,0,0,2,Export,804,Ukraine,...,,,0,,0.0,,410228000,,,4
4,H5,2021,2021,2021,0,0,2,Export,804,Ukraine,...,,,0,,0.0,,107648,,,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
193,H5,2021,2021,2021,0,0,2,Export,804,Ukraine,...,,,0,,0.0,,402821126,,,4
194,H5,2021,2021,2021,0,0,2,Export,804,Ukraine,...,,,0,,0.0,,5042855,,,0
195,H5,2021,2021,2021,0,0,2,Export,804,Ukraine,...,,,0,,0.0,,234705137,,,0
196,H5,2021,2021,2021,0,0,2,Export,804,Ukraine,...,,,0,,0.0,,461095,,,0


**Process the DataFrame in the following way:**
- Turn the column names into lower case letters
- Keep only the following columns `yr`,`rtcode`, `rttitle`, `ptcode`, `pttitle`, `cmdcode`, `cmddesce`, `rgcode`, `rgdesc`, `tradevalue`
- Sort the data in descending order of the `tradevalue`


Then display Ukraine's top 10 export partner countries in terms of the `tradevalue`.


In [3]:
df.columns = df.columns.str.lower()
df = df[['yr', 'rtcode', 'rttitle', 'ptcode', 'pttitle', 'cmdcode', 'cmddesce', 'rgcode', 'rgdesc', 'tradevalue']]
df1 = df.sort_values('tradevalue', ascending=False)[1:11]
df1

Unnamed: 0,yr,rtcode,rttitle,ptcode,pttitle,cmdcode,cmddesce,rgcode,rgdesc,tradevalue
39,2021,804,Ukraine,156,China,TOTAL,All Commodities,2,Export,7992491765
145,2021,804,Ukraine,616,Poland,TOTAL,All Commodities,2,Export,4979134699
182,2021,804,Ukraine,792,Turkey,TOTAL,All Commodities,2,Export,3999617927
150,2021,804,Ukraine,643,Russian Federation,TOTAL,All Commodities,2,Export,3349119521
89,2021,804,Ukraine,381,Italy,TOTAL,All Commodities,2,Export,3240255598
71,2021,804,Ukraine,276,Germany,TOTAL,All Commodities,2,Export,2789957319
161,2021,804,Ukraine,699,India,TOTAL,All Commodities,2,Export,2513804631
126,2021,804,Ukraine,528,Netherlands,TOTAL,All Commodities,2,Export,2128423194
187,2021,804,Ukraine,818,Egypt,TOTAL,All Commodities,2,Export,1909977890
169,2021,804,Ukraine,724,Spain,TOTAL,All Commodities,2,Export,1639830286


**Visualize the data in the following way:**
- Visualize the exports to Ukraine's top 10 partner countries using a bar chart. Make sure that you exlude the partner country "World" from the data. Also make sure that the visualization is well readable and includes a title and axes labels.

In [4]:
df1['tradevalue'] = df1['tradevalue'] / 1e9

px.bar(df1,
    x='pttitle', 
    y='tradevalue', 
    title='Ukraine Top 10 Export Partners 2021',
    labels={'tradevalue': 'Export Value [billion USD]', 'pttitle': 'Country'}
    )

## 2. Ukraines Exports by Product (10 points)

**What are Ukraine's 10 most important export products in 2021? To answer the question, retrieve relevant data from the API and visualize it adequately.** 

Specifically:

- We need to agree on a common definition of products. For our purpose, set the classification code to `cc='AG4'`. This aggregation level 4 of the Harmonised System (HS) distinguishes between more than 1200 different products. (The aggregation level 2 would be more crude; the aggregation level 6 would be more detailed)
- Do not distinguish between different partner countries; we are interested in Ukraine's exports to the entire world on aggregate

In [5]:
url='https://comtrade.un.org/api/get?r=804&ps=2021&p=0&rg=2&cc=AG4' 

f = urlopen(url)
j = json.load(f)
df = pd.DataFrame((j['dataset']))
df.columns = df.columns.str.lower()
df

Unnamed: 0,pfcode,yr,period,perioddesc,aggrlevel,isleaf,rgcode,rgdesc,rtcode,rttitle,...,qtaltcode,qtaltdesc,tradequantity,altquantity,netweight,grossweight,tradevalue,cifvalue,fobvalue,estcode
0,H5,2021,2021,2021,4,0,2,Export,804,Ukraine,...,,,0,,4.378890e+05,,5903700,,,2
1,H5,2021,2021,2021,4,0,2,Export,804,Ukraine,...,,,0,,,,3750267,,,6
2,H5,2021,2021,2021,4,0,2,Export,804,Ukraine,...,,,0,,1.337978e+07,,3090679,,,2
3,H5,2021,2021,2021,4,0,2,Export,804,Ukraine,...,,,0,,,,13481189,,,6
4,H5,2021,2021,2021,4,0,2,Export,804,Ukraine,...,,,0,,4.819185e+07,,86788173,,,2
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1120,H5,2021,2021,2021,4,0,2,Export,804,Ukraine,...,,,1223,,1.223000e+03,,126638,,,0
1121,H5,2021,2021,2021,4,0,2,Export,804,Ukraine,...,,,3262,,2.289837e+06,,560932,,,0
1122,H5,2021,2021,2021,4,0,2,Export,804,Ukraine,...,,,3069660,,1.969730e+09,,676446031,,,2
1123,H5,2021,2021,2021,4,0,2,Export,804,Ukraine,...,,,873,,0.000000e+00,,611820,,,0


In [6]:
df1 = df.sort_values('tradevalue', ascending=False)[:11]
df1['tradevalue'] = df1['tradevalue'] / 1e9
df1['cmddesce'] = df1['cmddesce'].str[:34]
df1

Unnamed: 0,pfcode,yr,period,perioddesc,aggrlevel,isleaf,rgcode,rgdesc,rtcode,rttitle,...,qtaltcode,qtaltdesc,tradequantity,altquantity,netweight,grossweight,tradevalue,cifvalue,fobvalue,estcode
512,H5,2021,2021,2021,4,0,2,Export,804,Ukraine,...,,,45063372163,,45063370000.0,,6.810644,,,0
431,H5,2021,2021,2021,4,0,2,Export,804,Ukraine,...,,,5161205431,,5161205000.0,,6.310573,,,0
393,H5,2021,2021,2021,4,0,2,Export,804,Ukraine,...,,,24539480637,,24539480000.0,,5.854587,,,6
389,H5,2021,2021,2021,4,0,2,Export,804,Ukraine,...,,,19394934690,,19394930000.0,,4.722745,,,0
932,H5,2021,2021,2021,4,0,2,Export,804,Ukraine,...,,,6800833269,,6800833000.0,,3.888485,,,0
933,H5,2021,2021,2021,4,0,2,Export,804,Ukraine,...,,,4619926279,,4619926000.0,,3.436732,,,0
1075,H5,2021,2021,2021,4,0,2,Export,804,Ukraine,...,,,77797380,,77797380.0,,1.625403,,,6
927,H5,2021,2021,2021,4,0,2,Export,804,Ukraine,...,,,3212719370,,3212719000.0,,1.576713,,,0
407,H5,2021,2021,2021,4,0,2,Export,804,Ukraine,...,,,2328268065,,2328268000.0,,1.359008,,,0
481,H5,2021,2021,2021,4,0,2,Export,804,Ukraine,...,,,4345749214,,4345749000.0,,1.275725,,,0


In [7]:
px.bar(df1,
    x='cmddesce',
    y='tradevalue', 
    title='Ukraine Global Export Value 2021 in Billions',
    labels={'tradevalue': 'Export Value USD', 'cmddesce': 'Commodity'}
    )

## 3. Dependency of the World on Ukraine's Exports (10 points)

**Currently, many supermarkets around the globe run out of products such as "cooking oil". This is said to be a consequence of the war in Ukraine and the fact that Ukraine is supplying a significant fraction of the worlds total exports of cooking oil. Your task is to analyze this question.** 

Specifically: 

- Consider the product `cc=1512` ("Sun-flower seed, safflower or cotton-seed oil and their fractions; whether or not refined, but not chemically modified")
- Retrieve the exports of this product of each individual country x to the entire World
- Calculate what fraction of the total exports is due to country x
- Visualize these fractions for the top 10 exporter countries.

In [8]:
# Retrieve the exports of this product of each individual country x to the entire World

url = 'https://comtrade.un.org/api/get?r=all&ps=2021&p=0&rg=2&cc=1512'

f = urlopen(url)
j = json.load(f)
df = pd.DataFrame((j['dataset']))
df.columns = df.columns.str.lower()
df = df[['yr', 'rtcode', 'rttitle', 'ptcode', 'pttitle', 'cmdcode', 'cmddesce', 'rgcode', 'rgdesc', 'tradevalue']].sort_values('tradevalue', ascending=False)
df.head()

Unnamed: 0,yr,rtcode,rttitle,ptcode,pttitle,cmdcode,cmddesce,rgcode,rgdesc,tradevalue
75,2021,804,Ukraine,0,World,1512,"Sun-flower seed, safflower or cotton-seed oil ...",2,Export,6310573339
62,2021,643,Russian Federation,0,World,1512,"Sun-flower seed, safflower or cotton-seed oil ...",2,Export,3105016420
74,2021,792,Turkey,0,World,1512,"Sun-flower seed, safflower or cotton-seed oil ...",2,Export,937160152
53,2021,528,Netherlands,0,World,1512,"Sun-flower seed, safflower or cotton-seed oil ...",2,Export,856155347
32,2021,348,Hungary,0,World,1512,"Sun-flower seed, safflower or cotton-seed oil ...",2,Export,702935614


In [9]:
df1 = df[['rttitle', 'tradevalue']]
df1 = df1.sort_values('tradevalue', ascending=False)
df1['tradevalue'] = df1['tradevalue'] / df1['tradevalue'].sum()
rest_of_world = df1['tradevalue'][10:].sum()
df1 = df1[:10]

df2 = pd.DataFrame({'rttitle': ['all other contries'], 'tradevalue': [rest_of_world]})

df1 = pd.concat([df1, df2])

fig = px.pie(df1, 
       values='tradevalue', 
       names='rttitle', 
       title='Top Cooking Oil Exporting Countries',
       labels={'tradevalue':'Trade Value [billion USD]', 'rttitle': 'Country'}
       )

fig.show()

## 4. Dependency of the World Function (10 points)

**Create a Python function `world_dependency_plot` that is able to answer the previous question - as well as corresponding questions for different products and different years**.

Specifically,  

- the function should have the following two input parameters: 
    - `cc`: single product code (e.g. 1512)
    - `ps`: single year (e.g. 2021)
- the function should return a visualization as specified in the previous exercise
- the title of the visualization should contain the product name
- the function should include a docstring that details what the function does, and what it's inputs and outputs are.

Test your function using `world_dependency_plot(ps=2020, cc=8703)`

In [10]:
def world_dependency_plot(cc:int, ps:int):
    """Requests and plots data from comtrade API

    Args:
        cc (int): classification code / https://comtrade.un.org/Data/cache/classificationHS.json
        ps (int): time period / YYYY or YYYYMM
    """
    url = 'https://comtrade.un.org/api/get?'
    params = {'r': 'all', 'ps':ps, 'p': 0, 'rg': 2, 'cc': cc}
    url = url + urllib.parse.urlencode(params)
    f = urlopen(url)
    j = json.load(f)
    df = pd.DataFrame((j['dataset']))



    df.columns = df.columns.str.lower()
    df1 = df[['rttitle', 'tradevalue']]
    df1 = df1.sort_values('tradevalue', ascending=False)
    df1['tradevalue'] = df1['tradevalue'] / df1['tradevalue'].sum()
    rest_of_world = df1['tradevalue'][10:].sum()
    df1 = df1[:10]

    title = str(ps) + ' Top 10 Global Exporters in %: ' + str(df['cmddesce'][1].split(';')[0])

    fig = px.pie(df1,
    values='tradevalue', 
    names='rttitle', 
    title=title,
    labels={'tradevalue':'trade value [billion USD]', 'rttitle': 'Country'})
    return fig

In [11]:
world_dependency_plot(ps=2020, cc=8703)

## 5. Requesting data for many years (10 points)

The API poses some limitations:

- The Comtrade API only allows querying data for up to 5 years per API request. 
- Another problem is that you have to provide one string containing comma-separated values. (For instance, you can set `ps='2010,2011,2012,2013,2014').` 
- Another problem is that only 1 request per second can be carried out.

**Write a Python function `comtrade_many_years` that is able to download and process data for an arbitrary number of years. This implies that you need to split the desired request into multiple requests, the results of which are finally concatenated to one large dataset.**

Specifically, 

- the function must accept a list, tuple or range of years as input (e.g. `[2010, 2015, 2020]` or `range(2000, 2021)`) and no limitations should be placed on the number of requestable years.
- the function must also accept the parameters `r` (reporting country), `p` (partner country, `cc` (classification code) and `rg` (trade regime), but here the standard Comtrade rate limits and formatting rules may apply
- the function should return 1 "long" dataset that contains data for all requested years.
- the function should include a docstring that details what the function does, and what it's inputs and outputs are.

Test your function using: `comtrade_many_years(ps=range(2010,2020), r=276, p='all', cc='TOTAL', rg=2)`

In [12]:
from itertools import zip_longest

In [13]:
# copied from: https://docs.python.org/3/library/itertools.html#itertools-recipes

def grouper(iterable, n, *, incomplete='fill', fillvalue=None):
    "Collect data into non-overlapping fixed-length chunks or blocks"
    # grouper('ABCDEFG', 3, fillvalue='x') --> ABC DEF Gxx
    # grouper('ABCDEFG', 3, incomplete='strict') --> ABC DEF ValueError
    # grouper('ABCDEFG', 3, incomplete='ignore') --> ABC DEF
    args = [iter(iterable)] * n
    if incomplete == 'fill':
        return zip_longest(*args, fillvalue=fillvalue)
    if incomplete == 'strict':
        return zip(*args, strict=True)
    if incomplete == 'ignore':
        return zip(*args)
    else:
        raise ValueError('Expected fill, strict, or ignore')

In [14]:
def comtrade_many_years(ps, r, p, cc, rg):
    """ Downloads and processes data for an arbitrary number of years from comtrade API

    Args:
        ps (itr or in): YYYY or YYYYMM
        r: reporting area / https://comtrade.un.org/Data/cache/reporterAreas.json
        p: partner area / https://comtrade.un.org/Data/cache/partnerAreas.json
        cc: classification code / https://comtrade.un.org/Data/cache/classificationHS.json
        rg: trade flow / https://comtrade.un.org/Data/cache/tradeRegimes.json
    """
    if isinstance(ps, int):
        ps = [ps]


    url = 'https://comtrade.un.org/api/get?'
    dfs = []
    chunks = grouper(iterable=ps,n=5)

    first_it = True
    for chunk in chunks:
        if first_it != True:
            # executed in all iterations after the first
            time.sleep(1)
        else:
            first_it = False

        # remove fill value when the last group has less than 5 values
        years = [str(year) for year in chunk if year != None]
        # convert years from integers to strings and join them with ','
        years = ','.join(years)

        # encode link parameters
        params = {'r': r, 'ps': years, 'p': p, 'rg': rg, 'cc': cc}
        link = url + urllib.parse.urlencode(params)

        # fetch data
        f = urlopen(link)
        j = json.load(f)
        df = pd.DataFrame((j['dataset']))
        dfs.append(df)
    
    # concatenate into single dataframe
    df = pd.concat(dfs)
    return df

In [15]:
comtrade_many_years(ps=range(2010,2020), r=276, p='all', cc='TOTAL', rg=2)

Unnamed: 0,pfCode,yr,period,periodDesc,aggrLevel,IsLeaf,rgCode,rgDesc,rtCode,rtTitle,...,qtAltCode,qtAltDesc,TradeQuantity,AltQuantity,NetWeight,GrossWeight,TradeValue,CIFValue,FOBValue,estCode
0,H4,2012,2012,2012,0,0,2,Export,276,Germany,...,,,0,,,,1410146320662,,,4
1,H4,2012,2012,2012,0,0,2,Export,276,Germany,...,,,0,,0.0,,380125284,,,4
2,H4,2012,2012,2012,0,0,2,Export,276,Germany,...,,,0,,0.0,,232800352,,,4
3,H4,2012,2012,2012,0,0,2,Export,276,Germany,...,,,0,,0.0,,851111,,,4
4,H4,2012,2012,2012,0,0,2,Export,276,Germany,...,,,0,,0.0,,2451300572,,,4
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1160,H4,2016,2016,2016,0,0,2,Export,276,Germany,...,,,0,,0.0,,102500,,,0
1161,H4,2016,2016,2016,0,0,2,Export,276,Germany,...,,,0,,0.0,,1298990,,,0
1162,H4,2016,2016,2016,0,0,2,Export,276,Germany,...,,,0,,0.0,,123298365,,,4
1163,H4,2016,2016,2016,0,0,2,Export,276,Germany,...,,,0,,0.0,,61514118,,,4
