First time use / Primeira utilização

Menu: Terminal -> New Terminal
pip install -r requirements.txt

# Teste da API Comtrade

Site: https://comtradedeveloper.un.org
Manual metodológico: https://comtrade.un.org/data/MethodologyGuideforComtradePlus.pdf


API specs: https://comtradedeveloper.un.org/api-details#api=comtrade-v1&operation=get-get



## Descrições dos códigos HS

Obter tabela em https://github.com/datasets/harmonized-system/blob/master/data/harmonized-system.csv

Copiar para directoria `support`


### Ler a tabela e criar dicionários para descodificação

In [121]:
import pandas as pd

hs_codes_df = pd.read_csv('support/harmonized-system.csv') # read table
hs_codes_map = dict(zip(hs_codes_df.hscode, hs_codes_df.description)) #  dict for decoding
hs_codes_l2 = hs_codes_df[hs_codes_df.level == 2]  # create subset of level 2 codes
hs_l2_map = dict(zip(hs_codes_l2.hscode, hs_codes_l2.description)) # dict for decodung


### Obter os códigos de países e regiões

Ver lista em https://unstats.un.org/wiki/display/comtrade/Country+Code

Em formato JSON disponível em:
 * https://comtrade.un.org/data/cache/reporterAreas.json
 * https://comtrade.un.org/data/cache/partnerAreas.json

Aqui descarregamos e guardamos em `support` para evitar
ligação próxima vez.


In [122]:
import os
import json
import requests
import pandas as pd

fname = 'support/reporter_codes.json'
if os.path.isfile(fname):
    with open(fname) as cached:
        reporter_codes = json.load(cached)
else:
    resp = requests.get("https://comtrade.un.org/data/cache/reporterAreas.json")
    codes = json.loads(resp.content)['results']
    reporter_codes = dict([(a['id'], a['text'])  for a in codes])
    with open(fname, mode="x") as outfile:
        json.dump(reporter_codes,outfile,indent=2)

fname = 'support/partner_codes.json'
if os.path.isfile(fname):
    with open(fname) as cached:
        partner_codes = json.load(cached)
else:
    resp = requests.get("https://comtrade.un.org/data/cache/partnerAreas.json")
    codes = json.loads(resp.content)['results']
    partner_codes = dict([(a['id'], a['text'])  for a in codes])
    with open(fname,mode="x") as outfile:
        json.dump(reporter_codes,outfile,indent=2)

# convert codes to int
m49_reporter_codes_map = { int(k):v for (k,v) in reporter_codes.items() if k!='all'}
m49_partner_codes_map = { int(k):v for (k,v) in partner_codes.items() if k!='all'}
m49_reporter_list = ",".join(map(str,m49_reporter_codes_map.keys()))
m49_partner_list = ",".join(map(str,m49_partner_codes_map.keys()))

Os dados devolvidos pela API incluem códigos no campo `Partner2` que não estão na lista oficial de códigos M49.

Outros utilizadores têm o mesmo problema.

Ver  https://rstudio-pubs-static.s3.amazonaws.com/92321_70509e47e7f041e68f383253cb85751b.html onde se encontra o resultado de cruzamentos dos códigos dos dados e várias versões da lista
M49 e que é útil para completar a lista da FAO.

Ver também esta nota da unstats: https://unstats.un.org/wiki/display/comtrade/Reporter+country+codes+and+their+customs+areas

__Códigos atualmente em falta:__
* __473__ China-Angola Import 2016 partner2Code.


In [123]:
m49_codes_map.get(473,"Not found")

'Not found'

## Obter os dados de comtrade.un.org

### Parâmetros gerais que não mudam



In [124]:
m49_angola = 24
m49_brazil = 76
m49_cabo_verde = 132
m49_china = 156
m49_hong_kong = 344
m49_macau = 446
m49_guine_equatorial = 226
m49_guine_bissau = 624
m49_mozambique = 508
m49_portugal = 620
m49_stome_principe = 678
m49_timor = 626

# make list of Portuguese Speaking Countries
m49_plp = [m49_angola,m49_brazil,m49_cabo_verde,m49_guine_bissau,
            m49_guine_equatorial,m49_mozambique,m49_portugal,
            m49_stome_principe,m49_timor]
m49_plp_list = ",".join(map(str,m49_plp))




### Função auxiliar para aceder à API un.comtrade


In [125]:
import json
import requests
import pandas as pd
import warnings


def call_uncomtrade(typeCode: str, freqCode: str, 
                    reporterCode: str = '49', 
                    partnerCode: str = '024,076,132,226,624,508,620,678,626',
                    partner2Code: str = '0',
                    period: str = None,
                    clCode: str = "HS",
                    cmdCode: str = "TOTAL",
                    flowCode: str = "M,X",
                    timeout: int = 10,
                    echo_url: bool = False
                    )->pd.DataFrame | None:
    """ Makes a request to UN Comtrade API (public), returns a pandas DataFrame
    
    Parameters
        typeCode: required, C for commodities, S for Services
        freqCode: required, A for annual and M for monthly
        reporterCode: optional, list of M49 codes, default "049" (China)
        partnerCode: optional, list of M49 codes None for all countries, default PLP codes
        partner2Code: optional, list of M49 codes, None for all countries, 0 for agregate, default 0
        period:  optional, aaaa or aaaamm default None (all available periods)
        clCode: Trade classifications: HS, SITC, BEC or EBOPS.
                Available values : HS, SS, B4, B5, EB, EB10, EB02, EBSDMX
        cmdCode: optional, default, "TOTAL"
        flowCode: optional, M=import, X=export,more: RX, RM, MIP, XIP, MOP, XOP, MIF, XIF, DX, FM; default "M,X"
        timeOut: int, max wait time in seconds. Default 10
        echo_url: bool, print url of call, default False
     """

    baseUrl = "https://comtradeapi.un.org/public/v1"

    requestUrl=f"{baseUrl}/preview/{typeCode}/{freqCode}/{clCode}"
    resp = requests.get(requestUrl,
            {
            'reporterCode':reporterCode,
            'period':period,
            'partnerCode':partnerCode,
            'partner2Code':partner2Code,
            'cmdCode':cmdCode,
            'flowCode':flowCode
            },
            timeout=timeout)
    if echo_url:
        print(resp.url)
    results = json.loads(resp.content)['data']
    if len(results) == 0:
        warnings.warn("Pesquisa não obteve resultados")
        df = None
    else:
        df = pd.DataFrame(results)

        # Convert the country codes to country names
        df.reporterDesc = df.reporterCode.map(m49_reporter_codes_map)
        df.partnerDesc = df.partnerCode.map(m49_partner_codes_map)
        df.partner2Desc = df.partner2Code.map(m49_partner_codes_map)
        # Convert the HS codes
        df.cmdDesc = df.cmdCode.map(hs_codes_map)
        # Generate a formated version of the value for readability here
        df['primaryValueFormated'] = df.primaryValue.map('{:,}'.format)
        # return the DataFrame
    return df

## Parâmetros para visualização

Colunas a visualizar, ordem das linhas



In [126]:
# Colunas mais interessantes do resultado
# escolher de 
#        'typeCode', 'freqCode', 'refPeriodId', 'refYear', 'refMonth',
#        'period', 'reporterCode', 'reporterISO', 'reporterDesc',
#        'flowCode', 'flowDesc', 'partnerCode', 'partnerISO', 'partnerDesc',
#        'partner2Code', 'partner2ISO', 'partner2Desc',
#        'classificationCode', 'classificationSearchCode',
#        'isOriginalClassification', 'cmdCode', 'cmdDesc', 'aggrLevel',
#        'isLeaf', 'customsCode', 'customsDesc', 'mosCode', 'motCode',
#        'motDesc', 'qtyUnitCode', 'qtyUnitAbbr', 'qty', 'isQtyEstimated',
#        'altQtyUnitCode', 'altQtyUnitAbbr', 'altQty', 'isAltQtyEstimated',
#        'netWgt', 'isNetWgtEstimated', 'grossWgt', 'isGrossWgtEstimated',
#        'cifvalue', 'fobvalue', 'primaryValue', 'legacyEstimationFlag',
#        'isReported', 'isAggregate', 'primaryValueFormated']

cols = ['typeCode','freqCode','reporterDesc','partnerDesc','partner2Code','partner2Desc','refYear','cmdCode','cmdDesc','flowCode','isReported','primaryValueFormated','primaryValue']
sort_order = ['reporterDesc','partnerDesc','refYear','refMonth']

## Notas de utilização

### Mais do que uma linha por par de países nos anos de 2015,2016,2017

Alguns anos produzem mais do que uma linha por par _reporter/partner_  com diferentes valores:
2015, 2016, 2017
* Nesses anos existe uma linha por cada `partner2Code`, incluindo uma linha para o próprio partnerCode.
* Uma linha adicional com `partner2Code` igual a zero que contém o total agregado das outras linhas com `partner2Code`explícito.
* Isso significa que existe duplicação do total.
  
|    | reporterDesc   | partnerDesc       |   partner2Code | partner2Desc         |   refYear | cmdCode   | flowCode   | primaryValueFormated   |
|---:|:---------------|:------------------|---------------:|:---------------------|----------:|:----------|:-----------|:-----------------------|
|  3 | China          | Equatorial Guinea |            344 | China, Hong Kong SAR |      2015 | TOTAL     | M          | 59.0                   |
|  1 | China          | Equatorial Guinea |             56 | Belgium              |      2015 | TOTAL     | M          | 2,435.0                |
|  2 | China          | Equatorial Guinea |            226 | Equatorial Guinea    |      2015 | TOTAL     | M          | 1,166,493,970.0        |
|  0 | China          | Equatorial Guinea |              0 | nan                  |      2015 | TOTAL     | M          | 1,166,496,464.0        |


Para evitar isso tem de se chamar a API com partner2Code = 0, para que os resultados de 2015,2016,2017 excluam
a decomposição. Se partner2Code=None as linhas adicionais aparecem.


Exemplo de resultados se o `partner2Code` for None


In [130]:
pd.set_option('display.max_columns', 100)
pd.set_option('display.max_rows', 500)

cols2 = ['reporterDesc','partnerDesc','partner2Code','partner2Desc','refYear','cmdCode','flowCode','primaryValueFormated']
period = "2016" ## if freqCode M  use aaaamm
flow = "M"
partnerCode = m49_guine_bissau
df = call_uncomtrade("C",# C for commodities, S for Services
                     "A",# (freqCode) A for annual and M for monthly
                     flowCode=flow,
                     reporterCode=m49_china,
                     partnerCode=partnerCode,
                     partner2Code=None,
                     cmdCode='TOTAL',
                     period=period,
                     timeout=30, echo_url=True
                     )
result = df.sort_values(['partnerDesc','flowCode'])[cols2]
# print(result.to_markdown())
result

KeyboardInterrupt: 

In [128]:
result.to_excel(f"./downloads/partner2_{partnerCode}_{period}.xlsx")

Isto não acontece senão nos anos de 2015,2016 e 2017.

Por exemplo a mesma chamada para 2018 dá apenas uma linha. 


In [102]:
period = "2017" ## if freqCode M  use aaaamm
df = call_uncomtrade("C",# C for commodities, S for Services
                     "A",# (freqCode) A for annual and M for monthly
                     flowCode=flow,
                     reporterCode=m49_china,
                     partnerCode=partnerCode,
                     partner2Code=None,
                     cmdCode='TOTAL',
                     period=period,
                     timeout=None,
                     echo_url=True
                     )
result = df.sort_values(['partnerDesc','flowCode'])[cols2]
# print(result.to_markdown())
result

https://comtradeapi.un.org/public/v1/preview/C/A/HS?reporterCode=156&period=2018&partnerCode=620&cmdCode=TOTAL&flowCode=M


Unnamed: 0,reporterDesc,partnerDesc,partner2Code,partner2Desc,refYear,cmdCode,flowCode,primaryValueFormated
0,China,Portugal,0,,2018,TOTAL,M,2249055585.0


A função `call_uncomtrade` passou a colocar `partner2Code = 0` quando não especificado,
para evitar o problema.

Neste exemplo não se inclui o parâmetro `partner2Code` e a função coloca a zero para obter o resultado
correcto.

In [103]:
period = "2016"
df = call_uncomtrade("C",# C for commodities, S for Services
                     "A",# (freqCode) A for annual and M for monthly
                     flowCode=flow,
                     reporterCode=m49_china,
                     partnerCode=partnerCode,
                     cmdCode='TOTAL',
                     period=period,
                     timeout=60,
                     echo_url=True
                     )
result = df.sort_values(['partnerDesc','flowCode'])[cols2]
# print(result.to_markdown())
result

https://comtradeapi.un.org/public/v1/preview/C/A/HS?reporterCode=156&period=2016&partnerCode=620&partner2Code=0&cmdCode=TOTAL&flowCode=M


Unnamed: 0,reporterDesc,partnerDesc,partner2Code,partner2Desc,refYear,cmdCode,flowCode,primaryValueFormated
0,China,Portugal,0,,2016,TOTAL,M,1582818088.0


Não parece ser possível obter as informações referentes a partner2
 senão nos anos 2015-2017.

In [107]:
period = "2018" ## if freqCode M  use aaaamm

df = call_uncomtrade("C",# C for commodities, S for Services
                     "A",# (freqCode) A for annual and M for monthly
                     flowCode=flow,
                     reporterCode=m49_china,
                     partnerCode=partnerCode,
                     partner2Code=m49_hong_kong,
                     cmdCode='TOTAL',
                     period=period,
                     timeout=None,
                     echo_url=True
                     )
result = df.sort_values(['partnerDesc','flowCode'])[cols2]
# print(result.to_markdown())
result

https://comtradeapi.un.org/public/v1/preview/C/A/HS?reporterCode=156&period=2018&partnerCode=620&partner2Code=344&cmdCode=TOTAL&flowCode=M




AttributeError: 'NoneType' object has no attribute 'sort_values'

## Reproduzir os dados do anuário do Forum Macau

O Forum Macau inclui desde 2016 um quadro sumário
dos dados das trocas comerciais entre a China e 
os PLP.

Aqui reproduzimos esses quadros a partir dos
dados atualmente disponíveis no UN Comtrade.

>NOTA: Tal como nos quadros do Forum os valores são
apresentados em 10 mil USD

In [117]:


pd.set_option('display.max_columns', 100)
pd.set_option('display.max_rows', 500)

cols2 = ['reporterDesc','partnerDesc','partner2Code','partner2Desc','refYear','cmdCode','flowCode','primaryValueFormated']
period = "2016,2017,2018,2019,2020,2021" ## if freqCode M  use aaaamm
flow = "M,X"
partnerCode = m49_plp_list
df = call_uncomtrade("C",# C for commodities, S for Services
                     "A",# (freqCode) A for annual and M for monthly
                     flowCode=flow,
                     reporterCode=m49_china,
                     partnerCode=partnerCode,
                     cmdCode='TOTAL',
                     period=period,
                     timeout=30
                     )

temp=df.pivot(index=['refYear','partnerDesc',],columns='flowCode',values='primaryValue')
temp['C'] = temp['M'] + temp['X']
forum_type = temp[['C','X','M']]
forum_type.columns=['Trocas','Exportações','Importações']
forum_type.div(10)

Unnamed: 0_level_0,Unnamed: 1_level_0,Trocas,Exportações,Importações
refYear,partnerDesc,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
2016,Angola,"$1,564,651,570.40","$168,039,862.40","$1,396,611,708.00"
2016,Brazil,"$6,783,120,001.10","$2,197,615,276.50","$4,585,504,724.60"
2016,Cabo Verde,"$4,875,471.90","$4,867,143.80","$8,328.10"
2016,Equatorial Guinea,"$78,010,533.70","$14,825,383.10","$63,185,150.60"
2016,Guinea-Bissau,"$2,139,768.70","$2,123,646.00","$16,122.70"
2016,Mozambique,"$178,791,466.80","$130,855,779.70","$47,935,687.10"
2016,Portugal,"$558,493,761.40","$400,211,952.60","$158,281,808.80"
2016,Sao Tome and Principe,"$642,221.30","$640,449.10","$1,772.20"
2016,Timor-Leste,"$16,456,440.60","$16,427,280.00","$29,160.60"
2017,Angola,"$2,295,616,269.20","$225,745,322.50","$2,069,870,946.70"


### Gravar em Excel

In [120]:
forum_type.to_excel(f"./downloads/quadros_forum_{period.replace(',','_')}.xlsx")
forum_type.div(10).to_excel(f"./downloads/quadros_forum_10k_{period.replace(',','_')}.xlsx")
forum_type.div(10000).to_excel(f"./downloads/quadros_forum_10M_{period.replace(',','_')}.xlsx")

## Cobertura dos dados: importações/exportações China-PLP anos disponíveis

In [None]:
import time

pd.set_option('display.max_columns', 100)
pd.set_option('display.max_rows', 500)

flow = "M,X"
for country_code in m49_plp:
    df = call_uncomtrade("C",# C for commodities, S for Services
                     "A",# (freqCode) A for annual and M for monthly
                     flowCode=flow,
                     reporterCode=m49_china,
                     partnerCode=country_code,
                     period=None # period None returns all the available period
                     )
    print(m49_codes_cn_plp[country_code],  df.refYear.unique())
    time.sleep(1)  # avoid stressing the UN server.



## China, importações mais importantes dos PLP

In [None]:
rank_filter = 5  # número de importações mais relevantes
years = "2020,2021"
pco_cols = ['reporterDesc','partnerDesc','refYear','rank','cmdDesc',
            'flowCode','primaryValueFormated']
df = call_uncomtrade("C",# C for commodities, S for Services
                     "A",# (freqCode) A for annual and M for monthly
                     flowCode="M",
                     cmdCode="AG2",
                     reporterCode=m49_china,
                     partnerCode=m49_plp_list,
                     period=years 
                     )

pco = df.sort_values(['partnerDesc','refYear','primaryValue'], ascending=[True,True,False])
pco['rank'] = pco.groupby(['partnerDesc','refYear'])["primaryValue"].rank(method="dense", ascending=False)
pco_top5 = pco[pco['rank'] <= rank_filter]
pco_top5[pco_cols].set_index(['reporterDesc','partnerDesc','refYear'])

### Guardar formato excel


In [None]:
filename_note=years  # change to append to filename
pco_top5.to_excel(f"./downloads/china_plp_import_top5_{filename_note}.xlsx")

## China exportações mais importantes para os PLP

In [None]:
rank_filter = 5  # número de importações mais relevantes
years = "2021"
pco_cols = ['reporterDesc','partnerDesc','refYear','rank','cmdDesc',
            'flowCode','primaryValueFormated']
df = call_uncomtrade("C",# C for commodities, S for Services
                     "A",# (freqCode) A for annual and M for monthly
                     flowCode="X",
                     cmdCode="AG2",
                     reporterCode=m49_china,
                     partnerCode=m49_plp_list,
                     period=years 
                     )

pco = df.sort_values(['partnerDesc','refYear','primaryValue'], ascending=[True,True,False])
pco['rank'] = pco.groupby(['partnerDesc','refYear'])["primaryValue"].rank(method="dense", ascending=False)
pco_top5 = pco[pco['rank'] <= rank_filter]
pco_top5[pco_cols].set_index(['reporterDesc','partnerDesc','refYear'])

### Guardar formato excel


In [None]:
filename_note=years  # change to append to filename
pco_top5.to_excel(f"./downloads/china_plp_export_top5_{filename_note}.xlsx")

### China, detalhe das exportações mais importants dos PLP para a China

Em dois passos: 
* obtém as categorias de nível 2 mais importantes de cada país
* pesquisa todos as subcategorias de cada

In [None]:
rank_filter = 5  # número de importações mais relevantes
years = "2020,2021"
pco_cols = ['reporterDesc','partnerDesc','refYear','rank','cmdDesc',
            'flowCode','primaryValueFormated']
df = call_uncomtrade("C",# C for commodities, S for Services
                     "A",# (freqCode) A for annual and M for monthly
                     flowCode="M",
                     cmdCode="AG2",
                     reporterCode=m49_china,
                     partnerCode=m49_plp_list,
                     period=years 
                     )

pco = df.sort_values(['partnerDesc','refYear','primaryValue'], ascending=[True,True,False])
pco['rank'] = pco.groupby(['partnerDesc','refYear'])["primaryValue"].rank(method="dense", ascending=False)
pco_top5 = pco[pco['rank'] <= rank_filter]
# get the countries
countries = pco_top5.partnerDesc.unique()
country_cmd_top5_codes = dict()
for country in countries:
    l2_codes = pco_top5[pco_top5.partnerDesc == country]['cmdCode'].unique()
    print(country,l2_codes)
    hs_details = []
    for l2_code in l2_codes:
        l2_sub_codes = list(hs_codes_df[hs_codes_df.hscode.str.startswith(l2_code)]['hscode'])
        hs_details = hs_details + l2_sub_codes
    # print(hs_details)
    country_cmd_top5_codes[country] = hs_details.copy()


Exemplo dos códigos relevantes para o detalhe de Angola

In [None]:
country_cmd_top5_codes['Angola']