# Análise dos produtos comercializados

## Criar directoria de apoio 

In [1]:
from pathlib import Path

Path('support').mkdir(parents=True, exist_ok=True)
Path('reports').mkdir(parents=True, exist_ok=True)

## Criar ficheiro de configuração se não existir

O ficheiro de configuração serve para armazenar
a APIKEY que dá acesso a mais de 500 linhas de
resultados.


In [2]:
import os

fname = 'config.ini'
content = """
# Ficheiro de configuração
[comtrade]
# Add API Key. DO NOT SHARE
key = APIKEYHERE
"""
if not os.path.isfile(fname):
  print("Criando ficheiro de configuração")
  with open(fname,'w') as f:
    f.write(content)


## China, categorias de produtos mais importantes nas trocas com os PLP

Obtem os totais agregados com nível 2 de código HS e lista os primeiros.

Para significado detalhado de cada código ver: https://www.wcoomd.org/en/topics/nomenclature/instrument-and-tools/hs-nomenclature-2022-edition/hs-nomenclature-2022-edition.aspx


## Obter uma chave de acesso à API

Para aceder à UN Comtrade via API sem limites é necessário uma chave de acesso,
de outro modo os resultados são limitados a 500 linhas.

Para obter a chave de acsso:
* Registo em https://comtradedeveloper.un.org/
* Ir para _Products_ 
* Selecionar "Premium Individual APIs" (https://comtradedeveloper.un.org/product#product=dataapis)
* Escolher _Subscribe to "comtrade - v1"_  
* Esperar pelo email com a chave da API key (demora alguns dias)
* Copiar a chave para o local indicado no ficheiro `config.ini` antes
  de executar o resto do notebook.


In [3]:
import configparser
import comtrade

if os.path.isfile(fname):

    config = configparser.ConfigParser()
    config.read('config.ini')
    APIKEY = config['comtrade']['key']

comtrade.init(APIKEY)

### TODO

- [ ] Permitir escolher os reporter e partners
- [ ] compatibilizar com o formato de report da AICEP https://myaicep.portugalexporta.com/mercados-internacionais/cn/china?setorProduto=-1
- [ ] há problemas de duplicação de linhas nas listagem de detalhes de commodities, provavlmente por custom code, ou mot code.
  

In [4]:
import comtrade
import pandas as pd


rank_filter = 5  # número de importações mais relevantes

year_start = 1993
year_end = 2023
years = comtrade.year_range(year_start,year_end)
reporterCode = comtrade.m49_china
partnerCode = comtrade.m49_plp_list
flowCode = 'M,X'
pco_cols = ['reporterDesc','partnerDesc','refYear','rank','cmdCode','cmdDesc',
            'flowCode','primaryValue']



pco_top5_sorted = comtrade.top_commodities(reporterCode, partnerCode,  years, flowCode,rank_filter, pco_cols)

# save to Excel
# Prepare file name for Excel output
filename_note=f"{years.replace(',','_')}_{flowCode.replace(',','_')}"  # change to append to filename
excel_file_name = f"./reports/china_plp_top5_{filename_note}.xlsx"
excel_file = pd.ExcelWriter(excel_file_name)

# Save data
sheet_name = "comtrade"
pco_top5_sorted.to_excel(excel_file, sheet_name=sheet_name, startrow=2)
# Adjust column width
comtrade.excel_col_autowidth(pco_top5_sorted, excel_file,sheet_name)

# format column primaryValue as currency
comtrade.excel_format_currency(pco_top5_sorted, excel_file,sheet_name, columns=['primaryValue'],width=20)
    
# Write title in first row TBD
excel_file_title = f"Trocas Comerciais China-PLP, top {rank_filter} produtos (M=Importações, X=Exportações), {years}, valores USD"
excel_file.sheets[sheet_name].write(0, 0, excel_file_title)
print("Guardado em:",excel_file_name)

excel_file.close()
# show
pd.options.display.max_colwidth=100
pd.options.display.float_format = '{:,.2f}'.format
pd.options.display.max_rows = 100
pco_top5_sorted.head(100)

Guardado em: ./reports/china_plp_top5_1993_1994_1995_1996_1997_1998_1999_2000_2001_2002_2003_2004_2005_2006_2007_2008_2009_2010_2011_2012_2013_2014_2015_2016_2017_2018_2019_2020_2021_2022_M_X.xlsx


Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,Unnamed: 3_level_0,Unnamed: 4_level_0,cmdCode,cmdDesc,primaryValue
reporterDesc,partnerDesc,flowCode,refYear,rank,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
China,Angola,M,1993,1,27,"Mineral fuels, mineral oils and products of their distillation; bituminous substances; mineral w...",179857111.0
China,Angola,M,1993,2,84,"Machinery and mechanical appliances, boilers, nuclear reactors; parts thereof",839509.0
China,Angola,M,1993,3,85,Electrical machinery and equipment and parts thereof; sound recorders and reproducers; televisio...,12707.0
China,Angola,M,1993,4,48,"Paper and paperboard; articles of paper pulp, of paper or paperboard",1135.0
China,Angola,M,1994,1,27,"Mineral fuels, mineral oils and products of their distillation; bituminous substances; mineral w...",47971662.0
China,Angola,M,1994,2,71,"Natural, cultured pearls; precious, semi-precious stones; precious metals, metals clad with prec...",9835.0
China,Angola,M,1994,3,84,"Machinery and mechanical appliances, boilers, nuclear reactors; parts thereof",1136.0
China,Angola,M,1994,4,39,Plastics and articles thereof,1034.0
China,Angola,M,1994,5,34,"Soap, organic surface-active agents; washing, lubricating, polishing or scouring preparations; a...",657.0
China,Angola,M,1995,1,27,"Mineral fuels, mineral oils and products of their distillation; bituminous substances; mineral w...",135763069.0


## Análise de variações nos produtos mais importantes

A célula seguinte produz uma lista das variações mais importantes na evolução das 
trocas comerciais entre a China e os Países de Língua Portuguesa

Por cada combinação de países e de fluxo (M=importações pela China, X=exportações da China)
por exemplo:

        China Brazil M

é apresentada uma linha por ano, por exemplo:

        1993   72,26
                -> 72 Iron and steel
                -> 26 Ores, slag and ash
        1994 ! 15,72
                -> 15 Animal, vegetable or microbial fats and oils and their cleavage products; prepared edible fats; animal or vegetable waxes
        1995 ! 15,26
        1996 ! 15,23
                -> 23 Food industries, residues and wastes thereof; prepared animal fodder
        1997 ≈ 23,15
        1998 ! 26,23
        1999 ! 26,12
                -> 12 Oil seeds and oleaginous fruits; miscellaneous grains, seeds and fruit, industrial or medicinal plants; straw and fodder
        2000 ≈ 12,26
        2001 ≈ 26,12
        2002 ≈ 12,26
        2003 =
        2004 ≈ 26,12
        2005 =

Cada ano é seguido de um sinal que indica a mudança ocorrida em relação ao ano anterior:
* ! : mudança num dos produtos principais
* = : mesmos produtos do ano anterior, pela mesma ordem de importância
* ≈ : mesmos produtos do ano anterior, mas importância diferente


Cada vez que um novo produto aparece na série é fornecida a descrição do código respectivo.

Esta listagem pode ser produzida tendo em conta um número variáveis de produtos principais
normalmente os dois principais.

In [5]:
consider_first = 2
for reporter in pco_top5_sorted.index.get_level_values(0).unique():
    for partner in pco_top5_sorted.loc[(reporter,)].index.get_level_values(0).unique(): 
        for flow in pco_top5_sorted.loc[((reporter,partner,))].index.get_level_values(0).unique():
            print(reporter,partner, flow)
            previous_pattern = []
            all_commodities = set() # all commodities for this partner for this flow
            all_commodities_desc = set() # all commodities for this partner for this flow descriptions
            for year in pco_top5_sorted.loc[(reporter,partner,flow,)].index.get_level_values(0).unique():
                print(year, end=' ')
                top5_codes = pco_top5_sorted.loc[(reporter,partner,flow,year)]['cmdCode'].astype(str).values.flatten().tolist()[0:consider_first]
                top5_codes_desc = pco_top5_sorted.loc[(reporter,partner,flow,year)]['cmdDesc'].astype(str).values.flatten().tolist()[0:consider_first]
                top5_primary_value = pco_top5_sorted.loc[(reporter,partner,flow,year)]['primaryValue'].values.flatten().tolist()[0:consider_first]
                top5_table = {code: {"desc":desc, "value":value} for code,desc,value in zip(top5_codes,top5_codes_desc,top5_primary_value)}
                codes_desc = {f"{code} {desc}" for code,desc in zip(top5_codes,top5_codes_desc)}
                # make set of tuples top5_codes and top5_codes_desc
                new_commodities = set(top5_codes).difference(all_commodities)
                all_commodities = all_commodities.union(set(top5_codes))
                all_commodities_desc = all_commodities_desc.union(codes_desc)
                pattern = ','.join(top5_codes)
                if len(previous_pattern) == 0:
                    previous_pattern = top5_codes
                    print(' ', pattern)
                    for cmd in new_commodities:
                        print( "        ->", cmd, f"{top5_table[cmd]['value']:16,.2f}",top5_table[cmd]['desc'])
                elif set(top5_codes) == set(previous_pattern):
                    if top5_codes == previous_pattern:
                        print('=')
                    else:
                        print('≈', pattern)
                else:
                    print('!', pattern)
                    if len(new_commodities) > 0:
                        for cmd in new_commodities:
                            print( "        ->", cmd, f"{top5_table[cmd]['value']:16,.2f}", top5_table[cmd]['desc'])
                       
                previous_pattern = top5_codes
            for cmd in sorted(all_commodities_desc):
                print( "    ",cmd)
            print(80*"-")



China Angola M
1993   27,84
        -> 27   179,857,111.00 Mineral fuels, mineral oils and products of their distillation; bituminous substances; mineral waxes
        -> 84       839,509.00 Machinery and mechanical appliances, boilers, nuclear reactors; parts thereof
1994 ! 27,71
        -> 71         9,835.00 Natural, cultured pearls; precious, semi-precious stones; precious metals, metals clad with precious metal, and articles thereof; imitation jewellery; coin
1995 ! 27,84
1996 ! 27,03
        -> 03       131,488.00 Fish and crustaceans, molluscs and other aquatic invertebrates
1997 ! 27,39
        -> 39       209,792.00 Plastics and articles thereof
1998 ! 27,03
1999 ! 27,84
2000 ! 27,44
        -> 44       671,664.00 Wood and articles of wood; wood charcoal
2001 =
2002 ! 27,03
2003 ! 27,25
        -> 25       232,059.00 Salt; sulphur; earths, stone; plastering materials, lime and cement
2004 ! 27,71
2005 =
2006 =
2007 =
2008 =
2009 =
2010 ! 27,25
2011 ! 27,71
2012 =
2013 =
2014 =

## Detalhe do tipo de produtos comercializados entre os PLP e a China

Em dois passos: 
* obtém as categorias de nível 2 mais importantes de cada país
* pesquisa todos as subcategorias de cada um das categorias

Para significado detalhado de cada código ver: https://www.wcoomd.org/en/topics/nomenclature/instrument-and-tools/hs-nomenclature-2022-edition/hs-nomenclature-2022-edition.aspx

In [6]:

import ipywidgets as widgets
from IPython.display import display
import comtrade

rank_filter = 5  # número de importações mais relevantes
years = "2021"
partnerCode = comtrade.m49_angola # 
flowCode="M"

# select year with widget from list range(2000,2022)

years = widgets.SelectMultiple(
    options=range(2000,2022),
    value=[2021],
    description='Ano:',
    disabled=False,
)

# select country widget from list Angola, Brazil, Cabo Verde, Guiné Bissau, Equatorial Guinea, Mozambique, Portugal, São Tomé e Príncipe, Timor-Leste
reporterCodeWidget = widgets.Dropdown(
    options=[("China", comtrade.m49_china),("Macau",comtrade.m49_macau) , ("Hong Kong", comtrade.m49_hong_kong)] + comtrade.PLP_TUPLES_REVERSE ,
    # value=[("Angola",m49_angola)],
    description='Fonte (reporter):',
    disabled=False,
)

# select country widget from list Angola, Brazil, Cabo Verde, Guiné Bissau, Equatorial Guinea, Mozambique, Portugal, São Tomé e Príncipe, Timor-Leste
partnerCodeWidget = widgets.Dropdown(
    options=[("China", comtrade.m49_china),("Macau",comtrade.m49_macau) , ("Hong Kong", comtrade.m49_hong_kong)] + comtrade.PLP_TUPLES_REVERSE ,
    # value=[("Angola",m49_angola)],
    description='Parceiro:',
    disabled=False,
)

# select flowCode widget from list M=Importações, X=Exportações
flowCodeWidget = widgets.Dropdown(
    options=[("Importações", "M"), ("Exportações", "X")],
    value="M",
    description='Fluxo:',
    disabled=False, 
)



### Escolher ano, parceiro e fluxo (import/export) para análise 

Pode escolher-se mais que um ano.

In [90]:

display(years)
display(reporterCodeWidget)
display(partnerCodeWidget)
display(flowCodeWidget)


SelectMultiple(description='Ano:', index=(14,), options=(2000, 2001, 2002, 2003, 2004, 2005, 2006, 2007, 2008,…

Dropdown(description='Fonte (reporter):', options=(('China', 156), ('Macau', 446), ('Hong Kong', 344), ('Angol…

Dropdown(description='Parceiro:', index=7, options=(('China', 156), ('Macau', 446), ('Hong Kong', 344), ('Ango…

Dropdown(description='Fluxo:', options=(('Importações', 'M'), ('Exportações', 'X')), value='M')

Depois de escolher ano, país e fluxo executar a célula seguinte.

Para significado detalhado de cada código ver: https://www.wcoomd.org/en/topics/nomenclature/instrument-and-tools/hs-nomenclature-2022-edition/hs-nomenclature-2022-edition.aspx

In [91]:
import comtrade
from comtrade import COUNTRY_CODES, HS_CODES, HS_CODES_DF, HS_CODES_L2_DF


yearsList = ",".join(list(map(str,years.value)))
partnerCode = partnerCodeWidget.value
reporterCode = reporterCodeWidget.value


# decode country
partner_name = comtrade.COUNTRY_CODES.get(partnerCode)
reporter_name = comtrade.COUNTRY_CODES.get(reporterCode)

flow = flowCodeWidget.value
print(reporter_name,yearsList,flow,partner_name,)

pco_cols = ['reporterDesc','partnerDesc','refYear','rank','cmdDesc',
            'flowCode','primaryValueFormated']

pco_cols_detail = ['reporterDesc','partnerDesc','refYear','cmdCodeAG2','cmdCode','cmdDesc',
            'flowCode','primaryValue', 'isAggregate']

# first we collect the top commodity level 2 codes            
df = comtrade.get_data("C",# C for commodities, S for Services
                     "A",# (freqCode) A for annual and M for monthly
                     flowCode=flow,
                     cmdCode="AG2",
                     reporterCode=reporterCode,
                     partnerCode=partnerCode,
                     period=yearsList,
                     timeout=30,
                     echo_url=True
                     )

pco = df.sort_values(['partnerDesc','refYear','primaryValue'], ascending=[True,True,False])
pco['rank'] = pco.groupby(['partnerDesc','refYear'])["primaryValue"].rank(method="dense", ascending=False)
pco_top5 = pco[pco['rank'] <= rank_filter]
# get the countries
countries = pco_top5.partnerDesc.unique()
country_cmd_top5_codes = dict()

# get the detailed commodity codes for the top of each country
for country in countries:
    l2_codes = pco_top5[pco_top5.partnerDesc == country]['cmdCode'].unique()
    print(country,l2_codes)
    hs_details = []
    for l2_code in l2_codes:
        l2_sub_codes = list(HS_CODES_DF[HS_CODES_DF.hscode.str.startswith(l2_code)]['hscode'])
        hs_details = hs_details + l2_sub_codes
    # print(hs_details)
    country_cmd_top5_codes[country] = hs_details.copy()

# now we fetch the detail
countryDesc = COUNTRY_CODES[partnerCode]
country_cmd_codes = ",".join(country_cmd_top5_codes[countryDesc])
print(country_cmd_codes)

df = comtrade.get_data("C",# C for commodities, S for Services
                     "A",# (freqCode) A for annual and M for monthly
                     flowCode=flow,
                     #cmdCode=country_cmd_codes, # sometimes the detail commodity codes generates errors
                     cmdCode=None, # This gives all the commodities for the country and year 
                     reporterCode=reporterCode,
                     partnerCode=partnerCode,
                     partner2Code=0,
                     period=yearsList,
                     echo_url=True,
                     timeout=60
                     )
                     
df['cmdCodeAG2'] = df.cmdCode.str[0:2]
# filter the detail commodity codes by the top 5
df = df[df['cmdCodeAG2'].isin(country_cmd_top5_codes[countryDesc])]

China 2014 M Mozambique
https://comtradeapi.un.org/data/v1/get//C/A/HS?reporterCode=156&period=2014&partnerCode=508&partner2Code=0&cmdCode=AG2&flowCode=M&customsCode=C00&subscription-key=HIDDEN
Mozambique ['71' '44' '12' '26' '27']
71,7101,710110,710121,710122,7102,710210,710221,710229,710231,710239,7103,710310,710391,710399,7104,710410,710421,710429,710491,710499,7105,710510,710590,7106,710610,710691,710692,7107,710700,7108,710811,710812,710813,710820,7109,710900,7110,711011,711019,711021,711029,711031,711039,711041,711049,7111,711100,7112,711230,711291,711292,711299,7113,711311,711319,711320,7114,711411,711419,711420,7115,711510,711590,7116,711610,711620,7117,711711,711719,711790,7118,711810,711890,44,4401,440111,440112,440121,440122,440131,440132,440139,440141,440149,4402,440210,440220,440290,4403,440311,440312,440321,440322,440323,440324,440325,440326,440341,440342,440349,440391,440393,440394,440395,440396,440397,440398,440399,4404,440410,440420,4405,440500,4406,440611,440612,44069

Temos que identificar as linhas com valor agregado porque
os dados não tem essa coluna correcta

In [92]:
# set the index to reporterDesc, partnerDesc, refYear, cmdCodeAG2
df.sort_values(['reporterDesc','partnerDesc','refYear','flowCode','cmdCode'], inplace=True)
# print(df2.info())

# list unique values of MultiIndex
lastCode = "---"
lastIndex = 0
for row in df.iterrows():
    currentCode = row[1]['cmdCode']
    if currentCode != lastCode and currentCode.startswith(lastCode):
        # print(f">>>> Last code {lastCode} index {lastIndex} is parent of {currentCode}")
        df.loc[lastIndex,'isAggregate'] = True
    # print(df.loc[row[0]][['cmdCode','cmdDesc']])
    lastCode = currentCode
    lastIndex = row[0]
df['valueForTotal'] = df['primaryValue']
df.loc[df['isAggregate'] == True,'valueForTotal'] = 0
df['subtotalAG2'] = df.groupby(['partnerCode','refYear','flowCode','cmdCodeAG2'])["valueForTotal"].transform('sum')
df['percentAG2'] = df['primaryValue'] / df['subtotalAG2'] * 100

In [93]:


df[pco_cols_detail+['percentAG2']].sort_values('primaryValue',ascending=False).to_excel(f"./reports/product_detail_{reporter_name}_{yearsList}_{flow}_{partner_name}.xlsx")

pd.options.display.max_rows = 500
# df.sort_values(['partnerDesc','refYear','flowCode','subtotalAG2','cmdCodeAG2','primaryValue'],ascending=[True,True,True,False,True,False])[pco_cols_detail].head(500)
df.sort_values(['reporterDesc','partnerDesc','refYear','subtotalAG2','cmdCode'],ascending=[True,True,True,False,True])[pco_cols_detail+['percentAG2']].head(500)


Unnamed: 0,reporterDesc,partnerDesc,refYear,cmdCodeAG2,cmdCode,cmdDesc,flowCode,primaryValue,isAggregate,percentAG2
95,China,Mozambique,2014,71,71,"Natural, cultured pearls; precious, semi-precious stones; precious metals, metals clad with prec...",M,1028828072.0,True,100.0
96,China,Mozambique,2014,71,7103,"Precious (excluding diamond) and semi-precious stone; worked, graded, not strung, mounted, set; ...",M,1028828072.0,True,100.0
97,China,Mozambique,2014,71,710310,"Stones; precious (other than diamonds) and semi-precious stones, unworked or simply sawn or roug...",M,2844499.0,False,0.28
98,China,Mozambique,2014,71,710391,"Stones; rubies, sapphires and emeralds, worked (other than simply sawn or roughly shaped), not s...",M,201736.0,False,0.02
99,China,Mozambique,2014,71,710399,"Stones; precious (other than diamonds) and semi-precious stones, (other than rubies, sapphires a...",M,1025781837.0,False,99.7
74,China,Mozambique,2014,44,44,Wood and articles of wood; wood charcoal,M,410136287.0,True,100.0
75,China,Mozambique,2014,44,4403,"Wood in the rough, whether or not stripped of bark or sapwood, or roughly squared",M,334708889.0,True,81.61
76,China,Mozambique,2014,44,440349,"Wood, tropical; other than dark red meranti, light red meranti meranti bakau and teak, in the ro...",M,171470.0,False,0.04
77,China,Mozambique,2014,44,440399,"Wood; in the rough, whether or not stripped of bark or sapwood, or roughly squared, untreated, n...",M,334537419.0,False,81.57
78,China,Mozambique,2014,44,4407,"Wood sawn or chipped lengthwise, sliced or peeled, whether or not planed, sanded or finger-joint...",M,75427398.0,True,18.39
