# Top commodities in trade between partners

This notebook will collect data on the top commodities in trade between partners for a given period.

Partners can be a single country or a group of countries



## Setup 

See [here](0-comtrade-setup-first.ipynb) for instructions on how to obtain an API Key.
Without APIKEY results are limited to a maximum of 500 rows.


In [1]:
import comtradetools as comtradetools

comtradetools.setup()
APIKEY = comtradetools.get_api_key()
comtradetools.init(APIKEY, force_init=False)

INFO:root:Loading HS codes from support/harmonized-system.csv


## Top commodities in trade between partners

Obtains main commodities in trade between partners for a given period, using 2 digit HS codes.

For the meaning of codes see: https://www.wcoomd.org/en/topics/nomenclature/instrument-and-tools/hs-nomenclature-2022-edition/hs-nomenclature-2022-edition.aspx


### TODO

- [ ] Permitir escolher os reporter e partners
- [ ] compatibilizar com o formato de report da AICEP https://myaicep.portugalexporta.com/mercados-internacionais/cn/china?setorProduto=-1
- [ ] há problemas de duplicação de linhas nas listagem de detalhes de commodities, provavlmente por custom code, ou mot code.
  

Define:
* `reporter` - country or countries providing the data
* `partner` - country or countries trading with the reporter
* `years_of_interest` - year or period for which data is requested
* `rank_filter` - number of top commodities to be analyzed
* `flowCode` M for imports, X for exports, M,X for both

In [8]:
import comtradetools as comtradetools

reporterDesc = 'China'
reporterCode = comtradetools.encode_country(reporterDesc)
partnerDesc = 'PLP'
partnerCode = comtradetools.m49_plp_list
year_start = 2003
year_end = 2023
years_of_interest = comtradetools.year_range(year_start,year_end+1)
rank_filter = 5  # número de importações mais relevantes
flowCode = 'M,X'


In [5]:
import comtradetools as comtrade
import pandas as pd



pco_cols = ['reporterDesc','partnerDesc','refYear','pyf_rank','cmdCode','cmdDesc',
            'flowCode','primaryValue','pyf_sum','pyf_perc']

periods = comtrade.split_period(years_of_interest,12)
df = pd.DataFrame()
for period in periods:
    print(f"Fetching data for period {period}")

    temp = comtrade.getFinalData(APIKEY,
                                typeCode="C",# C for commodities, S for Services
                                freqCode="A",# (freqCode) A for annual and M for monthly
                                flowCode=flowCode,
                                cmdCode="AG2",
                                reporterCode=reporterCode,
                                partnerCode=partnerCode,
                                period=period,
                                partner2Code=0,
                                clCode="HS",
                                customsCode='C00',
                                motCode=0,
                                includeDesc=True,
                                cache=True,
                                use_alternative=False
                                )
    df = pd.concat([df,temp], ignore_index=True)

# Subtotal by partner, year (not flow?)
pco = df.sort_values(['partnerDesc','refYear','primaryValue'], ascending=[True,True,False])
# Determine rank of each partner in import and export
pco['pyf_rank'] = pco.groupby(['partnerDesc','refYear','flowCode'])["primaryValue"].rank(method="dense", ascending=False)
pco['pyf_sum'] = pco.groupby(['partnerDesc','refYear','flowCode'])["primaryValue"].transform('sum')
pco['pyf_perc'] = pco['primaryValue'] / pco['pyf_sum']
# convert rank column to int
pco['pyf_rank'] = pco['pyf_rank'].astype(int)

# filter top partners
pco_tops = pco[pco['pyf_rank'] <= rank_filter]
cmdCodes_tops = pco_tops['cmdCode'].unique()
print(f"Product categories involved in exchanges with top partners:\n {cmdCodes_tops}")




INFO:root:Calling getFinalData for period 2003,2004,2005,2006,2007,2008,2009,2010,2011,2012,2013,2014
INFO:root:Using cached results for period 2003,2004,2005,2006,2007,2008,2009,2010,2011,2012,2013,2014


Fetching data for period 2003,2004,2005,2006,2007,2008,2009,2010,2011,2012,2013,2014
Fetching data for period 2015,2016,2017,2018,2019,2020,2021,2022,2023,2024


INFO:root:Calling getFinalData for period 2015,2016,2017,2018,2019,2020,2021,2022,2023,2024
INFO:root:Using cached results for period 2015,2016,2017,2018,2019,2020,2021,2022,2023,2024


Product categories involved in exchanges with top partners:
 ['27' '85' '64' '87' '52' '84' '25' '71' '39' '20' '90' '40' '73' '44'
 '03' '72' '74' '76' '94' '68' '62' '22' '01' '26' '63' '12' '47' '15'
 '29' '54' '41' '17' '02' '89' '38' '10' '95' '70' '34' '60' '61' '09'
 '16' '69' '99' '92' '49' '82' '48' '65' '97' '33' '30' '56' '08' '04'
 '21' '55' '96' '28' '31' '86' '45' '42' '83' '06' '07' '32' '14' '91'
 '78' '46']


In [6]:
# show
pd.options.display.max_colwidth=100
pd.options.display.max_rows=1600
pd.options.display.float_format = '{:,.2f}'.format

pco_tops_sorted = pco_tops[pco_cols].set_index(['reporterDesc','partnerDesc','flowCode','refYear','pyf_rank']).sort_index()
pco_tops_sorted.head(50).style.format(comtrade.make_format(list(pco_tops_sorted.columns)))


Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,Unnamed: 3_level_0,Unnamed: 4_level_0,cmdCode,cmdDesc,primaryValue,pyf_sum,pyf_perc
reporterDesc,partnerDesc,flowCode,refYear,pyf_rank,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1
China,Angola,M,2003,1,27,"Mineral fuels, mineral oils and products of their distillation; bituminous substances; mineral waxes","$2,205,654,993","$2,205,934,880",99.987%
China,Angola,M,2003,2,25,"Salt; sulphur; earths, stone; plastering materials, lime and cement","$232,059","$2,205,934,880",0.011%
China,Angola,M,2003,3,71,"Natural, cultured pearls; precious, semi-precious stones; precious metals, metals clad with precious metal, and articles thereof; imitation jewellery; coin","$47,397","$2,205,934,880",0.002%
China,Angola,M,2003,4,39,Plastics and articles thereof,$257,"$2,205,934,880",0.000%
China,Angola,M,2003,5,20,"Preparations of vegetables, fruit, nuts or other parts of plants",$119,"$2,205,934,880",0.000%
China,Angola,M,2004,1,27,"Mineral fuels, mineral oils and products of their distillation; bituminous substances; mineral waxes","$4,716,807,801","$4,717,339,146",99.989%
China,Angola,M,2004,2,71,"Natural, cultured pearls; precious, semi-precious stones; precious metals, metals clad with precious metal, and articles thereof; imitation jewellery; coin","$430,188","$4,717,339,146",0.009%
China,Angola,M,2004,3,25,"Salt; sulphur; earths, stone; plastering materials, lime and cement","$92,383","$4,717,339,146",0.002%
China,Angola,M,2004,4,90,"Optical, photographic, cinematographic, measuring, checking, medical or surgical instruments and apparatus; parts and accessories","$4,080","$4,717,339,146",0.000%
China,Angola,M,2004,5,40,Rubber and articles thereof,"$2,561","$4,717,339,146",0.000%


## Output to Excel file

In [9]:
# Prepare file name for Excel output

real_period = pco_tops_sorted.index.get_level_values('refYear').unique()
start_year=min(real_period)
end_year=max(real_period)
period=f"{start_year}-{end_year}"

filename_note=f"{period}_{flowCode.replace(',','_')}"  # change to append to filename
excel_file_name = f"./reports/{reporterDesc}_{partnerDesc}_tops_{filename_note}.xlsx"
excel_file = pd.ExcelWriter(excel_file_name)
# Save data
sheet_name = "comtrade"
pco_tops_sorted.reset_index().to_excel(excel_file, sheet_name=sheet_name, index=True, startrow=2)
# Adjust column width
comtrade.excel_col_autowidth(pco_tops_sorted, excel_file,sheet_name)

# format column primaryValue as currency
comtrade.excel_format_currency(pco_tops_sorted, excel_file,sheet_name, columns=['primaryValue'],width=20)

# Write title in first row TBD
excel_file_title = f"Commodities {reporterDesc}-{partnerDesc}, top {rank_filter} (M=Import, X=Export), {period} USD"
print(excel_file_title)
excel_file.close()
print("Saved to:",excel_file_name)


Commodities China-PLP, top 5 (M=Import, X=Export), 2003-2022 USD
Saved to: ./reports/China_PLP_tops_2003-2022_M_X.xlsx


## Análise de variações nos produtos mais importantes

A célula seguinte produz uma lista das variações mais importantes na evolução das 
trocas comerciais entre a China e os Países de Língua Portuguesa

Por cada combinação de países e de fluxo (M=importações pela China, X=exportações da China)
por exemplo:
```
China Brazil M
2003   12,26,72 | 12=1,683,531,130.00 (100%) | 26=1,369,127,929.00 (100%) | 72=745,347,034.00 (100%) | 
        -> 26 Ores, slag and ash
        -> 12 Oil seeds and oleaginous fruits; miscellaneous grains, seeds and fruit, industrial or medicinal plants; straw and fodder
        -> 72 Iron and steel
2015 ! 12,26,27 | 12=16,887,916,146.00  (1003%) | 26=12,653,641,892.00  (924%) | 27=5,297,996,353.00  (new:27) | 
        -> 27 Mineral fuels, mineral oils and products of their distillation; bituminous substances; mineral waxes
2016 = 12,26,27 | 12=15,551,903,256.00 (92%) | 26=13,339,552,748.00 (105%) | 27=6,041,262,180.00 (114%) | 
2017 = 12,26,27 | 12=20,916,408,256.00 (134%) | 26=18,261,973,405.00 (137%) | 27=9,189,149,994.00 (152%) | 
2018 = 12,26,27 | 12=28,843,062,797.00 (138%) | 26=19,307,536,897.00 (106%) | 27=16,484,023,407.00 (179%) | 
12 Oil seeds and oleaginous fruits; miscellaneous grains, seeds and fruit, industrial or medicinal plants; straw and fodder
26 Ores, slag and ash
27 Mineral fuels, mineral oils and products of their distillation; bituminous substances; mineral waxes
72 Iron and steel
```
Cada ano é seguido de um sinal que indica a mudança ocorrida em relação ao ano anterior:
* ! : mudança num dos produtos principais
* = : mesmos produtos do ano anterior, pela mesma ordem de importância
* ≈ : mesmos produtos do ano anterior, mas importância diferente


Cada vez que um novo produto aparece na série é fornecida a descrição do código respectivo e o valor envolvido.

Esta listagem pode ser produzida tendo em conta um número variáveis de produtos principais
normalmente os dois principais (ver variável `consider_first`)

In [6]:
consider_first = 3
for reporter in pco_tops_sorted.index.get_level_values(0).unique():
    for partner in pco_tops_sorted.loc[(reporter,)].index.get_level_values(0).unique(): 

        for flow in pco_tops_sorted.loc[((reporter,partner,))].index.get_level_values(0).unique():
            print(reporter,partner, flow)
            print("Percentages show change of commodity value compared to previous year")
            previous_pattern = []
            all_commodities = set() # all commodities for this partner for this flow
            all_commodities_desc = set() # all commodities for this partner for this flow descriptions
            previous_years_values = dict()
            for year in pco_tops_sorted.loc[(reporter,partner,flow,)].index.get_level_values(0).unique():
                print(year, end=' ')
                top5_codes = pco_tops_sorted.loc[(reporter,partner,flow,year)]['cmdCode'].astype(str).values.flatten().tolist()[0:consider_first]
                top5_codes_desc = pco_tops_sorted.loc[(reporter,partner,flow,year)]['cmdDesc'].astype(str).values.flatten().tolist()[0:consider_first]
                top5_primary_value = pco_tops_sorted.loc[(reporter,partner,flow,year)]['primaryValue'].values.flatten().tolist()[0:consider_first]
                top5_table = {code: {"desc":desc, "value":value, "year": year} for code,desc,value in zip(top5_codes,top5_codes_desc,top5_primary_value)}
                codes_desc = {f"{code} {desc}" for code,desc in zip(top5_codes,top5_codes_desc)}
                # make set of tuples top5_codes and top5_codes_desc
                new_commodities = set(top5_codes).difference(all_commodities)
                all_commodities = all_commodities.union(set(top5_codes))
                all_commodities_desc = all_commodities_desc.union(codes_desc)
                pattern = ','.join(top5_codes)
                if len(previous_pattern) == 0:
                    previous_pattern = top5_codes
                    # update previous_years_values with top5_table
                    previous_years_values.update(top5_table)
                    print(' ', pattern, end=' | ')
                    for cmd in top5_codes:
                        print(f"{cmd}={top5_table[cmd]['value']:,.2f} ({top5_table[cmd]['value'] / previous_years_values[cmd]['value']:.0%})", end=' | ')
                    print()
                    for cmd in new_commodities:
                        print( " "*13+"  ->", cmd,top5_table[cmd]['desc'])
                elif set(top5_codes) == set(previous_pattern):
                    if top5_codes == previous_pattern:
                        print('=', pattern, end=' | ')
                    else:
                        print('≈', pattern, end=' | ')
                    for cmd in top5_codes:
                        print(f"{cmd}={top5_table[cmd]['value']:,.2f} ({top5_table[cmd]['value'] / previous_years_values[cmd]['value']:.0%})", end=' | ')
                    print()
                else:
                    print('!', pattern, end=' | ')
                    for cmd in top5_codes:
                        print(f"{cmd}={top5_table[cmd]['value']:,.2f} ", end=' ')
                        if cmd in previous_years_values.keys():
                            print(f"({top5_table[cmd]['value'] / previous_years_values[cmd]['value']:.0%})", end=' | ')
                        else:
                            print(f"(new:{cmd})", end=' | ')
                    print()
                    if len(new_commodities) > 0:
                        for cmd in new_commodities:
                            print( " "*13+"  ->", cmd, top5_table[cmd]['desc'])
                       
                previous_pattern = top5_codes
                previous_years_values.update(top5_table)
            for cmd in sorted(all_commodities_desc):
                print( "    ",cmd)
            print(80*"-")



China Angola M
Percentages show change of commodity value compared to previous year
2003   27,25,71 | 27=2,205,654,993.00 (100%) | 25=232,059.00 (100%) | 71=47,397.00 (100%) | 
               -> 71 Natural, cultured pearls; precious, semi-precious stones; precious metals, metals clad with precious metal, and articles thereof; imitation jewellery; coin
               -> 25 Salt; sulphur; earths, stone; plastering materials, lime and cement
               -> 27 Mineral fuels, mineral oils and products of their distillation; bituminous substances; mineral waxes
2004 ≈ 27,71,25 | 27=4,716,807,801.00 (214%) | 71=430,188.00 (908%) | 25=92,383.00 (40%) | 
2005 ! 27,71,73 | 27=6,575,939,442.00  (139%) | 71=5,645,659.00  (1312%) | 73=120,907.00  (new:73) | 
               -> 73 Iron or steel articles
2006 ! 27,71,25 | 27=10,930,839,396.00  (166%) | 71=2,162,576.00  (38%) | 25=128,801.00  (139%) | 
2007 ! 27,71,44 | 27=12,879,534,443.00  (118%) | 71=8,619,506.00  (399%) | 44=338,307.00  (new:44)

# AQUI

## Detalhe do tipo de produtos comercializados entre os PLP e a China

Em dois passos: 
* obtém as categorias de nível 2 mais importantes de cada país
* pesquisa todos as subcategorias de cada um das categorias

Para significado detalhado de cada código ver: https://www.wcoomd.org/en/topics/nomenclature/instrument-and-tools/hs-nomenclature-2022-edition/hs-nomenclature-2022-edition.aspx

In [10]:

import ipywidgets as widgets
from IPython.display import display
import comtradetools

rank_filter = 5  # número de importações mais relevantes
years = "2021"
partnerCode = comtrade.m49_angola # 
flowCode="M"

# select year with widget from list range(2000,2022)

years = widgets.SelectMultiple(
    options=range(2003,2022),
    value=[2021],
    description='Ano:',
    disabled=False,
)

# select country widget from list Angola, Brazil, Cabo Verde, Guiné Bissau, Equatorial Guinea, Mozambique, Portugal, São Tomé e Príncipe, Timor-Leste
reporterCodeWidget = widgets.Dropdown(
    options=[("China", comtrade.m49_china),("Macau",comtrade.m49_macau) , ("Hong Kong", comtrade.m49_hong_kong)] + comtrade.PLP_TUPLES_REVERSE ,
    # value=[("Angola",m49_angola)],
    description='Fonte (reporter):',
    disabled=False,
)

# select country widget from list Angola, Brazil, Cabo Verde, Guiné Bissau, Equatorial Guinea, Mozambique, Portugal, São Tomé e Príncipe, Timor-Leste
partnerCodeWidget = widgets.Dropdown(
    options=[("China", comtrade.m49_china),("Macau",comtrade.m49_macau) , ("Hong Kong", comtrade.m49_hong_kong)] + comtrade.PLP_TUPLES_REVERSE ,
    # value=[("Angola",m49_angola)],
    description='Parceiro:',
    disabled=False,
)

# select flowCode widget from list M=Importações, X=Exportações
flowCodeWidget = widgets.Dropdown(
    options=[("Importações", "M"), ("Exportações", "X")],
    value="M",
    description='Fluxo:',
    disabled=False, 
)



### Escolher ano, parceiro e fluxo (import/export) para análise 

Pode escolher-se mais que um ano.

In [11]:

display(years)
display(reporterCodeWidget)
display(partnerCodeWidget)
display(flowCodeWidget)


SelectMultiple(description='Ano:', index=(18,), options=(2003, 2004, 2005, 2006, 2007, 2008, 2009, 2010, 2011,…

Dropdown(description='Fonte (reporter):', options=(('China', 156), ('Macau', 446), ('Hong Kong', 344), ('Angol…

Dropdown(description='Parceiro:', options=(('China', 156), ('Macau', 446), ('Hong Kong', 344), ('Angola', 24),…

Dropdown(description='Fluxo:', options=(('Importações', 'M'), ('Exportações', 'X')), value='M')

Depois de escolher ano, país e fluxo executar a célula seguinte.

Para significado detalhado de cada código ver: https://www.wcoomd.org/en/topics/nomenclature/instrument-and-tools/hs-nomenclature-2022-edition/hs-nomenclature-2022-edition.aspx

In [14]:
import time

import comtradetools as comtrade
from comtradetools import COUNTRY_CODES, HS_CODES, HS_CODES_DF, HS_CODES_L2_DF


years_of_interest = comtradetools.year_range(min(years.value),max(years.value))
partnerCode = partnerCodeWidget.value
reporterCode = reporterCodeWidget.value


# decode country
partner_name = comtrade.decode_country(partnerCode)
reporter_name = comtrade.decode_country(reporterCode)

flow = flowCodeWidget.value
print(reporter_name,years_of_interest,flow,partner_name,)

pco_cols = ['reporterDesc','partnerDesc','refYear','rank','cmdDesc',
            'flowCode','primaryValueFormated']

pco_cols_detail = ['reporterDesc','partnerDesc','refYear','cmdCodeAG2','motDesc','cmdCode','cmdDesc',
            'flowCode','primaryValue', 'isAggregate']

# first we collect the top commodity level 2 codes            
df = comtrade.getFinalData(
                        APIKEY,
                        typeCode="C",# C for commodities, S for Services
                        freqCode="A",# (freqCode) A for annual and M for monthly
                        flowCode=flow,
                        cmdCode="AG2",
                        reporterCode=reporterCode,
                        partnerCode=partnerCode,
                        partner2Code=0,
                        period=years_of_interest,
                        motCode = 0,
                        clCode="HS",
                        customsCode='C00',
                        includeDesc=True,
                        cache=True,
                        use_alternative=False
                     )

pco = df.sort_values(['partnerDesc','refYear','primaryValue'], ascending=[True,True,False])
pco['rank'] = pco.groupby(['partnerDesc','refYear'])["primaryValue"].rank(method="dense", ascending=False)
pco_top5 = pco[pco['rank'] <= rank_filter]
# get the countries
countries = pco_top5.partnerDesc.unique()
country_cmd_top5_codes = dict()

# get the detailed commodity codes for the top of each country
for country in countries:
    l2_codes = pco_top5[pco_top5.partnerDesc == country]['cmdCode'].unique()
    print(country,l2_codes)
    hs_details = []
    for l2_code in l2_codes:
        l2_sub_codes = list(HS_CODES_DF[HS_CODES_DF.hscode.str.startswith(l2_code)]['hscode'])
        hs_details = hs_details + l2_sub_codes
    # print(hs_details)
    country_cmd_top5_codes[country] = hs_details.copy()

# now we fetch the detail
countryDesc = comtrade.decode_country(partnerCode)
country_cmd_codes = ",".join(country_cmd_top5_codes[countryDesc])
print(country_cmd_codes)
print("waiting for server ...")
time.sleep(5)
df = comtrade.getFinalData(
                    APIKEY,
                    typeCode="C",# C for commodities, S for Services
                    freqCode="A",# (freqCode) A for annual and M for monthly
                    flowCode=flow,
                    clCode="HS",
                    #cmdCode=country_cmd_codes, # sometimes the detail commodity codes generates errors
                    cmdCode=None, # This gives all the commodities for the country and year 
                    reporterCode=reporterCode,
                    partnerCode=partnerCode,
                    period=years_of_interest,
                    partner2Code=0,
                    motCode=0,
                    customsCode='C00',
                    includeDesc=True,
                    cache=True,
                    use_alternative=False
                    )
                     
# create a column with the first 2 digits of the commodity code
df['cmdCodeAG2'] = df.cmdCode.str[0:2]
# filter the detail commodity codes by the top 5
df = df[df['cmdCodeAG2'].isin(country_cmd_top5_codes[countryDesc])]



INFO:root:Calling getFinalData for period 2003,2004,2005,2006,2007,2008,2009,2010,2011,2012,2013,2014
INFO:root:Using cached results for period 2003,2004,2005,2006,2007,2008,2009,2010,2011,2012,2013,2014
INFO:root:Calling getFinalData for period 2015,2016,2017,2018,2019,2020,2021
INFO:root:Using cached results for period 2015,2016,2017,2018,2019,2020,2021


China 2003,2004,2005,2006,2007,2008,2009,2010,2011,2012,2013,2014,2015,2016,2017,2018,2019,2020,2021 M Angola
Angola ['27' '25' '71' '39' '20' '90' '40' '73' '44' '03' '72' '74' '76' '94'
 '68' '85' '22' '01' '26']
27,2701,270111,270112,270119,270120,2702,270210,270220,2703,270300,2704,270400,2705,270500,2706,270600,2707,270710,270720,270730,270740,270750,270791,270799,2708,270810,270820,2709,270900,2710,271012,271019,271020,271091,271099,2711,271111,271112,271113,271114,271119,271121,271129,2712,271210,271220,271290,2713,271311,271312,271320,271390,2714,271410,271490,2715,271500,2716,271600,25,2501,250100,2502,250200,2503,250300,2504,250410,250490,2505,250510,250590,2506,250610,250620,2507,250700,2508,250810,250830,250840,250850,250860,250870,2509,250900,2510,251010,251020,2511,251110,251120,2512,251200,2513,251310,251320,2514,251400,2515,251511,251512,251520,2516,251611,251612,251620,251690,2517,251710,251720,251730,251741,251749,2518,251810,251820,2519,251910,251990,2520,252010,2520

INFO:root:Calling getFinalData for period 2003,2004,2005,2006,2007,2008,2009,2010,2011,2012,2013,2014


Request failed due to timeout
Empty result in getFinalData, retrying in 5 seconds


INFO:root:Calling getFinalData for period 2015,2016,2017,2018,2019,2020,2021


In [8]:
# show more rows
pd.set_option('display.max_rows', 1000)
df[['refYear','cmdCode','partner2Code','partner2Desc','motCode','customsCode','isAggregate','primaryValue']].sort_values(by=['refYear','cmdCode','partner2Code']).head(100)

Unnamed: 0,refYear,cmdCode,partner2Code,partner2Desc,motCode,customsCode,isAggregate,primaryValue
7,2021,39,0,World,0,C00,True,1521359848.0
385,2021,3901,0,World,0,C00,True,5432194.0
3254,2021,390110,0,World,0,C00,False,950258.0
3255,2021,390120,0,World,0,C00,False,533438.0
2358,2021,390130,0,World,0,C00,False,98611.0
3256,2021,390140,0,World,0,C00,False,874290.0
3257,2021,390190,0,World,0,C00,False,2975597.0
2820,2021,3902,0,World,0,C00,True,22158108.0
3258,2021,390210,0,World,0,C00,False,16078109.0
3259,2021,390230,0,World,0,C00,False,4761138.0


Temos que identificar as linhas com valor agregado porque
os dados não tem essa coluna correcta.

Na lista China-Angola importações a flag isAggregate parece não 
estar correcta até 2017 inclusive. 2018 já está bem.

See the [bug report](isaggregate_bug.ipynb)


In [19]:

df['valueForTotal'] = df['primaryValue']
df.loc[df['isAggregate'] == True,'valueForTotal'] = 0
df['subtotalAG2'] = df.groupby(['partnerCode','refYear','flowCode','cmdCodeAG2'])["valueForTotal"].transform('sum')
df['subtotalCmd'] = df.groupby(['partnerCode','refYear','flowCode','cmdCode'])["valueForTotal"].transform('sum')
df['percentAG2'] = df['primaryValue'] / df['subtotalAG2'] * 100
df['percentCmd'] = df['primaryValue'] / df['subtotalCmd'] * 100

In [20]:
import comtrade

excel_file_name = f"./reports/product_detail_{reporter_name}_{years_of_interest.replace(',','-')}_{flow}_{partner_name}.xlsx"
excel_file = pd.ExcelWriter(excel_file_name, engine='xlsxwriter')

df2 = df[pco_cols_detail+['percentAG2']].sort_values('primaryValue',ascending=False)
df2.to_excel(excel_file, sheet_name='detail', index=True, startrow=2)

comtrade.excel_col_autowidth(df2, excel_file, 'detail')
comtrade.excel_format_currency(df2, excel_file,'detail', columns=['primaryValue'],width=20)
excel_file.close()
print(f"Excel file {excel_file_name} created")


Excel file ./reports/product_detail_China_2021_M_Brazil.xlsx created


In [21]:

pd.options.display.max_rows = 500
# df.sort_values(['partnerDesc','refYear','flowCode','subtotalAG2','cmdCodeAG2','primaryValue'],ascending=[True,True,True,False,True,False])[pco_cols_detail].head(500)
# format columns percentAG2 to percentage 3 decimals
df['percentAG2Form'] = df['percentAG2'].map("{:.3f}%".format)
df['percentCmdForm'] = df['percentCmd'].map("{:.3f}%".format)
df[df['isAggregate'] == False].sort_values(['reporterDesc','partnerDesc','refYear','cmdCode','subtotalAG2','percentAG2'],ascending=[True,True,True,True,False,False])\
    [['reporterDesc','partnerDesc','refYear','partner2Desc','percentCmdForm','valueForTotal','cmdCodeAG2','cmdCode','cmdDesc']+\
    ['customsDesc','motDesc','subtotalCmd','subtotalAG2']].head(500)


Unnamed: 0,reporterDesc,partnerDesc,refYear,partner2Desc,percentCmdForm,valueForTotal,cmdCodeAG2,cmdCode,cmdDesc,customsDesc,motDesc,subtotalCmd,subtotalAG2
2,China,Brazil,2021,World,100.000%,4620523121.0,2,20230,"Meat; of bovine animals, boneless cuts, frozen",All customs procedure codes,All modes of transport,4620523121.0,7799680540.0
4,China,Brazil,2021,World,100.000%,39241333.0,2,20322,"Meat; of swine, hams, shoulders and cuts thereof, with bone in, frozen",All customs procedure codes,All modes of transport,39241333.0,7799680540.0
5,China,Brazil,2021,World,100.000%,1617075329.0,2,20329,"Meat; of swine, n.e.c. in item no. 0203.2, frozen",All customs procedure codes,All modes of transport,1617075329.0,7799680540.0
7,China,Brazil,2021,World,100.000%,2784082.0,2,20500,"Meat; of horses, asses, mules or hinnies, fresh, chilled or frozen",All customs procedure codes,All modes of transport,2784082.0,7799680540.0
9,China,Brazil,2021,World,100.000%,61929222.0,2,20649,"Offal, edible; of swine, (other than livers), frozen",All customs procedure codes,All modes of transport,61929222.0,7799680540.0
11,China,Brazil,2021,World,100.000%,229346.0,2,20712,"Meat and edible offal; of fowls of the species Gallus domesticus, not cut in pieces, frozen",All customs procedure codes,All modes of transport,229346.0,7799680540.0
12,China,Brazil,2021,World,100.000%,1455270569.0,2,20714,"Meat and edible offal; of fowls of the species Gallus domesticus, cuts and offal, frozen",All customs procedure codes,All modes of transport,1455270569.0,7799680540.0
14,China,Brazil,2021,World,100.000%,2627538.0,2,20910,"Fat; pig fat, free of lean meat, not rendered or otherwise extracted, fresh, chilled, frozen, sa...",All customs procedure codes,All modes of transport,2627538.0,7799680540.0
87,China,Brazil,2021,World,100.000%,522.0,12,120110,"Soya beans; seed, whether or not broken",All customs procedure codes,All modes of transport,522.0,33120876740.0
88,China,Brazil,2021,World,100.000%,33120436899.0,12,120190,"Soya beans; other than seed, whether or not broken",All customs procedure codes,All modes of transport,33120436899.0,33120876740.0
