# Bug in isAggregate column?

The value of `isAggregate` column is affected by the values
of the partner2Code column (and probably other detail columns
like Mode of Transport and CustomsCode).

If a dataset has details of partner2 then `isAggregate` is set
at the level of specific partner2, and not when partner2Code = 0


For example China-Angola Imports 2017

In [6]:

# pip install pandas requests

import json
import requests
import pandas as pd


pd.set_option('display.float_format', lambda x: '%.2f' % x)

url_cn_angola_2017 = 'https://comtradeapi.un.org/public/v1/preview//C/A/HS?reporterCode=156&period=2017&partnerCode=24&flowCode=M&customsCode=C00'
resp = requests.get(url_cn_angola_2017)
results = json.loads(resp.content)['data']
df_2017 = pd.DataFrame(results)
df_2017[['refYear','cmdCode','partner2Code','isAggregate','primaryValue']].sort_values(by='cmdCode').head(20)



Unnamed: 0,refYear,cmdCode,partner2Code,isAggregate,primaryValue
0,2017,1,0,True,2678856.0
102,2017,1,24,True,2678856.0
103,2017,106,24,True,2678856.0
1,2017,106,0,True,2678856.0
104,2017,10612,24,False,2678856.0
2,2017,10612,0,True,2678856.0
105,2017,20,24,True,16500.0
3,2017,20,0,True,16500.0
106,2017,2009,24,True,16500.0
4,2017,2009,0,True,16500.0


In [7]:
# show only rows where isAggregate is False
df_2017[df_2017['isAggregate'] == False][['refYear','cmdCode','partner2Code','isAggregate','primaryValue']].sort_values(by='cmdCode').head(10)

Unnamed: 0,refYear,cmdCode,partner2Code,isAggregate,primaryValue
104,2017,10612,24,False,2678856.0
107,2017,200989,24,False,16500.0
110,2017,220210,24,False,20000.0
111,2017,220299,24,False,3000.0
113,2017,220300,24,False,85122.0
116,2017,250610,24,False,2503564.0
117,2017,250620,24,False,2519548.0
224,2017,251511,380,False,5119.0
119,2017,251512,24,False,3564171.0
120,2017,251520,24,False,32968.0


Groupping by code delivers the expected results


In [8]:
pd.options.display.float_format = '{:,.2f}'.format
df_2017['cmdCodeAG2'] = df_2017.cmdCode.str[0:2]
df_2017[df_2017['isAggregate'] == False].groupby(['refYear','cmdCodeAG2']).agg({'primaryValue':'sum'}).reset_index().sort_values(by=['refYear','cmdCodeAG2'])

Unnamed: 0,refYear,cmdCodeAG2,primaryValue
0,2017,1,2678856.0
1,2017,20,16500.0
2,2017,22,108122.0
3,2017,25,20681052.0
4,2017,27,20541590167.0
5,2017,39,711.0
6,2017,44,27138190.0
7,2017,61,87.0
8,2017,62,210.0
9,2017,68,6128.0


If the dataset has no breakdown
of the partner2Code then the
isAggregate value is set for lines
with partner2Code = 0



In [9]:

url_cn_angola_2018 = 'https://comtradeapi.un.org/public/v1/preview//C/A/HS?reporterCode=156&period=2018&partnerCode=24&flowCode=M&customsCode=C00'
resp = requests.get(url_cn_angola_2018)
results = json.loads(resp.content)['data']
df_2018 = pd.DataFrame(results)
df_2018[['cmdCode','partner2Code','partner2Desc','isAggregate','primaryValue']].sort_values(by='cmdCode').head(20)

Unnamed: 0,cmdCode,partner2Code,partner2Desc,isAggregate,primaryValue
0,1,0,,True,3175615.0
1,106,0,,True,3175615.0
2,10612,0,,False,3175615.0
3,3,0,,True,58330.0
4,303,0,,True,58330.0
5,30389,0,,False,58330.0
6,5,0,,True,19121.0
7,507,0,,True,19121.0
8,50790,0,,False,19121.0
9,22,0,,True,84459.0


In [11]:
pd.options.display.float_format = '{:,.2f}'.format
df_2018['cmdCodeAG2'] = df_2018.cmdCode.str[0:2]
df_2018[df_2018['isAggregate'] == False].groupby(['refYear','cmdCodeAG2']).agg({'primaryValue':'sum'}).reset_index().sort_values(by=['refYear','cmdCodeAG2'])

Unnamed: 0,refYear,cmdCodeAG2,primaryValue
0,2018,1,3175615.0
1,2018,3,58330.0
2,2018,5,19121.0
3,2018,22,84459.0
4,2018,25,21344705.0
5,2018,26,5738316.0
6,2018,27,25755301914.0
7,2018,33,32.0
8,2018,35,34.0
9,2018,39,137.0


Resposta da Comtrade:

“This is not an issue with the data.
 
Some datasets have breakdown of 2ndPartner and some others do not, this will cause that flag to change depending on the original data received. For example:
 
https://comtradeapi.un.org/public/v1/preview/C/A/HS?reporterCode=156&period=2017&partnerCode=24&flowCode=M&customsCode=C00&cmdCode=010612

https://comtradeapi.un.org/public/v1/preview/C/A/HS?reporterCode=156&period=2019&partnerCode=24&flowCode=M&customsCode=C00&cmdCode=010612”
 
Best Regards
Comtrade Team

Nota: Os exemplos não especificam partnerCode2=0 

In [14]:

# pip install pandas requests
import time
import json
import requests
import pandas as pd


pd.set_option('display.float_format', lambda x: '%.2f' % x)

url_cn_angola_2017 = 'https://comtradeapi.un.org/public/v1/preview/C/A/HS?reporterCode=156&period=2017&partnerCode=24&flowCode=M&customsCode=C00'
resp = requests.get(url_cn_angola_2017)
results = json.loads(resp.content)['data']
df_2017 = pd.DataFrame(results)
print("As of 2023-02-17, the isAggregate column is correct for 2017 and earlier data if partner2Code is not specified, but duplicate rows are returned")
print("Request url: ", url_cn_angola_2017)
print(df_2017[['cmdCode','partner2Code','isAggregate','primaryValue']].sort_values(by='cmdCode').head(120))
print()

time.sleep(5)

url_cn_angola_2018 = 'https://comtradeapi.un.org/public/v1/preview/C/A/HS?reporterCode=156&period=2019&partnerCode=24&flowCode=M&customsCode=C00'
resp = requests.get(url_cn_angola_2018)
results = json.loads(resp.content)['data']
df_2018 = pd.DataFrame(results)
print("As of 2023-02-18, the isAggregate column is correct for 2018 and later data, and there are no duplicates because partner2 always zero")
print("Request url: ", url_cn_angola_2018)
print(df_2018[['cmdCode','partner2Code','isAggregate','primaryValue']].sort_values(by='cmdCode').head(50))


As of 2023-02-17, the isAggregate column is correct for 2017 and earlier data if partner2Code is not specified, but duplicate rows are returned
Request url:  https://comtradeapi.un.org/public/v1/preview/C/A/HS?reporterCode=156&period=2017&partnerCode=24&flowCode=M&customsCode=C00
    cmdCode  partner2Code  isAggregate  primaryValue
0        01             0         True    2678856.00
102      01            24         True    2678856.00
103    0106            24         True    2678856.00
1      0106             0         True    2678856.00
104  010612            24        False    2678856.00
..      ...           ...          ...           ...
243  440341           702        False      39756.00
42   440341             0         True      93089.00
141  440341            24        False      53333.00
43   440349             0         True   19808496.00
142  440349            24        False   19691856.00

[120 rows x 4 columns]

As of 2023-02-18, the isAggregate column is correct for 20

In [46]:
df_2017[df_2017.isAggregate == False][['cmdCode','partner2Code','isAggregate','primaryValue']].sort_values(by='cmdCode').head(50)

Unnamed: 0,cmdCode,partner2Code,isAggregate,primaryValue
104,10612,24,False,2678856.0
107,200989,24,False,16500.0
110,220210,24,False,20000.0
111,220299,24,False,3000.0
113,220300,24,False,85122.0
116,250610,24,False,2503564.0
117,250620,24,False,2519548.0
224,251511,380,False,5119.0
119,251512,24,False,3564171.0
120,251520,24,False,32968.0
