# Retrieve World Bank Trade Data

Data comes from [https://wits.worldbank.org/witsapiintro.aspx]

Prereqs:
* Need to run "pip install pandaSDMX" to install parser to read SDMX data from a request

In [1]:
from pandasdmx import Request
import json
from pprint import pprint as pp

In [2]:
wits = Request('WBG_WITS')
flow_response = wits.dataflow()
flow_response.write().dataflow.head()

Unnamed: 0_level_0,name
dataflow,Unnamed: 1_level_1
DF_WITS_Tariff_TRAINS,WITS - UNCTAD TRAINS Tariff Data
DF_WITS_TradeStats_Development,WITS TradeStats Devlopment
DF_WITS_TradeStats_Tariff,WITS TradeStats Tariff
DF_WITS_TradeStats_Trade,WITS TradeStats Trade


In [3]:
trade_stats = wits.dataflow('DF_WITS_TradeStats_Trade')

In [4]:
trade_data_structure = trade_stats.dataflow.DF_WITS_TradeStats_Trade.structure()
trade_data_structure

DataStructureDefinition | TRADESTATS | Country Trade Statistics

In [5]:
trade_data_structure.dimensions.aslist()

[Dimension | FREQ,
 Dimension | REPORTER,
 Dimension | PARTNER,
 TimeDimension | TIME_PERIOD,
 Dimension | PRODUCTCODE,
 Dimension | INDICATOR]

Only interested in trade with the US

In [6]:
reporters = trade_stats.write().codelist.loc['REPORTER']
select_reporter = reporters.loc[['USA']]
select_reporter

Unnamed: 0,dim_or_attr,name
USA,D,United States


Looking for specific trade partners: Mexico, Canada, EU, China, Russia, World. We limit the definition of the EU to just the 11 countries that adopted the Euro as their currency in 1999.  I tried to use the partner code 'EUN' but there is no data in the with that partner code.

In [7]:
partners = trade_stats.write().codelist.loc['PARTNER']

#for index, row in partners.iterrows():
#    print(f"'{index}':\t\t'{row['name']}'")

# Countries we are interested
partner_list = ['MEX','CAN','CHN','RUS']
          
# Countries which adopted the Euro as their currency in 1999
eu_list = ['AUT','BEL','FIN','FRA','DEU','IRL','ITA','LUX','NLD','PRT','ESP']
          
# Separate list for the lonely total partner representing the world 'WLD'
world_list = ['WLD']
          
select_partners = partners.loc[partner_list + eu_list + world_list]
select_partners

Unnamed: 0,dim_or_attr,name
MEX,D,Mexico
CAN,D,Canada
CHN,D,China
RUS,D,Russian Federation
AUT,D,Austria
BEL,D,Belgium
FIN,D,Finland
FRA,D,France
DEU,D,Germany
IRL,D,Ireland


Only product we are interested in is "all products" i.e. 'Total'

In [8]:
products = trade_stats.write().codelist.loc['PRODUCTCODE']

#for index, row in products.iterrows():
#    print(f"'{index}':\t\t'{row['name']}'")

select_product = products.loc[['Total']]
select_product

Unnamed: 0,dim_or_attr,name
Total,D,All Products


Only looking for the import and export values. All values for these two indicators are in thousands of USD.

In [9]:
indicators = trade_stats.write().codelist.loc['INDICATOR']

#for index, row in indicators.iterrows():
#    print(f"'{index}':\t\t'{row['name']}'")

select_indicators = indicators.loc[['MPRT-TRD-VL', 'XPRT-TRD-VL']]
select_indicators

Unnamed: 0,dim_or_attr,name
MPRT-TRD-VL,D,Import Trade Value (US$ Thousand)
XPRT-TRD-VL,D,Export Trade Value (US$ Thousand)


Only time frequency is 'annual'.

In [10]:
trade_stats.write().codelist.loc['FREQ'].head()

Unnamed: 0,dim_or_attr,name
FREQ,D,Freq
A,D,Annual


In [11]:
freq = 'A'
partners_list = select_partners.index.tolist()
reporter = select_reporter.index.tolist()[0]
product = select_product.index.tolist()[0]
indicator_list = select_indicators.index.tolist()

response = wits.data(resource_id = 'DF_WITS_TradeStats_Trade',
                 key={'FREQ': freq,
                      'REPORTER': reporter,
                      'PARTNER':  partners_list,
                      'PRODUCTCODE': product,
                      'INDICATOR': indicator_list},
                 params = {'startPeriod': '2000',
                           'endPeriod': '2017'},
                 dsd = trade_data_structure)

In [12]:
trade_df = response.write((s for s in response.data.series))
trade_df.rename(columns={'MPRT-TRD-VL': 'Imports', 'XPRT-TRD-VL': 'Exports'}, inplace=True)

import_columns_to_sum = [ ('A', 'USA', cty, 'Total', 'Imports') for cty in eu_list ]
import_column_eu = ('A', 'USA', 'EU', 'Total', 'Imports')
trade_df[import_column_eu] = trade_df[import_columns_to_sum].sum(axis=1)
trade_df.drop(import_columns_to_sum, axis=1, inplace=True)

export_columns_to_sum = [ ('A', 'USA', cty, 'Total', 'Exports') for cty in eu_list ]
export_column_eu = ('A', 'USA', 'EU', 'Total', 'Exports')
trade_df[export_column_eu] = trade_df[export_columns_to_sum].sum(axis=1)
trade_df.drop(export_columns_to_sum, axis=1, inplace=True)

known_cty_list = partner_list + [ 'EU' ]

known_import_columns_to_sum = [ ('A', 'USA', cty, 'Total', 'Imports') for cty in known_cty_list ]
import_column_other = ('A', 'USA', 'Other', 'Total', 'Imports')
import_column_world = ('A', 'USA', 'WLD', 'Total', 'Imports')
trade_df[import_column_other] = trade_df[import_column_world] - trade_df[known_import_columns_to_sum].sum(axis=1)

known_export_columns_to_sum = [ ('A', 'USA', cty, 'Total', 'Exports') for cty in known_cty_list ]
export_column_other = ('A', 'USA', 'Other', 'Total', 'Exports')
export_column_world = ('A', 'USA', 'WLD', 'Total', 'Exports')
trade_df[export_column_other] = trade_df[export_column_world] - trade_df[known_export_columns_to_sum].sum(axis=1)

trade_df.head()

FREQ,A,A,A,A,A,A,A,A,A,A,A,A,A,A
REPORTER,USA,USA,USA,USA,USA,USA,USA,USA,USA,USA,USA,USA,USA,USA
PARTNER,CAN,CAN,CHN,CHN,MEX,MEX,RUS,RUS,WLD,WLD,EU,EU,Other,Other
PRODUCTCODE,Total,Total,Total,Total,Total,Total,Total,Total,Total,Total,Total,Total,Total,Total
INDICATOR,Imports,Exports,Imports,Exports,Imports,Exports,Imports,Exports,Imports,Exports,Imports,Exports,Imports,Exports
TIME_PERIOD,Unnamed: 1_level_5,Unnamed: 2_level_5,Unnamed: 3_level_5,Unnamed: 4_level_5,Unnamed: 5_level_5,Unnamed: 6_level_5,Unnamed: 7_level_5,Unnamed: 8_level_5,Unnamed: 9_level_5,Unnamed: 10_level_5,Unnamed: 11_level_5,Unnamed: 12_level_5,Unnamed: 13_level_5,Unnamed: 14_level_5
2000,230816100.0,178920000.0,100012900.0,16184680.0,135923100.0,111338600.0,7658130.0,2092380.131,1217933000.0,781830700.0,163543900.0,116362500.0,579978800.0,356932500.0
2001,216234100.0,163421100.0,102267300.0,19181940.0,131334700.0,101295100.0,6264073.0,2716088.46,1140900000.0,729080400.0,165886600.0,111836500.0,518913400.0,330629600.0
2002,212398400.0,160918200.0,133510400.0,22127510.0,136025100.0,97470190.0,7207030.0,2396944.154,1200096000.0,693068300.0,176681100.0,105011000.0,534273800.0,305144500.0
2003,225035300.0,169920200.0,163313900.0,28367720.0,139686700.0,97411740.0,9106465.0,2447218.211,1302834000.0,724736600.0,191704000.0,110898000.0,573987100.0,315691800.0
2004,260161300.0,189878800.0,210500100.0,34427690.0,157881900.0,110731200.0,12618720.0,2960936.609,1525304000.0,814844400.0,214841400.0,124131800.0,669300700.0,352713900.0


In [13]:
# This line commented below should have worked but it doesn't.  You cannot use
# rename to rename columns with tuple names.  I had to use something a little
# more dangerous by explicitly set the columns attribute of the DataFrame

#new_column_names = { x: f"{x[2]}_{x[4]}" for x in list(trade_df) }
#trade_df.rename(columns=new_column_names, inplace=True)

trade_df.columns = [f"{x[2]}_{x[4]}" for x in trade_df.columns]
trade_df.head()

Unnamed: 0_level_0,CAN_Imports,CAN_Exports,CHN_Imports,CHN_Exports,MEX_Imports,MEX_Exports,RUS_Imports,RUS_Exports,WLD_Imports,WLD_Exports,EU_Imports,EU_Exports,Other_Imports,Other_Exports
TIME_PERIOD,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1
2000,230816100.0,178920000.0,100012900.0,16184680.0,135923100.0,111338600.0,7658130.0,2092380.131,1217933000.0,781830700.0,163543900.0,116362500.0,579978800.0,356932500.0
2001,216234100.0,163421100.0,102267300.0,19181940.0,131334700.0,101295100.0,6264073.0,2716088.46,1140900000.0,729080400.0,165886600.0,111836500.0,518913400.0,330629600.0
2002,212398400.0,160918200.0,133510400.0,22127510.0,136025100.0,97470190.0,7207030.0,2396944.154,1200096000.0,693068300.0,176681100.0,105011000.0,534273800.0,305144500.0
2003,225035300.0,169920200.0,163313900.0,28367720.0,139686700.0,97411740.0,9106465.0,2447218.211,1302834000.0,724736600.0,191704000.0,110898000.0,573987100.0,315691800.0
2004,260161300.0,189878800.0,210500100.0,34427690.0,157881900.0,110731200.0,12618720.0,2960936.609,1525304000.0,814844400.0,214841400.0,124131800.0,669300700.0,352713900.0


Now let's save the data as JSON so that we do not have to repeatedly make API calls to retrieve the data.

In [17]:
trade_csv = trade_df.to_csv()

# now write output to a file
with open('trade_data_file.csv', 'w') as outfile:
    outfile.write(trade_csv)