# Pull Trade Data from the WB WITS API Using Python
![alt text](./img/trade-network.png "Title")

Trade data serves as a critical resource for understanding global economic trends, uncovering market opportunities, and informing policy decisions. However, efficiently accessing and analyzing vast amounts of trade data can be a daunting task. In this comprehensive guide, we will explore how Python can empower us to effortlessly extract trade data from the World Bank World Integrated Trade Solution (WB WITS) SDMX (Statistical Data and Metadata Exchange) Application Programming Interface (API).

In this article, we will explore how to access and utilize trade data from the WB WITS SDMX API using Python. This powerful combination of data retrieval and programming allows us to tap into a vast repository of trade data and leverage Python's analytical capabilities to extract meaningful insights.

# Overview of WB WITS API
The [WB WITS API](https://wits.worldbank.org/witsapiintro.aspx?lang=en), is a powerful tool that provides access to an extensive range of trade data from countries around the world. It is specifically designed to facilitate trade analysis, policy formulation, and economic research. By leveraging the SDMX version of the WB WITS API, users can programmatically retrieve comprehensive and up-to-date information on global trade flows, tariffs, and trade-related indicators.

The structure of API data call is below:
![alt text](./img/api_call.png "Title")

To pull data from SDMX WB WITS API, you need at least three parameters :
1. Dataset ID parameter
2. Query Parameter
3. Period Parameter

Let me guide you on how we can do it in Python.

# Prerequisites
* Make sure you have installed Python in your machine and Jupyter Notebook.
* Make sure you have installed these Python libraries: `requests `, `pandas `, `numpy `, `xmltodict`

## Import required libraries

In [4]:
import requests
import pandas as pd
import numpy as np
import xmltodict



## Specify the SDMX REST API endpoint

In [5]:
endpoint = "http://wits.worldbank.org/API/V1/SDMX/V21/rest"

# Dataset ID Parameter
After specified the endpoint, now we need to define what dataset (or dataflow in SDMX) that we want to pull. A **dataflow** is a structure on which data is collected and disseminated, referenced to a DSD (data structure definition, let's short it to "**datastructure**" throughout the article) which is used as the underlying template to which the data must conform.

In WB WITS API, there are four dataflows available which can be accessed by:

In [6]:
path = '/'.join([endpoint, 'dataflow'])
response = requests.get(path)
if response.status_code == 200:

   # Convert XML Response to Dictionary Object
   response_dict = xmltodict.parse(response.text)['Structure']

   # Drill through specific keys to get the list of dataflows in the dictionary and then normalize it to pandas dataframe.
   dataflows = pd.json_normalize(response_dict['Structures']['Dataflows']['Dataflow'])

   # Only get English version of dataflows' name and description
   dataflows = dataflows[(dataflows['Name.@xml:lang'] == 'en') &
                           (dataflows['Description.@xml:lang'] == 'en')
                       ]
   dataflows = dataflows[['@id', '@agencyID', '@version', '@isFinal', 'Description.#text', 'Structure.Ref.@id']]

   # Rename the dataflow column
   dataflows = dataflows.rename(columns={
       '@id': 'id',
       '@agencyID': 'agencyID',
       '@version': 'version',
       '@isFinal':'isFinal',
       'Description.#text': 'description',
       'Structure.Ref.@id': 'datastructure'
   })

display(dataflows[['id', 'datastructure', 'description']])

Unnamed: 0,id,datastructure,description
0,DF_WITS_Tariff_TRAINS,TARIFF_TRAINS,Data flow to access WITS - UNCTAD TRAINS Prefe...
1,DF_WITS_TradeStats_Development,TRADESTATS,"Development indicators such as GDP, GNI per ca..."
2,DF_WITS_TradeStats_Tariff,TRADESTATS,Tariff information like number of trade agreem...
3,DF_WITS_TradeStats_Trade,TRADESTATS,"Trade data such as total exports, number or pr..."


Above you can see WB WITS API available dataflows with its respective datastructure and description. For trade data we will use `DF_WITS_TradeStats_Trade` for dataflow ID and and `TRADESTATS` for the datastructure ID.

In [7]:
dataset_id = 'DF_WITS_TradeStats_Trade'
datastructure_id = 'TRADESTATS'

# Query Parameter
In order to avoid server overload, pulling for the entire database is not possible in one request. So, we have to be clear on what exactly data we need in our query. The query is a **series of keys delimited by period, which determine the specific data that want to be retrieved**.

To start building a query, we need to know the dimensions of the DSD are. Here's how to do it in Python:

In [8]:
path = '/'.join([endpoint, 'datastructure/WBG_WITS', datastructure_id])

# Making API Call
response = requests.get(path)

if response.status_code == 200:
   response_dict = xmltodict.parse(response.text)['Structure']
   datastructure = response_dict['Structures']['DataStructures']['DataStructure']
   datastructure_meta = {
           'id': datastructure['@id'],
           'agencyID': datastructure['@agencyID'],
           'version': datastructure['@version'],
           'isFinal': datastructure['@isFinal'],
           'name': datastructure['Name']['#text']
       }
  
   dimensions = pd.json_normalize(datastructure['DataStructureComponents']['DimensionList']['Dimension'])[
       ['@id', '@position', 'ConceptIdentity.Ref.@maintainableParentID', 'LocalRepresentation.Enumeration.Ref.@id']
   ]
   dimensions = dimensions.rename(columns={
       '@id': 'id',
       '@position':'position',
       'ConceptIdentity.Ref.@maintainableParentID': 'conceptscheme',
       'LocalRepresentation.Enumeration.Ref.@id': 'codelist'
   })
   dimensions['component'] = 'Dimension'
display(dimensions)

Unnamed: 0,id,position,conceptscheme,codelist,component
0,FREQ,1,TRADESTATS_CONCEPTS,CL_TS_FREQ_WITS,Dimension
1,REPORTER,2,TRADESTATS_CONCEPTS,CL_TS_COUNTRY_WITS,Dimension
2,PARTNER,3,TRADESTATS_CONCEPTS,CL_TS_COUNTRY_WITS,Dimension
3,PRODUCTCODE,5,TRADESTATS_CONCEPTS,CL_TS_PRODUCTCODE_WITS,Dimension
4,INDICATOR,6,TRADESTATS_CONCEPTS,CL_TS_INDICATOR_WITS,Dimension


Above you can see that `TRADESTATS` DSD has five dimensions which are `FREQ`, `REPORTER`, `PARTNER`, `PRODUCTCODE`, and `INDICATOR`.

Next, we need to specify the value for each dimension in order to have the exact data that we need using a set of values that have been defined in the respective codelist.

Lets define our query dictionary first to get the values from the codelist:

In [9]:
dimension_list = dimensions['id'].values.tolist()
query = {k: None for k in dimension_list}

def getCodelist(codelist):
    codelist_meta = {
                'id':codelist['@id'],
                'agencyID': codelist['@agencyID'],
                'version': codelist['@version'],
                'isFinal': codelist['@isFinal'],
                'name': codelist['Name']['#text']
            }
    codelist_codes = codelist['Code']

    if type(codelist_codes) == list:
        codelist_code = [{'id': code['@id'],
                        'name': code['Name']['#text'],
                        'language': code['Name']['@xml:lang']} for code in codelist_codes]
    else:
        codelist_code = [{'id': codelist_codes['@id'],
        'name': codelist_codes['Name']['#text'],
        'language': codelist_codes['Name']['@xml:lang']}]

    return {
        'header': pd.DataFrame([codelist_meta]),
        'codes': pd.DataFrame(codelist_code)
    }

To start, let's find what are value available for `FREQ` dimensions from `CL_TS_FREQ_WITS` codelist:
## CL_TS_FREQ_WITS

In [10]:
codelist = 'CL_TS_FREQ_WITS'
path = '/'.join([endpoint, 'codelist/WBG_WITS', codelist])

response = requests.get(path)
if response.status_code == 200:
   response_dict = xmltodict.parse(response.text)['Structure']
   codelists = response_dict['Structures']['Codelists']['Codelist']
   codelist = getCodelist(codelists)['codes']

display(codelist)

Unnamed: 0,id,name,language
0,A,Annual,en


As you can see above in the dataframe, code `A` corresponds to "Annual" since WB WITS data only available in annual frequency.

In [11]:
query['FREQ'] = 'A'

Let's examine other codelists:
# CL_TS_COUNTRY_WITS
Let's find codes for the countries we want to retrieve for `REPORTER` and `PARTNER` dimensions:

In [12]:
codelist = 'CL_TS_COUNTRY_WITS'
path = '/'.join([endpoint, 'codelist/WBG_WITS', codelist])

response = requests.get(path)
if response.status_code == 200:
   response_dict = xmltodict.parse(response.text)['Structure']
   codelists = response_dict['Structures']['Codelists']['Codelist']
   codelist = getCodelist(codelists)['codes']

filtered_codes = codelist[codelist['name'].isin(['Singapore', 'Malaysia', 'Indonesia'])]
display(filtered_codes)

Unnamed: 0,id,name,language
110,IDN,Indonesia,en
163,MYS,Malaysia,en
205,SGP,Singapore,en


Based on above we can set `REPORTER` to `'IDN'` and `'SGP+MYS'` for `PARTNER`. We put `+` to indicate that we want to retrieve data for both Singapore and Malaysia.

In [13]:
query['REPORTER'] = 'IDN'
query['PARTNER'] = 'SGP+MYS'

## CL_TS_PRODUCTCODE_WITS
`CL_TS_PRODUCTCODE_WITS` correspond to `PRODUCTCODE` dimension

In [14]:
codelist = 'CL_TS_PRODUCTCODE_WITS'
path = '/'.join([endpoint, 'codelist/WBG_WITS', codelist])

response = requests.get(path)
if response.status_code == 200:
   response_dict = xmltodict.parse(response.text)['Structure']
   codelists = response_dict['Structures']['Codelists']['Codelist']
   codelist = getCodelist(codelists)['codes']
display(codelist.head())

Unnamed: 0,id,name,language
0,999999,Not Applicable,en
1,01-05_Animal,Animal,en
2,06-15_Vegetable,Vegetable,en
3,16-24_FoodProd,Food Products,en
4,25-26_Minerals,Minerals,en


We can see that **Minerals** product code corresponds to `25–26_Minerals`. Let's use that code:

In [31]:
query['PRODUCTCODE'] ='25-26_Minerals'

## CL_TS_INDICATOR_WITS
Let's search Indicator codes those contains string "Export" for `INDICATOR` dimension:

In [32]:
codelist = 'CL_TS_INDICATOR_WITS'
path = '/'.join([endpoint, 'codelist/WBG_WITS', codelist])
response = requests.get(path)
if response.status_code == 200:
   response_dict = xmltodict.parse(response.text)['Structure']
   codelists = response_dict['Structures']['Codelists']['Codelist']
   codelist = getCodelist(codelists)['codes']

display(codelist[codelist['name'].str.contains('Export')])

Unnamed: 0,id,name,language
33,NMBR-XPRT-HS6-PRDCT,Export No Of traded HS6 digit Products,en
36,XPRT-PRDCT-SHR,Export Product Share (%),en
37,XPRT-PRTNR-SHR,Export Partner Share (%),en
38,XPRT-SHR-TTL-PRDCT,Export Share in Total Products (%),en
39,XPRT-TRD-VL,Export Trade Value (US$ Thousand),en
56,BX-GSR-GNFS-CD,"Exports of goods and services (BoP, current US$)",en
59,BX-GSR-TOTL-CD,"Exports of goods, services and primary income ...",en
78,NE-EXP-GNFS-CD,Exports of goods and services (current US$),en
79,NE-EXP-GNFS-KD,Exports of goods and services (constant 2005 US$),en
80,NE-EXP-GNFS-KD-ZG,Exports of goods and services (annual % growth),en


We can see that `XPRT-TRD-VL` corresponds to **"Export Trade Value (US$ Thousand)"**.

In [33]:
query['INDICATOR'] = 'XPRT-TRD-VL'

We now have all the keys for our query. Next we need to make them into one string delimited by dot.

In [34]:
query_string = '.'.join([value for value in query.values()])
query_string

'A.IDN.SGP+MYS.25-26_Minerals.XPRT-TRD-VL'

Our query parameter will be look like this:
![alt text](./img/key_series.png "Structure of an SDMX API Query Parameter")

Which translated to:

![alt text](./img/key_series_translated.png "Structure of an SDMX API Query Parameter Tranlsation")

# Period Parameter
After we get the query parameter the data dimension, next we need to define the period range of our data. The format of the period parameter will look like this:
![alt text](./img/period_parameter_structure.png "Period Parameter Structure")

Just change the `start_year` and `end_year` with what year to start and end respectively, for example:

In [35]:
period_parameter = '?startperiod=2010&endperiod=2015'

# Call the Data API
We now have all the parameters needed, now we can make the API Call. In this case, we want to pull the mineral export trade of Indonesia's total export (Thousand USD) to Singapore and Malaysia countries from 2010 to 2015 for Mineral products by year.

In [36]:
path = '/'.join([endpoint, 'data', dataset_id, query_string, period_parameter])
print(path)
response = requests.get(path)
print(response)
if response.status_code == 200:
   response_dict = xmltodict.parse(response.text)
   response_series = response_dict['message:GenericData']['message:DataSet']['generic:Series']
   series_list = [response_series] if type(response_series) == dict else response_series
   for i, series in enumerate(series_list):
       series_key = series['generic:SeriesKey']
       series_key_values_raw = [series_key['generic:Value']] if type(series_key['generic:Value']) == dict else series_key['generic:Value']
       series_key_values = {value['@id']:value['@value'] for value in series_key_values_raw}

       series_obs = series['generic:Obs']
       series_obs_raw = [series['generic:Obs']] if type(series['generic:Obs']) == dict else series['generic:Obs']
       for j, obs in enumerate(series_obs_raw):
           obs_dimensions_raw = [obs['generic:ObsDimension']] if type(obs['generic:ObsDimension']) == dict else obs['generic:ObsDimension'] 
           obs_dimensions = {value['@id']: value['@value'] for value in obs_dimensions_raw}

           obs_attributes = obs['generic:Attributes']['generic:Value']
           obs_attributes = [obs_attributes] if type(obs_attributes) == dict else obs_attributes
           obs_attributes = {value['@id']: value['@value'] for value in obs_attributes}

           obs_value = obs['generic:ObsValue']
           obs_value = {'OBS_VALUE': float(obs_value['@value'])}
          
           observation = [{**series_key_values, **obs_dimensions, **obs_attributes, **obs_value}]
           observations = observation if j == 0 else observations + observation
       series_observations = observations if i == 0 else series_observations + observations

   data = pd.DataFrame(series_observations)
display(data)

http://wits.worldbank.org/API/V1/SDMX/V21/rest/data/DF_WITS_TradeStats_Trade/A.IDN.SGP+MYS.25-26_Minerals.XPRT-TRD-VL/?startperiod=2010&endperiod=2015
<Response [200]>


Unnamed: 0,FREQ,REPORTER,PARTNER,PRODUCTCODE,INDICATOR,TIME_PERIOD,DATASOURCE,OBS_VALUE
0,A,IDN,MYS,25-26_Minerals,XPRT-TRD-VL,2010,WITS-CMT,18885.831
1,A,IDN,MYS,25-26_Minerals,XPRT-TRD-VL,2011,WITS-CMT,11226.897
2,A,IDN,MYS,25-26_Minerals,XPRT-TRD-VL,2012,WITS-CMT,2348.98
3,A,IDN,MYS,25-26_Minerals,XPRT-TRD-VL,2013,WITS-CMT,5428.528
4,A,IDN,MYS,25-26_Minerals,XPRT-TRD-VL,2014,WITS-CMT,968.519
5,A,IDN,MYS,25-26_Minerals,XPRT-TRD-VL,2015,WITS-CMT,6222.839
6,A,IDN,SGP,25-26_Minerals,XPRT-TRD-VL,2010,WITS-CMT,83954.462
7,A,IDN,SGP,25-26_Minerals,XPRT-TRD-VL,2011,WITS-CMT,46080.804
8,A,IDN,SGP,25-26_Minerals,XPRT-TRD-VL,2012,WITS-CMT,63513.413
9,A,IDN,SGP,25-26_Minerals,XPRT-TRD-VL,2013,WITS-CMT,91051.735


## Information about the data:
* `FREQ` is the frequency of data.
* `REPORTER` is the exporting country code. Reason why it use that term is because this trade data is coming from  country reports sent to the World Bank, hence called `REPORTER`.
* `PARTNER` is the country code of trade counter-party.
* `PRODUCTCODE` is the code of the product traded.
* `INDICATOR` is the indicator of the value.
* `TIME_PERIOD` is the period time of the value.
* `DATASOURCE` is the additional information of where this value come from.
* `OBS_VALUE` is the value of the data.

# Conclusion
In conclusion, the ability to access and analyze trade data is of utmost importance for understanding global economic trends, making informed policy decisions, and driving business strategies.

Through this article, we have explored how to effectively retrieve trade data from the World Bank's World Integrated Trade Solution (WITS) Statistical Data and Metadata eXchange (SDMX) API using Python. By harnessing the power of Python programming and the extensive data resources provided by the WB WITS SDMX API, we have unlocked the potential to extract valuable insights from trade data.

The combination of the WB WITS SDMX API and Python provides a powerful toolkit for working with trade data and extracting actionable insights. By examining trade values, we can gain a deeper understanding of economic dynamics, identify growth opportunities, and inform evidence-based decision-making.

So, let's embark on this data-driven journey and unlock the hidden potential of trade data using the powerful combination of the WB WITS SDMX API and Python. Happy exploring!

## Reference
* [Statistical Data and Metadata eXchange (SDMX)](https://sdmx.org/?page_id=5008)
* [World Bank World Integrated Trade Solution (WB WITS) API User Guide](http://wits.worldbank.org/data/public/WITSAPI_UserGuide.pdf)