# Trade Tools: extracting and cleaning Japanese trade data files

In [1]:
import pandas as pd
from tradefile import TradeFile
from customsgrabber import CustomsGrabber

## Scraping trade data from the customs: the CustomsGrabber object 

As of 2021, unfortunately, while the Japanese trade data are open to consult, the official website of the Japanese Customs does not provide a dynamic connection to the trade data at the row level, and it is not possible to query the database in order to extract data for specific conditions (e.g. country and commodity). Instead, the website provides either a closed query interface or a set of files divided by month and by a subset of the commodity codes, either the Principal Commodity codes or the HS (Harmonized System) codes.

The class CustomsGrabber provides a downloader for the trade data from the Japanese Customs Website. The data are saved as a zip containing one or more csv file. The single csv files are original from the website.
Currently, CustomsGrabber is able to download one or more years of data along these two dimensions:
- *direction*: 'import' (goods to Japan) or 'export';
The *kind*, 'HS' (the international Harmonized System coding) or 'PC' (Principal Commodity, a summarization of HS codes by categories that the Japanese Government deems useful) is inferred by the columns of the file.

In [2]:
grabber = CustomsGrabber()
grabber.grabRange(from_year=2021, to_year=2021, direction='import', kind='HS') # data from 2019 only



  year_html = BeautifulSoup(year_page)


Saving the data as import_HS_2021-2021.zip in ../data/.


## Transforming raw tables into workable data: the TradeFile object

The class TradeFile provides the tools to open and transform a wide form csv file from the Japanese Customs website.
The files are in wide format, with month columns possibly multiplied by the number of units (e.g. Kgs, Number of units, and thousands JPY).

TradeFile can open a single csv file, or more files from a zip archive, merge and normalize the data so that the resulting DataFrame will have the following form:
- The commodity code
- The target country
- The date (month and year) of acquisition
- The unit of measure
- The value or the measure

TradeFile can also open text data that have already been normalized. The flag _raw_ is used to indicate whether we are opening a "raw" file or a normalized one.

In [5]:
path = "../data/import_HS_2021-2021.zip"
tool = TradeFile(path, raw=True)



Opening ../data/import_HS_2021-2021.zip...
Loading the file...
Unpivoting the monthly columns. This might take a minute...


tradefile   : INFO     The melting took 0.4032611846923828, the splitting 3.2825939655303955, the cleaning 0.798795223236084 and the date merging time is 2.037729024887085.
tradefile   : INFO     Unpivoting the metrics...


### Merging new data to an existing (text) database

For periodic (say, monthly) updates of the database, it is possible to specify an existing textfile or a DataFrame containing an already normalized trade table. The parameters to use are:
- _merge_file_: in case of a starting csv file
- _merge_df_: in case the DataFrame is already in memory.

Note that it is assumed that the base DataFrame is already normalized. While merging two raw files is currently not supported with this method, note that it is always possible to open a zip archive with two or more raw files in it.

In [2]:
path2 = "../data/import_HS_2021-2021.zip"
merged_df = TradeFile(path2, raw=True, merge_file="../data/import_HS_2016-01_2021-04.csv")

### Accessing the normalized data

Once acquired, the trade DataFrame can be accessed under _.data_. This is a common Pandas DataFrame and can be manipulated as such. Note that currently _.data_ is not protected, so be careful to respect its original data structure.

In [8]:
is_july_07 = (tool.data['date'] == "2021-07-01")
is_jpy = (tool.data['unit'] == 'JPY')
is_italy = (tool.data['country'] == '220')
tool.data[is_july_07 & is_jpy & is_italy]['value'].mean()

50215654.07262021

### Saving the data to file

The trade dataframe can be saved to csv or zip file through the method _save_to_file_. The method will generate a file with name _[kind]__[first date]__[last_date].csv_. In case _is_zip_ is set to True, the extension will be _.zip_.

In [9]:
tool.save_to_file(path="../data", is_zip=False)

tradefile   : INFO     TradeFile.save_to_file: saved data to ../data/HS_2016-01-01_2021-07-01.csv.
