In [1]:
import pandas as pd # need this 
import datetime as dt
import numpy as np
import requests, io
import os

pd.set_option('display.max_colwidth', 50)
# set to -1 to get it do display everything

This notebook will grab and manipulate the Chinese tariff data as reported by Bown, Jung,  and Zhang PIIE article here:

https://www.piie.com/blogs/trade-and-investment-policy-watch/trump-has-gotten-china-lower-its-tariffs-just-toward

This will provide an updated tariff series that starts from the mfn values, then incorperates Chinese retaliation against the US in addition to the adjustments that China made on other products. Here is a blog post I found helpfull describing some of the issues:

https://www.china-briefing.com/news/new-tariff-export-duty-cuts-china-2019-wide-variety-products-affected/

---
### Step 1

Here we will readin the data and do some simple cleaning...

In [27]:
#url = "https://www.piie.com/system/files/documents/bown-jung-zhang-2019-06-12.xlsx"

location = "https://github.com/mwaugh0328/consumption_and_tradewar/raw/master/data/bown-jung-zhang-2019-06-12.xlsx"
location = "./data/bown-12-2019.xlsx"
# This is the link to the blog post that has all the changes in the tariffs
# and the article by Bown , Jung  and Zhang 

df_tariffs = pd.read_excel(location, sheet_name = "China tariffs", dtype = {"hs10": str})

In [28]:
df_tariffs.head()

Unnamed: 0,hs10,description,"January 1, 2018\nMFN Tariff Rates","April 2, 2018\nRetaliation to US Section 232 tariffs","May 1, 2018\nChange of MFN tariffs on pharmeceuticals","July 1, 2018\nChange of MFN tariffs on consumer goods, autos, and ITA products","July 6, 2018\nRetaliation to US Section 301 tariffs ($34 billion)","August 23, 2018\nRetaliation to US Section 301 tariffs ($16 billion)","September 24, 2018\nRetaliation to US Section 301 tariffs ($60 billion)","November 1, 2018\nChange of MFN tariffs industry goods","January 1, 2019\nChange of temporary MFN rates for 2019","January 1, 2019\nSuspension of retaliation against US auto and parts (Section 301)","June 1, 2019\nChange of retaliation tariffs on some US products (subset of $60 billion)","September 1, 2019\nChange of retaliation tariffs on some US products (subset of $70 billion)","December 26, 2019\nChina implements product exclusions on less than $1 billion of US exports from $34 billion and $16 billion list"
0,101210010,"Live horses, asses, mules and hinnies: Horses:...",0.0,0,0.0,0.0,0,0,0,0.0,0.0,0,0,0,0
1,101210090,"Live horses, asses, mules and hinnies: Horses:...",0.0,0,0.0,0.0,0,0,0,0.0,0.0,0,0,0,0
2,101290010,"Live horses, asses, mules and hinnies: Horses:...",10.0,0,0.0,0.0,0,0,10,0.0,0.0,0,15,10,0
3,101290090,"Live horses, asses, mules and hinnies: Horses:...",10.0,0,0.0,0.0,0,0,10,0.0,0.0,0,15,10,0
4,101301010,"Live horses, asses, mules and hinnies: Asses: ...",0.0,0,0.0,0.0,0,0,0,0.0,0.0,0,0,0,0


First thing I want to do is to rename the column to have just the dates so I can eventually have a time series by product of the tariffs. This is what I do below.

In [35]:
cnames = df_tariffs.columns.tolist()

In [36]:
time_dict = [dt.datetime(2018,1,1), dt.datetime(2018,4,2), dt.datetime(2018,5,1), dt.datetime(2018,7,1), 
             dt.datetime(2018,7,6), dt.datetime(2018,8,23), dt.datetime(2018,9,24), dt.datetime(2018,11,1),
             dt.datetime(2019,1,1), dt.datetime(2019,1,2), dt.datetime(2019,6,1), dt.datetime(2019,9,1), dt.datetime(2019,12,26)]

In [37]:
tariff_times = dict(zip(cnames[2:], time_dict)) 

In [38]:
df_tariffs.rename(columns = tariff_times, inplace = True)

In [39]:
# This will create teh 8 and 6 digit codes

df_tariffs["hs8"] = df_tariffs.hs10.str[0:8]

df_tariffs["hs6"] = df_tariffs.hs10.str[0:6]

In [40]:
df_tariffs.head()

Unnamed: 0,hs10,description,2018-01-01 00:00:00,2018-04-02 00:00:00,2018-05-01 00:00:00,2018-07-01 00:00:00,2018-07-06 00:00:00,2018-08-23 00:00:00,2018-09-24 00:00:00,2018-11-01 00:00:00,2019-01-01 00:00:00,2019-01-02 00:00:00,2019-06-01 00:00:00,2019-09-01 00:00:00,2019-12-26 00:00:00,hs8,hs6
0,101210010,"Live horses, asses, mules and hinnies: Horses:...",0.0,0,0.0,0.0,0,0,0,0.0,0.0,0,0,0,0,1012100,10121
1,101210090,"Live horses, asses, mules and hinnies: Horses:...",0.0,0,0.0,0.0,0,0,0,0.0,0.0,0,0,0,0,1012100,10121
2,101290010,"Live horses, asses, mules and hinnies: Horses:...",10.0,0,0.0,0.0,0,0,10,0.0,0.0,0,15,10,0,1012900,10129
3,101290090,"Live horses, asses, mules and hinnies: Horses:...",10.0,0,0.0,0.0,0,0,10,0.0,0.0,0,15,10,0,1012900,10129
4,101301010,"Live horses, asses, mules and hinnies: Asses: ...",0.0,0,0.0,0.0,0,0,0,0.0,0.0,0,0,0,0,1013010,10130


---
### Step 2

Now I will melt the dataframe to make it long. Groupby hs10 code and take the cummulative sum of the "value" in the dataframe. This will give the level of the tariff rate and how it changes by time.

In [41]:
df = df_tariffs.melt(id_vars = ["hs10", "description", "hs6", "hs8"])

df.rename(columns = {"variable": "time_of_tariff"}, inplace = True)

df['tariff'] = df.groupby(['hs10'])['value'].apply(lambda x: x.cumsum())
# This is one of those fancy/amazing pandas things...group by hs10, graby the value, apply the function cum sum.
# So what this is doing is for each hs10 grouping take the cummulative sum of the tariff value

Then lets grab this to see what it looks like...

In [42]:
hs_grp = df.groupby(["hs6", "time_of_tariff"])

hs_grp.get_group(("010129", dt.datetime(2018,1,1)))

Unnamed: 0,hs10,description,hs6,hs8,time_of_tariff,value,tariff
2,101290010,"Live horses, asses, mules and hinnies: Horses:...",10129,1012900,2018-01-01,10.0,10.0
3,101290090,"Live horses, asses, mules and hinnies: Horses:...",10129,1012900,2018-01-01,10.0,10.0


---
### Step 3

Choices. Now the issue is that the mfn tariffs are at the 10 digit level, but to match this thing up with US exports in a consistent way, we need to go to the 6 digit. My understanding is that across countries, only up to the 6 digit are things consistent (in fact you can see this when tyring to merge US exports with the tariffs at the 10 digit) level. 

So the solution will be the following. We will try a couple of different aggregations and see if it matters. Note that it appears that the Tariff Retaliation was at the 6 digit level (as there is no variation in tariffs across products within the 6 digit level). What variation this is missing is the initial level which (sometimes) does vary within products. 

**Updated** however you groupby does not matter for final results. 

In [43]:
tariffs_hs6_max = df.groupby(["hs6", "time_of_tariff"]).agg({"tariff": "max"})

In [44]:
tariffs_hs6_max.reset_index(inplace = True)

In [45]:
tariffs_hs6_max.set_index("time_of_tariff", inplace = True)

In [46]:
# Drop the 2019, 1, 1 observation as the 2019,1,2, supercedes it. 
# The above was the old comment, I don't think it's correct as the 1,2 one is autos
# the 1,1 is the mfn reduction.

# There tariffs are cummulative so the 2019,1,2 does supercede it

tariffs_hs6_max.drop(labels=dt.datetime(2019,1,1), axis = 0, inplace = True)

In [47]:
tariffs_hs6_max.reset_index(inplace = True)

In [48]:
tariffs_hs6_max.head(15)

Unnamed: 0,time_of_tariff,hs6,tariff
0,2018-01-01,10121,0.0
1,2018-04-02,10121,0.0
2,2018-05-01,10121,0.0
3,2018-07-01,10121,0.0
4,2018-07-06,10121,0.0
5,2018-08-23,10121,0.0
6,2018-09-24,10121,0.0
7,2018-11-01,10121,0.0
8,2019-01-02,10121,0.0
9,2019-06-01,10121,0.0


In [49]:
location = os.getcwd()

tariffs_hs6_max.to_csv(location + "\\data"+ "\\new_tariff_list_2020.csv",index = False)

In [20]:
grp = tariffs_hs6_max.groupby("time_of_tariff")

In [21]:
test = grp.tariff.median()

Here is one of the catagories that got hammerd and see how the tariff evloved

In [50]:
#Meat of swine, fresh, chilled or frozen: Frozen: Other: Other meat of swine, frozen

tariffs_hs6_max[tariffs_hs6_max.hs6== "020329"]

Unnamed: 0,time_of_tariff,hs6,tariff
540,2018-01-01,20329,12.0
541,2018-04-02,20329,37.0
542,2018-05-01,20329,37.0
543,2018-07-01,20329,37.0
544,2018-07-06,20329,62.0
545,2018-08-23,20329,62.0
546,2018-09-24,20329,62.0
547,2018-11-01,20329,62.0
548,2019-01-02,20329,62.0
549,2019-06-01,20329,62.0
