## Overview
**Summary:** The 'Allen-Unger Global Commodity Prices' dataset is a vast collection of commodity prices from the 10th to 20th century. The 165 unique commodities include Agricultural (i.e., cheese, butter, salt), Energy (i.e., oil, coal, wood), and Industrial (i.e., iron, gold, copper) commodities valued in over 70 currencies and in nearly 200 locations across the globe. 
While there is a considerable breadth of data, 80% of the records are prices from European locations during the 15th – 19th centuries and over 40% of the data relates to grain prices. The authors of the dataset acknowledge that many series’ are not complete, and assumptions were made to standardize data across sources. Measures and currencies have been converted into standard values for comparability. 

The dataset can be used to investigate both broad themes and specific event throughout the past 1,000 years. The presences of volatility in commodity prices in particular locations can signify macro events throughout history.


**Question:** How did the British East India Company's (EIC) creation and involvement in the West Indies influence commodity prices in England. 


**Time period:** 1600 – 1875


**Background:** While there is a distinct date for the formation of the EIC, other countries had established land and sea trade routes with the East. These foreign commodities came with significant tariffs and supply restrictions. 


**Hypothesis:** Imported commodities, such as tea, pepper, and ginger, will decrease in price and volatility in England as trade is established with the East through the British East India Company.

## Initial environment set up
There is one original CSV file in the original version of the dataset. I added two supplemental CSV files to enhance the existing data (location & commodity categories).   

**The hidden code cells define set up the environment, libraries, and functions throughout the notebook. Click on the "Code" button in the published kernel to reveal the hidden code.** 

In [None]:
## Set up of environment and libraries 

from mpl_toolkits.mplot3d import Axes3D
from sklearn.preprocessing import StandardScaler
import matplotlib.pyplot as plt # plotting
import numpy as np # linear algebra
import os # accessing directory structure
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)
from scipy.stats import linregress as lg
import seaborn as sns 

## Available Files
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))

Let's check the files and take a quick look at what the data looks like.

In [None]:
nRowsRead = None
df1 = pd.read_csv('/kaggle/input/allenunger-global-commodity-prices/all_commodities.csv', delimiter=',')
df1.dataframeName = 'all_commodities.csv'
nRow, nCol = df1.shape
print(f'There are {nRow} rows and {nCol} columns in the original CSV file')

nRowsRead = None 
df2 = pd.read_csv('/kaggle/input/addition-labels-commodities/commodities_location_data.csv', delimiter=',')
df2.dataframeName = 'commodities_location_data.csv'
nRow, nCol = df2.shape
print(f'There are {nRow} rows and {nCol} columns in the Location CSV file')

nRowsRead = None 
df3 = pd.read_csv('/kaggle/input/addition-labels-commodities/commodity_categories.csv', delimiter=',')
df3.dataframeName = 'commodity_categories.csv'
nRow, nCol = df3.shape
print(f'There are {nRow} rows and {nCol} columns in the Categories CSV file')

Let's take a look at a sample of the three CSV files in order. 

In [None]:
df1.sample(n=3)

In [None]:
df2.sample(n=5)

In [None]:
df3.sample(n=5)

Next, we'll merge the three files into one data frame, reorganize the columns, rename the columns, and view a sample of the data. From there, we'll understand the type of data in each column. 

In [None]:
df4 = df1.merge(df2, on='Location') ## Merge Location labels
df4 = df4.merge(df3, on='Commodity') ## Merge Commodity labels

## Reorg. columns and print sample
df4 = df4[["Item Year" , "Commodity" , "Primary_Cat" , "Sub_Cat" , "Location" , "Country" , "Continent" , "Original Value" , "Standard Value" , "Original Currency" , "Standard Currency" , "Orignal Measure" , "Standard Measure" , "Variety"]]

#Rename columns
df4.columns = ["Item_Year" , "Commodity" , "Primary_Cat" , "Sub_Cat" , "Location" , "Country" , "Continent" , "Original_Value" , "Standard_Value" , "Original_Currency" , "Standard_Currency" , "Orignal_Measure" , "Standard Measure" , "Variety"]

df4.sample(n=3)

In [None]:
df4.dtypes

In the next section, we'll dive into the data.   

## Reviewing the data
We'll start by viewing the complete list of commodities and locations before summarizing the data to get a better idea of the locations, commodities, and categories available along with our first visualization of the dataset as a whole. 

In [None]:
print ("Unique Commodities")
print ("--------------------------------")
print(df4.Commodity.unique())

In [None]:
print ("Unique Locations")
print ("--------------------------------")
print(df4.Location.unique())

In [None]:
df_top_loc = df4['Location'].value_counts()
print ("Count by Location")
print ("---------------------")
print(df_top_loc.head(10))

In [None]:
df_top_com = df4['Commodity'].value_counts()
print("Count by Commodity")
print ("---------------------")
print(df_top_com.head(10))

In [None]:
df_top_sub = df4['Sub_Cat'].value_counts()
print("Count by Sub-category")
print ("---------------------")
print(df_top_sub.head(10))

In [None]:
amount_val = df4['Standard_Value'].value_counts()
time_val = df4['Item_Year'].values

sns.distplot(time_val, color='blue').set_title('Distribution of data by Year (All)', fontsize=14)
plt.xlabel('Item Year', fontsize=12)
plt.ylabel('Percentage of Population', fontsize=12)


While there is a considerable breadth of data, a majority of the data is from the 15th – 19th centuries and over 40% of the data relates to grain prices.

Now that we have a high-level understanding of the data available let's look at England specifically. In the original dataset, there is a label in the 'Location' column with the name 'England'. If you refer back to the 'Unique Locations' data view, most labels are cities, rather than a country. We also saw earlier that 'England' is the most robust location. There are also a handful of other cities within England in our dataset. Let's review both 'England' and specific cities within England.

In [None]:
df_England0 = df4[df4['Country'].isin(["England"]) & (df4.Location == "England")]
df_England1 = df_England0['Location'].value_counts()
print ("Count by 'England' Location")
print(df_England1.head(10))

print ("\n-------------------------- \n")

df_England2 = df_England0['Sub_Cat'].value_counts()
print("Commodity Sub-category by 'England' Location")
print(df_England2.head(10))

In [None]:
df_hist = df4[df4['Country'].isin(["England"]) & (df4.Location == "England")]

amount_val = df_hist['Standard_Value'].value_counts()
time_val = df_hist['Item_Year'].values

sns.distplot(time_val, color='blue').set_title('Distribution of data by Year (England)', fontsize=14)
plt.xlabel('Item Year', fontsize=12)
plt.ylabel('Percentage of Population', fontsize=12)

The visual reinforces that the 'England' data is robust, and the data available in the selection set increases over time until the late 19th century. Now, let’s look at the cities within England.

In [None]:
df_England0 = df4[df4['Country'].isin(["England"]) & (df4.Location != "England")]
df_England1 = df_England0['Location'].value_counts()
print ("Count by Location ex. 'England' Locations")
print(df_England1.head(10))

print ("\n---------------------------------------------- \n")

df_England2 = df_England0['Sub_Cat'].value_counts()
print("Commodity Sub-category ex. 'England' Locations")
print(df_England2.head(10))

There are just under 3,000 records, all in the grain category. We'll exclude these locations for the purpose of our analysis. Now that we have an understanding of our data, we can dive into a few data cleansing steps.

## Data cleansing 

Let's create a few dataframes to use throughout the rest of the analysis. We'll be looking at ginger, pepper, and tea.

In [None]:
## Dataframe for three commodities

df_Ginger = df4[(df4.Commodity == "Ginger") & (df4.Location == "England")]
df_Pepper = df4[(df4.Commodity == "Pepper") & (df4.Location == "England")]
df_Tea = df4[(df4.Commodity == "Tea") & (df4.Location == "England")]


df_Ginger.head(5)

We can see from the ‘Item_Year’ column above that we don't have data for sequential years. The lack of sequential data will throw off our analysis. Let’s add in missing years in the series and use the 'interpolate' function to populate these new years with values to form a linear path between actual data points.
 
 
We'll perform this operation for all three selections sets and see a first few rows so that we can view the 'Standard Value' for each. 

In [None]:
#fill unreported data function (Ginger)

commodity = "Ginger" 
data_fill = df_Ginger
start_date = int(data_fill["Item_Year"].min())

#Add blank columns
new_index = pd.Index(np.arange(start_date,1900,1), name="Item_Year")
data_fill.set_index("Item_Year").reindex(new_index)
data_fill = data_fill.set_index("Item_Year").reindex(new_index).reset_index()

# Fill Commodity and Location data
data_fill[['Commodity']] = data_fill[['Commodity']].fillna(value= commodity)
data_fill[['Location']] = data_fill[['Location']].fillna(value="England")

## Interpolate (liner method)
data_fill['Standard_Value'].interpolate(method='linear', inplace=True)

#apply to dataframe 
df_Ginger = data_fill

#apply to dataframe 
df_Ginger.head(5)

In [None]:
#fill unreported data function (Pepper)

commodity = "Pepper" 
data_fill = df_Pepper
start_date = int(data_fill["Item_Year"].min())

#Add blank columns
new_index = pd.Index(np.arange(start_date,1900,1), name="Item_Year")
data_fill.set_index("Item_Year").reindex(new_index)
data_fill = data_fill.set_index("Item_Year").reindex(new_index).reset_index()

# Fill Commodity and Location data
data_fill[['Commodity']] = data_fill[['Commodity']].fillna(value= commodity)
data_fill[['Location']] = data_fill[['Location']].fillna(value="England")

## Interpolate (liner method)
data_fill['Standard_Value'].interpolate(method='linear', inplace=True)

#apply to dataframe 
df_Pepper = data_fill

#apply to dataframe 
df_Pepper.head(5)

In [None]:
#fill unreported data function (Tea)

commodity = "Tea" 
data_fill = df_Tea
start_date = int(data_fill["Item_Year"].min())

#Add blank columns
new_index = pd.Index(np.arange(start_date,1900,1), name="Item_Year")
data_fill.set_index("Item_Year").reindex(new_index)
data_fill = data_fill.set_index("Item_Year").reindex(new_index).reset_index()

# Fill Commodity and Location data
data_fill[['Commodity']] = data_fill[['Commodity']].fillna(value= commodity)
data_fill[['Location']] = data_fill[['Location']].fillna(value="England")

## Interpolate (liner method)
data_fill['Standard_Value'].interpolate(method='linear', inplace=True)

#apply to dataframe 
df_Tea = data_fill

#apply to dataframe 
df_Tea.head(5)


Now that we have sequential data, we can compute a few statistics for each dataframe. We'll compute these calculations here and add them to each dataframe. The statistics of the 'Standard Value' include standard deviation (rolling 10- and 20-year), 20-year rolling average, and the coefficient of variance (CoV). Finally, we'll make our last two dataframes to view the data together. 

In [None]:
#stats
df_Ginger['STD_10'] = df_Ginger.Standard_Value.rolling(window=10).std()
df_Pepper['STD_10'] = df_Pepper.Standard_Value.rolling(window=10).std()
df_Tea['STD_10'] = df_Tea.Standard_Value.rolling(window=10).std()

df_Ginger['STD_20'] = df_Ginger.Standard_Value.rolling(window=20).std()
df_Pepper['STD_20'] = df_Pepper.Standard_Value.rolling(window=20).std()
df_Tea['STD_20'] = df_Tea.Standard_Value.rolling(window=20).std()

df_Ginger['Ave_20'] = df_Ginger.Standard_Value.rolling(window=20).mean()
df_Pepper['Ave_20'] = df_Pepper.Standard_Value.rolling(window=20).mean()
df_Tea['Ave_20'] = df_Tea.Standard_Value.rolling(window=20).mean()

df_Ginger['CoV'] = df_Ginger.STD_20 / df_Ginger.Ave_20 
df_Pepper['CoV'] = df_Pepper.STD_20 / df_Pepper.Ave_20
df_Tea['CoV'] = df_Tea.STD_20 / df_Tea.Ave_20

## final dataframes
df_all = pd.concat([df_Ginger, df_Pepper, df_Tea])
df_all = df_all.sort_values(by=['Item_Year'])

df_GP = pd.concat([df_Ginger, df_Pepper])
df_GP = df_GP.sort_values(by=['Item_Year'])

Now that we've cleaned and framed the data, let's dive into the analysis.

## Analysis
For our analysis, we'll review the specific years around the creation of the EIC (1600) for sharp changes in prices, long-term price impacts, and stability in prices.
 
In the 1600s, England was a globally established nation with access to many resources. The trade with Asia was specific to spices, textiles, and tea. Additional trade routes were established further east (China, Japan) later in the 18th century. Keep in mind, other countries had established trade routes in the 16th century, but these countries imposed high tariffs on certain commodities. Further, the abundance of political instability proliferated throughout the 250 years following the creation of the EIC. Local disputes also had a large effect on prices and stability.
 
First, we need to identify commodities that will be affected by the newly established nationalized trade route.

### **Indicative commodities**

* **Ginger**: Data for ginger starts as early as 1265. Ginger was primarily grown in the East Indies in the 1600s. Prices decreased significantly in the first hundred years (1600 - 1700) of the EIC. England saw significantly lower and more stable prices.


* **Pepper**: Pepper had very similar characteristics and price movements as ginger. Pepper was previously available via Portuguese and Dutch traders. However, pepper carried significant foreign tax. The establishment of a nationalized trade route meant the British could see a reduced price. We’ll see this come out through the data.    


* **Tea**: Data for tea begins in 1673 and is consistent through the 1800s. Tea was primarily sourced from China. The British saw steadily declining tea prices from the beginning of the dataset. The British built up a large deficit in the early day of the relationship, as the Chinese desired silver (rather than gold). In the late 1700s, the British used India to grow opium, which was illegal in China. As the Chinese became increasingly addicted to opium, the British exploited this relationship to export tea and eventually the silver. The Opium Wars in the 1830s and 1840s are macro factors to consider that will also influence prices.
 
The remaining analysis will focus only on indicative commodities. Many other commodities were considered for this analysis, however, these commodities proved to be unsuitable or non-indicative. 


Let's look at our three commodities over the whole period first so can establish a baseline for the price movements.

In [None]:
## formatting for charts
def chart_format (com1, func, start_date, end_date):
    plt.xlabel("Item Year", fontsize=14)
    plt.ylabel("Standard Value", fontsize=14)
    plt.xticks(fontsize=10)
    plt.yticks(fontsize=10)
    plt.title(f'{com1} {func} in England ({start_date} - {end_date})', fontsize=14)
    plt.legend(loc='best')
    
## all commodities, two axis
def all_chart (value, func, start_date, end_date, xmax, ymax):
    df_x = df_GP.pivot(index = 'Item_Year', columns = 'Commodity', values = value)
    ax1 = df_x.plot()

    df_y = df_Tea.pivot(index = 'Item_Year', columns = 'Commodity', values = value)
    ax2 = ax1.twinx()
    ax2.spines['right'].set_position(('axes', 1.0))
    df_y.plot(ax=ax2, color = 'g')


    ax2.set_xlim(start_date,end_date)
    ax1.set_xlim(start_date,end_date)
    ax1.set_ylim(0, xmax)
    ax2.set_ylim(0,ymax)

    ax1.legend(loc=0)
    ax1.legend(loc=2)
    ax1.set_xlabel("Item Year")
    ax1.set_ylabel("Stanard Value (Ginger & Pepper)")
    ax2.set_ylabel("Stanard Value (Tea)")
    
    chart_format ("Ginger, Pepper, & Tea \n", func, start_date, end_date)

In [None]:
all_chart('Standard_Value', 'prices', 1550, 1850, 120, 600)

We can see that ginger and pepper move in a similar fashion and time frame, while tea has a similar appearance, but about 100 years later.

## Charts and individual commodities



### **Ginger** 
Let's explore ginger in the 40 years leading up to the creation of the EIC as well as the first 100 years after its creation.

In [None]:
## Single Line, no trend
def single_line (com1, data_frame, start_date, end_date):
    
    df_D = data_frame
    df_D = df_D[df_D.Item_Year <= end_date]
    df_D = df_D[df_D.Item_Year >= start_date]

    #Title filter
    start_date = int(df_D["Item_Year"].min())
    end_date = int(df_D["Item_Year"].max())
    
    ## Plot
    df_D = df_D.pivot(index = 'Item_Year', columns = 'Commodity', values = 'Standard_Value' )
    df_D.plot()

    chart_format (com1, "prices", start_date, end_date)

In [None]:
single_line ("Ginger", df_Ginger, 1560, 1700)

In [None]:
#Scatter Chart    
def scatter_one_trend (com1, data_frame, start_date, end_date, period):
    
    #Filter
    df_B = data_frame
    df_B = df_B[df_B.Item_Year <= end_date]
    df_B = df_B[df_B.Item_Year >= start_date]

    #Title filter
    start_date = int(df_B["Item_Year"].min())
    end_date = int(df_B["Item_Year"].max())
    
    ## Plot
    x = df_B.Item_Year
    y = df_B.Standard_Value
    plt.scatter(x, y, label = com1)

    #Slope
    stats = lg(x, y)
    m = stats.slope
    b = stats.intercept
    plt.plot(x, m * x + b, color="red", label = f'Slope: {round(m,2)}')


    # Rolling
    df_B['Standard_Value'] = df_B.iloc[:,8].rolling(window=period).mean()
    plt.plot(x, y, color="green", label=f'Rolling {period}-Year Ave.')

    chart_format (com1,"prices", start_date, end_date)

In [None]:
scatter_one_trend ("Ginger",df_Pepper, 1560, 1700, 20) 

While ginger prices are declining, we can see the prices began declinng just prior to 1600. The notable change is in the low prices in the late 17th century. Let's look at the standard deviation over this period as well to consider the stability of the ginger prices.

In [None]:
def std_chart (com1, data_frame, start_date, end_date, STD):
     
    # Filter
    df_F = data_frame
    df_F = df_F[df_F.Item_Year <= end_date]
    df_F = df_F[df_F.Item_Year >= start_date]
    
    #Title filter
    start_date = int(df_F["Item_Year"].min())
    end_date = int(df_F["Item_Year"].max())
    
    df_F = df_F.pivot(index = 'Item_Year', columns = 'Commodity', values = STD)
    df_F.plot(color = ["g", "gray"], label=f'Rolling 20-Year St. d')
    
    chart_format (com1,"20-year rolling St.d \n", start_date, end_date)         

In [None]:
std_chart("Ginger", df_Ginger, 1580, 1700, "STD_20")

While the standard deviation is quite useful, it doesn't give us the full measure of volatility. As prices decline, the standard deviation will also decline (ceteris paribus). Let's consider the Coefficient of Variation ("CoV").

The CoV is the quotient of the standard deviation and the mean over a period. For the purpose of this analysis, we’ll use the CoV over a 20-year rolling period.

In [None]:
#CoV Chart
def cov_chart (com1, data_frame, start_date, end_date, period):
    
    # Filter
    df_G = data_frame
    df_G = df_G[df_G.Item_Year <= end_date]
    df_G = df_G[df_G.Item_Year >= start_date]

    #Title filter
    start_date = int(df_G["Item_Year"].min())
    end_date = int(df_G["Item_Year"].max())
    
    ## Plot
    x = df_G.Item_Year
    y = df_G.Standard_Value
    
    ## Stats
    std = df_G.loc[0:,'Standard_Value'].std()
    ave = df_G.loc[0:,'Standard_Value'].mean()
    
    df_G['Standard_Value'] = df_G.iloc[:,8].rolling(window=period).std() / df_G.iloc[:,8].rolling(window=period).mean()
    plt.plot(x, y, label=f'Rolling {period}-Year CoV')
    
    ## Formatting 
    chart_format (com1,f'{period}-year rolling CoV \n', start_date, end_date)

In [None]:
cov_chart("Ginger", df_Ginger, 1560, 1700, 20)

As we can see, the CoV decreased over time as England established trade routes. The British were involved in many wars over the year. The wars may have diverted resources away from commerce and caused a spike in the CoV.



### **Pepper**

Next, let's look at pepper prices, standard deviation, and CoV. 

In [None]:
single_line ("Pepper", df_Pepper, 1560, 1700)

In [None]:
scatter_one_trend ("Pepper",df_Pepper, 1560, 1700, 20) 

In [None]:
std_chart("Pepper", df_Pepper, 1580, 1700, "STD_20")

In [None]:
cov_chart("Pepper", df_Pepper, 1560, 1700, 20)

Beginning in 1600, pepper prices showed a steady decline, which continued for nearly 100 years. We can also see a steep decline in the standard deviation, while the CoV continued to oscillate. We’ll draw further conclusions from this shortly, but can primarily attribute the decreased standard deviation to the decline in prices. 

### **Tea**

Finally, let's look at the same charts for tea. 

In [None]:
single_line ("Tea", df_Tea, 1500, 1850)

In [None]:
scatter_one_trend ("Tea", df_Tea, 1670, 1850, 20)

In [None]:
std_chart("Tea", df_Tea, 1670, 1840, "STD_20")

In [None]:
cov_chart("Tea",df_Tea, 1670, 1850, 20)

It's clear that prices and standard deviation dropped significantly during this period. While the price of tea declined over four-fold in the first 100 years, I'd like to focus on the period once the British started producing opium in India to import to China illegally. 

Let's look at the stability in prices leading up to the Opium Wars in the late 1830s on a 10-year rolling basis. 

In [None]:
cov_chart("Tea",df_Tea, 1760, 1850, 10)

The 10-year rolling CoV reached its lowest value in the late 1820s as the Chinese citizens became addicted to the British's stable source of opium. While we can see the CoV drop in the 1820s, the Opium War caused the CoV to spike leading up to the 1840s. 

Now that we’ve visualized the data, we can begin to draw a conclusion by viewing the commodities in parallel.

## Aggregated views

As we've seen above, ginger, pepper, and tea all saw a reduction in price and standard deviation over the period. The rolling 20-year mean and standard deviation illustrate the declines nicely, clearly showing prices and variation in prices both were impacted by the EIC. Let's look at these charts with all three commodities. 

In [None]:
all_chart('Ave_20', '20-year rolling ave.', 1720, 1850, 75, 500)

In [None]:
all_chart('STD_20', '20-year rolling St.d', 1560, 1850, 25, 150)

Ginger and pepper saw their standard deviations fall together. However, the causes may vary. Here's the same chart as above, but on a shorter time period. 

In [None]:
std_chart("Ginger & Pepper", df_GP, 1580, 1700, "STD_20")

Next, we’ll look at the 20-year rolling average price and CoV for ginger and pepper to understand if the reduction of standard deviation was absolute (due to primarily falling prices) or relative (falling prices paired with reduced volatility).

In [None]:
def price_cov (com1, data_frame, start_date, end_date, xmax, ymax):
    
    df_PC = data_frame
    
    df_x = df_PC.pivot(index = 'Item_Year', columns = 'Commodity', values = 'Ave_20')
    ax1 = df_x.plot()

    df_y = df_PC.pivot(index = 'Item_Year', columns = 'Commodity', values = 'CoV')
    ax2 = ax1.twinx()
    ax2.spines['right'].set_position(('axes', 1.0))
    df_y.plot(ax=ax2, color = 'g')


    ax2.set_xlim(start_date,end_date)
    ax1.set_xlim(start_date,end_date)
    ax1.set_ylim(0, xmax)
    ax2.set_ylim(0,ymax)

    chart_format (com1, "20-year rolling avearge price and CoV \n", start_date, end_date)
       
    ax1.legend(loc=2)
    ax2.legend(['CoV'])
    ax1.set_xlabel("Item Year")
    ax1.set_ylabel("20-year rolling Stanard Value")
    ax2.set_ylabel("CoV")

In [None]:
price_cov ("Ginger", df_Ginger, 1580, 1700, 80, .5)

Ginger saw a reduction of both average price and relative volatility as evidenced by the falling CoV. Again, a decreasing standard deviation paired with falling prices only shows a decrease in the absolute volatility of prices. 

Let’s look at the same chart for pepper.

In [None]:
price_cov ("Pepper", df_Pepper, 1580, 1700, 70, .5)

Pepper saw a reduction in standard deviation, but the CoV continued to oscillate. From this, we can draw the conclusion that the lower standard deviation in absolute terms was primarily driven by the falling price and not relative stability in pepper prices. 

Let’s look at the same chart for tea.

In [None]:
price_cov ("Tea", df_Tea, 1680, 1850, 500, .5)

Tea saw a large reduction in price, but the CoV continued to oscillate similar to pepper. Again, we can draw the conclusion that the lower standard deviation in absolute terms was primarily driven by the falling price and not relative stability in tea prices.

Finally, let’s view the CoV for all three commodities over the whole period to understand the relative volatility over the whole period.

In [None]:
all_chart('CoV', '20-year rolling CoV', 1560, 1850, .55, .55)

The above chart shows that the CoV is inconclusive as volatility is still present throughout the period. The relative volatility over the long-term can be attributed to general instability during the period.

## The rise of ginger and pepper

As we noticed in our initial price chart coving the whole period, there was a significant rise in pepper and ginger prices toward the end of the 18th century. As we've learned, opium was a dominate focus of the EIC. However, let's review historical evidence in conjunction with the pricing data to determine if the rise in prices in the early 1800, which is counter to our hypothesis, can be ruled out as counterevidence. Let's look at the ginger and pepper prices leading up to the turn of the 19th century.

In [None]:
single_line ("Ginger & Pepper", df_GP, 1760, 1850)

We can see there was a distinct period around 1790 in which both commodity prices spiked significantly. The dominate opium trade is the likely reason. The EIC took over the opium monopoly in 1757 but needed to popularize the drug in China to drive addiction and thus demand. 

Our initial dataset does not contain opium prices (or quantities), but additional research shows the EIC increased Chinese opium imports from, “1,000 chests [1 chest = ~140 lbs.] in 1767 and then to about 10,000 per year between 1820 and 1830. By 1838 the amount had grown to some 40,000 chests imported into China annually” [1]. 
Another source cites demand in light of the Chinese opium ban as, “opium exports from India to China rose from just 75 mt [metric tons] in 1775 to just under 300 mt by 1800 and more than 2,500 mt by 1839” [2].  

Converting these final metrics, we can corroborate that about 5.5 million pounds of opium were imported annually into China in the late 1830s. 
Along with the addictive nature of opium, one source cites that demand was fueled by the EIC’s loss of its monopoly, beginning as early as 1813, causing opium to be even more prevalent in China. [2]

Further, historical references cite Bengal, among other regions, as a primary source of opium production. “In Bengal, the land designated for opium growing stretched for 500 miles with more than a million registered farmers growing opium plants for the East India Company in 500,000 acres of prime land" [3]. 

Bengal has a climate that can also be used to grow ginger and pepper. Additioanlly, there are multiple accounts of the British forcing local farmers into the production of opium around the turn of the century. The lack of supply of ginger and pepper, due to a shift to opium production, was likely the primary reason for ginger and pepper’s price increases. Another contributing factor can be found in the transportation constraints:

> The opium trade became so important that traditional ships were no longer sufficient to bear the volume of the flow. They were superseded in the 1830s by specially designed ‘opium clippers’ which were heavily armed to protect their high-value cargo from pirates (or the Chinese authorities) and much faster than traditional ships [2]

The EIC couldn’t deliver the large quantities of opium to the Chinese quick enough, and even went as far as to design new ships specifically for opium transport. This immense effort likely diverted transportation resources for British imports and caused ginger and pepper prices in England to increase during this period. The invention of opium-specific cargo ships in the 1830s likely freed up previously used ships to resume non-opium commodities imports to England. Thus, explain the increase and decrease in prices through these historical events.

Using this information we can dismiss ginger and pepper as indicative commodities as the rise of opium production began in India. 


[1] https://www.britannica.com/topic/opium-trade <br>
[2] https://www.unodc.org/documents/wdr/WDR_2008/WDR2008_100years_drug_control_origins.pdf<br>
[3] https://pdfs.semanticscholar.org/a5c8/58b8a481cd61a4bd6484ba1e4b438f304ee2.pdf


## Quantifying the results 

While the visual representations are compelling, we can quantify the impact by looking at the minimum, maximum, and percentage change for the 20-year rolling 'Standard Value' mean and standard deviations.

In [None]:
#stats function

def stats  (com1, data_frame, start_date, end_date):
    
    df_H = data_frame
    df_H = df_H[df_H.Item_Year <= end_date]
    df_H = df_H[df_H.Item_Year >= start_date]

    #rolling average
    ra_yearx = df_H.loc[df_H['Ave_20'].idxmax()]['Item_Year']
    ra_valuex = round(df_H.loc[df_H['Ave_20'].idxmax()]['Ave_20'],2)
    ra_yearm = df_H.loc[df_H['Ave_20'].idxmin()]['Item_Year']
    ra_valuem = round(df_H.loc[df_H['Ave_20'].idxmin()]['Ave_20'],2)
    
    
    print (f'{com1} \n----------------------------------------------')
    print (f'The maximum 20-year rolling average price for {com1} occurred in {ra_yearx} with a value of {ra_valuex}')                
    print (f'The minimum 20-year rolling average price for {com1} occurred in {ra_yearm} with a value of {ra_valuem}')
    print ("----------------------------------------------------- ")
    
    #st.d 
    st_yearx = df_H.loc[df_H['STD_20'].idxmax()]['Item_Year']
    st_valuex = round(df_H.loc[df_H['STD_20'].idxmax()]['STD_20'],2)
    st_yearm = df_H.loc[df_H['STD_20'].idxmin()]['Item_Year']
    st_valuem = round(df_H.loc[df_H['STD_20'].idxmin()]['STD_20'],2)
                    
    print (f'The maximum standard deviation occurred in {st_yearx} with a value of {st_valuex}')
    print (f'The minimum standard deviation occurred in {st_yearm} with a value of {st_valuem}')
    print ("----------------------------------------------------- ")
    

    #changes 
    print (f'The 20-year rolling average price for {com1} decreased by {int(round(((ra_valuex - ra_valuem)/ra_valuex)*100,0))}% between its high in {ra_yearx} and low in {ra_yearm}') 
    print (f'The 20-year rolling standard deviation of {com1} decreased by {int(round(((st_valuex - st_valuem)/st_valuex)*100,0))}% between its high in {st_yearx} and low in {st_yearm}') 

In [None]:
stats ("Ginger", df_Ginger, 1580, 1700)

In [None]:
stats ("Pepper", df_Pepper, 1580, 1700)

In [None]:
stats ("Tea", df_Tea, 1580, 1850)

Each of our indicative commodities showed a significant decrease in price and standard deviation over the period. Ginger and pepper saw these relative highs and lows in the beginning and end of the 17th century, respectively. Tea realized an even more dramatic decline since its inception to the end of the period. 

The decline in the absolute standard deviation was primarily driven by the steep decline in price for all commodities but also supported throughout various sub-periods with lower relative volatility as well. 

## Conclusion
We set out to answer how the British East India Company's creation and involvement in the West Indies influenced commodity prices in England. While England already had access to many global commodities, tariffs and political uncertainty caused prices to be high and volatile.
 
The analysis above has isolated three commodities that were significantly impacted by the EIC. Two spices, ginger and pepper, saw a steep decline in price and increase in stability during the 17th century. This price decline lasted until opium diverted production capacity in India at the turn of the 19th century. 
 
We discovered that tea change most dramatically, especially as the British dominated the Chinese market by growing opium in India to use in exchange. The price continued to decline but reached peak stability just prior to the Opium Wars in the late 1830s.
 
The British were involved in numerous wars through the 17th through 19th centuries. Regional conflicts caused dramatic price swing despite the monopolistic control. The decline in absolute standard deviation for all three commodities was primarily due to lower prices. 
 
Overall, the creation of the EIC caused the prices of certain imports from Asia to become less expensive and more stable in England. 