__Problem Overview:__ Over $100B of coffee is traded annually each year.  This makes it second only to oil as the most-sought-after commodity on the planet. Investment firms, coffee farmers, and beverage giants all need to accurately project the price of coffee so they can maximize their profits.  The objective of this project is to utilize disparate data sets and create a machine learning model to precisely predicte the price/lb of coffee.  

__Target Variable & Importance:__ The price/lb of coffee in US dollars is the target variable. This model could be utilized by two types of organizations.  First, investment firms could harness the predictive power of the model for arbitrage. Identifying future contracts that diverge significantly from the model could lead to investment opportunities.  Second, corporations who purchase large amounts of coffee could use this model to lock in favorable prices using futures contracts. Ultimately, this would save firm's money and thus improve their profits. 

__Features (Explanatory Variables):__ The model features can be broken down into the following five groups:

- _Production:_ The supply of coffee will directly affect price.  Over production will lower prices while scarcity increases them. 

- _Key Producer Weather Patterns:_ Rain, wind, temperature, etc. in the coffee belt all directly affect crop yields i.e. supply. More fruitful harvests will likely result in lower prices and vice versa. 

- _Other Commodities:_ Oil, Wheat, corn and other major commodities likely move in tandem with coffee.  Since investors likely treat these assets similarly, they are most likely correlated with coffee. 

- _Market Movers:_ The “Big Four” coffee roasting companies – Kraft, P&G, Sara Lee and Nestle – buy about 50% of the coffee produced worldwide.  Their stock prices and variable costs will likely have predictive power. 

- _Past Prices:_ The past price of coffee will affect the future price.  A lagged variable can be created to capture this affect. 

__Goals & Success Metrics:__ The objective of this project is to utilize multiple data sources to create a functioning model that can accurately predict future coffee prices. Success can be measured by back testing the model to see if it has the ability to identify mispriced futures contracts and create opportunities for arbitrage.  Additionally, from a quantitative perspective, the accuracy of the model will be measured by its R^2 value and then benchmarked relative to other researchers' models.  

__Model Limitations:__ Geopolitical risk, crop disease, and technological innovation are three critical factors not captured in this model.  All three could cause major supply shocks resulting in signficant price fluctuations. 

Regarding geopolitical risk, many of coffee's primary producers are in the developing world.  This means production is exposed to trade negotiations, political instability, wars, and famine to name a few.  These events could materially alter prices.

As for crop disease, it is unclear to what extent it may or may not affect prices.  However, future diseases could lead to dramatically lower harvests and thus higher prices. 

On the flip side, improved technology such as better farm equipment or more resilient coffee varieties could increase harvest yields and thus lower prices.  

__Data Risk:__ Data quality and availability is a a major risk for this project.  Through preliminary research, data for historical production, commodity prices, and coffee prices has been identified.  However, the availability of public weather data is uncertain due to the current partial government shutdown.  Additionally, historical stock prices could be difficult to find.  The absence of these datasets would likely affect the model's predictive power. 


In [9]:
#Reading the target variable dataset

import pandas as pd
from pathlib import Path

data = Path('..', 'Final Project', 'Datasets', 'historic_coffee_prices.csv') 
hist_cp = pd.read_csv(data)

print(hist_cp)




                               Macrotrends Data Download Unnamed: 1
0                                                    NaN        NaN
1               Coffee Prices - 45 Year Historical Chart        NaN
2                                                    NaN        NaN
3      DISCLAIMER AND TERMS OF USE: HISTORICAL DATA I...        NaN
4      FOR INFORMATIONAL PURPOSES - NOT FOR TRADING P...        NaN
5      NEITHER MACROTRENDS LLC NOR ANY OF OUR INFORMA...        NaN
6      FOR ANY DAMAGES RELATING TO YOUR USE OF THE DA...        NaN
7                                                    NaN        NaN
8      ATTRIBUTION: Proper attribution requires clear...        NaN
9      A "dofollow" backlink to the originating page ...        NaN
10                                                   NaN        NaN
11                                                   NaN        NaN
12                                                  date      value
13                                              