# Coffee Market Analysis
## Modeling Notebook

### Matthew Garton - June 2019

**Purpose:** The purpose of this notebook is to perform Exploratory Data Analysis on my coffee dataset, to examine relationships between variables, distributions, and try to determine which variables will be most useful for predicting coffee prices. I started a new notebook to try a different approach. Rather than starting with the full dataset - I want to separate each set of predictors to examine relationships one at a time (Fundamental Data, Weather Data, Technical Data). Once I have figured out the relationships that have value for prediction, then I can incorporate those into a model.

**Context**: The ultimate goal of my project is to develop trading signals for coffee futures. I will attempt to build a machine learning model which uses fundamental and technical data to predict the future direction of coffee futures price changes. My expectation at the outset of this project is that my feature matrix will include data on weather, GDP, and coffee production and exports in major coffee-producing nations, GDP and coffee import data in major coffee-importing nations, as well as volume, open-interest, and commitment of traders data for ICE coffee futures contracts.

This notebook imports a cleaner dataset that I prepared in the Data Wrangling Notebook, called CoffeeDataset. See '../data/' for all of the raw data that I started with, or the links in the Data Wrangling Notebook to get the data directly from the source.

In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import datetime

from statsmodels.regression.linear_model import OLS

pd.options.display.max_columns = 1000
pd.options.display.max_rows = 1000
%matplotlib inline

import warnings
warnings.filterwarnings("ignore")

In [2]:
# import the dataset
coffee = pd.read_csv('../data/CoffeeDataset.csv')
coffee['Date'] = pd.to_datetime(coffee['Date'])
coffee.set_index('Date', inplace=True)

In [3]:
df = coffee[['Settle','Production',
             'Disappearance','Exports',
             'Imports','Inventories']].resample('A').last()

In [4]:
df.rename(columns={'Settle':'Price'}, inplace=True)

In [5]:
def get_forward_returns(df, ranges):
    for r in ranges:
        df['Return'.format(r)] = df['Price'].pct_change(r).shift(-r)

In [6]:
get_forward_returns(df, [1])

### Regression 1: Using Levels to Predict Price

In [7]:
X = df[['Production', 'Disappearance', 'Exports', 'Imports', 'Inventories']]
y = df['Price']

model = OLS(y,X)
lr = model.fit()
lr.summary()

0,1,2,3
Dep. Variable:,Price,R-squared:,0.944
Model:,OLS,Adj. R-squared:,0.928
Method:,Least Squares,F-statistic:,60.6
Date:,"Thu, 27 Jun 2019",Prob (F-statistic):,1.25e-10
Time:,16:32:00,Log-Likelihood:,-112.19
No. Observations:,23,AIC:,234.4
Df Residuals:,18,BIC:,240.0
Df Model:,5,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
Production,5.1e-06,0.001,0.005,0.996,-0.002,0.002
Disappearance,0.0003,0.001,0.245,0.809,-0.003,0.003
Exports,-0.0068,0.003,-2.426,0.026,-0.013,-0.001
Imports,0.0089,0.002,3.785,0.001,0.004,0.014
Inventories,-0.0085,0.003,-3.362,0.003,-0.014,-0.003

0,1,2,3
Omnibus:,2.704,Durbin-Watson:,1.98
Prob(Omnibus):,0.259,Jarque-Bera (JB):,1.683
Skew:,0.661,Prob(JB):,0.431
Kurtosis:,3.09,Cond. No.,94.8


In [8]:
df['Price_Lag'] = df['Price'].shift(1)

px_93 = 71.55

df.fillna(value=px_93, inplace=True)
df.head()

Unnamed: 0_level_0,Price,Production,Disappearance,Exports,Imports,Inventories,Return,Price_Lag
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
1994-12-31,168.85,90646.3016,63742.220601,70716.497596,75024.0,14789.0,-0.437963,71.55
1995-12-31,94.9,93217.497,65593.109721,67871.9052,72371.0,9287.0,0.231823,168.85
1996-12-31,116.9,87056.4742,66780.173497,77685.14589,77854.0,7716.0,0.389649,94.9
1997-12-31,162.45,103251.642,66524.900726,80413.637943,81063.0,8447.0,-0.275162,116.9
1998-12-31,117.75,99666.991,67927.357276,80265.158973,82767.0,8204.0,0.069214,162.45


In [9]:
X = df[['Exports', 'Imports', 'Inventories','Price_Lag']]
y = df['Price']

model = OLS(y,X)

lr = model.fit()
lr.summary()

0,1,2,3
Dep. Variable:,Price,R-squared:,0.944
Model:,OLS,Adj. R-squared:,0.932
Method:,Least Squares,F-statistic:,79.76
Date:,"Thu, 27 Jun 2019",Prob (F-statistic):,1.32e-11
Time:,16:32:00,Log-Likelihood:,-112.21
No. Observations:,23,AIC:,232.4
Df Residuals:,19,BIC:,237.0
Df Model:,4,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
Exports,-0.0064,0.002,-2.761,0.012,-0.011,-0.002
Imports,0.0088,0.003,3.357,0.003,0.003,0.014
Inventories,-0.0083,0.003,-2.629,0.017,-0.015,-0.002
Price_Lag,0.0335,0.219,0.153,0.880,-0.425,0.492

0,1,2,3
Omnibus:,3.497,Durbin-Watson:,1.977
Prob(Omnibus):,0.174,Jarque-Bera (JB):,2.088
Skew:,0.721,Prob(JB):,0.352
Kurtosis:,3.319,Cond. No.,4150.0


### Regression 2: Using Levels to Predict Returns

In [10]:
X = df[['Exports', 'Imports', 'Inventories','Price_Lag']]
y = df['Return']

model = OLS(y,X)

lr = model.fit()
lr.summary()

0,1,2,3
Dep. Variable:,Return,R-squared:,0.105
Model:,OLS,Adj. R-squared:,-0.084
Method:,Least Squares,F-statistic:,0.5566
Date:,"Thu, 27 Jun 2019",Prob (F-statistic):,0.697
Time:,16:32:01,Log-Likelihood:,-93.528
No. Observations:,23,AIC:,195.1
Df Residuals:,19,BIC:,199.6
Df Model:,4,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
Exports,-0.0004,0.001,-0.373,0.714,-0.003,0.002
Imports,0.0006,0.001,0.491,0.629,-0.002,0.003
Inventories,-0.0004,0.001,-0.299,0.768,-0.003,0.003
Price_Lag,-0.0731,0.097,-0.752,0.461,-0.277,0.130

0,1,2,3
Omnibus:,55.183,Durbin-Watson:,1.044
Prob(Omnibus):,0.0,Jarque-Bera (JB):,333.504
Skew:,4.23,Prob(JB):,3.81e-73
Kurtosis:,19.626,Cond. No.,4150.0


### Regression 3: Using Changes to Predict Returns

In [12]:
fundamentals = ['Production', 'Disappearance', 'Exports', 'Imports', 'Inventories']

for f in fundamentals:
    df['{}_Chg'.format(f)] = df[f].diff()

In [13]:
X = df[['Production', 'Disappearance', 'Exports', 'Imports', 
        'Inventories','Production_Chg', 'Disappearance_Chg', 
        'Exports_Chg', 'Imports_Chg', 'Inventories_Chg','Price_Lag']]
y = df['Return']

model = OLS(y,X)

lr = model.fit()
lr.summary()

MissingDataError: exog contains inf or nans