## Gold Price - Analysis & Forecasting

[![green-arrow-over-gold-bars.jpg](https://i.postimg.cc/XYgHHFT4/green-arrow-over-gold-bars.jpg)](https://postimg.cc/Yvhzj4rs)

### Table of Contents

* [Import libraries & data](#section1)
    * [Import libraries](#section_1.1)
    * [Import data](#section_1.2)
* [Exploratory data analysis](#section2)
    * [Understanding the data](#section2.1)
    * [Visual Analysis](#section2.2)
* [Time series - Forecasting Model](#section3)
    * [Train - test split](#section3.1)
    * [Model 1 - Simple Linear Regression](#section3.2)
    * [Model 2 - Naive Prediction](#section3.3)
    * [Model 3 - Simple Average](#section3.4)
    * [Model 4 - Moving Average](#section3.5)
    * [Model 5 - Simple Exponential smoothing](#section3.6)
    * [Model 6 - Double Exponential smoothing](#section3.7)
    * [Model 7 - Triple Exponential smoothing](#section3.8)
* [Final Model prediction](#section4)

### Import libraries & data <a class="anchor" id="section1"></a>

### Import libraries <a class="anchor" id="section_1.1"></a>

In [None]:
# Import necessary libraries
import numpy as np
import pandas as pd
import seaborn as sns
from matplotlib import pyplot as plt

# Import model libraries
from sklearn.linear_model import LinearRegression
from statsmodels.tsa.api import ExponentialSmoothing, SimpleExpSmoothing, Holt

# Suppress the warnings
import warnings
warnings.filterwarnings("ignore")

### Import data <a class="anchor" id="section_1.2"></a>

In [None]:
# Import the price data
df = pd.read_csv("../input/gold-prices/monthly_csv.csv")
df.head()

In [None]:
df.shape

### Exploratory Data Analysis <a class="anchor" id="section2"></a>

### Understanding the data <a class="anchor" id="section2.1"></a>

In [None]:
print(f"Date range of gold prices available from - {df.loc[:,'Date'][0]} to {df.loc[:,'Date'][len(df)-1]}")

In [None]:
date = pd.date_range(start='1/1/1950', end='8/1/2020', freq='M')
date

In [None]:
df['month'] = date
df.drop('Date',axis=1,inplace=True)
df = df.set_index('month')
df.head()

In [None]:
df.plot(figsize=(20,8))
plt.title("Gold price (Monthly) since 1950")
plt.xlabel("Months")
plt.ylabel("Price")
plt.grid();

In [None]:
round(df.describe(),3)

**Inference**

1. The Average gold price in last 70 years is **$416.56**

2. **Only 25% of the time**, the gold price is above **$447.07**

3. Highest Gold price ever touched is **$1840.81**

### Visual Analysis <a class="anchor" id="section2.2"></a>

In [None]:
_, ax = plt.subplots(figsize=(25,8))
sns.boxplot(x = df.index.year,y = df.values[:,0],ax=ax)
plt.title("Gold price (Monthly) since 1950")
plt.xlabel("Year")
plt.ylabel("Price")
plt.xticks(rotation=90)
plt.grid();

In [None]:
_, ax = plt.subplots(figsize=(22,8))
sns.boxplot(x = df.index.month_name(),y = df.values[:,0],ax=ax)
plt.title("Gold price (Monthly) since 1950")
plt.xlabel("Months")
plt.ylabel("Price")
plt.grid();

In [None]:
from statsmodels.graphics.tsaplots import month_plot

fig, ax = plt.subplots(figsize=(22,8))

month_plot(df,ylabel='Gold price',ax=ax)
plt.title("Gold price (Monthly) since 1950")
plt.xlabel("Months")
plt.ylabel("Price")
plt.grid();

In [None]:
# Average gold price per year trend since 1950
df_yearly_sum = df.resample('A').mean()
df_yearly_sum.plot();
plt.title("Average Gold price (Yearly) since 1950")
plt.xlabel("Year")
plt.ylabel("Price")
plt.grid()

In [None]:
# Average gold price per quarter trend since 1950
df_quarterly_sum = df.resample('Q').mean()
df_quarterly_sum.plot();
plt.title("Average Gold price (Quarterly) since 1950")
plt.xlabel("Quarter")
plt.ylabel("Price")
plt.grid()

In [None]:
# Average gold price per decade trend since 1950
df_decade_sum = df.resample('10Y').mean()
df_decade_sum.plot();
plt.title("Average Gold price (Decade) since 1950")
plt.xlabel("Decade")
plt.ylabel("Price")
plt.grid()

### Analysis in Coefficient of variation

1. The coefficient of variation (CV) is a statistical measure of the relative dispersion of data points in a data series around the mean.
2. In finance, the coefficient of variation allows investors to determine how much volatility, or risk, is assumed in comparison to the amount of return expected from investments.
3. The lower the ratio of the standard deviation to mean return, the better risk-return trade-off.

Let us look at the CV values for each year in Gold price

In [None]:
# Coefficient of variation in price
df_1 = df.groupby(df.index.year).mean().rename(columns={'Price':'Mean'})
df_1 = df_1.merge(df.groupby(df.index.year).std().rename(columns={'Price':'Std'}),left_index=True,right_index=True)
df_1['CoV_pct'] = ((df_1['Std']/df_1['Mean'])*100).round(2)
df_1.head()

In [None]:
# Average gold price per year trend since 1950
fig, ax = plt.subplots(figsize=(15,10))
df_1['CoV_pct'].plot();
plt.title("Average Gold price (Yearly) since 1950")
plt.xlabel("Year")
plt.ylabel("Coefficient of Variation in %")
plt.grid()

**Inference**

1. The CV value reached its highest in year 1978 near to 25%, which could have made the asset as highly risky
2. But in 2020, the CV value is closer to 5%, which makes the asset viable for good investment

### Time Series - Forecasting models <a class="anchor" id="section3"></a>

### Train - Test split to build Time series forecasting models <a class="anchor" id="section3.1"></a>

In [None]:
train    =   df[df.index.year <= 2015] 
test     =   df[df.index.year > 2015]

In [None]:
print(train.shape)
print(test.shape)

In [None]:
train['Price'].plot(figsize=(13,5), fontsize=14)
test['Price'].plot(figsize=(13,5), fontsize=14)
plt.grid()
plt.legend(['Training Data','Test Data'])
plt.show()

### Model 1 - Linear Regression <a class="anchor" id="section3.2"></a>

In [None]:
train_time = [i+1 for i in range(len(train))]
test_time = [i+len(train)+1 for i in range(len(test))]
len(train_time), len(test_time)

In [None]:
LR_train = train.copy()
LR_test = test.copy()

In [None]:
LR_train['time'] = train_time
LR_test['time'] = test_time

In [None]:
lr = LinearRegression()
lr.fit(LR_train[['time']],LR_train['Price'].values)

In [None]:
test_predictions_model1         = lr.predict(LR_test[['time']])
LR_test['forecast'] = test_predictions_model1

plt.figure(figsize=(13,6))
plt.plot( train['Price'], label='Train')
plt.plot(test['Price'], label='Test')
plt.plot(LR_test['forecast'], label='Regression On Time_Test Data')
plt.legend(loc='best')
plt.grid();

In [None]:
def mape(actual,pred):
    return round((np.mean(abs(actual-pred)/actual))*100,2)

In [None]:
# Get MAPE of the model

mape_model1_test = mape(test['Price'].values,test_predictions_model1)
print("For RegressionOnTime forecast on the Test Data,  MAPE is %3.3f" %(mape_model1_test),"%")

In [None]:
results = pd.DataFrame({'Test MAPE (%)': [mape_model1_test]},index=['RegressionOnTime'])
results

### Model 2 - Naive prediction <a class="anchor" id="section3.3"></a>

In [None]:
Naive_train = train.copy()
Naive_test = test.copy()

In [None]:
Naive_test['naive'] = np.asarray(train['Price'])[len(np.asarray(train['Price']))-1]
Naive_test['naive'].head()

In [None]:
plt.figure(figsize=(12,8))
plt.plot(Naive_train['Price'], label='Train')
plt.plot(test['Price'], label='Test')
plt.plot(Naive_test['naive'], label='Naive Forecast on Test Data')
plt.legend(loc='best')
plt.title("Naive Forecast")
plt.grid();

In [None]:
# Get MAPE of the model

mape_model2_test = mape(test['Price'].values,Naive_test['naive'].values)
print("For Naive forecast on the Test Data,  MAPE is %3.3f" %(mape_model2_test),"%")

In [None]:
resultsDf_2 = pd.DataFrame({'Test MAPE (%)': [mape_model2_test]},index=['NaiveModel'])

results = pd.concat([results, resultsDf_2])
results

### Model 3 - Simple Average <a class="anchor" id="section3.4"></a>

In [None]:
SimpleAvg_train = train.copy()
SimpleAvg_test = test.copy()
SimpleAvg_test['mean_forecast'] = train['Price'].mean()
SimpleAvg_test.head()

In [None]:
plt.figure(figsize=(12,8))
plt.plot(SimpleAvg_train['Price'], label='Train')
plt.plot(SimpleAvg_test['Price'], label='Test')
plt.plot(SimpleAvg_test['mean_forecast'], label='Simple Average on Test Data')
plt.legend(loc='best')
plt.title("Simple Average Forecast")
plt.grid();

In [None]:
## Test Data - MAPE

mape_model3_test = mape(test['Price'].values,SimpleAvg_test['mean_forecast'].values)
print("For Simple Average forecast on the Test Data,  MAPE is %3.3f" %(mape_model3_test),"%")

In [None]:
resultsDf_3 = pd.DataFrame({'Test MAPE (%)': [mape_model3_test]},index=['SimpleAverageModel'])

results = pd.concat([results, resultsDf_3])
results

### Model 4 - Moving Average <a class="anchor" id="section3.5"></a>

In [None]:
Mvg_Avg = df.copy()
Mvg_Avg['Trailing_2'] = Mvg_Avg['Price'].rolling(2).mean()
Mvg_Avg['Trailing_3'] = Mvg_Avg['Price'].rolling(3).mean()
Mvg_Avg['Trailing_5'] = Mvg_Avg['Price'].rolling(5).mean()
Mvg_Avg['Trailing_7'] = Mvg_Avg['Price'].rolling(7).mean()
Mvg_Avg.head()

In [None]:
## Plotting on the whole data

plt.figure(figsize=(16,8))
plt.plot(Mvg_Avg['Price'], label='Train')
plt.plot(Mvg_Avg['Trailing_2'],label='2 Point Moving Average')
plt.plot(Mvg_Avg['Trailing_3'],label='3 Point Moving Average')
plt.plot(Mvg_Avg['Trailing_5'],label = '5 Point Moving Average')
plt.plot(Mvg_Avg['Trailing_7'],label = '7 Point Moving Average')

plt.legend(loc = 'best')
plt.grid();

In [None]:
#Creating train and test set 
trailing_Mvg_Avg_train=Mvg_Avg[Mvg_Avg.index.year <= 2015] 
trailing_Mvg_Avg_test=Mvg_Avg[Mvg_Avg.index.year > 2015]

In [None]:
## Plotting on both the Training and Test data

plt.figure(figsize=(16,8))
plt.plot(trailing_Mvg_Avg_train['Price'], label='Train')
plt.plot(trailing_Mvg_Avg_test['Price'], label='Test')

plt.plot(trailing_Mvg_Avg_train['Trailing_2'],label='2 Point Trailing Moving Average on Training Set')
plt.plot(trailing_Mvg_Avg_train['Trailing_3'],label='3 Point Trailing Moving Average on Training Set')
plt.plot(trailing_Mvg_Avg_train['Trailing_5'],label = '5 Point Trailing Moving Average on Training Set')
plt.plot(trailing_Mvg_Avg_train['Trailing_7'],label = '7 Point Trailing Moving Average on Training Set')

plt.plot(trailing_Mvg_Avg_test['Trailing_2'], label='2 Point Trailing Moving Average on Test Set')
plt.plot(trailing_Mvg_Avg_test['Trailing_3'], label='3 Point Trailing Moving Average on Test Set')
plt.plot(trailing_Mvg_Avg_test['Trailing_5'],label = '5 Point Trailing Moving Average on Test Set')
plt.plot(trailing_Mvg_Avg_test['Trailing_7'],label = '7 Point Trailing Moving Average on Test Set')
plt.legend(loc = 'best')
plt.grid();

In [None]:
## Test Data - MAPE --> 2 point Trailing MA

mape_model4_test_2 = mape(test['Price'].values,trailing_Mvg_Avg_test['Trailing_2'].values)
print("For 2 point Moving Average Model forecast on the Training Data,  MAPE is %3.3f" %(mape_model4_test_2),"%")

## Test Data - MAPE  --> 3 point Trailing MA

mape_model4_test_3 = mape(test['Price'].values,trailing_Mvg_Avg_test['Trailing_3'].values)
print("For 3 point Moving Average Model forecast on the Training Data,  MAPE is %3.3f" %(mape_model4_test_3),"%")

## Test Data - MAPE --> 5 point Trailing MA

mape_model4_test_5 = mape(test['Price'].values,trailing_Mvg_Avg_test['Trailing_5'].values)
print("For 5 point Moving Average Model forecast on the Training Data,  MAPE is %3.3f" %(mape_model4_test_5),"%")

## Test Data - MAPE  --> 7 point Trailing MA

mape_model4_test_7 = mape(test['Price'].values,trailing_Mvg_Avg_test['Trailing_7'].values)
print("For 7 point Moving Average Model forecast on the Training Data,  MAPE is %3.3f " %(mape_model4_test_7),"%")

In [None]:
resultsDf_4 = pd.DataFrame({'Test MAPE (%)': [mape_model4_test_2,mape_model4_test_3
                                          ,mape_model4_test_5,mape_model4_test_7]}
                           ,index=['2pointTrailingMovingAverage','3pointTrailingMovingAverage'
                                   ,'5pointTrailingMovingAverage','7pointTrailingMovingAverage'])

results = pd.concat([results, resultsDf_4])
results

### Model 5 - Simple Exponential Smoothing <a class="anchor" id="section3.6"></a>

In [None]:
SES_train = train.copy()
SES_test = test.copy()

In [None]:
model_SES = SimpleExpSmoothing(SES_train['Price'])
model_SES_autofit = model_SES.fit(optimized=True)

In [None]:
model_SES_autofit.params

In [None]:
SES_test['predict'] = model_SES_autofit.forecast(steps=len(test))
SES_test.head()

In [None]:
## Plotting on both the Training and Test data

plt.figure(figsize=(16,8))
plt.plot(SES_train['Price'], label='Train')
plt.plot(SES_test['Price'], label='Test')

plt.plot(SES_test['predict'], label='Alpha =0.995 SES predictions on Test Set')

plt.legend(loc='best')
plt.grid()
plt.title('Alpha =0.995 Predictions');

In [None]:
## Test Data

mape_model5_test_1 = mape(SES_test['Price'].values,SES_test['predict'].values)
print("For Alpha =0.995 SES Model forecast on the Test Data, MAPE is %3.3f" %(mape_model5_test_1),"%")

In [None]:
resultsDf_5 = pd.DataFrame({'Test MAPE (%)': [mape_model5_test_1]},index=['Alpha=0.995,SimpleExponentialSmoothing'])

results = pd.concat([results, resultsDf_5])
results

In [None]:
resultsDf_6 = pd.DataFrame({'Alpha Values':[],'Train MAPE':[],'Test MAPE': []})

for i in np.arange(0.3,1,0.1):
    model_SES_alpha_i = model_SES.fit(smoothing_level=i,optimized=False,use_brute=True)
    SES_train['predict',i] = model_SES_alpha_i.fittedvalues
    SES_test['predict',i] = model_SES_alpha_i.forecast(steps=55)
    
    mape_model5_train_i = mape(SES_train['Price'].values,SES_train['predict',i].values)
    
    mape_model5_test_i = mape(SES_test['Price'].values,SES_test['predict',i].values)
    
    resultsDf_6 = resultsDf_6.append({'Alpha Values':i,'Train MAPE':mape_model5_train_i 
                                      ,'Test MAPE':mape_model5_test_i}, ignore_index=True)

In [None]:
resultsDf_6.sort_values(by=['Test MAPE'],ascending=True)

In [None]:
## Plotting on both the Training and Test data

plt.figure(figsize=(18,9))
plt.plot(SES_train['Price'], label='Train')
plt.plot(SES_test['Price'], label='Test')

plt.plot(SES_test['predict'], label='Alpha =1 SES predictions on Test Set')

plt.plot(SES_test['predict', 0.3], label='Alpha =0.3 SES predictions on Test Set')



plt.legend(loc='best')
plt.grid();

In [None]:
resultsDf_6_1 = pd.DataFrame({'Test MAPE (%)': [resultsDf_6.sort_values(by=['Test MAPE'],ascending=True).values[0][2]]}
                           ,index=['Alpha=0.3,SimpleExponentialSmoothing'])

results = pd.concat([results, resultsDf_6_1])
results

### Model 6 - Double Exponential smoothing <a class="anchor" id="section3.7"></a>

In [None]:
DES_train = train.copy()
DES_test = test.copy()

In [None]:
model_DES = Holt(DES_train['Price'])

In [None]:
resultsDf_7 = pd.DataFrame({'Alpha Values':[],'Beta Values':[],'Train MAPE':[],'Test MAPE': []})

for i in np.arange(0.3,1.1,0.1):
    for j in np.arange(0.3,1.1,0.1):
        model_DES_alpha_i_j = model_DES.fit(smoothing_level=i,smoothing_trend=j,optimized=False,use_brute=True)
        DES_train['predict',i,j] = model_DES_alpha_i_j.fittedvalues
        DES_test['predict',i,j] = model_DES_alpha_i_j.forecast(steps=55)
        
        mape_model6_train = mape(DES_train['Price'].values,DES_train['predict',i,j].values)
        
        mape_model6_test = mape(DES_test['Price'].values,DES_test['predict',i,j].values)
        
        resultsDf_7 = resultsDf_7.append({'Alpha Values':i,'Beta Values':j,'Train MAPE':mape_model6_train
                                          ,'Test MAPE':mape_model6_test}, ignore_index=True)

In [None]:
resultsDf_7.sort_values(by=['Test MAPE']).head()

In [None]:
## Plotting on both the Training and Test data

plt.figure(figsize=(18,9))
plt.plot(DES_train['Price'], label='Train')
plt.plot(DES_test['Price'], label='Test')

plt.plot(DES_test['predict', 0.3, 0.3], label='Alpha=0.3,Beta=0.3,DoubleExponentialSmoothing predictions on Test Set')


plt.legend(loc='best')
plt.grid();

In [None]:
resultsDf_7_1 = pd.DataFrame({'Test MAPE (%)': [resultsDf_7.sort_values(by=['Test MAPE']).values[0][3]]}
                           ,index=['Alpha=0.3,Beta=0.3,DoubleExponentialSmoothing'])

results = pd.concat([results, resultsDf_7_1])
results

### Model 7 - Triple Exponential Smoothing <a class="anchor" id="section3.8"></a>

In [None]:
TES_train = train.copy()
TES_test = test.copy()

In [None]:
model_TES = ExponentialSmoothing(TES_train['Price'],trend='additive',seasonal='additive',freq='M')
model_TES_autofit = model_TES.fit()

In [None]:
model_TES_autofit.params

In [None]:
## Prediction on the test data

TES_test['auto_predict'] = model_TES_autofit.forecast(steps=len(test))
TES_test.head()

In [None]:
## Plotting on both the Training and Test using autofit

plt.figure(figsize=(18,9))
plt.plot(TES_train['Price'], label='Train')
plt.plot(TES_test['Price'], label='Test')

plt.plot(TES_test['auto_predict'], label='Alpha=0.99,Beta=0.05,Gamma=0.001,TripleExponentialSmoothing predictions on Test Set')


plt.legend(loc='best')
plt.grid();

In [None]:
## Test Data

mape_model6_test_1 = mape(TES_test['Price'].values,TES_test['auto_predict'].values)
print("For A=0.99,B=0.05,G=0.001, Triple ES Model forecast on the Test Data,  MAPE is %3.3f" %(mape_model6_test_1),"%")

In [None]:
resultsDf_8_1 = pd.DataFrame({'Test MAPE (%)': [mape_model6_test_1]}
                           ,index=['Alpha=0.99,Beta=0.05,Gamma=0.001,TripleExponentialSmoothing'])

results = pd.concat([results, resultsDf_8_1])
results

In [None]:
## First we will define an empty dataframe to store our values from the loop

resultsDf_8_2 = pd.DataFrame({'Alpha Values':[],'Beta Values':[],'Gamma Values':[],'Train MAPE':[],'Test MAPE': []})
resultsDf_8_2

In [None]:
for i in np.arange(0.3,1.1,0.1):
    for j in np.arange(0.3,1.1,0.1):
        for k in np.arange(0.3,1.1,0.1):
            model_TES_alpha_i_j_k = model_TES.fit(smoothing_level=i,smoothing_trend=j,smoothing_seasonal=k,optimized=False,use_brute=True)
            TES_train['predict',i,j,k] = model_TES_alpha_i_j_k.fittedvalues
            TES_test['predict',i,j,k] = model_TES_alpha_i_j_k.forecast(steps=55)
        
            mape_model8_train = mape(TES_train['Price'].values,TES_train['predict',i,j,k].values)
            
            mape_model8_test = mape(TES_test['Price'].values,TES_test['predict',i,j,k].values)
            
            resultsDf_8_2 = resultsDf_8_2.append({'Alpha Values':i,'Beta Values':j,'Gamma Values':k,
                                                  'Train MAPE':mape_model8_train,'Test MAPE':mape_model8_test}
                                                 , ignore_index=True)

In [None]:
resultsDf_8_2.sort_values(by=['Test MAPE']).head()

In [None]:
model_TES_alpha_best = model_TES.fit(smoothing_level=0.4,
                                      smoothing_trend=0.3,
                                      smoothing_seasonal=0.6,
                                      optimized=False,
                                      use_brute=True)
TES_train['predict',0.4,0.3,0.6] = model_TES_alpha_best.fittedvalues
TES_test['predict',0.4,0.3,0.6] = model_TES_alpha_best.forecast(steps=55)

In [None]:
## Plotting on both the Training and Test data using brute force alpha, beta and gamma determination

plt.figure(figsize=(18,9))
plt.plot(TES_train['Price'], label='Train')
plt.plot(TES_test['Price'], label='Test')

#The value of alpha and beta is taken like that by python
plt.plot(TES_test['predict', 0.4, 0.3, 0.6], label='Alpha=0.4,Beta=0.3,Gamma=0.6,TES predictions on Test Set')


plt.legend(loc='best')
plt.grid();

In [None]:
## Test Data

mape_model7_test = mape(TES_test['Price'].values,TES_test['predict',0.4,0.3,0.6].values)
print("For A=0.4,B=0.3,G=0.6, Triple ES Model forecast on the Test Data,  MAPE is %3.3f" %(mape_model7_test),"%")

In [None]:
resultsDf_9_1 = pd.DataFrame({'Test MAPE (%)': [mape_model7_test]}
                           ,index=['Alpha=0.4,Beta=0.3,Gamma=0.6,TripleExponentialSmoothing'])

results = pd.concat([results, resultsDf_9_1])
results

### Final Model <a class="anchor" id="section4"></a>

In [None]:
final_model =  ExponentialSmoothing(df,
                                  trend='additive',
                                  seasonal='additive').fit(smoothing_level=0.4,
                                                           smoothing_trend=0.3,
                                                           smoothing_seasonal=0.6)

In [None]:
MAPE_final_model = mape(df['Price'].values,final_model.fittedvalues)

print('MAPE:',MAPE_final_model)

In [None]:
# Getting the predictions for the same number of times stamps that are present in the test data
prediction = final_model.forecast(steps=len(test))

In [None]:
# Compute 95% confidence interval for predicted values
pred_df = pd.DataFrame({'lower_CI':prediction - 1.96*np.std(final_model.resid,ddof=1),
                        'prediction':prediction,
                        'upper_CI': prediction + 1.96*np.std(final_model.resid,ddof=1)})
pred_df.head()

In [None]:
# plot the forecast along with the confidence band

axis = df.plot(label='Actual', figsize=(15,8))
pred_df['prediction'].plot(ax=axis, label='Forecast', alpha=0.5)
axis.fill_between(pred_df.index, pred_df['lower_CI'], pred_df['upper_CI'], color='k', alpha=.15)
axis.set_xlabel('Year-Months')
axis.set_ylabel('Price')
plt.legend(loc='best')
plt.grid()
plt.show()

### END OF NOTEBOOK