**Importing dependencies and libraries**

Aim: To predict gold prices using the various features from the given dataset

In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestRegressor
from sklearn import metrics

**EDA**

Gold Price Dataset: https://www.kaggle.com/altruistdelhite04/gold-price-data

In [1]:
gold_df = pd.read_csv(r'../input/gold-price-data/gld_price_data.csv')

In [1]:
gold_df.head() # print first 5 rows in the dataframe

In [1]:
gold_df.tail()

In [1]:
gold_df.shape # number of rows and columns

In [1]:
gold_df.info() # getting some basic informations about the data

In [1]:
gold_df.isnull().sum() # checking the number of missing values

In [1]:
gold_df.describe() # getting the statistical measures of the data

**Correlation:**
1. Positive Correlation
2. Negative Correlation

In [1]:
correlation = gold_df.corr()

In [1]:
# constructing a heatmap to understand the correlatiom
plt.figure(figsize = (7,7))
sns.heatmap(correlation, cbar=True, square=True, fmt='.1f',annot=True, annot_kws={'size':8}, cmap='GnBu')

In [1]:
print(correlation['GLD']) # correlation values of GLD

In [1]:
sns.distplot(gold_df['GLD'],color='red') # checking the distribution of the GLD Price

**Splitting Target & Features**

In [1]:
X = gold_df.drop(['Date','GLD'],axis=1)
Y = gold_df['GLD']

In [1]:
print(X)

In [1]:
print(Y)

**Splitting into Training data and Test Data**

In [1]:
X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size = 0.3, random_state=101)

**Model Training**

Random Forest Regressor

In [1]:
regressor = RandomForestRegressor(n_estimators=100)

In [1]:
# training the model
regressor.fit(X_train,Y_train)

**Model Evaluation**

In [1]:
# prediction on Test Data
test_data_prediction = regressor.predict(X_test)

In [1]:

# R squared error
score_1 = metrics.r2_score(Y_test, test_data_prediction)

# Mean Absolute Error
score_2 = metrics.mean_absolute_error(Y_test, test_data_prediction)

# Mean Squared Error
score_3 = metrics.mean_squared_error(Y_test, test_data_prediction)

# Root Mean squared Error
score_4 = np.sqrt(score_3)

print("R squared error : ", score_1)
print('Mean Absolute Error : ', score_2)
print("Mean squared error : ", score_3)
print('Root Mean squared error : ', score_4)

**Compare the Actual Values and Predicted Values in a Plot**

In [1]:
Y_test = list(Y_test)

In [1]:
plt.plot(Y_test, color='black', label = 'Actual Value')
plt.plot(test_data_prediction, color='orange', label='Predicted Value')
plt.title('Actual Price vs Predicted Price')
plt.xlabel('Number of values')
plt.ylabel('GLD Price')
plt.legend()
plt.show()

Code written by Dhrumil Vadodaria!!

Connect with me on LinkedIn for future discussions on ML: https://www.linkedin.com/in/dhrumil-v-814932128/

Please provide citation if you use the code!