#### Days and times are rich data sources that can be used for models of machine learning. Such datetime variables, however, require some feature engineering to transform them into numerical information. 

#### In this I will show how to build date features for your machine learning models with built-in pandas functions.

In [1]:
import pandas as pd
import numpy as np

In [2]:
df = pd.read_csv('gold_rate_history.csv')

In [3]:
df.head(5)

Unnamed: 0,Date,Country,State,Location,Pure Gold (24 k),Standard Gold (22 K)
0,1/2/2006,India,Tamilnadu,Chennai,768.0,711.0
1,1/3/2006,India,Tamilnadu,Chennai,770.5,713.0
2,1/4/2006,India,Tamilnadu,Chennai,784.5,726.0
3,1/5/2006,India,Tamilnadu,Chennai,782.5,725.0
4,1/6/2006,India,Tamilnadu,Chennai,776.0,719.0


#### Let us now see the type of the Date column

In [4]:
df.Date.head(1)

0    1/2/2006
Name: Date, dtype: object

* It can be seen that the dtype : Object. We will see how to convert that into a format which can be fed into any machine learning model

#### We will use pandas inbuilt function to convert it into a meaningful format

In [5]:
df['Date'] = pd.to_datetime(df['Date'],format='%m/%d/%Y')

#### We will now check the type of data in Date column

In [6]:
df.Date.head(2)

0   2006-01-02
1   2006-01-03
Name: Date, dtype: datetime64[ns]

* It can be seen that the dtype in now changed to datetime64[ns] from object.

#### Now we will split these Month,Day and Year information to seperate columns

In [7]:
df['Month'] = df['Date'].dt.month
df['Day'] = df['Date'].dt.day
df['Year'] = df['Date'].dt.year

In [8]:
df.head()

Unnamed: 0,Date,Country,State,Location,Pure Gold (24 k),Standard Gold (22 K),Month,Day,Year
0,2006-01-02,India,Tamilnadu,Chennai,768.0,711.0,1,2,2006
1,2006-01-03,India,Tamilnadu,Chennai,770.5,713.0,1,3,2006
2,2006-01-04,India,Tamilnadu,Chennai,784.5,726.0,1,4,2006
3,2006-01-05,India,Tamilnadu,Chennai,782.5,725.0,1,5,2006
4,2006-01-06,India,Tamilnadu,Chennai,776.0,719.0,1,6,2006


### Checking the type of the new columns

In [9]:
df.Day.head()

0    2
1    3
2    4
3    5
4    6
Name: Day, dtype: int64

**It can be seen that the type of the new column generated now is changed to 'int64'. Now we cann feed these data into any machine learning model and do classsification or prediction.** 

### We will feed these data into Linear Regression model to predict the Standard Gold Rate in future

In [10]:
X = df[['Pure Gold (24 k)','Month','Day','Year']]
y = df['Standard Gold (22 K)']

In [11]:
from sklearn.model_selection import train_test_split

In [12]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.4, random_state=42)

In [13]:
from sklearn.linear_model import LinearRegression

In [14]:
lm = LinearRegression()

In [15]:
lm.fit(X_train,y_train)

LinearRegression()

In [16]:
pd.DataFrame(lm.coef_,X.columns,columns=['Coeff'])

Unnamed: 0,Coeff
Pure Gold (24 k),0.933255
Month,0.477218
Day,0.025302
Year,6.584761


In [17]:
prediction = lm.predict(X_test)

### Calculating the Metrics of our model 

In [18]:
from sklearn import metrics

In [22]:
print("Mean Absolute Error :"+ str(metrics.mean_absolute_error(y_test,prediction)))
print('\n')
print("Mean Squared Error :"+ str(metrics.mean_squared_error(y_test,prediction)))
print('\n')
print("Mean Root Square Error :"+ str(np.sqrt(metrics.mean_squared_error(y_test,prediction))))


Mean Absolute Error :15.749930052705942


Mean Squared Error :378.8976612447136


Mean Root Square Error :19.46529376209652
