# Goal of the notebook.  
My aim is to analyze the trends in doge coin market prices. And to build a deeplearning network to predict the future price of DogeCoin.  
My [Github](https://github.com/FancyWhale69/DogeCoin_Analasys_And_Deeplearning).  

Note: you need to run the cells to view the graphs.

## Terms meaning:  
Price  :the current value for investors to buy/sell.  
Open   :Opening price for the day.  
High   :Highest price during the day.  
Low    :Lowest price during the day.  
Vol.   :total number of shares traded in a security over a period(day).  
Change :Change in Price from day to day.

## Data-set  
Data set is provided by Tarandeep Singh on [Kaggle](https://www.kaggle.com/tarandeep97/dogecoin-historical-data20172021)

In [None]:
#imports
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import plotly.graph_objects as go

# Explotary Data Analysis

In [None]:
df= pd.read_csv('/kaggle/input/dogecoin-historical-data20172021/Dogecoin Historical Data.csv')

In [None]:
#basic information
df.info()

In [None]:
#EDA
df.describe()

There is little to no devation in data.

In [None]:
#graph the data
temp= df.sort_index(ascending=False)
# Create traces
fig = go.Figure()
fig.add_trace(go.Scatter(x=temp['Date'], y=temp['Price'],
                    name='Price'))
fig.add_trace(go.Scatter(x=temp['Date'], y=temp['High'],
                    name='High'))
fig.add_trace(go.Scatter(x=temp['Date'], y=temp['Low'],
                    name='Low'))

fig.update_layout(
    title="DogeCoin",
    xaxis_title="days",
    yaxis_title="Value in USD",
    
    )

fig.show()

As shwon in the EDA above the data follows the same line with little or no deviation at all.

In [None]:
#convert the object type columns to numerical type
df

In [None]:
#change the value from - to 0.0, then convert all string to float
df.loc[1396, 'Vol.']='0.0'
df['Volx1M']=df['Vol.'].apply(lambda x : float(x[:-1]) if x[-1] == 'M' else float(x[:-1])*1000 )
df['Change %']=df['Change %'].apply(lambda x : float(x.replace(',','')[:-1]))

In [None]:
#graph the data

temp= df.sort_index(ascending=False)

# Create traces
fig = go.Figure()
fig.add_trace(go.Scatter(x=temp['Date'], y=(temp['Change %'] - temp['Change %'].min()) / (temp['Change %'].max() - temp['Change %'].min()),
                    name='Change'))
fig.add_trace(go.Scatter(x=temp['Date'], y=(temp['Volx1M'] - temp['Volx1M'].min()) / (temp['Volx1M'].max() - temp['Volx1M'].min()),
                    name='Volume'))
fig.add_trace(go.Scatter(x=temp['Date'], y=(temp['Open'] - temp['Open'].min()) / (temp['Open'].max() - temp['Open'].min()),
                    name='open'))

fig.update_layout(
    title="DogeCoin",
    xaxis_title="Days",
    yaxis_title="",
    
    )

fig.show()

In [None]:
df.describe()

After converting Vol. and Change % to numircal values, we can observe that they have high devation which is also supported by graph above

In [None]:
df= df.sort_index(ascending=False).reset_index() #sort data from 2017 to 2021
df.drop(['index','Date', 'Vol.'], axis=1, inplace=True) #drop unimportant columns
df['target']= df['Price'].shift(-1) #create target column by shifting the price column by 1
#save the last point may,7,2021 then drop it
last_point= df.loc[1434]
df.drop(1434, inplace=True)

In [None]:
df.corr()['Price'].plot(kind='bar')#graph the corrleation

The graph shows that the open, low, and high attributes do not add any usefull information in predicting the price therefore dropping them will not affect the model's performance.

In [None]:
df.drop(['Open', 'Low', 'High'], axis=1, inplace=True) #drop the unimportant columns

# Splitting data  


In [None]:
#method 1
from sklearn.model_selection import train_test_split
x=df.drop('target', axis=1)
y=df['target']
x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.2, random_state=42)

In [None]:
#data normalization
from sklearn.preprocessing import MinMaxScaler
scaler= MinMaxScaler()
x_train= scaler.fit_transform(x_train)
x_test= scaler.transform(x_test)

In [None]:
#reshape data to [samples, timestep, features]
x_train=np.reshape(x_train, (x_train.shape[0], 1, x_train.shape[1]))
x_test=np.reshape(x_test, (x_test.shape[0], 1, x_test.shape[1]))

# RNN Model

In [None]:
#build the model
import tensorflow.keras as keras
model= keras.Sequential()
model.add(keras.layers.LSTM(3, input_shape=(x_train.shape[1], x_train.shape[2]), return_sequences=True))
model.add(keras.layers.LSTM(16, return_sequences=True))
model.add(keras.layers.LSTM(32))
model.add(keras.layers.Dropout(0.2))
model.add(keras.layers.Dense(1))
model.compile(loss='mean_squared_error', optimizer='adam')

In [None]:
model.fit(x_train, y_train, epochs=100, validation_data=(x_test, y_test))

# Evaluation  
Since accuracy can't be used, instead squared_mean_error will be used. And since we difined our loss function as squared_mean_error we can use the Val_loss as our metric (the lower the better)

In [None]:
fig = go.Figure()
fig.add_trace(go.Scatter( y=model.history.history['loss'],
                    name='loss'))
fig.add_trace(go.Scatter( y=model.history.history['val_loss'],
                    name='val_loss'))

fig.update_layout(
    title="Loss vs Val_Loss",
    xaxis_title="epochs",
    yaxis_title="loss_value",
    
    )

graph shows that no over fitting occured during training

# Graphing Model predictions

In [None]:
pred = model.predict(x_test)
fig = go.Figure()
fig.add_trace(go.Scatter( y=y_test,
                    name='actual_data'))
fig.add_trace(go.Scatter( y=pred.reshape(pred.shape[0]),
                    name='pred_data'))
fig.update_layout(
    title="Price of Dogecoin (Test Data)",
    xaxis_title="index",
    yaxis_title="Price in USD",
    
    )

In [None]:
test= df[round(len(df)*0.8):]
x_test=test.drop('target', axis=1)
y_test=test['target']
x_test= scaler.transform(x_test)
x_test=np.reshape(x_test, (x_test.shape[0], 1, x_test.shape[1]))
pred= model.predict(x_test)

In [None]:
fig = go.Figure()
fig.add_trace(go.Scatter( y=y_test,
                    name='actual_data'))
fig.add_trace(go.Scatter( y=pred.reshape(pred.shape[0]),
                    name='pred_data'))

fig.update_layout(
    title="Price of Dogecoin (Last 20%)",
    xaxis_title="index",
    yaxis_title="Price in USD",
    
    )


In [None]:
x=df.drop('target', axis=1)
y=df['target']

x=scaler.transform(x)
x= x.reshape(x.shape[0], 1, x.shape[1])
pred= model.predict(x)

In [None]:
fig = go.Figure()
fig.add_trace(go.Scatter( y=y,
                    name='actual_data'))
fig.add_trace(go.Scatter( y=pred.reshape(pred.shape[0]),
                    name='pred_data'))

fig.update_layout(
    title="Price of Dogecoin (Full Data)",
    xaxis_title="index",
    yaxis_title="Price in USD",
    
    )

Note: model could not predict the abnormal increase in price at index 955

# Conclusion  
Given the sudden changes in the data especially in the last 20% the model performed quite well.  
In the future i could use transformers instead of LSTM and see how it affects performance.