## PROOF OF CONCEPT - CRYPTOCURRENCY RATING - MULTICLASSIFICATION PROBLEM.

The aim is to rate the cryptocurrencies in the market today based on the historical training data and their current indicator values. For this we need to have the following.

1. Historical Data - The data which is labelled as A1, A2, A3, B1, B2, B3, C1, C2, C3, and D ; by the domain experts.
2. Multiclassification model : Here, we are using a simple neural network for performing the classification (into one of the 10 categories) of the new data.


# Importing the necessary libraries

In [1]:
import pandas as pd
from matplotlib import pyplot as plt
import numpy as np
%matplotlib inline

# Uploading the datasets into the notebook.

Here for creating the proof of concept I had took the data of three cryptocurrencies (Bitcoin, Ether and Litecoin). The data have the following values in each rows.

1. Date
2. Open
3. High
4. Low
5. Close
6. Adj Close
7. Volume
8. Name of the coins

In [2]:
#Upload files
from google.colab import files
uploaded=files.upload()

Saving BTC-USD.csv to BTC-USD.csv
Saving ETH-USD.csv to ETH-USD.csv
Saving LTC-USD.csv to LTC-USD.csv


In [4]:
#Store the data into dataframe variable

df_btc=pd.read_csv('BTC-USD.csv')
df_eth=pd.read_csv('ETH-USD.csv')
df_ltc=pd.read_csv('LTC-USD.csv')

In [6]:
df_btc['Name']='BTC'
df_eth['Name']='ETH'
df_ltc['Name']='LTC'

In [7]:
df_btc.head()

Unnamed: 0,Date,Open,High,Low,Close,Adj Close,Volume,Name
0,2020-02-27,8825.09375,8932.892578,8577.199219,8784.494141,8784.494141,45470200000.0,BTC
1,2020-02-28,8788.728516,8890.456055,8492.932617,8672.455078,8672.455078,44605450000.0,BTC
2,2020-02-29,8671.212891,8775.631836,8599.508789,8599.508789,8599.508789,35792390000.0,BTC
3,2020-03-01,8599.758789,8726.796875,8471.212891,8562.454102,8562.454102,35349160000.0,BTC
4,2020-03-02,8563.264648,8921.308594,8532.630859,8869.669922,8869.669922,42857670000.0,BTC


# Concatenating the three different datasets into a single dataset.

In [15]:
final_df = pd.concat([df_btc, df_eth, df_ltc],ignore_index=True)

In [16]:
final_df.tail()

Unnamed: 0,Date,Open,High,Low,Close,Adj Close,Volume,Name
1096,2021-02-23,208.123413,208.369675,159.684677,176.934921,176.934921,10944700000.0,LTC
1097,2021-02-24,176.966766,189.165466,169.789001,181.378494,181.378494,7205360000.0,LTC
1098,2021-02-25,181.385422,204.743835,176.03215,178.90184,178.90184,7327999000.0,LTC
1099,2021-02-26,179.128525,182.075851,163.248245,170.398148,170.398148,7875998000.0,LTC
1100,2021-02-27,176.9888,179.882904,169.932831,172.907578,172.907578,5623843000.0,LTC


In [73]:
final_df.dropna(inplace=True)

# Creating dummy ratings to rate the coins as A1, A2, A3, B1, B2, B3, C1, C2, C3, and D.



In [74]:

rating_list = ['A1','A2','A3','B1','B2','B3','C1','C2','C3','D']


final_df["Rating"] = np.random.choice(rating_list, size=len(final_df))
print (final_df.tail())

            Date        Open        High  ...        Volume  Name  Rating
1096  2021-02-23  208.123413  208.369675  ...  1.094470e+10   LTC      B2
1097  2021-02-24  176.966766  189.165466  ...  7.205360e+09   LTC      B1
1098  2021-02-25  181.385422  204.743835  ...  7.327999e+09   LTC      B1
1099  2021-02-26  179.128525  182.075851  ...  7.875998e+09   LTC      B1
1100  2021-02-27  176.988800  179.882904  ...  5.623843e+09   LTC      B2

[5 rows x 9 columns]


In [75]:
final_df.columns

Index(['Date', 'Open', 'High', 'Low', 'Close', 'Adj Close', 'Volume', 'Name',
       'Rating'],
      dtype='object')

#Following will split the data into train and test after randomizing it.

In [76]:

from sklearn.model_selection import train_test_split
train, test = train_test_split(final_df, train_size = 0.8)

In [77]:
train.head()

Unnamed: 0,Date,Open,High,Low,Close,Adj Close,Volume,Name,Rating
448,2020-05-18,207.179779,215.908463,207.10907,214.525055,214.525055,17411570000.0,ETH,C3
979,2020-10-29,55.742558,56.30431,53.460052,54.765854,54.765854,3010404000.0,LTC,C2
163,2020-08-08,11604.553711,11800.064453,11558.431641,11754.045898,11754.045898,17572060000.0,BTC,D
900,2020-08-11,58.288845,59.439777,53.231251,54.386845,54.386845,2754950000.0,LTC,B2
205,2020-09-19,10933.75293,11134.092773,10909.618164,11094.34668,11094.34668,22764200000.0,BTC,C1


In [78]:
test.head()

Unnamed: 0,Date,Open,High,Low,Close,Adj Close,Volume,Name,Rating
932,2020-09-12,49.057728,51.221138,48.672417,50.817154,50.817154,1588978000.0,LTC,B2
997,2020-11-16,62.279919,71.997322,62.171036,71.326248,71.326248,5089779000.0,LTC,A3
761,2020-03-25,40.479458,40.696136,38.545746,39.151222,39.151222,3133348000.0,LTC,C1
412,2020-04-12,158.232391,164.516953,156.320511,161.142426,161.142426,15123720000.0,ETH,A3
547,2020-08-25,408.071686,408.527924,374.355377,384.001038,384.001038,12428440000.0,ETH,C3


In [79]:
X_train=train[['Open','High','Low','Close','Adj Close','Volume']]
X_test=test[['Open','High','Low','Close','Adj Close','Volume']]
Y_train=train['Rating']
Y_test=test['Rating']

In [81]:
X_train

Unnamed: 0,Open,High,Low,Close,Adj Close,Volume
448,207.179779,215.908463,207.109070,214.525055,214.525055,1.741157e+10
979,55.742558,56.304310,53.460052,54.765854,54.765854,3.010404e+09
163,11604.553711,11800.064453,11558.431641,11754.045898,11754.045898,1.757206e+10
900,58.288845,59.439777,53.231251,54.386845,54.386845,2.754950e+09
205,10933.752930,11134.092773,10909.618164,11094.346680,11094.346680,2.276420e+10
...,...,...,...,...,...,...
341,33533.199219,35896.882813,33489.218750,35510.289063,35510.289063,6.308859e+10
1053,171.094833,171.094833,114.956459,139.252228,139.252228,1.799426e+10
557,384.671631,402.411743,371.636688,388.241150,388.241150,1.674711e+10
771,40.344578,40.833725,39.964409,40.675556,40.675556,3.229458e+09



#Scale the train and test data with Min Max Scaler, so that the data have values between 0 and 100

In [80]:
from sklearn import preprocessing
min_max_scaler=preprocessing.MinMaxScaler(feature_range=(0,100))
scaled_x_train=min_max_scaler.fit_transform(X_train)
scaled_x_test=min_max_scaler.transform(X_test)

In [82]:
scaled_x_train

array([[3.14387814e-01, 3.11801956e-01, 3.26227553e-01, 3.19244163e-01,
        3.19244163e-01, 4.69987362e+00],
       [4.41454699e-02, 3.80216413e-02, 5.01139843e-02, 4.14456259e-02,
        4.14456259e-02, 5.85328922e-01],
       [2.06531990e+01, 2.01829257e+01, 2.07250185e+01, 2.03848303e+01,
        2.03848303e+01, 4.74572735e+00],
       ...,
       [6.31125099e-01, 6.31724250e-01, 6.21890412e-01, 6.21311755e-01,
        6.21311755e-01, 4.51003109e+00],
       [1.66675078e-02, 1.14838505e-02, 2.58617619e-02, 1.69446009e-02,
        1.69446009e-02, 6.47914457e-01],
       [3.16514885e+00, 3.08713840e+00, 2.98009314e+00, 3.04101913e+00,
        3.04101913e+00, 1.08552053e+01]])

# Encode the labels (A1, A2, A3...) which is in string, into numbers ; so that the algorithm can understand

In [85]:

from keras.utils import np_utils
from sklearn.preprocessing import LabelEncoder

In [83]:
# encode class values as integers
encoder = LabelEncoder()
encoder.fit(Y_train)
encoded_Y = encoder.transform(Y_train)
# convert integers to dummy variables (i.e. one hot encoded)
dummy_y = np_utils.to_categorical(encoded_Y)

In [84]:
dummy_y

array([[0., 0., 0., ..., 0., 1., 0.],
       [0., 0., 0., ..., 1., 0., 0.],
       [0., 0., 0., ..., 0., 0., 1.],
       ...,
       [0., 0., 0., ..., 0., 0., 0.],
       [0., 0., 0., ..., 0., 0., 0.],
       [0., 0., 0., ..., 0., 1., 0.]], dtype=float32)

# Create the Neural Network Model and fit it on the preprocessed training data

In [89]:
import tensorflow as tf
from tensorflow import keras


model = keras.Sequential([
    keras.layers.Dense(20, input_shape=(6,), activation='relu'),
    keras.layers.Dropout(0.3),
    keras.layers.Dense(15, activation='relu'),
    keras.layers.Dense(10, activation='softmax')
])


model.compile(optimizer='adam',
              loss='categorical_crossentropy',
              metrics=['accuracy'])

model.fit(scaled_x_train, dummy_y, epochs=100,batch_size=5)

Epoch 1/100
Epoch 2/100
Epoch 3/100
Epoch 4/100
Epoch 5/100
Epoch 6/100
Epoch 7/100
Epoch 8/100
Epoch 9/100
Epoch 10/100
Epoch 11/100
Epoch 12/100
Epoch 13/100
Epoch 14/100
Epoch 15/100
Epoch 16/100
Epoch 17/100
Epoch 18/100
Epoch 19/100
Epoch 20/100
Epoch 21/100
Epoch 22/100
Epoch 23/100
Epoch 24/100
Epoch 25/100
Epoch 26/100
Epoch 27/100
Epoch 28/100
Epoch 29/100
Epoch 30/100
Epoch 31/100
Epoch 32/100
Epoch 33/100
Epoch 34/100
Epoch 35/100
Epoch 36/100
Epoch 37/100
Epoch 38/100
Epoch 39/100
Epoch 40/100
Epoch 41/100
Epoch 42/100
Epoch 43/100
Epoch 44/100
Epoch 45/100
Epoch 46/100
Epoch 47/100
Epoch 48/100
Epoch 49/100
Epoch 50/100
Epoch 51/100
Epoch 52/100
Epoch 53/100
Epoch 54/100
Epoch 55/100
Epoch 56/100
Epoch 57/100
Epoch 58/100
Epoch 59/100
Epoch 60/100
Epoch 61/100
Epoch 62/100
Epoch 63/100
Epoch 64/100
Epoch 65/100
Epoch 66/100
Epoch 67/100
Epoch 68/100
Epoch 69/100
Epoch 70/100
Epoch 71/100
Epoch 72/100
Epoch 73/100
Epoch 74/100
Epoch 75/100
Epoch 76/100
Epoch 77/100
Epoch 78

<tensorflow.python.keras.callbacks.History at 0x7f96d5bfb0d0>

# Do the testing on the test data to understand the model performance.

In [102]:
yp = model.predict(X_test)

Lets investigate the prediction for the row 1. From the following we can see that it is predicted as [0. 0. 0. 1. 0. 0. 0. 0. 0. 0.] (ie. B1). But in actual it is B2.

In [104]:
print(yp[0])

[0. 0. 0. 1. 0. 0. 0. 0. 0. 0.]


Seeing what is the actual rating in the data

In [108]:
Y_test.iloc[0]

'B2'

As we had used dummy data, which is not accurate; the resultant accuracy is really poor. When we have the right data labelled by domain experts, we can train our model more efficiently.

Here we had developed the model using only limited features like Open, Close, Volume, High ,Low,Adj Close. We can use other indicators as following:

* Exchange Trading Information : Returns, Capitalization, Relative Price Change, Parkinson's Volatility.
* Blockchain Information : Median Value, No. of Transactions, New Coins, Total Fees, Median Fees, Active Addresses, Average difficulty, No. of blocks, Block size, No. of payments 