### Project Kayote

The goal is to be able to predict to a good accuracy the worth of bitcoin in the next couple of months.

1. Extract data from source __[here](https://api.cryptowat.ch/market)__ 
2. Clean up data a bit.
3. Visualise the trends to have an idea.
4. Scale and normalise the data for better prediction.
5. Use lasso regression to penalise and select the needed attributes, and regularise the data.
6. Carry out the prediction

In [13]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.style as style
import seaborn as sns
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression, Ridge, Lasso
from sklearn.preprocessing import MinMaxScaler


In [14]:
style.use('ggplot')
%matplotlib inline

### Getting the data. 

The data we are using is from a website called api.cryptowatch they provide daily update of the high and low prices, open and close prices and volume of coins traded worldwide.
Exchanges that can be used
* Luno
* Bitfinex
* Binance
* Bitflyer
* Bitstamp
The list goes on, you can check them [here](https://api.cryptowat.ch/exchanges)

In [15]:
import requests

def get_data(coin, exchange= 'bitfinex', after='2020-05-01'):
    url = 'https://api.cryptowat.ch/markets/{}/{}usd/ohlc'.format(exchange, coin)
    resp = requests.get(url, params={'periods': ['3600',,
                                   'after': str(int(pd.Timestamp(after).timestamp()))
                                   })
#     resp.raise_for_status()
    data = resp.json()
    print(data)
    df = pd.DataFrame(data['result']['3600'], columns= ['CloseTime', 'OpenPrice',
                                                       'HighPrice', 'LowPrice', 
                                                        'ClosePrice', 'Volume',
                                                       'NA'])
    df['CloseTime'] = pd.to_datetime(df['CloseTime'], unit = 's')
    df.set_index('CloseTime', inplace= True)
    return (df)
    

In [32]:
btc = get_data('btc', 'bitstamp', '2020-02-22')

{'result': {'3600': [[1582329600, 9670.84, 9727.88, 9656.9, 9696.13, 205.11493369, 1990094.456717529], [1582333200, 9696.13, 9722.39, 9673.61, 9694.32, 107.7452436, 1045307.4104110292], [1582336800, 9696.01, 9702.59, 9662.71, 9665.12, 79.59305534, 770299.0879303502], [1582340400, 9666.59, 9676.79, 9645.75, 9656.95, 63.27258411, 611374.3805829801], [1582344000, 9662.17, 9695.69, 9653.99, 9674, 31.97239067, 309636.5242461154], [1582347600, 9674, 9684.89, 9622.2, 9631.32, 76.04820331, 734154.1645717114], [1582351200, 9636.63, 9659.79, 9600, 9621.71, 97.21008356, 935813.1215276737], [1582354800, 9620.93, 9635.14, 9568.51, 9621.18, 210.80243197, 2022519.9381313669], [1582358400, 9613.79, 9633.69, 9594.24, 9609.74, 32.87813844, 316037.6755460505], [1582362000, 9603.04, 9677.32, 9600.58, 9649.94, 21.703729, 209154.3880958681], [1582365600, 9646.91, 9650.97, 9600, 9631.41, 67.00426694, 644781.4033787744], [1582369200, 9623.87, 9659.03, 9623.87, 9642.21, 61.16883218, 589867.0382978432], [158237

In [35]:
import datetime

datetime.datetime.fromtimestamp(1596786600)

datetime.datetime(2020, 8, 7, 8, 50)

In [36]:
43200/3600

12.0

In [45]:
btc

Unnamed: 0_level_0,OpenPrice,HighPrice,LowPrice,ClosePrice,Volume,NA
CloseTime,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
2020-02-22 00:00:00,9670.84,9727.88,9656.90,9696.13,205.114934,1.990094e+06
2020-02-22 01:00:00,9696.13,9722.39,9673.61,9694.32,107.745244,1.045307e+06
2020-02-22 02:00:00,9696.01,9702.59,9662.71,9665.12,79.593055,7.702991e+05
2020-02-22 03:00:00,9666.59,9676.79,9645.75,9656.95,63.272584,6.113744e+05
2020-02-22 04:00:00,9662.17,9695.69,9653.99,9674.00,31.972391,3.096365e+05
...,...,...,...,...,...,...
2020-08-07 03:00:00,11831.10,11920.00,11828.27,11866.01,175.190742,2.081800e+06
2020-08-07 04:00:00,11866.06,11899.22,11832.23,11832.23,181.233653,2.150788e+06
2020-08-07 05:00:00,11833.41,11848.82,11778.50,11822.52,183.269472,2.165308e+06
2020-08-07 06:00:00,11822.52,11827.47,11732.88,11789.80,242.529369,2.858283e+06


### Normalising the data before training the data

In [46]:
scaler = MinMaxScaler()

df = pd.DataFrame(scaler.fit_transform(btc), columns= btc.columns)
df

Unnamed: 0,OpenPrice,HighPrice,LowPrice,ClosePrice,Volume,NA
0,0.704971,0.683143,0.721758,0.708075,0.021872,0.028043
1,0.708181,0.682410,0.723835,0.707845,0.010951,0.014251
2,0.708166,0.679765,0.722481,0.704143,0.007793,0.010236
3,0.704432,0.676319,0.720373,0.703107,0.005962,0.007916
4,0.703871,0.678843,0.721397,0.705269,0.002451,0.003511
...,...,...,...,...,...,...
4007,0.979128,0.975956,0.991645,0.983207,0.018516,0.029382
4008,0.983565,0.973181,0.992137,0.978924,0.019194,0.030389
4009,0.979422,0.966449,0.985459,0.977693,0.019422,0.030601
4010,0.978040,0.963597,0.979789,0.973544,0.026069,0.040717


In [60]:
# to be able to use some of the data for predictions 

forecast = 30
test = np.array(df.ClosePrice[-forecast:])
itest = np.array(df[:-forecast])

df = df[:-forecast]

> The prediction is to be performed with the closeprice the target variable

In [56]:
x = np.array(df.drop('ClosePrice', 1))
y = np.array(df.ClosePrice)

>The data is splitted in the test and training data

In [57]:
X_train,X_test, y_train, y_test = train_test_split(x,y, test_size=0.1 , random_state = 42)

### The models are trained 

In [1]:
linear = LinearRegression()
linear.fit(X_train, y_train)
linear.score(X_test, y_test)

NameError: name 'LinearRegression' is not defined

In [59]:
ridge = Ridge()
ridge.fit(X_train,y_train)
ridge.score(X_test, y_test)

0.9993079487798706

In [61]:
pred = linear.predict(itest)
linear.score(pred,test)

ValueError: matmul: Input operand 1 has a mismatch in its core dimension 0, with gufunc signature (n?,k),(k,m?)->(n?,m?) (size 5 is different from 6)

In [None]:
#comparing th effect of regularisation

def get_weight(model, feature, col_name):
    """
    This functions aims to get the weight of the features for 
    each of the model we have used, linear regression, Ridge and
    Lasso regression. To compare the and see the effect of regularisation
    of data    
    
    """
    weight = pd.Series(model.coef_, feature.columns).sort_values()
    dataframe = pd.DataFrame(weight).reset_index()
    dataframe.columns = ['Features', col_name]
    round(dataframe[col_name], 3)
    return dataframe