### Bitcoin Price Prediction
The goal of this mini-project is to predict the price of the digital currency Bitcoin for the next 30 days using Support Vector Machine

In [1]:
#import the necessary libraries
import numpy as np
import pandas as pd

df = pd.read_csv("bitcoin.csv")
df.head()

Unnamed: 0,Date,Price
0,5/23/2019,7881.84668
1,5/24/2019,7987.371582
2,5/25/2019,8052.543945
3,5/26/2019,8673.21582
4,5/27/2019,8805.77832


Remve the date column

In [2]:
df.drop(["Date"],axis=1, inplace=True)

Creating a variable to predict 'n' days into the future. In order to accomplish this, we will shift the dataframe by 30 rows

In [3]:
predictionDays = 30
#Create another column shifted 'n' rows up
df["Prediction"] = df[["Price"]].shift(-predictionDays)
df.head()

Unnamed: 0,Price,Prediction
0,7881.84668,10701.69141
1,7987.371582,10855.37109
2,8052.543945,11011.10254
3,8673.21582,11790.91699
4,8805.77832,13016.23145


Display the last 5 rows of the new dataset

In [4]:
df.tail()

Unnamed: 0,Price,Prediction
362,9729.038086,
363,9522.981445,
364,9081.761719,
365,9182.577148,
366,9180.045898,


Creating the training dataset by dropping the predictions column from the dataframe and removing the last 'n' rows (Since there is no prediction/label) for these rows

In [5]:
X = np.array(df.drop(["Prediction"],axis=1)).reshape(-1,1)
X = X[:-predictionDays,:]
len(X)

337

Create an array containing the labels and remove the last 'n' rows

In [19]:
y = np.array(df["Prediction"])
y = y[:-predictionDays]
y = y.reshape(len(y),)
len(y)

337

Creating training and testing data sets. We will use sklearn's test_train_split class for this purpose

In [20]:
from sklearn.model_selection import train_test_split
#setting fixed seed for reproducibility
np.random.seed(42)

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2)

#Assigning the last 'n' observations in the dataframe for which there are no predictions to a separate array
predictionDays_array = np.array(df.drop(["Prediction"],axis=1))[-predictionDays:].reshape(-1,)
len(predictionDays_array)

30

### Creating a Support Vector Machine model

In [23]:
from sklearn.svm import SVR
# Create and Train the Support Vector Machine (Regression) using radial basis function
svr_rbf = SVR(kernel='rbf', C=1e3, gamma=0.00001)

### Testing the model using Cross Validation

In [33]:
from sklearn.model_selection import cross_val_score
scores = cross_val_score(svr_rbf, X_train, y_train, cv=10, scoring='neg_mean_squared_error')
svr_rbf_rmse = np.sqrt(-scores)
print("Scores:", svr_rbf_rmse)
print("Mean:", svr_rbf_rmse.mean())
print("Std deviation:", svr_rbf_rmse.std())

Scores: [1332.63343149 1654.75247961 1734.9437256  1035.20060089 1496.60927528
 1384.09653238 1669.5744076  1407.95459374 1386.76291655 1461.55722654]
Mean: 1456.408518968009
Std deviation: 192.13750099062213
