We’ll be using scikit-learn to implement linear regression. We have data of the closing price of the etf and its components. The goal is to determine the fractions of the monetary value stored in each cryptocurrency.

In [2]:
import pandas as pd
import pickle
data = pickle.load(open("test00.in", "rb"))[0]

from sklearn.linear_model import LinearRegression
import numpy as np

We start by finding the coefficients(c1...c5) which solve the equations of the form
y = c1 * X1 + c2 * X2 + c3 * X3 + c4 * X4 + c5 * X5
where y is etf price and X is the price of the 5 cryptocurrencies.


In [3]:
data

Unnamed: 0,eth,golem,neo,ripple,stellar,etf
0,8.17,0.009868,0.141841,0.006368,0.002481,72.96915
1,8.38,0.009863,0.145642,0.006311,0.002477,74.433065
2,9.73,0.010772,0.140422,0.006386,0.002554,85.206683
3,11.25,0.012581,0.136734,0.00657,0.002735,97.267391
4,10.25,0.01082,0.13107,0.006201,0.002598,88.586585


First we split the data into input and output arrays. The input array X is composed of the 5 cryptocurrencies we want to find the proportions for. The output array y is the price of the etf which we know is a linear combination of the 5 cryptocurrencies.

In [4]:
X = data[["eth", "golem", "neo", "ripple", "stellar"]]
y = data["etf"]
X

Unnamed: 0,eth,golem,neo,ripple,stellar
0,8.17,0.009868,0.141841,0.006368,0.002481
1,8.38,0.009863,0.145642,0.006311,0.002477
2,9.73,0.010772,0.140422,0.006386,0.002554
3,11.25,0.012581,0.136734,0.00657,0.002735
4,10.25,0.01082,0.13107,0.006201,0.002598


We now get the linear regression model from scikit. The fit method calculates the coefficients of the 5 input variables. 

In [5]:
lr = LinearRegression()
lr.fit(X,y)
c = lr.coef_

Finally we calculate the fractions of the monetary value stored in each cryptocurrency. Linear regression isn't perfect, resulting in some negetive fractions. Despite this the fractions are still close to the actual proportions.

In [6]:
w = np.multiply(c, X.iloc[-1])
w = w / sum(w)
#print(lr.coef_)
for val in w:
    print("%.5f" % (val))

0.81978
-0.05392
-0.00200
0.26219
-0.02605
