# Baseline model

### Train and Evaluation Requirements
- For EACH of 500 tickers
- Train data : Everything untill 2022-06-01 (EXCLUDED)
- Predict between 2022-06-01 and 2022-09-01 : predictions.append(prediction(ticker,date))
- Model evaluation: Balanced Accuracy (one average measure for all predictions)

### Preprocessing
1. Sort data by Date index
2. Consider Closing Price only as data points. Consider all Tickers as ONE data category
3. Trim Dataset for 2019-05-01 to 2022-05-01 (inclusive) 3 years history without overlapping test set. Potential issue COVID!
4. Pivot to list tickers as columns
5. Manage NaaN --> Backfill? Drop? (dropping would need to separate all Tickers in separate DB
6. Calculate Delta for each date ( > 0 : 1, =<0 : -1)
7. Separate into 31 day segments, pivot (30 features + 1 output)
8. Concatenate all 31 day segments (regardless of ticker)

### Assignment
- 1 Slide
- What source data you use
- What preprocessing you are using
- What Features are you using
- What Model are you using
- The value of Balanced Accuracy

In [35]:
# Import packages
import pandas as pd
import sklearn as sk
import pickle

In [36]:
# Read data
df_raw = pd.read_csv('C:/Users/karen/PycharmProjects/ycng228-project/.data/_SP500_data_all.csv',index_col = 0)
df_raw.index = pd.to_datetime(df_raw.index)
df_raw.index.name = 'Date'

In [37]:
df = df_raw.sort_index(axis = 0)

In [38]:
df.head(5)

Unnamed: 0_level_0,open,high,low,close,adjclose,volume,ticker
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
1970-03-25,6.78125,6.9375,6.78125,6.875,0.261931,68400,ED
1970-03-25,0.341564,0.368313,0.340535,0.349794,0.158033,2041200,MCD
1970-03-25,8.08403,8.434241,8.054846,8.200767,1.285789,382912,IP
1970-03-25,1.753906,1.796875,1.753906,1.789063,0.158143,2720000,XOM
1970-03-25,15.750478,16.108988,15.750478,15.917782,3.698164,1303316,IBM


In [39]:
# Transform Data
df = df[['close','ticker']].rename(columns = {"close" : "Closing Price","ticker" : "Ticker"})

In [40]:
df.head(5)

Unnamed: 0_level_0,Closing Price,Ticker
Date,Unnamed: 1_level_1,Unnamed: 2_level_1
1970-03-25,6.875,ED
1970-03-25,0.349794,MCD
1970-03-25,8.200767,IP
1970-03-25,1.789063,XOM
1970-03-25,15.917782,IBM


In [41]:
df = df.pivot(columns = "Ticker")

In [42]:
df.head(5)

Unnamed: 0_level_0,Closing Price,Closing Price,Closing Price,Closing Price,Closing Price,Closing Price,Closing Price,Closing Price,Closing Price,Closing Price,Closing Price,Closing Price,Closing Price,Closing Price,Closing Price,Closing Price,Closing Price,Closing Price,Closing Price,Closing Price,Closing Price
Ticker,A,AAL,AAP,AAPL,ABBV,ABC,ABMD,ABT,ACN,ADBE,...,WYNN,XEL,XOM,XRAY,XYL,YUM,ZBH,ZBRA,ZION,ZTS
Date,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2,Unnamed: 7_level_2,Unnamed: 8_level_2,Unnamed: 9_level_2,Unnamed: 10_level_2,Unnamed: 11_level_2,Unnamed: 12_level_2,Unnamed: 13_level_2,Unnamed: 14_level_2,Unnamed: 15_level_2,Unnamed: 16_level_2,Unnamed: 17_level_2,Unnamed: 18_level_2,Unnamed: 19_level_2,Unnamed: 20_level_2,Unnamed: 21_level_2
1970-03-25,,,,,,,,,,,...,,,1.789063,,,,,,,
1970-03-26,,,,,,,,,,,...,,,1.792969,,,,,,,
1970-03-30,,,,,,,,,,,...,,,1.804688,,,,,,,
1970-03-31,,,,,,,,,,,...,,,1.789063,,,,,,,
1970-04-01,,,,,,,,,,,...,,,1.796875,,,,,,,


In [None]:
####

In [43]:
#greater than the start date and smaller than the end date

start_date = '2019-03-31'
end_date = '2022-04-01'
mask = (df.index > start_date) & (df.index <= end_date)
df2 = df.loc[mask]

In [44]:
display(df2)

Unnamed: 0_level_0,Closing Price,Closing Price,Closing Price,Closing Price,Closing Price,Closing Price,Closing Price,Closing Price,Closing Price,Closing Price,Closing Price,Closing Price,Closing Price,Closing Price,Closing Price,Closing Price,Closing Price,Closing Price,Closing Price,Closing Price,Closing Price
Ticker,A,AAL,AAP,AAPL,ABBV,ABC,ABMD,ABT,ACN,ADBE,...,WYNN,XEL,XOM,XRAY,XYL,YUM,ZBH,ZBRA,ZION,ZTS
Date,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2,Unnamed: 7_level_2,Unnamed: 8_level_2,Unnamed: 9_level_2,Unnamed: 10_level_2,Unnamed: 11_level_2,Unnamed: 12_level_2,Unnamed: 13_level_2,Unnamed: 14_level_2,Unnamed: 15_level_2,Unnamed: 16_level_2,Unnamed: 17_level_2,Unnamed: 18_level_2,Unnamed: 19_level_2,Unnamed: 20_level_2,Unnamed: 21_level_2
2019-04-01,81.559998,32.349998,173.630005,47.810001,80.779999,79.089996,277.760010,79.660004,176.320007,272.170013,...,129.339996,55.570000,81.730003,49.840000,80.559998,100.580002,124.038834,211.910004,46.720001,101.540001
2019-04-02,81.139999,32.990002,173.339996,48.505001,83.070000,74.489998,284.200012,79.620003,175.369995,271.350006,...,135.029999,55.580002,81.379997,50.150002,80.040001,100.180000,123.786407,213.619995,46.930000,102.040001
2019-04-03,81.940002,33.709999,171.690002,48.837502,83.080002,75.050003,284.940002,79.500000,177.190002,271.500000,...,137.630005,55.369999,80.900002,50.349998,79.690002,100.540001,122.796120,215.039993,47.180000,102.120003
2019-04-04,80.830002,33.930000,174.000000,48.922501,82.809998,75.970001,282.869995,78.620003,177.160004,267.890015,...,139.669998,55.279999,82.050003,50.259998,80.000000,100.449997,122.873787,214.009995,47.680000,101.980003
2019-04-05,81.470001,34.060001,176.779999,49.250000,83.449997,77.269997,284.910004,79.000000,178.149994,267.450012,...,140.940002,55.709999,82.489998,50.400002,80.339996,99.959999,123.504852,218.330002,47.570000,102.120003
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2022-03-28,135.419998,17.299999,210.490005,175.600006,161.970001,154.729996,321.809998,119.989998,330.739990,450.010010,...,80.129997,71.239998,82.809998,49.840000,86.849998,121.190002,125.680000,422.279999,69.730003,189.369995
2022-03-29,138.419998,18.160000,215.449997,178.960007,162.179993,155.039993,331.500000,120.190002,340.690002,466.329987,...,81.750000,71.830002,82.370003,49.730000,89.269997,122.220001,129.580002,438.100006,69.739998,192.279999
2022-03-30,135.460007,18.049999,211.820007,177.770004,163.750000,155.139999,328.510010,120.379997,338.459991,460.059998,...,81.169998,72.320000,83.779999,49.549999,88.099998,120.839996,128.690002,429.609985,67.370003,191.320007
2022-03-31,132.330002,18.250000,206.960007,174.610001,162.110001,154.710007,331.239990,118.360001,337.230011,455.619995,...,79.739998,72.169998,82.589996,49.220001,85.260002,118.529999,127.900002,413.700012,65.559998,188.589996


In [45]:
df_nulls = pd.DataFrame(df2.isnull().sum(axis = 1))

In [46]:
display(df_nulls)
df_nulls.describe()

Unnamed: 0_level_0,0
Date,Unnamed: 1_level_1
2019-04-01,5
2019-04-02,5
2019-04-03,5
2019-04-04,5
2019-04-05,5
...,...
2022-03-28,0
2022-03-29,0
2022-03-30,0
2022-03-31,0


Unnamed: 0,0
count,759.0
mean,2.329381
std,1.389584
min,0.0
25%,1.0
50%,2.0
75%,4.0
max,5.0


In [47]:
df3 = df2.fillna(method = 'bfill')

In [48]:
pd.DataFrame(df3.isnull().sum(axis = 1)).describe()

Unnamed: 0,0
count,759.0
mean,0.0
std,0.0
min,0.0
25%,0.0
50%,0.0
75%,0.0
max,0.0


In [49]:
# Calculate Deltas and reset index
df_delta = df3.diff().dropna(axis='index').reset_index(drop=True)
display(df_delta)
display(pd.DataFrame(df_delta.isnull().sum(axis = 1)))


Unnamed: 0_level_0,Closing Price,Closing Price,Closing Price,Closing Price,Closing Price,Closing Price,Closing Price,Closing Price,Closing Price,Closing Price,Closing Price,Closing Price,Closing Price,Closing Price,Closing Price,Closing Price,Closing Price,Closing Price,Closing Price,Closing Price,Closing Price
Ticker,A,AAL,AAP,AAPL,ABBV,ABC,ABMD,ABT,ACN,ADBE,...,WYNN,XEL,XOM,XRAY,XYL,YUM,ZBH,ZBRA,ZION,ZTS
0,-0.419998,0.640003,-0.290009,0.695000,2.290001,-4.599998,6.440002,-0.040001,-0.950012,-0.820007,...,5.690002,0.010002,-0.350006,0.310001,-0.519997,-0.400002,-0.252426,1.709991,0.209999,0.500000
1,0.800003,0.719997,-1.649994,0.332500,0.010002,0.560005,0.739990,-0.120003,1.820007,0.149994,...,2.600006,-0.210003,-0.479996,0.199997,-0.349998,0.360001,-0.990288,1.419998,0.250000,0.080002
2,-1.110001,0.220001,2.309998,0.084999,-0.270004,0.919998,-2.070007,-0.879997,-0.029999,-3.609985,...,2.039993,-0.090000,1.150002,-0.090000,0.309998,-0.090004,0.077667,-1.029999,0.500000,-0.139999
3,0.639999,0.130001,2.779999,0.327499,0.639999,1.299995,2.040009,0.379997,0.989990,-0.440002,...,1.270004,0.430000,0.439995,0.140003,0.339996,-0.489998,0.631065,4.320007,-0.110001,0.139999
4,0.220001,-0.180000,0.809998,0.775002,0.530006,0.850006,-1.959991,-0.480003,0.760010,1.359985,...,3.910004,-0.399998,0.510002,-0.070000,0.990005,-0.370003,1.504860,3.110001,0.049999,0.110001
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
753,-0.290009,0.289999,-5.089996,0.880005,0.639999,0.039993,2.929993,1.040001,3.029999,18.390015,...,-0.139999,0.540001,-2.389999,0.150002,0.159996,1.570000,0.400002,-3.380005,-1.349998,-0.059998
754,3.000000,0.860001,4.959991,3.360001,0.209991,0.309998,9.690002,0.200005,9.950012,16.319977,...,1.620003,0.590004,-0.439995,-0.110001,2.419998,1.029999,3.900002,15.820007,0.009995,2.910004
755,-2.959991,-0.110001,-3.629990,-1.190002,1.570007,0.100006,-2.989990,0.189995,-2.230011,-6.269989,...,-0.580002,0.489998,1.409996,-0.180000,-1.169998,-1.380005,-0.889999,-8.490021,-2.369995,-0.959991
756,-3.130005,0.200001,-4.860001,-3.160004,-1.639999,-0.429993,2.729980,-2.019997,-1.229980,-4.440002,...,-1.430000,-0.150002,-1.190002,-0.329998,-2.839996,-2.309998,-0.790001,-15.909973,-1.810005,-2.730011


Unnamed: 0,0
0,0
1,0
2,0
3,0
4,0
...,...
753,0
754,0
755,0
756,0


In [50]:
df_delta.describe()

Unnamed: 0_level_0,Closing Price,Closing Price,Closing Price,Closing Price,Closing Price,Closing Price,Closing Price,Closing Price,Closing Price,Closing Price,Closing Price,Closing Price,Closing Price,Closing Price,Closing Price,Closing Price,Closing Price,Closing Price,Closing Price,Closing Price,Closing Price
Ticker,A,AAL,AAP,AAPL,ABBV,ABC,ABMD,ABT,ACN,ADBE,...,WYNN,XEL,XOM,XRAY,XYL,YUM,ZBH,ZBRA,ZION,ZTS
count,758.0,758.0,758.0,758.0,758.0,758.0,758.0,758.0,758.0,758.0,...,758.0,758.0,758.0,758.0,758.0,758.0,758.0,758.0,758.0,758.0
mean,0.069142,-0.018615,0.044657,0.166887,0.108047,0.10215,0.080317,0.051346,0.216781,0.245409,...,-0.064261,0.022665,0.001834,-0.000778,0.00748,0.024063,0.00487,0.266253,0.023193,0.118166
std,1.824695,0.731156,3.201791,2.203152,1.56997,1.873819,7.303623,1.709026,4.235277,10.096587,...,3.424484,1.068371,1.220823,1.061956,1.754856,1.657618,2.676062,7.957277,1.190725,2.602694
min,-8.370003,-3.93,-18.440002,-10.519997,-12.75,-9.5,-73.690002,-10.860001,-19.179993,-64.23999,...,-17.719997,-8.479996,-5.829998,-5.279999,-12.529999,-8.629997,-12.766998,-39.389984,-5.949997,-18.440002
25%,-0.75,-0.390001,-1.75,-0.827497,-0.699997,-0.947498,-2.98999,-0.7775,-1.62999,-3.764999,...,-1.9,-0.43,-0.677502,-0.5,-0.887499,-0.757502,-1.194181,-4.007492,-0.639999,-1.040007
50%,0.184998,-0.030001,0.004997,0.147503,0.139999,0.119999,0.439995,0.114998,0.419998,0.764999,...,-0.254997,0.050001,-0.060001,0.050001,0.055,0.044998,-0.009705,0.174995,0.025,0.220001
75%,1.077497,0.3775,1.965,1.156876,0.949997,1.117502,4.110008,0.970001,2.160004,4.960014,...,1.787502,0.517497,0.68,0.52,0.877497,0.810003,1.271841,4.462502,0.700001,1.4175
max,6.310005,4.869999,11.159996,11.110001,6.760002,9.510002,26.529999,8.099998,25.300018,50.5,...,22.169998,6.57,4.870003,6.639999,7.300003,13.140003,16.922318,42.959961,7.630001,12.18


In [51]:
column_names = ['day' + str(i) for i in range(1,31)]
column_names.append('result')
df_all = pd.DataFrame(columns = column_names)
df_all.head(1)

Unnamed: 0,day1,day2,day3,day4,day5,day6,day7,day8,day9,day10,...,day22,day23,day24,day25,day26,day27,day28,day29,day30,result


In [52]:
# Transform Delta dataframe to 31 day segments
_start = df_delta.index[0]
_end = df_delta.index[-1]
i = _start

while i < _end:
    _tmp = i + 31
    if _tmp > _end:
        break
    mask = (df_delta.index >= i) & (df_delta.index < _tmp)
    data_tmp = pd.DataFrame(df_delta.iloc[mask].transpose().values,columns=column_names)
    df_all = pd.concat([df_all, data_tmp], ignore_index=True)
    i += 1

In [53]:
display(df_all)

Unnamed: 0,day1,day2,day3,day4,day5,day6,day7,day8,day9,day10,...,day22,day23,day24,day25,day26,day27,day28,day29,day30,result
0,-0.419998,0.800003,-1.110001,0.639999,0.220001,-0.270004,0.260002,-0.599998,-0.099998,-0.580002,...,0.729996,1.090004,0.059998,-2.680000,-0.059998,0.389999,0.099998,-3.260002,2.170006,-8.370003
1,0.640003,0.719997,0.220001,0.130001,-0.180000,-0.570000,0.709999,0.790001,-0.120003,-0.939999,...,0.040001,0.840000,-0.040001,-0.840000,-0.160000,0.200001,0.040001,-1.850002,0.200001,-0.180000
2,-0.290009,-1.649994,2.309998,2.779999,0.809998,-1.940002,3.150009,2.160004,-0.850006,1.319992,...,2.949997,-3.509995,-1.279999,-1.330002,-2.040009,1.270004,-1.770004,-2.750000,0.380005,0.080002
3,0.695000,0.332500,0.084999,0.327499,0.775002,-0.150002,0.279999,-0.417500,-0.020000,0.090000,...,-0.342503,0.650002,-0.817501,-1.404999,0.009998,-0.544998,-0.885002,-2.864998,0.735001,0.564999
4,2.290001,0.010002,-0.270004,0.639999,0.530006,-1.290001,0.259995,-1.180000,-0.989998,0.250000,...,-0.419998,0.239998,0.550003,-1.310005,0.040001,-0.079994,-0.460007,-0.579994,1.570000,0.339996
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
365676,0.349998,0.130005,2.320000,-1.730003,-2.480003,-0.229996,2.360001,-1.139999,-2.560005,1.980003,...,2.959999,-3.449997,0.129997,-2.709999,3.739998,-0.079994,1.570000,1.029999,-1.380005,-2.309998
365677,0.349510,-2.456306,1.019417,1.893204,-0.378639,0.873779,3.126221,0.475723,0.654564,-1.449997,...,-1.510002,0.270004,-0.559998,0.259995,3.410004,1.129997,0.400002,3.900002,-0.889999,-0.790001
365678,-3.970001,-7.470001,-3.190002,-10.549988,-10.470001,16.039978,2.740021,-4.730011,-7.760010,9.690002,...,17.829987,-7.639984,4.139984,-13.769989,7.459991,-1.709991,-3.380005,15.820007,-8.490021,-15.909973
365679,-0.399994,-2.690002,-0.500000,0.169998,-1.180000,-2.739998,4.019997,-0.099998,-5.949997,3.439995,...,-0.489998,-0.089996,1.769997,-1.830002,0.880005,2.500000,-1.349998,0.009995,-2.369995,-1.810005


In [54]:
# Replace Result column with -1,1
df_train = df_all.iloc[:,:-1]
df_train.loc[df_all['result'] <= 0, 'result'] = -1 
df_train.loc[df_all['result'] > 0, 'result'] = 1 

In [55]:
display(df_train)

Unnamed: 0,day1,day2,day3,day4,day5,day6,day7,day8,day9,day10,...,day22,day23,day24,day25,day26,day27,day28,day29,day30,result
0,-0.419998,0.800003,-1.110001,0.639999,0.220001,-0.270004,0.260002,-0.599998,-0.099998,-0.580002,...,0.729996,1.090004,0.059998,-2.680000,-0.059998,0.389999,0.099998,-3.260002,2.170006,-1.0
1,0.640003,0.719997,0.220001,0.130001,-0.180000,-0.570000,0.709999,0.790001,-0.120003,-0.939999,...,0.040001,0.840000,-0.040001,-0.840000,-0.160000,0.200001,0.040001,-1.850002,0.200001,-1.0
2,-0.290009,-1.649994,2.309998,2.779999,0.809998,-1.940002,3.150009,2.160004,-0.850006,1.319992,...,2.949997,-3.509995,-1.279999,-1.330002,-2.040009,1.270004,-1.770004,-2.750000,0.380005,1.0
3,0.695000,0.332500,0.084999,0.327499,0.775002,-0.150002,0.279999,-0.417500,-0.020000,0.090000,...,-0.342503,0.650002,-0.817501,-1.404999,0.009998,-0.544998,-0.885002,-2.864998,0.735001,1.0
4,2.290001,0.010002,-0.270004,0.639999,0.530006,-1.290001,0.259995,-1.180000,-0.989998,0.250000,...,-0.419998,0.239998,0.550003,-1.310005,0.040001,-0.079994,-0.460007,-0.579994,1.570000,1.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
365676,0.349998,0.130005,2.320000,-1.730003,-2.480003,-0.229996,2.360001,-1.139999,-2.560005,1.980003,...,2.959999,-3.449997,0.129997,-2.709999,3.739998,-0.079994,1.570000,1.029999,-1.380005,-1.0
365677,0.349510,-2.456306,1.019417,1.893204,-0.378639,0.873779,3.126221,0.475723,0.654564,-1.449997,...,-1.510002,0.270004,-0.559998,0.259995,3.410004,1.129997,0.400002,3.900002,-0.889999,-1.0
365678,-3.970001,-7.470001,-3.190002,-10.549988,-10.470001,16.039978,2.740021,-4.730011,-7.760010,9.690002,...,17.829987,-7.639984,4.139984,-13.769989,7.459991,-1.709991,-3.380005,15.820007,-8.490021,-1.0
365679,-0.399994,-2.690002,-0.500000,0.169998,-1.180000,-2.739998,4.019997,-0.099998,-5.949997,3.439995,...,-0.489998,-0.089996,1.769997,-1.830002,0.880005,2.500000,-1.349998,0.009995,-2.369995,-1.0


In [56]:
from sklearn.model_selection import train_test_split

In [57]:
X,y = df_train.iloc[:,:-1],df_train.iloc[:,30:]

In [58]:
 X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, random_state=42)

In [59]:
from sklearn.linear_model import LogisticRegression
from sklearn import metrics

In [60]:
logreg =  LogisticRegression(solver='liblinear')

In [61]:
logreg.fit(X_train,y_train)

  return f(**kwargs)


LogisticRegression(solver='liblinear')

In [62]:
y_pred = logreg.predict(X_test)

In [63]:
bal_acc = metrics.balanced_accuracy_score(y_test, y_pred)
print(bal_acc)

0.5079600115162983


In [31]:
import joblib

In [78]:
filename = 'mdl.sav'
joblib.dump(logreg, filename)

['mdl.sav']

### Creating test dataset and testing model

In [64]:
start_date = '2022-04-18'
end_date = '2022-09-01'
mask = (df.index > start_date) & (df.index <= end_date)
df_5 = df.loc[mask]

In [65]:
df_6 = df_5.fillna(method = 'bfill')

In [66]:
pd.DataFrame(df_6.isnull().sum(axis = 1)).describe()

Unnamed: 0,0
count,95.0
mean,0.0
std,0.0
min,0.0
25%,0.0
50%,0.0
75%,0.0
max,0.0


In [67]:
# Calculate Deltas and reset index
df_7 = df_6.diff().dropna(axis='index').reset_index(drop=True)
# display(df_7)
# display(pd.DataFrame(df_7.isnull().sum(axis = 1)))

In [68]:
# Create test dataframe
column_names = ['day' + str(i) for i in range(1,31)]
column_names.append('result')
df_8 = pd.DataFrame(columns = column_names)
df_8.head(1)

Unnamed: 0,day1,day2,day3,day4,day5,day6,day7,day8,day9,day10,...,day22,day23,day24,day25,day26,day27,day28,day29,day30,result


In [69]:
# Transform Delta dataframe to 31 day segments
_start = df_7.index[0]
_end = df_7.index[-1]
i = _start

while i < _end:
    _tmp = i + 31
    if _tmp > _end:
        break
    mask = (df_7.index >= i) & (df_7.index < _tmp)
    data_tmp = pd.DataFrame(df_7.iloc[mask].transpose().values,columns=column_names)
    df_8 = pd.concat([df_8, data_tmp], ignore_index=True)
    i += 1

In [70]:
# Replace Result column with -1,1
df_val = df_8.iloc[:,:-1]
df_val.loc[df_8['result'] <= 0, 'result'] = -1 
df_val.loc[df_8['result'] > 0, 'result'] = 1 

In [71]:
display(df_val)

Unnamed: 0,day1,day2,day3,day4,day5,day6,day7,day8,day9,day10,...,day22,day23,day24,day25,day26,day27,day28,day29,day30,result
0,2.839996,-4.079994,-3.970001,0.129997,-3.509995,-0.260002,3.379997,-1.790001,0.300003,2.830002,...,1.610001,2.729996,0.850006,-1.570000,-4.030006,3.470001,6.700005,-2.990005,-4.470001,1.0
1,-0.110001,0.740000,-0.039999,-0.200001,-1.209999,0.199999,0.460001,-0.660000,-0.210001,0.570000,...,-0.139999,-0.490000,0.490000,-1.250000,0.629999,1.110001,0.889999,-0.259998,-0.580000,1.0
2,5.120010,-2.460007,-7.169998,5.479996,-0.429993,1.440002,-15.490005,-9.759995,0.860001,-0.570007,...,0.570007,-14.180008,1.350006,-3.010010,5.080002,5.589996,2.150009,-3.190002,-2.199997,1.0
3,-0.169998,-0.809998,-4.630005,1.090012,-6.080002,-0.229996,7.069992,-5.990005,0.310013,1.519989,...,-3.470001,0.239990,5.520004,-2.750000,0.160004,3.259995,5.860001,-0.800003,-0.129990,1.0
4,0.349991,1.820007,-3.529999,1.309998,-0.120010,1.440002,-1.309998,-9.429993,0.989990,1.730011,...,-0.709991,-0.710007,-2.979996,1.080002,2.850006,-1.389999,-0.570007,-2.630005,-1.349991,1.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
31684,0.899994,-0.559998,-0.800003,-2.399994,0.789993,2.670006,2.199997,-0.169998,0.199997,-0.610001,...,-1.160004,-1.299995,-1.910004,-0.029999,-0.040001,2.150002,-4.659996,0.049995,-1.430000,1.0
31685,0.760002,0.449997,-1.449997,0.900002,0.119995,3.090004,-0.330002,-0.389999,-0.629997,4.299995,...,-0.120003,-1.289993,-1.870003,-2.059998,0.399994,2.240005,-4.430000,0.110001,-0.250000,-1.0
31686,13.510010,6.729980,-5.470001,-1.309998,1.600006,11.519989,6.010010,13.190002,0.739990,-35.109985,...,1.929993,-13.970001,-12.529999,1.109985,2.350006,10.800018,-21.980011,-0.260010,-5.040009,-1.0
31687,0.670002,-0.220001,-0.290001,1.330002,-1.349998,0.809998,-0.070000,1.070000,-0.619999,-0.830002,...,0.120003,-1.170002,-1.619999,0.189999,0.090000,1.030003,-0.870003,-0.689999,-0.349998,-1.0


In [72]:
# vreate inputs
X_val,y_val = df_val.iloc[:,:-1],df_val.iloc[:,30:]

In [73]:
display(X_val)
display(y_val)

Unnamed: 0,day1,day2,day3,day4,day5,day6,day7,day8,day9,day10,...,day21,day22,day23,day24,day25,day26,day27,day28,day29,day30
0,2.839996,-4.079994,-3.970001,0.129997,-3.509995,-0.260002,3.379997,-1.790001,0.300003,2.830002,...,-2.290001,1.610001,2.729996,0.850006,-1.570000,-4.030006,3.470001,6.700005,-2.990005,-4.470001
1,-0.110001,0.740000,-0.039999,-0.200001,-1.209999,0.199999,0.460001,-0.660000,-0.210001,0.570000,...,-0.930000,-0.139999,-0.490000,0.490000,-1.250000,0.629999,1.110001,0.889999,-0.259998,-0.580000
2,5.120010,-2.460007,-7.169998,5.479996,-0.429993,1.440002,-15.490005,-9.759995,0.860001,-0.570007,...,-20.759995,0.570007,-14.180008,1.350006,-3.010010,5.080002,5.589996,2.150009,-3.190002,-2.199997
3,-0.169998,-0.809998,-4.630005,1.090012,-6.080002,-0.229996,7.069992,-5.990005,0.310013,1.519989,...,-8.419998,-3.470001,0.239990,5.520004,-2.750000,0.160004,3.259995,5.860001,-0.800003,-0.129990
4,0.349991,1.820007,-3.529999,1.309998,-0.120010,1.440002,-1.309998,-9.429993,0.989990,1.730011,...,-2.350006,-0.709991,-0.710007,-2.979996,1.080002,2.850006,-1.389999,-0.570007,-2.630005,-1.349991
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
31684,0.899994,-0.559998,-0.800003,-2.399994,0.789993,2.670006,2.199997,-0.169998,0.199997,-0.610001,...,0.070000,-1.160004,-1.299995,-1.910004,-0.029999,-0.040001,2.150002,-4.659996,0.049995,-1.430000
31685,0.760002,0.449997,-1.449997,0.900002,0.119995,3.090004,-0.330002,-0.389999,-0.629997,4.299995,...,-1.480003,-0.120003,-1.289993,-1.870003,-2.059998,0.399994,2.240005,-4.430000,0.110001,-0.250000
31686,13.510010,6.729980,-5.470001,-1.309998,1.600006,11.519989,6.010010,13.190002,0.739990,-35.109985,...,-4.459991,1.929993,-13.970001,-12.529999,1.109985,2.350006,10.800018,-21.980011,-0.260010,-5.040009
31687,0.670002,-0.220001,-0.290001,1.330002,-1.349998,0.809998,-0.070000,1.070000,-0.619999,-0.830002,...,-0.320004,0.120003,-1.170002,-1.619999,0.189999,0.090000,1.030003,-0.870003,-0.689999,-0.349998


Unnamed: 0,result
0,1.0
1,1.0
2,1.0
3,1.0
4,1.0
...,...
31684,1.0
31685,-1.0
31686,-1.0
31687,-1.0


In [74]:
# predict
y_val_pred = logreg.predict(X_val)

In [75]:
# Calculate Metric (Balanced Accuracy)
bal_acc_val = metrics.balanced_accuracy_score(y_val, y_val_pred)
print(bal_acc_val)

0.5044401129866302


In [77]:
display(y_val_pred)

array([ 1.,  1.,  1., ..., -1.,  1.,  1.])