## Install Python Dependencies
This installs the packages and libraries that Python needed inorder to analyze and download the data

# Import Python Packages
This will import the dependencies to be used to download and analyze the data that will be downloaded through the Yahoo Finance API.

### Yahoo Finance API
```
pip install yfinance
```

### Dataframe
```
pip install pandas
```

### Data Reader
```
pip install pandas_datareader
```

### Numerical Python
```
pip install numpy
```

### Plotting Data
```
pip install plotly matplotlib
```

### Machine learning for implementation of Logistic Regression
```
pip install scikit-learn
```

### Technical Analysis Library
```
pip install ta-lib
```

In [1]:
import yfinance as yf   # Yahoo Finance API
import pandas as pd     # DataFrame
import numpy as np      # Numerical Python
from pandas_datareader import data as pdr # Pandas Data Reader
import talib as ta      # Technical Analysis Library

# Machine learning
from sklearn.linear_model import LogisticRegression
from sklearn import metrics
from sklearn.model_selection import cross_val_score

# Widgets for Dynamic Graphs
import ipywidgets as widgets
from IPython.display import display

### Library Options
This will set the options for the libraries that will be used in this notebook

In [2]:
yf.pdr_override()   # Override Yahoo Finance API output to use pandas data reader
pd.options.plotting.backend = "plotly" # Use plotly as the plotting backend

### Download PSEi Data
This will download the data from Yahoo Finance API and then import it to pandas dataframe.
The data will be downloaded is from `2000-01-01` to `2023-05-18`

In [3]:
df = pdr.get_data_yahoo('PSEI.PS', '2000-01-01', '2023-05-18')
df = df.dropna()
df.head()

[*********************100%***********************]  1 of 1 completed


Unnamed: 0_level_0,Open,High,Low,Close,Adj Close,Volume
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
2000-01-03,2143.669922,2148.709961,2122.98999,2141.77002,2141.219482,0
2000-01-04,2151.550049,2158.209961,2130.679932,2153.179932,2152.626465,0
2000-01-05,2113.379883,2113.379883,2070.139893,2074.75,2074.216553,0
2000-01-06,2079.050049,2082.810059,2066.879883,2079.110107,2078.575684,0
2000-01-07,2079.320068,2094.290039,2077.649902,2094.290039,2093.751709,0


### Output the raw data to CSV

In [4]:
df.to_csv('raw.csv')

# Initialize Variables

### Time Period
The window size (moving average in days) of the rolling mean and rolling correlation

### Train Size
The percentage of the data that will be used for training the model

In [27]:
timeperiod = widgets.IntSlider(description="Time Period", min=2, max=100, step=1, value=10)
trainsize = widgets.FloatSlider(description="Train Size",min=0, max=1, step=0.01, value=0.7)

display(timeperiod, trainsize)

IntSlider(value=10, description='Time Period', min=2)

FloatSlider(value=0.8, description='Train Size', max=1.0, step=0.01)

In [28]:
df['S_' + str(timeperiod.value)] = df['Close'].rolling(window=timeperiod.value).mean() # Rolling mean

df['Corr'] = df['Close'].rolling(window=timeperiod.value).corr(df['S_' + str(timeperiod.value)]) # Correlation between the close price and the rolling mean

df['RSI'] = ta.RSI(np.array(df['Close']), timeperiod=timeperiod.value) # Relative Strength Index

df['Open-Close'] = df['Open'] - df['Close'].shift(1) # The difference between the current day's open and the previous day's close

df['Open-Open'] = df['Open'] - df['Open'].shift(1) # The difference between the current day's open and the previous day's open

df = df.dropna() # Drop the NaN values
df #    Show the dataframe

Unnamed: 0_level_0,Open,High,Low,Close,Adj Close,Volume,S_10,Corr,RSI,Open-Close,Open-Open,Predicted_Signal,PSEi_returns,Strategy_Returns
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1
2000-02-22,1833.839966,1833.839966,1778.890015,1799.829956,1799.367310,0,1926.203992,0.863339,21.720313,0.000000,-48.690063,1,-0.018720,-0.018720
2000-02-23,1799.540039,1835.810059,1799.540039,1828.930054,1828.459839,0,1904.369995,0.860742,30.785458,-0.289917,-34.299927,1,0.016039,0.016039
2000-02-24,1831.420044,1835.609985,1787.219971,1794.810059,1794.348633,0,1881.835999,0.883831,26.749769,2.489990,31.880005,1,-0.018832,-0.018832
2000-02-25,1831.420044,1835.609985,1787.219971,1794.810059,1794.348633,0,1861.596008,0.887208,26.749769,36.609985,0.000000,1,0.000000,0.000000
2000-02-28,1753.619995,1753.619995,1711.719971,1720.650024,1720.207642,0,1835.987012,0.934438,19.788826,-41.190063,-77.800049,1,-0.042197,-0.042197
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2023-05-11,6669.399902,6704.529785,6657.509766,6675.459961,6675.459961,116400,6641.555029,0.427915,59.636713,10.810059,36.569824,1,0.002530,0.002530
2023-05-12,6661.600098,6668.649902,6578.149902,6578.149902,6578.149902,150400,6641.002002,-0.082792,47.087081,-13.859863,-7.799805,1,-0.014685,-0.014685
2023-05-15,6582.569824,6585.720215,6523.149902,6523.149902,6523.149902,78700,6630.808984,-0.315707,41.590710,4.419922,-79.030273,1,-0.008396,-0.008396
2023-05-16,6549.339844,6596.339844,6549.339844,6588.899902,6588.899902,65700,6622.429980,-0.206046,49.431267,26.189941,-33.229980,1,0.010029,0.010029


### Prepare the data for the model

In [29]:
x = df.iloc[:,:9] # The features that will be used for the model
y = np.where(df['Close'].shift(-1) > df['Close'],1,-1) # 1 if the price goes up, -1 otherwise

In [30]:
split = int(trainsize.value * len(df)) # Split the data into train and test set
x_train, x_test, y_train, y_test = x[:split], x[split:], y[:split], y[split:] # Split the data into train and test set

### Implement Logistic Regression

In [31]:
model = LogisticRegression()  # Initialize the model
model = model.fit(x_train, y_train)  # Fit the model

### Show the model

In [32]:
# Show the coefficients of the model
pd.DataFrame(zip(x.columns, np.transpose(model.coef_)))
# 1st column is the features, 2nd column is the coefficients

Unnamed: 0,0,1
0,Open,[2.267669299178034e-06]
1,High,[2.2920127155775336e-06]
2,Low,[2.26819802646283e-06]
3,Close,[2.287727248966235e-06]
4,Adj Close,[2.287139176882533e-06]
5,Volume,[-3.3721335887791688e-09]
6,S_10,[2.304098079965185e-06]
7,Corr,[6.660079868362076e-11]
8,RSI,[1.3752387347115249e-08]


### Predict the price

In [33]:
probability = model.predict_proba(x_test) # Show the probability of the model

In [34]:
probability # 1st column is the probability of the price going down, 2nd column is the probability of the price going up

array([[0.47344444, 0.52655556],
       [0.47347594, 0.52652406],
       [0.47336541, 0.52663459],
       ...,
       [0.47759671, 0.52240329],
       [0.47751342, 0.52248658],
       [0.47741185, 0.52258815]])

In [35]:
y_predicted = model.predict(x_test) # Predict the price

In [36]:
metrics.confusion_matrix(y_test, y_predicted) # Show the confusion matrix

array([[  1, 562],
       [  0, 591]], dtype=int64)

### Show the accuracy of the model

In [37]:
print("Accuracy:", model.score(x_test,y_test)) # Show the accuracy of the model


Accuracy: 0.512998266897747


In [38]:
metrics.classification_report(y_test, y_predicted) # Show the classification report


'              precision    recall  f1-score   support\n\n          -1       1.00      0.00      0.00       563\n           1       0.51      1.00      0.68       591\n\n    accuracy                           0.51      1154\n   macro avg       0.76      0.50      0.34      1154\nweighted avg       0.75      0.51      0.35      1154\n'

### Show the cross validation score

In [39]:
cross_val = cross_val_score(LogisticRegression(), x, y, scoring='accuracy', cv=10) # Show the cross validation score
cross_val

array([0.50779896, 0.50606586, 0.49046794, 0.50606586, 0.52686308,
       0.50606586, 0.50694444, 0.50868056, 0.50868056, 0.50694444])

In [41]:
df['Predicted_Signal'] = model.predict(x) # Predict the signal
df['PSEi_returns'] = np.log(df['Close'] / df['Close'].shift(1)) # Calculate the PSEi returns
Cumulative_PSEi_returns = np.cumsum(df[split:]['PSEi_returns']) # Calculate the cumulative PSEi returns

df['Strategy_Returns'] = df['PSEi_returns'] * df['Predicted_Signal'].shift(1) # Calculate the strategy returns
Cumulative_Strategy_returns = np.cumsum(df[split:]['Strategy_Returns']) # Calculate the cumulative strategy returns



A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy



A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy



A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy



### Output the data to CSV

In [42]:
df.to_csv('predicted signals.csv') # Output the data to CSV

## Plot the data

In [43]:
df.plot(y=['Close', 'S_' + str(timeperiod.value)], labels={'value':'Value', 'index':'Date', 'variable':'Variables'}) # Plot the close price and the rolling mean

### Plot Cumulative PSEi returns

In [45]:
Cumulative_PSEi_returns.plot(labels={'value':'Cumulative Returns', 'index':'Date', 'variable':'Variables'}) # Plot the cumulative PSEi returns

### Plot the Cumulative Strategy Returns

In [46]:
Cumulative_Strategy_returns.plot(labels={'value':'Cumulative Returns', 'index':'Date', 'variable':'Variables'}) # Plot the cumulative strategy returns

### Plot the PSEi returns and the Strategy returns

In [48]:
Cumulative_PSEi_returns_df = Cumulative_PSEi_returns.to_frame()
Cumulative_Strategy_returns_df = Cumulative_Strategy_returns.to_frame()

cumulative_returns_df = pd.merge(Cumulative_PSEi_returns_df, Cumulative_Strategy_returns_df, left_index=True, right_index=True)
cumulative_returns_df.head()

Unnamed: 0_level_0,PSEi_returns,Strategy_Returns
Date,Unnamed: 1_level_1,Unnamed: 2_level_1
2018-08-28,0.010011,0.010011
2018-08-29,0.008269,0.008269
2018-08-30,0.0111,0.0111
2018-08-31,0.011425,0.011425
2018-09-03,0.00843,0.00843


### Plot

In [49]:
cumulative_returns_df.plot(title='PSEi amd Strategy Cumulative Returns', labels={'value':'Cumulative Returns', 'index':'Date', 'variable':'Strategy'})

In [51]:
cumulative_returns_df.to_csv('cumulative_returns.csv') # Output the data to CSV