# PSEi Stock Market Data Analysis

## Install Python Dependencies
This installs the packages and libraries that Python needed inorder to analyze and download the data

# Import Python Packages
This will import the dependencies to be used to download and analyze the data that will be downloaded through the Yahoo Finance API.

### Yahoo Finance API
```
pip install yfinance
```

### Dataframe
```
pip install pandas
```

### Data Reader
```
pip install pandas_datareader
```

### Numerical Python
```
pip install numpy
```

### Plotting Data
```
pip install plotly matplotlib
```

### Machine learning for implementation of Logistic Regression
```
pip install scikit-learn
```

### Technical Analysis Library
```
pip install ta-lib
```

In [None]:
import yfinance as yf   # Yahoo Finance API
import pandas as pd     # DataFrame
import numpy as np      # Numerical Python
from pandas_datareader import data as pdr # Pandas Data Reader
import talib as ta      # Technical Analysis Library

# Machine learning
from sklearn.linear_model import LogisticRegression
from sklearn import metrics
from sklearn.model_selection import cross_val_score

# Widgets for Dynamic Graphs
import ipywidgets as widgets
from IPython.display import display

# Plotting
import plotly.express as px

### Library Options
This will set the options for the libraries that will be used in this notebook

In [None]:
yf.pdr_override()   # Override Yahoo Finance API output to use pandas data reader
pd.options.plotting.backend = "plotly" # Use plotly as the plotting backend

### Download PSEi Data
This will download the data from Yahoo Finance API and then import it to pandas dataframe.
The data will be downloaded is from `2000-01-01` to `2023-05-18`

In [None]:
df = pdr.get_data_yahoo('PSEI.PS', '2000-01-01', '2023-05-18')
df = df.dropna()
df.head()

### Output the raw data to CSV

In [None]:
df.to_csv('raw.csv')

# Initialize Variables

### Time Period
The window size (moving average in days) of the rolling mean and rolling correlation

### Train Size
The percentage of the data that will be used for training the model

In [None]:
timeperiod = widgets.IntSlider(description="Time Period", min=2, max=100, step=1, value=10)
trainsize = widgets.FloatSlider(description="Train Size",min=0, max=1, step=0.01, value=0.8)

display(timeperiod, trainsize)

In [None]:
df['S_' + str(timeperiod.value)] = df['Close'].rolling(window=timeperiod.value).mean() # Rolling mean

df['Corr'] = df['Close'].rolling(window=timeperiod.value).corr(df['S_' + str(timeperiod.value)]) # Correlation between the close price and the rolling mean

df['RSI'] = ta.RSI(np.array(df['Close']), timeperiod=timeperiod.value) # Relative Strength Index

df['Open-Close'] = df['Open'] - df['Close'].shift(1) # The difference between the current day's open and the previous day's close

df['Open-Open'] = df['Open'] - df['Open'].shift(1) # The difference between the current day's open and the previous day's open

df = df.dropna() # Drop the NaN values
df #    Show the dataframe

### Prepare the data for the model

In [None]:
x = df.iloc[:,:9] # The features that will be used for the model
y = np.where(df['Close'].shift(-1) > df['Close'],1,-1) # 1 if the price goes up, -1 otherwise

In [None]:
split = int(trainsize.value * len(df)) # Split the data into train and test set
x_train, x_test, y_train, y_test = x[:split], x[split:], y[:split], y[split:] # Split the data into train and test set

### Implement Logistic Regression

In [None]:
model = LogisticRegression()  # Initialize the model
model = model.fit(x_train, y_train)  # Fit the model

### Show the model

In [None]:
# Show the coefficients of the model
pd.DataFrame(zip(x.columns, np.transpose(model.coef_)))
# 1st column is the features, 2nd column is the coefficients

### Predict the price

In [None]:
probability = model.predict_proba(x_test) # Show the probability of the model

In [None]:
probability # 1st column is the probability of the price going down, 2nd column is the probability of the price going up

In [None]:
y_predicted = model.predict(x_test) # Predict the price

In [None]:
metrics.confusion_matrix(y_test, y_predicted) # Show the confusion matrix

### Show the accuracy of the model

In [None]:
print("Accuracy:", model.score(x_test,y_test)) # Show the accuracy of the model


In [None]:
metrics.classification_report(y_test, y_predicted) # Show the classification report


### Show the cross validation score

In [None]:
cross_val = cross_val_score(LogisticRegression(), x, y, scoring='accuracy', cv=10) # Show the cross validation score
cross_val

In [103]:
df['Predicted_Signal'] = model.predict(x) # Predict the signal
df['PSEi_returns'] = np.log(df['Close'] / df['Close'].shift(1)) # Calculate the PSEi returns
Cumulative_PSEi_returns = np.cumsum(df[split:]['PSEi_returns']) # Calculate the cumulative PSEi returns

df['Strategy_Returns'] = df['PSEi_returns'] * df['Predicted_Signal'].shift(1) # Calculate the strategy returns
Cumulative_Strategy_returns = np.cumsum(df[split:]['Strategy_Returns']) # Calculate the cumulative strategy returns



A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy



A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy



A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy



### Output the data to CSV

In [117]:
df.to_csv('predicted signals.csv') # Output the data to CSV

## Plot the data

In [114]:
df.plot(y=['Close', 'S_' + str(timeperiod.value)], labels={'value':'Value', 'index':'Date', 'variable':'Variables'}) # Plot the close price and the rolling mean

### Plot Cumulative PSEi returns

In [111]:
Cumulative_PSEi_returns.plot(labels={'value':'Cumulative Returns', 'index':'Date', 'variable':'Variables'}) # Plot the cumulative PSEi returns

### Plot the Cumulative Strategy Returns

In [110]:
Cumulative_Strategy_returns.plot(labels={'value':'Cumulative Returns', 'index':'Date', 'variable':'Variables'}) # Plot the cumulative strategy returns

### Plot the PSEi returns and the Strategy returns

In [123]:
Cumulative_PSEi_returns_df = Cumulative_PSEi_returns.to_frame()
Cumulative_Strategy_returns_df = Cumulative_Strategy_returns.to_frame()

cumulative_returns_df = pd.merge(Cumulative_PSEi_returns_df, Cumulative_Strategy_returns_df,)
cumulative_returns_df.head()

Unnamed: 0_level_0,PSEi_returns,Strategy_Returns
Date,Unnamed: 1_level_1,Unnamed: 2_level_1
2018-09-18,-0.017309,-0.017309
2018-09-19,-0.026285,-0.026285
2018-09-20,-0.038336,-0.038336
2018-09-21,-0.004131,-0.004131
2018-09-24,0.002701,0.002701


### Plot

In [124]:
cumulative_returns_df.plot(title='PSEi amd Strategy Cumulative Returns', labels={'value':'Cumulative Returns', 'index':'Date', 'variable':'Strategy'})

In [127]:
cumulative_returns_df.to_csv('comulative_returns.csv') # Output the data to CSV