## Simple stock analysis and naive future price prediction
##### Author: Szymon Pawłowski 
##### Date - 29.01.22r.

This project purpose is to learn Dash and its components by making a simple dashboard with naive future stock price prediction. The data is automatically scraped from the URL site *www.stooq.pl* and the basic informations are scraped from the *Google Finance* site. There is possiblity to choose between the companies from WIG20 Warsaw Stock Exchange. The prediction of the future price is made by naive average model from the last 3 days prices. Furthermore there is implemented an additional RandomForest algorithm for making prediction if the price of stock will go up in the next day or no. 

To create the models, no in-depth EDA or advanced feature engineering were used - the goal is to implement the model under Dash dashboard, not to create great predictive models.

**These results are not credible and do not constitute any proposition or financial advice for any investment and should not be taken into account in making any such investment. This dashboard is designed for learning and entertainment purposes, not as an investment tool.**

In [1]:
#import libraries

import dash
from dash import html
import plotly.graph_objects as go
from dash import dcc
import plotly.express as px
from dash.dependencies import Input, Output
from datetime import date
from dash import dash_table
import dash_bootstrap_components as dbc


import datetime
import pandas as pd
import numpy as np
import requests
import io
from bs4 import BeautifulSoup

from matplotlib import pyplot as plt
from statsmodels.tsa.seasonal import seasonal_decompose
from sklearn.metrics import mean_squared_error
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import classification_report, confusion_matrix
from sklearn.ensemble import RandomForestClassifier

The dash_html_components package is deprecated. Please replace
`import dash_html_components as html` with `from dash import html`
  import dash_html_components as html


In [None]:
#create dash app
app = dash.Dash(__name__, external_stylesheets=[dbc.themes.SIMPLEX])

def zeroadd(txt: int):
    """
    Function adds zero in case of single number
    """
    if len(str(txt))==1:
        txt_new = '0'+str(txt)
    else:
        txt_new = str(txt)
    return txt_new

def get_data(start_date, end_date, stock_value, mode="normal"):
    """
    Function is scraping data from stooq.pl
    In case of mode == "normal" it's scraping for the plot and in other case for prediction in RF
    """
    if mode=="normal":
        date_str = start_date.replace("-","")+'&d2='+end_date.replace("-","")
    else:
        end_date_year = int(end_date[:4])
        start_date_pred = str(end_date_year-1)+end_date[4:]
        date_str = start_date_pred.replace("-","")+"&d2="+end_date.replace("-","")
    
    stock = stock_value
    urlData = requests.get('https://stooq.pl/q/d/l/?s='+stock+'&d1='+date_str+'&i=d').content
    df = pd.read_csv(io.StringIO(urlData.decode('utf-8')))
    df = df.sort_values(by="Data", ascending=False)
    df.columns = ['Data','Opening','Highest','Lowest','Closing','Volume']
    return df

def table_data(string):
    """
    Function is scraping basic data from Google Finance
    """
    tb = pd.read_html('https://www.google.com/finance/quote/'+string.upper()+':WSE')[0]
    tb.columns = ['Information', 'Amount [PLN]', 'Year/year change']
    tb = tb.loc[:1,:].append(tb.loc[4,:]).append(tb.loc[7,:]).reset_index(drop=True)
    return tb

def preprocess_data(data):
    """
    Function does basic preprocessing for plotting data and adapting for RF modelling
    """
    df = data.copy()
    df['Up'] = None
    df.loc[df.Closing > df.Closing.shift(-1), "Up"] = 1
    df.loc[df.Closing <= df.Closing.shift(-1), "Up"] = 0
   
    df.Data = pd.to_datetime(df.Data)
    
    df['Month'] = df.Data.dt.month
    df['Quarter'] = df.Data.dt.quarter
    df['OpeningYesterday'] = df.Opening.shift(-1)
    df['HighestYesterday'] = df.Highest.shift(-1)
    df['LowestYesterday'] = df.Lowest.shift(-1)
    df['ClosingYesterday'] = df.Closing.shift(-1)
    df['VolumeYesterday'] = df.Volume.shift(-1)

    df = df.dropna()
    
    df['LogVolumeYesterday'] = np.log(df.VolumeYesterday)
    df['LogVolume'] = np.log(df.Volume)
    quarters_ohe = pd.get_dummies(df.Quarter)
    quarters_ohe.columns = ['Q'+str(i) for i in range(1,5)]
    months_ohe = pd.get_dummies(df.Month)
    months_ohe.columns = ['M'+str(i) for i in range(1,13)]
    df.drop(['Quarter','Month'], axis=1, inplace=True)
    df = df.join(quarters_ohe).join(months_ohe)
    
    return df

def get_acc_tpr(confusion_matrix):
    """
    Function returns basic information from classification matrix
    """
    tp = confusion_matrix[1,1]
    tn = confusion_matrix[0,0]
    fp = confusion_matrix[0,1]
    fn = confusion_matrix[1,0]
    
    accuracy = round((tp+tn)/(tp+tn+fp+fn),2)
    tpr = round(tp/(tp+fn),2)
    return accuracy, tpr

#initial data
now = datetime.datetime.now()
rmse = 0
prediction = 0
nobs = 0
prob = 0
accuracy = 0
tpr = 0
tb = pd.DataFrame({"Information":['Revenue','Net income','Operating income','Cost of revenue'],
                  "Amount [PLN]": [0,0,0,0],
                  'Year/year change':[0,0,0,0]
                  })
columnstb = [{'name':col, 'id':col} for col in tb.columns]


#defining dbc cards
cards1 = [
    dbc.Card(
        [
            html.H2(f"{prediction:.2f} zł", className="card-title", id='prediction'),
            html.P("Predicted tomorrow price", className="card-text"),
        ],
        body=True,
        color="dark",
        inverse=True,
    ),
    dbc.Card(
        [
            html.H2(f"{prob*100:.2f} %" , className="card-title", id="prob"),
            html.P("Probability of tomorrow price increasing", className="card-text"),
        ],
        body=True,
        color="warning",
        inverse=True,
    ),
    dbc.Card(
        [
            html.H2(f"{tpr*100:.2f} %" , className="card-title", id="tpr"),
            html.P("Sensitivity (TPR)", className="card-text"),
        ],
        body=True,
        color="success",
        inverse=True,
    )]
cards2 = [
    dbc.Card(
        [ 
            html.H2(f"{accuracy*100:.2f} %", className="card-title", id="accuracy"),
            html.P("Accuracy", className="card-text")
        ],
        body=True,
        color="danger",
        inverse=True,
    ),
    dbc.Card(
        [ 
            html.H2(f"{rmse:.2f} zł", className="card-title", id="rmse"),
            html.P("RMSE training naive model", className="card-text")
        ],
        body=True,
        color="primary",
        inverse=True,
    ),
    dbc.Card(
        [
            html.H2(f"{nobs}" , className="card-title", id="nobs"),
            html.P("Number of observations in period", className="card-text"),
        ],
        body=True,
        color="info",
        inverse=True,
    )
    ]
#defining graphs for dashboard
graphs = [
    [
    dbc.Select(
    id="stock",
    options = [
                {'label':'ASSECO', 'value':'acp'},
                {'label':'CDPROJEKT', 'value':'cdr'},
                {'label':'CYFROWY POLSAT', 'value':'cps'},
                {'label':'DINO', 'value':'dnp'},
                {'label':'JSW', 'value':'jsw'},
                {'label':'KGHM', 'value':'kgh'},
                {'label':'LPP', 'value':'lpp'},
                {'label':'LOTOS', 'value':'lts'},
                {'label':'MERCATOR', 'value':'mrc'},
                {'label':'ORANGE', 'value':'opl'},
                {'label':'PGE', 'value':'pge'},
                {'label':'PGNIG', 'value':'pgn'},
                {'label':'PKN ORLEN', 'value':'pkn'},
                {'label':'PZU', 'value':'pzu'},
                {'label':'SANPL', 'value':'spl'},
                {'label':'TAURON', 'value':'tpe'},
                {'label':'PKO BP', 'value':'pko' },
                {'label': 'ALLEGRO', 'value':'ale'},
                {'label': 'PEKAO', 'value':'peo'},
        ],
    ),
    dcc.Graph(id="stock_plot"),
    ]
    ]

#applying dashboard layout
app.layout = html.Div([
         dbc.NavbarSimple(
                brand="Simple analysis and naive prediction of stock market prices",
                 children=[
                        dbc.NavItem(dbc.NavLink("These results are not credible and do not constitute any proposition or financial advice for any investment and should not be taken into account in making any such investment. This dashboard is designed for learning and entertainment purposes, not as an investment tool.", href="#"))
                 ],
                fluid=True,
                expand='xl',
             color='dark',
             dark=True
            ),
    
         html.Hr(),
    
         dcc.DatePickerRange(
            id='my-date-picker-range',
            min_date_allowed=date(now.year-6, now.month, now.day),
            max_date_allowed=date(now.year, now.month, now.day),
            start_date=date(now.year-1, now.month, now.day),
            end_date=date(now.year, now.month, now.day)
         ),
    
         html.Br(),
    
         dbc.Row([dbc.Col(graph) for graph in graphs]),
         
         dbc.Container(dbc.Row([dbc.Col(card) for card in cards1])),
         
         html.Br(),
            
         dbc.Container(dbc.Row([dbc.Col(card) for card in cards2])),
    
         dbc.Container(dbc.Row([
             dbc.Col(dcc.Graph(id="confusion_matrix")),
             dbc.Col(dash_table.DataTable(id = 'tb',columns=columnstb,
                                         style_data={
                                                'color': 'black',
                                                'backgroundColor': 'white'
                                            },
                                            style_data_conditional=[
                                                {
                                                    'if': {'row_index': 'odd'},
                                                    'backgroundColor': 'rgb(115, 115, 115)',
                                                }
                                            ],
                                            style_header={
                                                'backgroundColor': 'rgb(115, 115, 115)',
                                                'color': 'black',
                                                'fontWeight': 'bold'
                                            })
                     ,align="center", width={"size": 6})
                 ])
            )
    ])

@app.callback(
    Output('tb','data'),
    Input('stock','value')
)

def update_table(stock):
    """
    Callback function. Updates table based on selected stock.
    """
    tb = table_data(str(stock))
    columns = [{'name':col, 'id':col} for col in tb.columns]
    data = tb.to_dict(orient='records')
    return data

@app.callback(
    Output('stock_plot', 'figure'),
    Input('stock', 'value'),
    Input('my-date-picker-range', 'start_date'),
    Input('my-date-picker-range', 'end_date'))

def update_graph(stock_value, start_date, end_date):
    """
    Callback function. Updates graph based on selected stock, start date and end date.
    """
    
    df = get_data(stock_value=stock_value, start_date=start_date, end_date=end_date)
                                                                   
    df["Prediction"] = round((df.Closing + df.Closing.shift(-1) + df.Closing.shift(-2))/3,2)
    
    fig = go.Figure()
    fig.add_trace(go.Scatter(x=df.Data, y=df.Closing,
                    mode='lines',
                    name='Stock price'))
    fig.add_trace(go.Scatter(x=df.Data, y=df.Prediction,
                    mode='lines',
                    name='Naive average prediction'))                                                               
    fig.update_xaxes(title='Data')

    fig.update_yaxes(title='Closing')
    
    fig.update_layout(template = 'plotly_white')
    

    return fig

@app.callback(
    Output('rmse', 'children'),
    Output('prediction', 'children'),
    Output('nobs', 'children'),
    Output('prob', 'children'),
    Output('confusion_matrix', 'figure'),
    Output('accuracy','children'),
    Output('tpr','children'),
    Input('stock', 'value'),
    Input('my-date-picker-range', 'start_date'),
    Input('my-date-picker-range', 'end_date'))
def update_card_rmse(stock_value, start_date, end_date):
    """
    Callback function. Updates dbc cards based on selected stock, start date and end date.
    """
    #updating cards
    print(stock_value)
    df = get_data(stock_value=stock_value, start_date=start_date, end_date=end_date)
                                                                   
    df["Prediction"] = round((df.Closing.shift(-1) + df.Closing.shift(-2) + df.Closing.shift(-3))/3,2)
    df = df[df.Prediction.isna() == False]
    rmse = round(mean_squared_error(df.Closing, df.Prediction),2)
    pred = round(df.Closing[:3].mean(),2)
    nobs = len(df)
    
    print("RMSE: ",rmse) 
    print("PRED: ", pred)
    print("NOBS: ", nobs)

    #updating cards connected to the predictions
    df_pred = get_data(stock_value=stock_value, start_date=start_date, end_date=end_date, mode="pred")
    df_pred = preprocess_data(df_pred)
    
    #initial data for RF
    X = df_pred[['OpeningYesterday','HighestYesterday','LowestYesterday','ClosingYesterday','LogVolumeYesterday',
       'Q1','Q2','Q3','Q4','M1','M2','M3','M4','M5','M6','M7','M8','M9','M10','M11','M12']].dropna()
    Y = df_pred.Up.dropna().astype('int')
    X_train, X_test, y_train, y_test = X[int(len(X)*0.2):], X[:int(len(X)*0.2)], Y[int(len(Y)*0.2):], Y[:int(len(Y)*0.2)]
    
    #basic processing data
    numeric_cols = X.select_dtypes('float64').columns
    sc = StandardScaler()
    X_train.loc[:,numeric_cols] = sc.fit_transform(X_train[numeric_cols])
    X_test.loc[:,numeric_cols] = sc.transform(X_test[numeric_cols])
    
    #modelling with RF
    clf = RandomForestClassifier(bootstrap=True, max_depth=80, max_features=2, min_samples_leaf=5, min_samples_split=12, n_estimators=100)
    clf.fit(X_train, y_train)
    
    #prediction
    y_pred = clf.predict(X_test)

    #classification matrix
    cmatrix = confusion_matrix(y_test, y_pred)
    fig = px.imshow(cmatrix, text_auto = True,
                   labels=dict(x="Prediction",y="Reality", color="Sum"),
                   x = ["No up", "Up"],
                   y = ['No up', 'Up'],
                   color_continuous_scale=px.colors.sequential.Cividis_r,
                   template = 'plotly_white')
    
    
    prob = clf.predict_proba(np.array(df_pred[['Opening','Highest','Lowest','Closing','LogVolume','Q1','Q2','Q3','Q4','M1',
                                   'M2','M3','M4','M5','M6','M7','M8','M9','M10','M11','M12']].iloc[0]).reshape(1,-1))[0,1]
    prob = round(prob,2)
    print(prob)
    
    #getting metrics
    accuracy, tpr = get_acc_tpr(cmatrix)
    
    return rmse, pred, nobs, prob, fig, accuracy, tpr

if __name__ == '__main__':
    app.run_server(debug=False, port=8080)

Dash is running on http://127.0.0.1:8080/

 * Serving Flask app '__main__' (lazy loading)
 * Environment: production
[2m   Use a production WSGI server instead.[0m
 * Debug mode: off


 * Running on http://127.0.0.1:8080/ (Press CTRL+C to quit)
127.0.0.1 - - [29/Jan/2022 13:44:21] "GET / HTTP/1.1" 200 -
127.0.0.1 - - [29/Jan/2022 13:44:21] "GET /_dash-dependencies HTTP/1.1" 200 -
127.0.0.1 - - [29/Jan/2022 13:44:21] "GET /_dash-layout HTTP/1.1" 200 -
127.0.0.1 - - [29/Jan/2022 13:44:21] "GET /_dash-component-suites/dash/dcc/async-datepicker.js HTTP/1.1" 200 -
127.0.0.1 - - [29/Jan/2022 13:44:21] "GET /_dash-component-suites/dash/dcc/async-graph.js HTTP/1.1" 200 -
127.0.0.1 - - [29/Jan/2022 13:44:21] "GET /_dash-component-suites/dash/dcc/async-plotlyjs.js HTTP/1.1" 200 -
127.0.0.1 - - [29/Jan/2022 13:44:21] "GET /_dash-component-suites/dash/dash_table/async-highlight.js HTTP/1.1" 200 -
127.0.0.1 - - [29/Jan/2022 13:44:21] "GET /_dash-component-suites/dash/dash_table/async-table.js HTTP/1.1" 200 -


None
Exception on /_dash-update-component [POST]
Traceback (most recent call last):
  File "C:\Users\USER\miniconda3\lib\site-packages\flask\app.py", line 2073, in wsgi_app
    response = self.full_dispatch_request()
  File "C:\Users\USER\miniconda3\lib\site-packages\flask\app.py", line 1518, in full_dispatch_request
    rv = self.handle_user_exception(e)
  File "C:\Users\USER\miniconda3\lib\site-packages\flask\app.py", line 1516, in full_dispatch_request
    rv = self.dispatch_request()
  File "C:\Users\USER\miniconda3\lib\site-packages\flask\app.py", line 1502, in dispatch_request
    return self.ensure_sync(self.view_functions[rule.endpoint])(**req.view_args)
  File "C:\Users\USER\miniconda3\lib\site-packages\dash\dash.py", line 1336, in dispatch
    response.set_data(func(*args, outputs_list=outputs_list))
  File "C:\Users\USER\miniconda3\lib\site-packages\dash\_callback.py", line 151, in add_context
    output_value = func(*func_args, **func_kwargs)  # %% callback invoked %%
  Fil

127.0.0.1 - - [29/Jan/2022 13:44:22] "POST /_dash-update-component HTTP/1.1" 500 -


Exception on /_dash-update-component [POST]
Traceback (most recent call last):
  File "C:\Users\USER\miniconda3\lib\site-packages\flask\app.py", line 2073, in wsgi_app
    response = self.full_dispatch_request()
  File "C:\Users\USER\miniconda3\lib\site-packages\flask\app.py", line 1518, in full_dispatch_request
    rv = self.handle_user_exception(e)
  File "C:\Users\USER\miniconda3\lib\site-packages\flask\app.py", line 1516, in full_dispatch_request
    rv = self.dispatch_request()
  File "C:\Users\USER\miniconda3\lib\site-packages\flask\app.py", line 1502, in dispatch_request
    return self.ensure_sync(self.view_functions[rule.endpoint])(**req.view_args)
  File "C:\Users\USER\miniconda3\lib\site-packages\dash\dash.py", line 1336, in dispatch
    response.set_data(func(*args, outputs_list=outputs_list))
  File "C:\Users\USER\miniconda3\lib\site-packages\dash\_callback.py", line 151, in add_context
    output_value = func(*func_args, **func_kwargs)  # %% callback invoked %%
  File "<i

127.0.0.1 - - [29/Jan/2022 13:44:22] "POST /_dash-update-component HTTP/1.1" 500 -


Exception on /_dash-update-component [POST]
Traceback (most recent call last):
  File "C:\Users\USER\miniconda3\lib\site-packages\flask\app.py", line 2073, in wsgi_app
    response = self.full_dispatch_request()
  File "C:\Users\USER\miniconda3\lib\site-packages\flask\app.py", line 1518, in full_dispatch_request
    rv = self.handle_user_exception(e)
  File "C:\Users\USER\miniconda3\lib\site-packages\flask\app.py", line 1516, in full_dispatch_request
    rv = self.dispatch_request()
  File "C:\Users\USER\miniconda3\lib\site-packages\flask\app.py", line 1502, in dispatch_request
    return self.ensure_sync(self.view_functions[rule.endpoint])(**req.view_args)
  File "C:\Users\USER\miniconda3\lib\site-packages\dash\dash.py", line 1336, in dispatch
    response.set_data(func(*args, outputs_list=outputs_list))
  File "C:\Users\USER\miniconda3\lib\site-packages\dash\_callback.py", line 151, in add_context
    output_value = func(*func_args, **func_kwargs)  # %% callback invoked %%
  File "<i

127.0.0.1 - - [29/Jan/2022 13:44:23] "POST /_dash-update-component HTTP/1.1" 500 -


kgh
RMSE:  27.97
PRED:  143.88
NOBS:  249


127.0.0.1 - - [29/Jan/2022 13:44:24] "POST /_dash-update-component HTTP/1.1" 200 -


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy



A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy

127.0.0.1 - - [29/Jan/2022 13:44:25] "POST /_dash-update-component HTTP/1.1" 200 -
127.0.0.1 - - [29/Jan/2022 13:44:25] "POST /_dash-update-component HTTP/1.1" 200 -


0.37
pkn


127.0.0.1 - - [29/Jan/2022 13:44:29] "POST /_dash-update-component HTTP/1.1" 200 -


RMSE:  3.09
PRED:  73.35
NOBS:  249




A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy



A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy

127.0.0.1 - - [29/Jan/2022 13:44:29] "POST /_dash-update-component HTTP/1.1" 200 -
127.0.0.1 - - [29/Jan/2022 13:44:29] "POST /_dash-update-component HTTP/1.1" 200 -


0.33
