<h1>Table of Contents<span class="tocSkip"></span></h1>
<div class="toc"><ul class="toc-item"><li><span><a href="#Predicting-Stock-Data-with-an-LSTM-Network" data-toc-modified-id="Predicting-Stock-Data-with-an-LSTM-Network-1"><span class="toc-item-num">1&nbsp;&nbsp;</span>Predicting Stock Data with an LSTM Network</a></span><ul class="toc-item"><li><span><a href="#This-repository-contains" data-toc-modified-id="This-repository-contains-1.1"><span class="toc-item-num">1.1&nbsp;&nbsp;</span>This repository contains</a></span></li><li><span><a href="#Using-the-OSEMN-Process" data-toc-modified-id="Using-the-OSEMN-Process-1.2"><span class="toc-item-num">1.2&nbsp;&nbsp;</span>Using the OSEMN Process</a></span></li><li><span><a href="#Scrubbing-the-data" data-toc-modified-id="Scrubbing-the-data-1.3"><span class="toc-item-num">1.3&nbsp;&nbsp;</span>Scrubbing the data</a></span></li><li><span><a href="#Visualzations" data-toc-modified-id="Visualzations-1.4"><span class="toc-item-num">1.4&nbsp;&nbsp;</span>Visualzations</a></span><ul class="toc-item"><li><span><a href="#Comparing-AAPL-price-to-yahoo-finance" data-toc-modified-id="Comparing-AAPL-price-to-yahoo-finance-1.4.1"><span class="toc-item-num">1.4.1&nbsp;&nbsp;</span>Comparing AAPL price to yahoo finance</a></span></li></ul></li><li><span><a href="#Next-Steps:" data-toc-modified-id="Next-Steps:-1.5"><span class="toc-item-num">1.5&nbsp;&nbsp;</span>Next Steps:</a></span></li></ul></li></ul></div>

# Predicting Stock Data with an LSTM Network

## This repository contains

- A `main.ipynb` Jupyter notebook detailing EDA and the prediction process


- Folder `db` with files `firebase.py` and `database.py` for connecting to SQL and Google Firebase


- Folder `models` with files
  - `script.py` for using keras tuner to find the best parameters for a network such as "n_hidden_layers", "batch_size" and regularizer parameters
  - `create.py` for creating a network with given parameters, and testing parameters.
  - `load.py` for loading in a network with the best parameters and making predictions.
  

 - Folder `old` with files
  - `Old main.ipynb` which is my original notebook with all of the scrubbing process included
  - `Old model_creation.ipynb` (unorganized) where I first started making my Time Series network, and testing different ways to do so.
  - `Old Modeling.ipynb` (unorganized) Where I tested more aspects of making the network, separated out from Old main
  
  
 - Folder `predictions` where all of the predictions are stored for future visualization of how the network is doing.
 
 
 - `styles/custom.css`  The css used to style the jupyter notebooks
 
 
 - Folder `test` with files
  - `dashboard_test.ipynb` with my first tests of plotly graphs and my scraped data for my <a href="https://sql-viewer.herokuapp.com/">Website</a>
  - `Firebase Test.ipynb` for testing making plotly graphs directly from Firebase API calls
  - `model_scratch_testing.ipynb` Another model testing function that the NetworkCreator class was built off of.
  
  
 - `Prediction_testing.ipynb` A notebook for testing making predictions for tomorrow,  later combined into `main.ipynb` for multi-day prediction.
 
 
 - `Pull and clean data.ipynb`  My original notebook for pulling and replacing all of the clean data pickles, and cleaned data in the 'stock_cleaned' SQL server.
 
 
 - `Pull and update data.ipynb` A later rendition of pull and clean that simply cleans and inserts the new data rather than all of the data.  Takes 5 minutes to run vs almost an hour previously.
 
 
 - `presentation.pdf` A technical presentation of the project.

## Using the OSEMN Process
- Obtain the data
- Scrub the data
- Explore the data
- Model the data
- Interpret the data
- <a href="https://machinelearningmastery.com/how-to-work-through-a-problem-like-a-data-scientist/">Reference</a>

## Scrubbing the data

<div class="alert alert-info shadow">
  <strong>Prices</strong>
    <ul>
        <li>Reindex to valid dates 2019-08-09 => onwards without the four days that have very little data '2019-12-09', '2020-06-23', '2020-06-24', '2020-06-25'</li>
        <li>Forward interpolate the data with a limit of three days.  So if 6-25 was a valid price, and the four days after were null it would fill the first three, but not the third</li>
        <li>Drop symbols with null values</li>
        <li>Post to <code>stock_cleaned</code> SQL server</li>
        <li>Pickle</li>
    </ul>
    <br></br>
    <strong>Splits</strong>
    <ul>
        <li>Make an apply column which is `num`/`den`</li>
    </ul>
    <br></br>
    <strong>Performance</strong>
    <ul>
        <li>Load in performance and clean</li>
        <li>Drop symbols not in price symbols</li>
        <li>Match index to price index</li>
        <li>Fill null ExDividend dates with 1970-01-01 then encode days since then for numerical data</li>
        <li>Decide columns to fill, and columns to fill then drop if the symbol still has null values</li>
        <li>Interpolate null values for both, fill na for columns to fill</li>
        <li>Drop columns with negative min that still have many null values</li>
        <li>Drop symbols that still have null values in the columns with a negative minimum as filling with 0 not be adequate.</li>
        <li>Add price to performance</li>
        <li>Apply splits</li>
        <li>Separate out penny stocks ( stocks where price is < 1 dollar )</li>
        <li>Post to <code>stock_cleaned</code> SQL server</li>
        <li>Pickle penny and non-penny performances</li>
    </ul>
    <br></br>
    <strong>Company</strong>
    <ul>
        <li>Split out symbols that are in performance symbols</li>
        <li>Fill null text values with `unknown`</li>
        <li>Pickle.</li>
    </ul>
    <br></br>
    <strong>Analyst</strong>
    <ul>
        <li>Front fill null values by symbol, then fill the rest with 0</li>
        <li>Map text values to numeric</li>
        <li>Convert all to float</li>
        <li>Post to <code>stock_cleaned</code> SQL server</li>
        <li>Pickle</li>
    </ul>
    <br></br>
    <strong>Combined Company/Analyst/Performance</strong>
    <ul>
        <li>One hot encode Company</li>
        <li>Combine the three dataframes into one</li>
    </ul>
    <p><b>After process is done update Firebase for website with performance and performance penny,  possibly company and analyst if added later</b></p>
</div>


## Visualzations

### Comparing AAPL price to yahoo finance

## Next Steps:
> * Cluster on absolute correlation?  take.corr for different symbols
> * iteratively add columns,  start with only predicting future price of one stock with all the prices previously of other stocks
> * Try with vs without differencing and/or scaling