#**BASIC NEURAL NETWORK FOR US STOCK ANALYSIS. INTERPRETATION**  
---  

This project is built for learning purposes. Be aware that none of the information and software below should be used as an investing advice.  

## A brief background:
As a young computer science student and a [value investor][10] in the stock market, I always wondered if at some point I could do some research to fuse those different worlds in a fresh new project. Thats when I bumped into the world of machine learning, then deep learning, and now Neural Networks.

## Project Goals:
We will try building multiple models, including a a multi-layered Neural Network that looks forward predicting the average stock price growth(regression) of a company the next fiscal year, based on a series of fundamental stock analysis ratios such as the [dividend yield][11].
In order to keep everyone in track, lets leave an example:  

In 2019, IBM had an average price of 126\\$ per stock.  
In 2020, IBM had an average price of 114\\$ per stock, with a dividend yield of 0.052.
The aim of the project is to predict (the Y, or Labels) the average stock price growth from 2019 to 2020, 114/126=0.9047 (which is actually degrowth) and a diviend yield of 0.052 per stock.

As many readers will know, using $X \in[-1, 1]$ rather than $X\in{\Bbb R}$ is, generally speaking, a good practice when working with this kind of algorithms, and since our  that's what we will aim for.

##Tools:
The libraries we will be using this project are:

1.   **TensorFlow**: a general purpose Machine Learning tool
2.   Keras: an open source, deep learning library written in Python.
3.   **NumPy**: a useful python library when working with more complex data/structure operations
4.   **MatPlotLib**: to give some visuals to the project
5.   **Pandas**: essential library when working with huge dataframes.
6. **Requests, Json**: for the API requests

##Project structure:
The project will be split in 4 main parts:

1.   [**Data extraction**](#extraction): Build our dataframe using a couple of tools from the [AlphaVantage API][12]
2.   [**Data Cleaning and Refactoring**](#cleaning): Refactor and convert the raw data from the API to the actual train and test DataFrames the neural network will be given.
3.  [**Fitting and evaluating the model**](#fitting): Given the train and test DataFrames, training and testing the model to check it's accuracy.  
4.  [**Hyperparameter tuning**](#tuning): try looking for the best possible parameters for the Neural Network.
5.  [**Conclusion**](#conclusion): After our model is built and ready,  drawing conclusions from the actual results
6.  [**Bibliography**](#bibliography): resources used building this project


[10]: https://en.wikipedia.org/wiki/Value_investing
[11]: https://en.wikipedia.org/wiki/Dividend_yield
[12]: https://www.alphavantage.co/


In [69]:
%tensorflow_version 2.x  # Execute only if you are on Colab


`%tensorflow_version` only switches the major version: 1.x or 2.x.
You set: `2.x  # Execute only if you are on Colab`. This will be interpreted as: `2.x`.


TensorFlow is already loaded. Please restart the runtime to change versions.


In [70]:
# Some useful libs
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
import os
import requests, json, csv
from io import StringIO
from IPython.display import clear_output, display
import shutil
import time
from datetime import datetime
import dateutil

In [71]:
try:
        os.mkdir("./companies/")
except:
        print("File already exists")

if not(os.path.exists("metadata.json")):
        metadata = {
                "last_stock_index": "0"
        }
        with open('metadata.json', 'w') as outfile:
                json.dump(metadata, outfile)

# Delay for each API call
ALPHA_VANTAGE_DELAY = 14

# Minimun years that a stock needs to have:
MINIMUN_DATA_YEARS = 3

# List of ratios we will be using
#TODO: añadir acciones_internas/acciones_totales
FINANCIAL_RATIOS_NAMES = ["Contribution_To_Debt", "ROE", "ROA", "Sales_Margin",         "ROIC", "ROCE", "EV/EBIT", "PBV", "PER", "Dividend_Yield","Treasury_Ratio", "Acid_Test", "RSCD"]

# The different API functions
API_FUNCTIONS = ["TIME_SERIES_MONTHLY_ADJUSTED", "INCOME_STATEMENT",
        "BALANCE_SHEET", "CASH_FLOW"]

File already exists


Since you probably don't want to run the data extraction for weeks, you should skip the first two directions and head to [Fitting and evaluating the model](#fitting)

<a name="extraction"></a>
# 1. DATA EXTRACTION
we will get all the raw data from the alphavantage API

First let's create all the directories we will using:
Inside our ./companies directory, we will have 1 directory per symbol  

We will also need a metadata.json file to keep track of some inside info

Finally, set a delay for the API since its restricted to ~4 calls/min and 500 calls/day

Define our API_KEY and the API main url  
Define functions to get all data from the API:

In [None]:
API_KEY = # Get your own API KEY at alphavantage's main site
API_URL = "https://www.alphavantage.co/query?"

def download_symbol_raw(symbol, verbose = True):
    try:
        os.mkdir("./companies/" + symbol)
    except:
        if(verbose):
            print("File already exists")

    # Download symbol data:
    data = { "function": "",
        "symbol": symbol,
        "apikey": API_KEY
    }
    # Download each required csv from api:
    for func in API_FUNCTIONS:
        if(verbose):
            print('Downloading ' + func + '...')

        data["function"] = func
        request_content = requests.get(API_URL, data).json()
       
        # If we dont get a response for any function, we will skip this entire stock
        if (not('Monthly Time Series' in request_content) and 
            not('quarterlyReports' in request_content)):
            if(verbose):
                print("Error getting stock data")
            
            # Delete stock subfolder
            shutil.rmtree('./companies/' + symbol + '/', ignore_errors=True)
            return False

        # convert to a pd dataframe:
        if(func == "TIME_SERIES_MONTHLY"):
            key_word = "Monthly Time Series"
        else:
            key_word = "quarterlyReports"
        
        df = pd.DataFrame.from_dict(request_content[key_word])

        if(func == "TIME_SERIES_MONTHLY"):
            df = df.transpose()

        df.to_csv('./companies/' + symbol+ '/' + symbol + '_' + func + '.csv')

    print("Finished!")
    return True

def download_symbols_name_list(verbose = True):
    if(verbose):
        print("Downloading listing stocks on the USA market...")
    data = { "function": "LISTING_STATUS",
        "apikey": API_KEY 
    }
    response = requests.get(API_URL, data)
    data = StringIO(str(response.content).replace("\\r\\n", "_"))

    df = pd.read_csv(data, sep=",", lineterminator="_").set_index("b'symbol")
    # Match only stocks of assetType stock(We don't want to work with CFDs)
    df = df[df['assetType'] == "Stock"]

    df.to_csv('symbols_name_list.csv')

    print("Finished!")
    

    
def full_raw_download(verbose=True):
    # Check if file exists to save api call
    if not(os.path.exists('./symbols_name_list.csv')):
        download_symbols_name_list('symbols_name_list.csv')

    df = pd.read_csv('symbols_name_list.csv')["b'symbol"]


    metadata_file = open('metadata.json')
    last_index = int(json.load(metadata_file)['last_stock_index'])
    metadata_file.close()
    for i in range(last_index, len(df)):
        time.sleep(ALPHA_VANTAGE_DELAY*4) # Get ~1 ticket per minute
        # if download_symbol_raw() returns False, probably we cant do more api               calls for today. Save record of last stock we downloaded in metadata
        flag = download_symbol_raw(symbol=df[i], verbose=False)
        if (flag == False):
            print("Enough API calls. Stopping...")
            metadata = {
                "last_stock_index": str(i)
            }
            with open('metadata.json', 'w') as outfile:
                json.dump(metadata, outfile)

            return False

        else:
            print("Last symbol downloaded: " + df[i])

    return True
        


We now call the full_raw_download() function to start the extraction

In [None]:
full_raw_download()

<a name="cleaning"></a>
# 2. Data Cleaning and Refactoring
After finishing downloading the full raw data, we need to convert it to the ratios we will use to fit our model. For this purpose we will define some functions, and do some hard work with pandas Dataframes in order to make the data useful to the AI:

In [None]:

# not using this ratio for hte moment, since it is described as a combination of some of the next ratios below
def get_Contributions_To_Debt(raw_dfs, i):
    interestExpense = raw_dfs["INCOME_STATEMENT"]["interestExpense"][i]
    netIncome = raw_dfs["INCOME_STATEMENT"]["netIncome"][i]
    commonStockTotalEquity = raw_dfs["BALANCE_SHEET"]["commonStockTotalEquity"][i]
    longTermDebt = raw_dfs["BALANCE_SHEET"]["longTermDebt"][i]
    totalShareholderEquity = raw_dfs["BALANCE_SHEET"]["totalShareholderEquity"][i]

    return (((interestExpense + netIncome)/(commonStockTotalEquity + longTermDebt)) 
        - ((netIncome)/(totalShareholderEquity)))

RETURN_BOUNDS = 4
# not using this ratio neither since its very related to other return ratios
def get_ROE(raw_dfs, i):
    netIncome = raw_dfs["INCOME_STATEMENT"]["netIncome"][i]
    totalShareholderEquity = raw_dfs["BALANCE_SHEET"]["totalShareholderEquity"][i]

    return ((netIncome)/(totalShareholderEquity - netIncome))

def get_ROA(raw_dfs, i):
    netIncome = raw_dfs["INCOME_STATEMENT"]["netIncome"][i]
    totalAssets = raw_dfs["BALANCE_SHEET"]["totalAssets"][i]
    if (totalAssets == 0):
        return 0

    value = netIncome / totalAssets

    if(value > RETURN_BOUNDS):
        value = RETURN_BOUNDS
    elif(value < -1*RETURN_BOUNDS):
        value = -1*RETURN_BOUNDS

    return (value/RETURN_BOUNDS)

def get_Sales_Margin(raw_dfs, i):
    operatingIncome = raw_dfs["INCOME_STATEMENT"]["operatingIncome"][i]
    totalRevenue = raw_dfs["INCOME_STATEMENT"]["totalRevenue"][i]
    if (operatingIncome == 0 and totalRevenue == 0):
        return 0
    elif(totalRevenue == 0):
        return -1
    
    value = operatingIncome / totalRevenue

    if(value > RETURN_BOUNDS):
        value = RETURN_BOUNDS
    elif(value < -1*RETURN_BOUNDS):
        value = -1*RETURN_BOUNDS

    return (value/RETURN_BOUNDS)

# not using this ratio neither since its very related to ROCE
def get_ROIC(raw_dfs, i):
    netIncomeApplicableToCommonShares = raw_dfs["INCOME_STATEMENT"]["netIncomeApplicableToCommonShares"][i]
    totalRevenue = raw_dfs["INCOME_STATEMENT"]["totalRevenue"][i]
    commonStockTotalEquity = raw_dfs["BALANCE_SHEET"]["commonStockTotalEquity"][i]
    longTermDebt = raw_dfs["BALANCE_SHEET"]["longTermDebt"][i]
    dividendPayout = raw_dfs["CASH_FLOW"]["dividendPayout"][i]

    return ((netIncomeApplicableToCommonShares + dividendPayout) /
        (commonStockTotalEquity + longTermDebt))

def get_ROCE(raw_dfs, i):
    commonStockTotalEquity = raw_dfs["BALANCE_SHEET"]["commonStockTotalEquity"][i]
    longTermDebt = raw_dfs["BALANCE_SHEET"]["longTermDebt"][i]
    ebit = raw_dfs["INCOME_STATEMENT"]["ebit"][i]

    value = ebit / (longTermDebt + commonStockTotalEquity)

    if(value > RETURN_BOUNDS):
        value = RETURN_BOUNDS
    elif(value < -1*RETURN_BOUNDS):
        value = -1*RETURN_BOUNDS

    return (value/RETURN_BOUNDS)

# We will give the inverse ratio to scale it
def get_EVpEBIT(raw_dfs, i):
    ebit = raw_dfs["INCOME_STATEMENT"]["ebit"][i]
    longTermDebt = raw_dfs["BALANCE_SHEET"]["longTermDebt"][i]   
    netIncome = raw_dfs["INCOME_STATEMENT"]["netIncome"][i]
    average_price = raw_dfs["PRICES"]["averagePrice"][i]
    commonStockSharesOutstanding = raw_dfs["BALANCE_SHEET"]["commonStockSharesOutstanding"][i]
    cashAndShortTermInvestments = raw_dfs["BALANCE_SHEET"]["cashAndShortTermInvestments"][i]

    value = (ebit)/(commonStockSharesOutstanding * average_price + longTermDebt - cashAndShortTermInvestments)

    if (abs(value) > 1):
        return 0

    return value
    
# not using this ratio neither since its very related to PER
def get_PBV(raw_dfs, i):
    commonStockTotalEquity = raw_dfs["BALANCE_SHEET"]["commonStockTotalEquity"][i]
    netIncome = raw_dfs["INCOME_STATEMENT"]["netIncome"][i]
    commonStockSharesOutstanding = raw_dfs["BALANCE_SHEET"]["commonStockSharesOutstanding"][i]
    average_price = raw_dfs["PRICES"]["averagePrice"][i]

    return((commonStockSharesOutstanding * average_price) / (commonStockTotalEquity))   

# give the inverse of PER ratio to scale it
def get_PER(raw_dfs, i):
    netIncome = raw_dfs["INCOME_STATEMENT"]["netIncome"][i]
    commonStockSharesOutstanding = raw_dfs["BALANCE_SHEET"]["commonStockSharesOutstanding"][i]
    average_price = raw_dfs["PRICES"]["averagePrice"][i]
    netIncome = raw_dfs["INCOME_STATEMENT"]["netIncome"][i]

    value = (netIncome)/(commonStockSharesOutstanding* average_price)
    if (abs(value) > 1):
        return 0

    return value
# TODO: borrar las empresas que tengan commonStockSharesOutstanding a 0
def get_Dividend_Yield(raw_dfs, i):
    netIncome = raw_dfs["INCOME_STATEMENT"]["netIncome"][i]
    commonStockSharesOutstanding = raw_dfs["BALANCE_SHEET"]["commonStockSharesOutstanding"][i]
    average_price = raw_dfs["PRICES"]["averagePrice"][i]
    dividendPayout = raw_dfs["CASH_FLOW"]["dividendPayout"][i]

    if(dividendPayout == 0):
        return 0
    else:
        return((dividendPayout) / (commonStockSharesOutstanding* average_price))

# not using this ratio since it can be described as a combination of the next two
def get_Treasury_Ratio(raw_dfs, i):
    cashAndShortTermInvestments = raw_dfs["BALANCE_SHEET"]["cashAndShortTermInvestments"][i]
    totalCurrentLiabilities = raw_dfs["BALANCE_SHEET"]["totalCurrentLiabilities"][i]

    return((cashAndShortTermInvestments)/(totalCurrentLiabilities))

# give the next 2 ratios a range of [2, -2] and then divide by 2 to get a scaled ratio
LIQUIDITY_BOUND = 2
def get_Acid_Test(raw_dfs, i):
    totalCurrentAssets = raw_dfs["BALANCE_SHEET"]["totalCurrentAssets"][i]
    inventory = raw_dfs["BALANCE_SHEET"]["inventory"][i]
    totalCurrentLiabilities = raw_dfs["BALANCE_SHEET"]["totalCurrentLiabilities"][i]
    if (totalCurrentLiabilities == 0 or totalCurrentAssets == 0):
        return 1

    value = ((totalCurrentAssets - inventory) / totalCurrentLiabilities)

    if(value > LIQUIDITY_BOUND):
        value = LIQUIDITY_BOUND
    elif(value < -1*LIQUIDITY_BOUND):
        value = -1*LIQUIDITY_BOUND

    return (value/LIQUIDITY_BOUND)


def get_RSCD(raw_dfs, i):
    ebit = raw_dfs["INCOME_STATEMENT"]["ebit"][i]
    interestExpense = raw_dfs["INCOME_STATEMENT"]["interestExpense"][i]
    shortTermDebt = raw_dfs["BALANCE_SHEET"]["shortTermDebt"][i]

    if (interestExpense == 0 and shortTermDebt == 0):
        return 1
    
    value = (ebit) / (interestExpense + shortTermDebt)

    if(value > LIQUIDITY_BOUND):
        value = LIQUIDITY_BOUND
    elif(value < -1*LIQUIDITY_BOUND):
        value = -1*LIQUIDITY_BOUND

    return (value/LIQUIDITY_BOUND)

# This is not directly used in the NN, but still useful data to have on the frame
def get_Enterprise_Value(raw_dfs, i):
    average_price = raw_dfs["PRICES"]["averagePrice"][i]
    longTermDebt = raw_dfs["BALANCE_SHEET"]["longTermDebt"][i]   
    netIncome = raw_dfs["INCOME_STATEMENT"]["netIncome"][i]
    average_price = raw_dfs["PRICES"]["averagePrice"][i]
    commonStockSharesOutstanding = raw_dfs["BALANCE_SHEET"]["commonStockSharesOutstanding"][i]
    cashAndShortTermInvestments = raw_dfs["BALANCE_SHEET"]["cashAndShortTermInvestments"][i]

    return (commonStockSharesOutstanding * average_price + longTermDebt - cashAndShortTermInvestments)

# used to calculate all ratios at once
RATIOS_FUNCTIONS = {
    "Contributions_To_Debt": get_Contributions_To_Debt,
    "ROE": get_ROE,
    "ROA": get_ROA,
    "Sales_Margin": get_Sales_Margin,         
    "ROIC": get_ROIC, 
    "ROCE": get_ROCE, 
    "EV/EBIT": get_EVpEBIT, 
    "PBV": get_PBV, 
    "PER": get_PER, 
    "Dividend_Yield": get_Dividend_Yield,
    "Treasury_Ratio": get_Treasury_Ratio, 
    "Acid_Test": get_Acid_Test, 
    "RSCD": get_RSCD
}

We define a function to refactor our annual data to a fitable dataset for the model

In [None]:
# TODO: terminar de hacer fill a los valores importantes, y checkear que hay algunos valores. Ej si falta el commonStockSharesOutstanding para calcular el EV, hay que tirar esa columna entera

# Returns the new raw_dfs with filled  and clean new values. Returns empty dataframe if data is too bad to be used
def fill_raw_data(raw_dfs) -> {}:
    empty = {}
    # interestExpense:
    if (raw_dfs["INCOME_STATEMENT"]["interestExpense"].isna().sum() > 1):
        return empty
    elif(raw_dfs["INCOME_STATEMENT"]["interestExpense"].isna().sum() == 1):
        raw_dfs["INCOME_STATEMENT"]["interestExpense"] = pd.concat([raw_dfs["INCOME_STATEMENT"]["interestExpense"].ffill(), raw_dfs["INCOME_STATEMENT"]["interestExpense"].bfill()]).groupby(level=0).mean()
    
    raw_dfs["INCOME_STATEMENT"]["interestExpense"] = raw_dfs["INCOME_STATEMENT"]["interestExpense"].abs()

    # netIncome:
    if (raw_dfs["INCOME_STATEMENT"]["netIncome"].isna().sum() > 1):
        return empty
    elif(raw_dfs["INCOME_STATEMENT"]["netIncome"].isna().sum() == 1):
        raw_dfs["INCOME_STATEMENT"]["netIncome"] = pd.concat([raw_dfs["INCOME_STATEMENT"]["netIncome"].ffill(), raw_dfs["INCOME_STATEMENT"]["netIncome"].bfill()]).groupby(level=0).mean()

    # commonStockTotalEquity:
    if (raw_dfs["BALANCE_SHEET"]["commonStockTotalEquity"].isna().sum() > 1):
        return empty
    elif(raw_dfs["BALANCE_SHEET"]["commonStockTotalEquity"].isna().sum() == 1):
        raw_dfs["BALANCE_SHEET"]["commonStockTotalEquity"] = pd.concat([raw_dfs["BALANCE_SHEET"]["commonStockTotalEquity"].ffill(), raw_dfs["BALANCE_SHEET"]["commonStockTotalEquity"].bfill()]).groupby(level=0).mean()
    ## If any totalequity is negtive, we return empty and say that this stock is garbage and will probably go bankrupt
    if((raw_dfs["BALANCE_SHEET"]["commonStockTotalEquity"] < 0).sum() > 0):
        print("This stock might go bankrupt due toa  negative totalEquity")
        return empty

    #longTermDebt
    if (raw_dfs["BALANCE_SHEET"]["longTermDebt"].isna().sum() > 2):
        return empty
    else:
        raw_dfs["BALANCE_SHEET"]["longTermDebt"] = raw_dfs["BALANCE_SHEET"]["longTermDebt"].fillna(0)
        
    # totalShareholderEquity:
    if (raw_dfs["BALANCE_SHEET"]["totalShareholderEquity"].isna().sum() > 1):
        return empty
    elif(raw_dfs["BALANCE_SHEET"]["totalShareholderEquity"].isna().sum() == 1):
        raw_dfs["BALANCE_SHEET"]["totalShareholderEquity"] = pd.concat([raw_dfs["BALANCE_SHEET"]["totalShareholderEquity"].ffill(), raw_dfs["BALANCE_SHEET"]["totalShareholderEquity"].bfill()]).groupby(level=0).mean()

    # totalAssets:
    if (raw_dfs["BALANCE_SHEET"]["totalAssets"].isna().sum() > 1):
        return empty
    elif(raw_dfs["BALANCE_SHEET"]["totalAssets"].isna().sum() == 1):
        raw_dfs["BALANCE_SHEET"]["totalAssets"] = pd.concat([raw_dfs["BALANCE_SHEET"]["totalAssets"].ffill(), raw_dfs["BALANCE_SHEET"]["totalAssets"].bfill()]).groupby(level=0).mean()

    # operatingIncome:
    if (raw_dfs["INCOME_STATEMENT"]["operatingIncome"].isna().sum() > 1):
        return empty
    elif(raw_dfs["INCOME_STATEMENT"]["operatingIncome"].isna().sum() == 1):
        raw_dfs["INCOME_STATEMENT"]["operatingIncome"] = pd.concat([raw_dfs["INCOME_STATEMENT"]["operatingIncome"].ffill(), raw_dfs["INCOME_STATEMENT"]["operatingIncome"].bfill()]).groupby(level=0).mean()
    
    # totalRevenue
    if (raw_dfs["INCOME_STATEMENT"]["totalRevenue"].isna().sum() > 2):
        return empty
    else:
        raw_dfs["INCOME_STATEMENT"]["totalRevenue"] = raw_dfs["INCOME_STATEMENT"]["totalRevenue"].fillna(0)

    # netIncomeApplicableToCommonShares:
    if (raw_dfs["INCOME_STATEMENT"]["netIncomeApplicableToCommonShares"].isna().sum() > 1):
        return empty
    elif(raw_dfs["INCOME_STATEMENT"]["netIncomeApplicableToCommonShares"].isna().sum() == 1):
        raw_dfs["INCOME_STATEMENT"]["netIncomeApplicableToCommonShares"] = pd.concat([raw_dfs["INCOME_STATEMENT"]["netIncomeApplicableToCommonShares"].ffill(), raw_dfs["INCOME_STATEMENT"]["netIncomeApplicableToCommonShares"].bfill()]).groupby(level=0).mean()
    
    # dividendPayout
    raw_dfs["CASH_FLOW"]["dividendPayout"] = raw_dfs["CASH_FLOW"]["dividendPayout"].fillna(0).abs()

    # ebit:
    if (raw_dfs["INCOME_STATEMENT"]["ebit"].isna().sum() > 1):
        return empty
    elif(raw_dfs["INCOME_STATEMENT"]["ebit"].isna().sum() == 1):
        raw_dfs["INCOME_STATEMENT"]["ebit"] = pd.concat([raw_dfs["INCOME_STATEMENT"]["ebit"].ffill(), raw_dfs["INCOME_STATEMENT"]["ebit"].bfill()]).groupby(level=0).mean()

    # cashAndShortTermInvestments:
    if (raw_dfs["BALANCE_SHEET"]["cashAndShortTermInvestments"].isna().sum() > 1):
        return empty
    elif(raw_dfs["BALANCE_SHEET"]["cashAndShortTermInvestments"].isna().sum() == 1):
        raw_dfs["BALANCE_SHEET"]["cashAndShortTermInvestments"] = pd.concat([raw_dfs["BALANCE_SHEET"]["cashAndShortTermInvestments"].ffill(), raw_dfs["BALANCE_SHEET"]["cashAndShortTermInvestments"].bfill()]).groupby(level=0).mean()
    
    raw_dfs["BALANCE_SHEET"]["cashAndShortTermInvestments"] = raw_dfs["BALANCE_SHEET"]["cashAndShortTermInvestments"].abs()

    # shortTermDebt
    raw_dfs["BALANCE_SHEET"]["shortTermDebt"] = raw_dfs["BALANCE_SHEET"]["shortTermDebt"].fillna(0).abs()

    # totalCurrentLiabilities
    if (raw_dfs["BALANCE_SHEET"]["totalCurrentLiabilities"].isna().sum() > 1):
        return empty
    elif(raw_dfs["BALANCE_SHEET"]["totalCurrentLiabilities"].isna().sum() == 1):
        raw_dfs["BALANCE_SHEET"]["totalCurrentLiabilities"] = pd.concat([raw_dfs["BALANCE_SHEET"]["totalCurrentLiabilities"].ffill(), raw_dfs["BALANCE_SHEET"]["totalCurrentLiabilities"].bfill()]).groupby(level=0).mean()

        raw_dfs["BALANCE_SHEET"]["totalCurrentLiabilities"] = raw_dfs["BALANCE_SHEET"]["totalCurrentLiabilities"].abs()
    
    # totalCurrentAssets
    if (raw_dfs["BALANCE_SHEET"]["totalCurrentAssets"].isna().sum() > 1):
        return empty
    elif(raw_dfs["BALANCE_SHEET"]["totalCurrentAssets"].isna().sum() == 1):
        raw_dfs["BALANCE_SHEET"]["totalCurrentAssets"] = pd.concat([raw_dfs["BALANCE_SHEET"]["totalCurrentAssets"].ffill(), raw_dfs["BALANCE_SHEET"]["totalCurrentAssets"].bfill()]).groupby(level=0).mean()

        raw_dfs["BALANCE_SHEET"]["totalCurrentAssets"] = raw_dfs["BALANCE_SHEET"]["totalCurrentAssets"].abs()

    # inventory
    if (raw_dfs["BALANCE_SHEET"]["inventory"].isna().sum() > 1):
        return empty
    elif(raw_dfs["BALANCE_SHEET"]["inventory"].isna().sum() == 1):
        raw_dfs["BALANCE_SHEET"]["inventory"] = pd.concat([raw_dfs["BALANCE_SHEET"]["inventory"].ffill(), raw_dfs["BALANCE_SHEET"]["inventory"].bfill()]).groupby(level=0).mean()

        raw_dfs["BALANCE_SHEET"]["inventory"] = raw_dfs["BALANCE_SHEET"]["inventory"].abs()

    return raw_dfs

# Test the clean df
def test_clean_df(clean_df):
    return clean_df

# REFACTORS ANNUAL DATA FOR A GIVEN COMPANY
def refactor_anual_company_data(symbol, verbose=True):
    # this loop extracts, cleans and arranges all raw dataframes in a dictionary
    raw_dfs = {} 
    dataset_names = API_FUNCTIONS
    dataset_names[0] = 'PRICES'
    for f, i in zip(dataset_names, range(len(dataset_names))):
        file_path = "./companies/" + symbol + "/" + 'Y_' + symbol + '_' + f + ".csv"
        if not(os.path.exists(file_path)):
            if(verbose):
                print(file_path + " does not exists. Download needed?")

            return False
    
        df = pd.read_csv(file_path)

        if(df.empty):
            if(verbose):
                print(f + " is empty")
            
            return False

        if not(f == "PRICES"):    # drop useless column
            df = df.drop(df.columns[0], axis=1)

        df = df.replace(["None", ""], np.nan)

        # Finally convert the rest of the columns to numeric
        selection = [x for x in df.columns if x not in ["Date" ,"fiscalDateEnding", "reportedDate", "reportedCurrency", "Unnamed: 0"]]
        df[selection] = df[selection].apply(pd.to_numeric)

        raw_dfs[f] = df
    

    # The BALANCE_SHEET will be our reference, so if its not exactly of shape             DATA_TIME_STEP_COUNT, or all the results are in dollars we will return False
    if(raw_dfs["BALANCE_SHEET"].shape[0] < MINIMUN_DATA_YEARS):
        if (verbose):
            print("BALANCE_SHEET does not have " + str(MINIMUN_DATA_YEARS) + " rows")
        return False
        # Check all dataframes are of same length

    # Adjust Prices to use same rows as BALANCE_SHEET
    dates = raw_dfs["BALANCE_SHEET"]["year"].to_list()
    raw_dfs["PRICES"] = raw_dfs["PRICES"][raw_dfs["PRICES"]["year"].isin(dates)].reset_index().drop(columns=['index'])

    # Now we do inverse operation: let the dfs match the PRICES
    dates = raw_dfs["PRICES"]["year"].to_list()
    for name, df in raw_dfs.items():
        if (name != "PRICES"):
            raw_dfs[name] = raw_dfs[name][raw_dfs[name]["year"].isin(dates)].reset_index().drop(columns=['index'])

    # Transform some nan values 
    raw_dfs = fill_raw_data(raw_dfs)   
    if(not raw_dfs):
        if(verbose):
            print("The company has alarming results or the data is incorrect. Aborting...")
        
        return False

    # new clean_df where we will insert the desired data
    clean_df = pd.DataFrame()

    # add the date column to new df
    clean_df["year"] = raw_dfs["BALANCE_SHEET"]["year"]

    # add the nextYearGrowth column(in %)
    clean_df["nextYearGrowth"] = np.nan
    for i in range(clean_df.shape[0] - 1, 0, -1):
        clean_df["nextYearGrowth"][i] = ((raw_dfs["PRICES"]["averagePrice"][i - 1] - raw_dfs["PRICES"]["averagePrice"][i])/(raw_dfs["PRICES"]["averagePrice"][i]))

    # now check all the df are of the same length
    for k, v in raw_dfs.items():
        if(v.shape[0] != raw_dfs["BALANCE_SHEET"].shape[0]):
            if (verbose):
                print("Dataframe sizes do not matach")

            return False

    # Check that all the results were given in USD
    currency_values = raw_dfs["BALANCE_SHEET"]["reportedCurrency"].unique()
    if(len(currency_values) != 1 or currency_values[0] != "USD"):
        if (verbose):
            print("The results report are not exclusively in USD")

        return False

    # Refactor the cashFlow dividentPayout:
    raw_dfs["CASH_FLOW"]["dividendPayout"] = raw_dfs["CASH_FLOW"]["dividendPayout"].fillna(0)

    # add the averagePrice column to the clean DF:
    clean_df['averagePrice'] = raw_dfs["PRICES"]["averagePrice"]

    # fill the new df with empty columns:
    for ratio_name in RATIOS_FUNCTIONS.keys():
        clean_df[ratio_name] = np.nan

    # finally calculate all the ratios and fill the clean_df
    for i in range(raw_dfs["BALANCE_SHEET"].shape[0]):
        for ratio_name, ratio_function in RATIOS_FUNCTIONS.items():
            clean_df[ratio_name][i] = ratio_function(raw_dfs, i)

    # Add the EV to the clean frame
    # clean_df["enterpriseValue"] = np.nan
    clean_df.insert(loc=1, column='enterpriseValue', value=np.nan)
    for i in range(clean_df.shape[0]):
        clean_df["enterpriseValue"][i] = get_Enterprise_Value(raw_dfs, i)   

    # save it to a csv file
    clean_df.to_csv('./companies/' + symbol + '/Y_' + symbol + '_CLEAN.csv')
    if(verbose):
        display(clean_df)

    return True


In [None]:
refactor_anual_company_data('MSFT')

Unnamed: 0,year,enterpriseValue,nextYearGrowth,averagePrice,Contributions_To_Debt,ROE,ROA,Sales_Margin,ROIC,ROCE,EV/EBIT,PBV,PER,Dividend_Yield,Treasury_Ratio,Acid_Test,RSCD
0,2020,1390179000000.0,,193.782575,-0.039809,0.598206,0.03674,0.092576,0.424021,0.094619,0.03815,18.213426,0.030182,0.010317,1.888079,1.0,1.0
1,2019,927897000000.0,0.488442,130.191542,-0.094683,0.621969,0.034234,0.085342,0.36541,0.07523,0.047083,12.672618,0.039435,0.01388,1.927672,1.0,1.0
2,2018,695737100000.0,0.319859,98.6405,-0.065776,0.250518,0.016005,0.079417,0.204022,0.063559,0.052425,10.632283,0.021883,0.01677,2.287102,1.0,1.0
3,2017,477856100000.0,0.421788,69.3778,-0.13177,0.414222,0.021988,0.062051,0.227316,0.039806,0.048443,7.714984,0.039651,0.02215,2.060858,1.0,0.937738
4,2016,331110500000.0,0.342283,51.686417,-0.067742,0.304317,0.021681,0.059136,0.255174,0.045317,0.059651,5.919322,0.041624,0.027272,1.907778,1.0,0.698063


True

In [None]:
df = pd.read_csv('./companies/KO/Y_KO_CLEAN.csv')
cols = df.columns
cols = cols.insert(0, "symbol")
dataset = pd.DataFrame(columns=cols)

for company in os.listdir('./companies'):
    refactor_anual_company_data(company, verbose=False)
    if(os.path.exists('./companies/' + company + '/Y_' + company + '_CLEAN.csv')):
        temp_df = pd.read_csv('./companies/' + company + '/Y_' + company + '_CLEAN.csv')
        temp_df['symbol'] = company
        dataset = dataset.append(temp_df, ignore_index = True)


dataset = dataset.drop(columns=['Unnamed: 0']).dropna().reset_index().drop(columns=['index'])


# The ratios we will actually use in the FNN:
FINAL_RATIOS = ['ROA', 'Sales_Margin', 'ROCE', 'EV/EBIT', 'PER', 'Dividend_Yield', 'Acid_Test', 'RSCD']
dataset = dataset[['symbol', 'year', 'enterpriseValue', 'averagePrice', 'nextYearGrowth'] + FINAL_RATIOS]

# Drop rows with weird values:
dataset = dataset[dataset["Dividend_Yield"].abs() < 1]

dataset.to_csv('./stock_dataset.csv', index=False)

dataset

Unnamed: 0,symbol,year,enterpriseValue,averagePrice,nextYearGrowth,ROA,Sales_Margin,ROCE,EV/EBIT,PER,Dividend_Yield,Acid_Test,RSCD
0,NVDA,2019,1.013647e+11,176.235542,1.295618,0.077885,0.081171,0.529915,0.041592,0.038774,0.003474,1.000000,1.000000
1,NVDA,2018,1.326559e+11,227.357992,-0.224854,0.067765,0.082613,0.435045,0.026052,0.022115,0.002475,1.000000,1.000000
2,NVDA,2017,8.327782e+10,150.586017,0.509821,0.042323,0.070080,0.270917,0.025817,0.018912,0.002963,1.000000,1.000000
3,NVDA,2016,2.526353e+10,56.054792,1.686408,0.020828,0.037275,1.000000,0.031270,0.020322,0.007050,1.000000,0.270548
4,KO,2018,1.951167e+11,42.033892,0.160992,0.019329,0.068276,0.085432,0.047505,0.035605,0.036767,0.476816,0.242517
...,...,...,...,...,...,...,...,...,...,...,...,...,...
379,ALKS,2018,6.781945e+09,45.229167,-0.444256,-0.019084,-0.022647,-0.100281,-0.016445,-0.019775,0.000000,1.000000,-1.000000
380,ALKS,2017,8.249998e+09,54.575000,-0.171248,-0.021971,-0.040930,-0.117205,-0.015911,-0.018792,0.000000,1.000000,-1.000000
381,ALKS,2016,6.533182e+09,44.280833,0.232475,-0.030184,-0.069957,-0.176731,-0.030536,-0.030882,0.000000,1.000000,-1.000000
382,ALKS,2015,9.760872e+09,66.432500,-0.333446,-0.030603,-0.089244,-0.184406,-0.021592,-0.022690,0.000000,1.000000,-1.000000


We Finally start with the cool stuff: lets start by getting our dataset:

<a name="fitting"></a>
# 3. Fitting and evaluating the model:
The previous two sections were dedicated exclusively to get our data ready for this part. It is widely known that refactoring the right way our data can be crucial to improve our models, or even make them work at all.

We start by splitting our dataset in the X variable(the features) and our y or predicting variable.

In [72]:
original_dataset = pd.read_csv('./stock_dataset.csv')
dataset = original_dataset.copy()

dataset = dataset.iloc[:,4:]
display(dataset)
y = dataset.pop('nextYearGrowth')
X = dataset


Unnamed: 0,nextYearGrowth,ROA,Sales_Margin,ROCE,EV/EBIT,PER,Dividend_Yield,Acid_Test,RSCD
0,1.295618,0.077885,0.081171,0.529915,0.041592,0.038774,0.003474,1.000000,1.000000
1,-0.224854,0.067765,0.082613,0.435045,0.026052,0.022115,0.002475,1.000000,1.000000
2,0.509821,0.042323,0.070080,0.270917,0.025817,0.018912,0.002963,1.000000,1.000000
3,1.686408,0.020828,0.037275,1.000000,0.031270,0.020322,0.007050,1.000000,0.270548
4,0.160992,0.019329,0.068276,0.085432,0.047505,0.035605,0.036767,0.476816,0.242517
...,...,...,...,...,...,...,...,...,...
375,-0.444256,-0.019084,-0.022647,-0.100281,-0.016445,-0.019775,0.000000,1.000000,-1.000000
376,-0.171248,-0.021971,-0.040930,-0.117205,-0.015911,-0.018792,0.000000,1.000000,-1.000000
377,0.232475,-0.030184,-0.069957,-0.176731,-0.030536,-0.030882,0.000000,1.000000,-1.000000
378,-0.333446,-0.030603,-0.089244,-0.184406,-0.021592,-0.022690,0.000000,1.000000,-1.000000


Import the models and it's utilities for the next sections:


In [73]:
from sklearn.model_selection import train_test_split
import matplotlib.pyplot as plt
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Activation
from tensorflow.keras.callbacks import EarlyStopping
from sklearn.preprocessing import StandardScaler
import keras


Split the dataset in a train set test and a test set. Even though we tried to use scaled ratios as inputs, we should still scale our data in order to avoid some uncontrolled noise.

In [74]:

# Split dataset
x_train, x_test, y_train, y_test = train_test_split(    
    X, y, test_size=0.15)

scaler = StandardScaler()

#first we fit the scaler on the training dataset
scaler.fit(x_train)

x_train_scaled = scaler.transform(x_train)
x_test_scaled = scaler.transform(x_test)

In [75]:
x_train.shape

(323, 8)

Building the model adding the Input layer and the Dense layers (which lead to the output layer of shape 1)

In [77]:
# Build the neural network
model = Sequential()
model.add(keras.layers.Dense(9, input_shape=[x_train.shape[1]])) #Input
model.add(Dense(64, activation='relu')) # Hidden 1
model.add(Dense(32, activation='relu')) # Hidden 2
model.add(Dense(8, activation='relu')) # Hidden 2
model.add(Dense(1, activation='linear')) # Output
model.compile(optimizer='adam', loss='mse', metrics=['mse'])
# monitor = EarlyStopping(monitor='val_loss', min_delta=0, 
#                         patience=16, verbose=1, mode='auto', 
#                         restore_best_weights=True)
model.summary()

Model: "sequential_2"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
dense_8 (Dense)              (None, 9)                 81        
_________________________________________________________________
dense_9 (Dense)              (None, 64)                640       
_________________________________________________________________
dense_10 (Dense)             (None, 32)                2080      
_________________________________________________________________
dense_11 (Dense)             (None, 8)                 264       
_________________________________________________________________
dense_12 (Dense)             (None, 1)                 9         
Total params: 3,074
Trainable params: 3,074
Non-trainable params: 0
_________________________________________________________________


Playing with the number of epochs can be a good idea to tune the model a little bit

In [78]:
history = model.fit(x_train_scaled, y_train, validation_split=0.2, epochs=64)

Epoch 1/64
Epoch 2/64
Epoch 3/64
Epoch 4/64
Epoch 5/64
Epoch 6/64
Epoch 7/64
Epoch 8/64
Epoch 9/64
Epoch 10/64
Epoch 11/64
Epoch 12/64
Epoch 13/64
Epoch 14/64
Epoch 15/64
Epoch 16/64
Epoch 17/64
Epoch 18/64
Epoch 19/64
Epoch 20/64
Epoch 21/64
Epoch 22/64
Epoch 23/64
Epoch 24/64
Epoch 25/64
Epoch 26/64
Epoch 27/64
Epoch 28/64
Epoch 29/64
Epoch 30/64
Epoch 31/64
Epoch 32/64
Epoch 33/64
Epoch 34/64
Epoch 35/64
Epoch 36/64
Epoch 37/64
Epoch 38/64
Epoch 39/64
Epoch 40/64
Epoch 41/64
Epoch 42/64
Epoch 43/64
Epoch 44/64
Epoch 45/64
Epoch 46/64
Epoch 47/64
Epoch 48/64
Epoch 49/64
Epoch 50/64
Epoch 51/64
Epoch 52/64
Epoch 53/64
Epoch 54/64
Epoch 55/64
Epoch 56/64
Epoch 57/64
Epoch 58/64
Epoch 59/64
Epoch 60/64
Epoch 61/64
Epoch 62/64
Epoch 63/64
Epoch 64/64


We evaluate our model with the x and y test set we got from splitting earlier

In [79]:
mse = model.evaluate(x_test_scaled, y_test)[0]



We will now use a very famous statistics metric for regression: the RMSE. We can obtaining just doing the SQRT of our models MSE.

In [80]:
rmse = np.sqrt(mse)

rmse

0.33519744308678323

Our RMSE suggest that our model is actually TERRIBLE making predictions, a good RMSE for this project should be around 0.05.

We still can test our data with the test set, but just with our previous results we know we won't make any further with this model.
You may look at the last two columns of the next dataframe for comparison

In [81]:


prediction = model.predict(x_test_scaled)
p_df = pd.DataFrame(prediction.flatten())
p_df.index = y_test.index

x_df = pd.DataFrame(x_test_scaled)
x_df.index = y_test.index


result_df = pd.concat([x_df, y_test, p_df], axis=1)


result_df

Unnamed: 0,0,1,2,3,4,5,6,7,nextYearGrowth,0.1
174,0.123919,0.303347,0.14834,0.64676,-0.771396,-0.164653,-1.061616,0.88081,-0.160108,0.170177
320,0.474093,0.361273,0.272612,0.533882,0.466489,0.09011,0.142289,0.88081,-0.034408,0.152459
315,0.624366,0.442776,0.174825,0.174081,0.241309,-0.270782,0.020441,0.88081,0.195959,0.039383
240,0.431038,0.380787,0.146078,0.59476,1.195993,-0.399937,-1.029231,0.286067,-0.363322,-0.016624
253,0.27537,0.500616,0.091878,0.391359,0.23902,-0.399937,-2.072112,-0.415622,0.083774,-0.107718
218,0.140681,0.23971,-0.256613,-0.28824,-0.213776,0.025189,0.938438,-1.814503,-0.127801,-0.274409
114,0.64365,0.406261,0.140255,0.199428,0.471923,-0.022531,-0.229718,0.217827,-0.109817,0.052035
216,0.74797,0.472495,1.929948,0.155055,0.185033,-0.399937,0.938438,0.88081,1.016945,0.190938
351,-1.401023,-0.51075,-1.817947,-0.741161,-0.620703,-0.399937,0.938438,-1.814503,0.598279,-0.260374
328,0.27924,0.488134,0.056,-0.116461,0.41697,0.047064,-2.087021,-0.466847,0.264907,-0.058869


<a name="tuning"></a>
# 4. HYPERPARAMETER AND MODEL TUNING

Tuning this model is a nonsense since our model is built wrong by it's basis, and there is nothing to tune.
Anyways, for learning purposes, we will be using the Keras BayesianOptimization util, which is proven to be good for this kind of regressions

In [82]:
!pip install -U keras-tuner

Requirement already up-to-date: keras-tuner in /usr/local/lib/python3.6/dist-packages (1.0.2)


This time we will try to build our network with no scaling

In [83]:
from sklearn.preprocessing import StandardScaler
from tensorflow.keras import models, layers
from kerastuner import HyperModel, BayesianOptimization
from numpy.random import seed
seed(42)
from sklearn.model_selection import train_test_split
import tensorflow
tensorflow.random.set_seed(42)

In [84]:
original_dataset = pd.read_csv('./stock_dataset.csv')
dataset = original_dataset.copy()

dataset = dataset.iloc[:,4:]
y = dataset.pop('nextYearGrowth')
X = dataset

# Split dataset
x_train, x_test, y_train, y_test = train_test_split(    
    X, y, test_size=0.2)

x_train

Unnamed: 0,ROA,Sales_Margin,ROCE,EV/EBIT,PER,Dividend_Yield,Acid_Test,RSCD
77,-0.249449,-1.000000,-1.000000,-0.025546,-0.023953,0.000000,1.000000,1.000000
220,0.013200,0.013637,0.430001,0.064194,0.042155,0.021040,0.979116,0.867072
297,-0.168848,-0.106096,-0.253118,-0.195660,-0.255929,0.000000,0.183226,-0.469773
113,0.006646,0.038184,0.013676,0.032536,0.025561,0.028099,1.000000,0.525057
203,0.045908,0.046301,1.000000,0.050989,0.030382,0.007617,0.828333,1.000000
...,...,...,...,...,...,...,...,...
71,0.039296,0.037838,0.074154,0.145645,0.290885,0.010626,0.335244,0.883317
106,0.006576,0.022953,0.018755,0.023366,0.023127,0.000000,0.499745,0.541587
270,-0.007487,-1.000000,-1.000000,-0.053473,-0.020281,0.000000,1.000000,1.000000
348,-0.124209,-0.474042,-1.000000,-0.197928,-0.175734,0.000000,1.000000,-1.000000


Building our hypermodel class:

In [85]:
class RegressionHyperModel(HyperModel):
    def __init__(self, input_shape):
        self.input_shape = input_shape

    def build(self, hp):
        model = models.Sequential()
        model.add(
            layers.Dense(
                units=hp.Int('units', 8, 64, 4, default=8),
                activation=hp.Choice(
                    'dense_activation',
                    values=['relu', 'tanh', 'sigmoid'],
                    default='relu'),
                input_shape=input_shape,
                kernel_initializer='zeros', bias_initializer='zeros'
            )
        )
        
        model.add(
            layers.Dense(
                units=hp.Int('units', 16, 64, 4, default=16),
                activation=hp.Choice(
                    'dense_activation',
                    values=['relu', 'tanh', 'sigmoid'],
                    default='relu'),
                kernel_initializer='zeros', bias_initializer='zeros'
            )
        )
        
        model.add(
            layers.Dropout(
                hp.Float(
                    'dropout',
                    min_value=0.0,
                    max_value=0.1,
                    default=0.005,
                    step=0.01)
            )
        )
        
        model.add(layers.Dense(1, kernel_initializer='zeros', bias_initializer='zeros', activation='linear'))
        
        model.compile(
            optimizer='rmsprop',loss='mse',metrics=['mse']
        )
        
        return model

Using the BayesianOptimization (or simply bo).   
Note that this may take some time.

In [86]:
input_shape = (x_train.shape[1],)
hypermodel = RegressionHyperModel(input_shape)

tuner_bo = BayesianOptimization(
            hypermodel,
            objective='mse',
            max_trials=10,
            seed=42,
            executions_per_trial=2
        )

tuner_bo.search(x_train, y_train, epochs=64, validation_split=0.2, verbose=0)


INFO:tensorflow:Reloading Oracle from existing project ./untitled_project/oracle.json
INFO:tensorflow:Reloading Tuner from ./untitled_project/tuner0.json
INFO:tensorflow:Oracle triggered exit


Extract the best model from all the attempts

In [87]:
best_model = tuner_bo.get_best_models(num_models=1)[0]
mse_bo = best_model.evaluate(x_test, y_test)[1]  


best_model.summary()

Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
dense (Dense)                (None, 60)                540       
_________________________________________________________________
dense_1 (Dense)              (None, 60)                3660      
_________________________________________________________________
dropout (Dropout)            (None, 60)                0         
_________________________________________________________________
dense_2 (Dense)              (None, 1)                 61        
Total params: 4,261
Trainable params: 4,261
Non-trainable params: 0
_________________________________________________________________


Check again the RMSE of the chosen model by the tuner:

In [88]:
np.sqrt(mse_bo)

0.3801270707264389

As expected, the result is still awful. As it is said earlier, when a Neural Network project fails at is basis, tuning it will no make any better.

<a name="conclusion"></a>
# 5. CONCLUSION
The model didn't succeed in the goal of the project, but many lessons can be extracted from this project.
1. The first and main reason for the bad predictions is probably related to the fact that the data is simply not sufficiently correlated. It is for sure that the ratios used in this approach are good, but we are only taking into account the financials of the companies we are analysing. Some important external factors such as the social sentiment or the political events are crucial in order to make this model worth a shot.  
2. The chosen model: Yes, a Forward Neural Network can be good enough for this problem, but the fact that our data sits in a time-line is important, and trying a Long Short Term Memory for the data time-series could be a better approach. We might try this in a furute example.
3. Tuning: model tuning is an advanced topic of Machine Learning, but when you master it, your models will improve significantly. It is worth the time learning and reading a lot about this topic.  

To summarize, this project was a great experience on my carreer in Data Science. The best case you can expect from learning is failing, beacuse when you fail you have more errors to learn from.  

Thank you for following up with me thourgh this experience

<a name="bibliography"></a>
# 6. BIBLIOGRAPHY

+ https://colab.research.google.com/github/tensorflow/docs/blob/master/site/en/tutorials/keras/keras_tuner.ipynb#scrollTo=_leAIdFKAxAD1
+ https://www.freecodecamp.org/
+ https://keras-team.github.io/keras-tuner/
+ https://pandas.pydata.org/docs/
+ https://www.alphavantage.co/documentation/
+ *introduction to machine learning with python: a guide for data scientists*