#            Stock Market Quarterly Investment Recommender

A Data Science Project by Eric J Campbell for The Data Incubator 2019 NYC Fall Cohort

# Motivation

This model aims to help an investor decide what stocks to invest in or what stocks to sell by attempting to predict the outcome of a publicly-traded company's quarterly earnings report. One of the key metrics released during each quarter is the Earnings Per Share (EPS), which is a reflection of the companies performance. If the EPS exceeds expectations, investors usually invest more in the company, subsequently raising the share price. Conversely, if the EPS falls below expectations, investors are prone to selling the stock, which causes a fall in share price. This model uses a freely-avaibable Facebook dataset which tracks the checkins, likes, and talking about counts for many companies over time. The checkins are used as a feature to quantify the trends in consumer activity at a physical company storefront, and are therefore a reflection of company performance. To summarize, by using data leading up to the release of a quarterly earnings report, this model will recommend to buy or sell a stock depending on predicted performance, which can both maximize profit and minimize loss while investing.

Data are found from multiple sources. The Facebook dataset, which is around 0.5 GB in size, is provided freely by Thinknum. Furthermore, two financial datasets are found using webscraping and availible APIs. Reported and expected EPS data, and release dates are found by web-scraping Yahoo Finance using the python Requests library and Beautiful Soup. Stock price, as well as quarterly revenue, profit, loss, and quarter dates are retrieved using a financial API provided by Intrinio. The Facebook dataset contains both public and private companies, and does not include stock information. The Intrinio API is again used to search for similar sounding companies, which are then processed using the Fuzzy Wuzzy library to find the best company match and stock ticker.

A machine learning model was implemented in order to predict the surprise EPS. The input dataframe was constructed as follows. Each row serves as one observation of a company's quarterly results. Six features are used, including the average checkins, average likes, and average talking about counts for that quarter, as well as the reported EPS. The remaining two features are found from feature engineering, using the ratio of both total gross profit and total operating expenses to the total revenue for that quarter. Because the likes, checkins, and talking about counts vary over several orders of magnitude across different companies, a StandardScaler transformer was constructed to scale the input data to more reasonable magnitudes. This transformer is placed in a pipeline along with a random forest predictor, and a gridsearch is performed to find optimal parameters. The accuracy is found to be ~0.95 for the training set and ~0.40 for the test set.

# Results

A validation of the idea behind this project is presented in the chart below, which shows the stock price for Tesla over the last two years, along with the points where a quarterly report was issued. Both positive EPS results (green dots) and negative EPS results (red dots) can be seen. Most of the time, the EPS report generates a predictable outcome!

In [6]:
plot_tesla_chart()

The final product of the model is shown below. 

By selecting a company, year, and quarter, the model will build a sample portfolio of what stocks should be owned, and what stocks should not be owned. Each row is color-coded to that effect:
1. Dark Green - Strong Buy: The stock is predicted to spike in price with the EPS report release.
2. Light Green - Weak Buy: This stock is predicted to increase slightly with the EPS report release.
3. Pink - Weak Sell: This stock is predicted to decrease slightly with the EPS report release.
4. Red - Strong Sell: This stock is expected to drop with the EPS report release.

The upcoming report release date, predicted EPS, and predictor recommendation are also provided. It is noted here that only a select few companies have been included in the result, as they are publicly-traded brick-and mortar stores where checkin activity is relevant. With access to more data on physical companies, this predictor would be more diversified and therefore, safer for an investor.

In [7]:
interactive_table()

Finally, an interactive plot is presented below, where given the year and quarter, the portfolio yield from the ML predictor model is plotted. This portfolio is then compared to a naive model which uses a Dow Jones tracked index fund ETF. The resulting portfolio return for both models is then displayed for comparison.

In [8]:
interactive_plot()

# Takeaway and Project Conclusions

In this report, a machine learning model was conceived and built from start to finish using freely availible data, and data from APIs and webscraping. Non-trivial analysis was performed on the data and fed into the model to generate complex predictions. Finally, an interactive portfolio generator and investment yield estimator were constructed in order to provide an investor a simple guide towards investing.

By using this model, an investor could potentially earn large investment yields in short periods of time by taking advantage of the volatility in a stock's share price during a quarterly earnings report release. Though an earnings report is not the sole variable in determining trends in a stock's price, it is by no means unimportant, as earnings are a very powerful metric for determining a company's performance. 

To conclude, although the model's predictive performance on the test set was ~0.4, it could be improved with further refinement and more amounts of high-quality data. To that end, this model could serve as a test case for an unsupervised real-time model which generates predictions for investors before a company's report release date.

# Code

In [1]:
import pickle
import pandas as pd

with open('EPS_dates_p_tesla.pickle', 'rb') as handle:
    EPS_dates_p = pickle.load(handle)
with open('EPS_dates_n_tesla.pickle', 'rb') as handle:
    EPS_dates_n = pickle.load(handle)
with open('EPS_prices_p_tesla.pickle', 'rb') as handle:
    EPS_prices_p = pickle.load(handle)
with open('EPS_prices_n_tesla.pickle', 'rb') as handle:
    EPS_prices_n = pickle.load(handle)
with open('dates_tesla.pickle', 'rb') as handle:
    dates = pickle.load(handle)
with open('prices_tesla.pickle', 'rb') as handle:
    prices = pickle.load(handle)

X_test_predicted = pd.read_pickle('X_test_predicted.pkl')
with open('quarter_release_date.pickle', 'rb') as handle:
    quarter_release_date = pickle.load(handle)

with open('dowjones_date_quarter_price_dict.pickle', 'rb') as handle:
    dowjones_date_quarter_price_dict = pickle.load(handle)
with open('select_profit.pickle', 'rb') as handle:
    select_profit = pickle.load(handle)
with open('select_cost.pickle', 'rb') as handle:
    select_cost = pickle.load(handle)

In [2]:
def plot_tesla_chart():
    from datetime import datetime
    from bokeh.plotting import figure, output_notebook, show, ColumnDataSource
    from bokeh.transform import factor_cmap
    from bokeh.models import HoverTool
    import pickle
    
#     with open('EPS_dates_p_tesla.pickle', 'rb') as handle:
#         EPS_dates_p = pickle.load(handle)
#     with open('EPS_dates_n_tesla.pickle', 'rb') as handle:
#         EPS_dates_n = pickle.load(handle)
#     with open('EPS_prices_p_tesla.pickle', 'rb') as handle:
#         EPS_prices_p = pickle.load(handle)
#     with open('EPS_prices_n_tesla.pickle', 'rb') as handle:
#         EPS_prices_n = pickle.load(handle)
#     with open('dates_tesla.pickle', 'rb') as handle:
#         dates = pickle.load(handle)
#     with open('prices_tesla.pickle', 'rb') as handle:
#         prices = pickle.load(handle)

    output_notebook()

    p = figure(
       tools=['pan','box_zoom','reset','save'],
       x_range=[datetime(2018, 1, 1).date(), datetime(2019, 11, 30).date()],
        title="Tesla Stock Price with Quarterly Earnings Per Share Releases",
       x_axis_label='Date', y_axis_label='Stock Price',
        x_axis_type="datetime" ,
    )
    p.width = 900
    p.height = 500
    
    p.line(dates, prices, legend='Price')
    p.scatter(EPS_dates_p, EPS_prices_p, size=8, color='lime', legend='Positive Report')
    p.scatter(EPS_dates_n, EPS_prices_n, size=8, color='red', legend='Negative Report')
    p.legend.location = 'top_right'

    show(p)

In [3]:
def interactive_table():
    from IPython.display import display
    from ipywidgets import widgets
    import pandas as pd
    import numpy as np
    import pickle
    
#     X_test_predicted = pd.read_pickle('X_test_predicted.pkl')
#     with open('quarter_release_date.pickle', 'rb') as handle:
#         quarter_release_date = pickle.load(handle)
    
    company_list = ['WALMART',
                        'TEXASROADHOUSE',
                        'DENNYS',
                        'DILLARDS',
                        'BIGLOTS',
                        'PLANETFITNESS',
                        'BIG5SPORTINGGOODS',
                        'LUMBERLIQUIDATORS',
                        'CHIPOTLE',
                        'DOLLARGENERAL',
                        'REDROBIN',
                        'DELTA',
                        'DESTINATIONXL',
                        'WINGSTOP',
                        'SEAWORLD',
                        'MCDONALDSUS',
                        'NORWEGIANCRUISELINE',
                        'ADVANCEAUTOPARTS',
                        'CHILDRENSPLACE',
                        'GUESS',
                        'KROGER',
                        'NORDSTROMRACK',
                        'CAESARSENTERTAINMENTCORP',
                        'ROYALCARIBBEAN',
                        'FOOTLOCKER',
                        'ESTEELAUDERCOMPANIES',
                        'CRACKERBARREL',
                        'AMERICANAIRLINES',
                        'LAZBOY',
                        'NIKE',
                        'MARRIOTTINTERNATIONAL',
                        'AUTONATION',
                        'EXTENDEDSTAYAMERICA',
                        'NATURALGROCERS',
                        'SHAKESHACK',
                        'POTBELLYSANDWICHSHOP',
                        'KOHLS']

    quarter_date_list = ['ALL QUARTERS', 'Jan 1 - Mar 31', 'Apr 1 - June 30', 'Jul 1 - Sep 30', 'Oct 1 - Dec 31']

    def highlight_greaterthan_1(s):
        if s['Predicted Surprise EPS'] >= 10.:
            return ['background-color: green']*4
        elif s['Predicted Surprise EPS'] > 0. and s['Predicted Surprise EPS'] < 10.:
            return ['background-color: lightgreen']*4
        elif s['Predicted Surprise EPS'] <= 0. and s['Predicted Surprise EPS'] > -10.:
            return ['background-color: pink']*4
        else:
            return ['background-color: red']*4

    filtered_X_test_predicted = X_test_predicted[X_test_predicted.Company.isin(company_list)]

    buy_or_sell = []
    for eps in filtered_X_test_predicted['Predicted Surprise EPS']:
        if eps >= 10.:
            buy_or_sell.append('Strong Buy')
        elif eps > 0. and eps < 10.:
            buy_or_sell.append('Weak Buy')
        elif eps <= 0. and eps > -10.:
            buy_or_sell.append('Weak Sell')
        else:
            buy_or_sell.append('Strong Sell')
    filtered_X_test_predicted.loc[:,'Recommendation'] = buy_or_sell

    release_date = []
    for row in range(len(filtered_X_test_predicted)):
        df_row = filtered_X_test_predicted.iloc[row, :]
        company = df_row['Company']
        date_range = df_row['Quarter Dates']
        release_date.append(quarter_release_date[company][date_range])
    filtered_X_test_predicted.loc[:,'Report Date'] = release_date

    company_list.sort()
    company_list.insert(0,'ALL COMPANIES')

    dropdown_company = widgets.Dropdown(options = company_list)
    dropdown_year = widgets.Dropdown(options = ['ALL YEARS', '2017', '2018'])
    dropdown_quarter = widgets.Dropdown(options = quarter_date_list)

    output = widgets.Output()

    def date_quarter_mask(dataframe, month_list):
        mask = []
        for index in range(len(dataframe)):
            row = dataframe.iloc[index]
            if row['Report Date'].month in month_list:
                mask.append(True)
            else:
                mask.append(False)
        return mask

    def date_year_mask(dataframe, year):
        mask = []
        for index in range(len(dataframe)):
            row = dataframe.iloc[index]
            if row['Report Date'].year == int(year):
                mask.append(True)
            else:
                mask.append(False)
        return mask

    def quarter_year_mask(dataframe, year, month_list):
        mask = []
        for index in range(len(dataframe)):
            row = dataframe.iloc[index]
            if row['Report Date'].month in month_list and row['Report Date'].year == int(year):
                mask.append(True)
            else:
                mask.append(False)
        return mask

    def common_filtering(company, year, quarter):
        output.clear_output()
        filtered_X_test_predicted.sort_values(by=['Report Date','Company'], inplace=True)

        if (company == 'ALL COMPANIES') & (year == 'ALL YEARS') & (quarter == 'ALL QUARTERS'):
            common_filter = filtered_X_test_predicted.loc[:, ['Company', 'Report Date', 'Predicted Surprise EPS', 'Recommendation']]

        elif (company == 'ALL COMPANIES') & (year == 'ALL YEARS') & (quarter != 'ALL QUARTERS'):
            if quarter == 'Jan 1 - Mar 31':
                mask = date_quarter_mask(filtered_X_test_predicted, [1, 2, 3])
                common_filter = filtered_X_test_predicted[mask] \
                        .loc[:, ['Company', 'Report Date', 'Predicted Surprise EPS', 'Recommendation']]
            elif quarter == 'Apr 1 - June 30':
                mask = date_quarter_mask(filtered_X_test_predicted, [4, 5, 6])
                common_filter = filtered_X_test_predicted[mask] \
                        .loc[:, ['Company', 'Report Date', 'Predicted Surprise EPS', 'Recommendation']]
            elif quarter == 'Jul 1 - Sep 30':
                mask = date_quarter_mask(filtered_X_test_predicted, [7, 8, 9])
                common_filter = filtered_X_test_predicted[mask] \
                        .loc[:, ['Company', 'Report Date', 'Predicted Surprise EPS', 'Recommendation']]
            elif quarter == 'Oct 1 - Dec 31':
                mask = date_quarter_mask(filtered_X_test_predicted, [10, 11, 12])
                common_filter = filtered_X_test_predicted[mask] \
                        .loc[:, ['Company', 'Report Date', 'Predicted Surprise EPS', 'Recommendation']]

        elif (company == 'ALL COMPANIES') and (year != 'ALL YEARS') and (quarter == 'ALL QUARTERS'):
            mask = date_year_mask(filtered_X_test_predicted, year)
            common_filter = filtered_X_test_predicted[mask] \
                        .loc[:, ['Company', 'Report Date', 'Predicted Surprise EPS', 'Recommendation']]

        elif (company == 'ALL COMPANIES') and (year != 'ALL YEARS') and (quarter != 'ALL QUARTERS'):     
            if quarter == 'Jan 1 - Mar 31':
                mask = quarter_year_mask(filtered_X_test_predicted, year, [1, 2, 3])
                common_filter = filtered_X_test_predicted[mask] \
                        .loc[:, ['Company', 'Report Date', 'Predicted Surprise EPS', 'Recommendation']]
            elif quarter == 'Apr 1 - June 30':
                mask = quarter_year_mask(filtered_X_test_predicted, year, [4, 5, 6]) 
                common_filter = filtered_X_test_predicted[mask] \
                        .loc[:, ['Company', 'Report Date', 'Predicted Surprise EPS', 'Recommendation']]
            elif quarter == 'Jul 1 - Sep 30':
                mask = quarter_year_mask(filtered_X_test_predicted, year, [7, 8, 9]) 
                common_filter = filtered_X_test_predicted[mask] \
                        .loc[:, ['Company', 'Report Date', 'Predicted Surprise EPS', 'Recommendation']]
            elif quarter == 'Oct 1 - Dec 31':
                mask = quarter_year_mask(filtered_X_test_predicted, year, [10, 11, 12]) 
                common_filter = filtered_X_test_predicted[mask] \
                        .loc[:, ['Company', 'Report Date', 'Predicted Surprise EPS', 'Recommendation']]

        elif (company != 'ALL COMPANIES') and (year == 'ALL YEARS') and (quarter == 'ALL QUARTERS'):
            common_filter = filtered_X_test_predicted[filtered_X_test_predicted.Company == company] \
                        .loc[:, ['Company', 'Report Date', 'Predicted Surprise EPS', 'Recommendation']]

        elif (company != 'ALL COMPANIES') and (year == 'ALL YEARS') and (quarter != 'ALL QUARTERS'):
            if quarter == 'Jan 1 - Mar 31':
                tempdf = filtered_X_test_predicted[filtered_X_test_predicted.Company == company]
                mask = date_quarter_mask(tempdf, [1, 2, 3])
                common_filter = tempdf[mask] \
                        .loc[:, ['Company', 'Report Date', 'Predicted Surprise EPS', 'Recommendation']]
            elif quarter == 'Apr 1 - June 30':
                tempdf = filtered_X_test_predicted[filtered_X_test_predicted.Company == company]
                mask = date_quarter_mask(tempdf, [4, 5, 6])
                common_filter = tempdf[mask] \
                        .loc[:, ['Company', 'Report Date', 'Predicted Surprise EPS', 'Recommendation']]
            elif quarter == 'Jul 1 - Sep 30':
                tempdf = filtered_X_test_predicted[filtered_X_test_predicted.Company == company]
                mask = date_quarter_mask(tempdf, [7, 8, 9])
                common_filter = tempdf[mask] \
                        .loc[:, ['Company', 'Report Date', 'Predicted Surprise EPS', 'Recommendation']]
            elif quarter == 'Oct 1 - Dec 31':
                tempdf = filtered_X_test_predicted[filtered_X_test_predicted.Company == company]
                mask = date_quarter_mask(tempdf, [10, 11, 12])
                common_filter = tempdf[mask] \
                        .loc[:, ['Company', 'Report Date', 'Predicted Surprise EPS', 'Recommendation']]

        elif (company != 'ALL COMPANIES') and (year != 'ALL YEARS') and (quarter == 'ALL QUARTERS'):
            tempdf = filtered_X_test_predicted[filtered_X_test_predicted.Company == company]
            mask = date_year_mask(tempdf, year)
            common_filter = tempdf[mask] \
                        .loc[:, ['Company', 'Report Date', 'Predicted Surprise EPS', 'Recommendation']]

        elif (company != 'ALL COMPANIES') and (year != 'ALL YEARS') and (quarter != 'ALL QUARTERS'):
            tempdf = filtered_X_test_predicted[filtered_X_test_predicted.Company == company]
            if quarter == 'Jan 1 - Mar 31':
                mask = quarter_year_mask(tempdf, year, [1, 2, 3])
                common_filter = tempdf[mask] \
                        .loc[:, ['Company', 'Report Date', 'Predicted Surprise EPS', 'Recommendation']]
            elif quarter == 'Apr 1 - June 30':
                mask = quarter_year_mask(tempdf, year, [4, 5, 6]) 
                common_filter = tempdf[mask] \
                        .loc[:, ['Company', 'Report Date', 'Predicted Surprise EPS', 'Recommendation']]
            elif quarter == 'Jul 1 - Sep 30':
                mask = quarter_year_mask(tempdf, year, [7, 8, 9]) 
                common_filter = tempdf[mask] \
                        .loc[:, ['Company', 'Report Date', 'Predicted Surprise EPS', 'Recommendation']]
            elif quarter == 'Oct 1 - Dec 31':
                mask = quarter_year_mask(tempdf, year, [10, 11, 12]) 
                common_filter = tempdf[mask] \
                        .loc[:, ['Company', 'Report Date', 'Predicted Surprise EPS', 'Recommendation']]

        with output:
            display(common_filter.sort_values(by=['Report Date','Company']) \
                    .style.apply(highlight_greaterthan_1, axis=1))

    def dropdown_company_eventhandler(change):
        common_filtering(change.new, dropdown_year.value, dropdown_quarter.value)
    def dropdown_year_eventhandler(change):
        common_filtering(dropdown_company.value, change.new, dropdown_quarter.value)
    def dropdown_quarter_eventhandler(change):
        common_filtering(dropdown_company.value, dropdown_year.value, change.new)

    dropdown_company.observe(dropdown_company_eventhandler, names='value')
    dropdown_year.observe(dropdown_year_eventhandler, names='value')
    dropdown_quarter.observe(dropdown_quarter_eventhandler, names='value')

    display(dropdown_company)
    display(dropdown_year)
    display(dropdown_quarter)

    display(output)
    pd.options.mode.chained_assignment = None 

In [4]:
def interactive_plot():
    import pickle
    from datetime import datetime
    from IPython.display import display
    from bokeh.io import output_file, show
    from bokeh.layouts import gridplot
    from bokeh.palettes import Viridis3
    from bokeh.plotting import figure, output_notebook
    from collections import OrderedDict
    from bokeh.models import ColumnDataSource, LabelSet, Label
    import pandas as pd
    import numpy as np
    from ipywidgets import widgets
    
#     with open('dowjones_date_quarter_price_dict.pickle', 'rb') as handle:
#         dowjones_date_quarter_price_dict = pickle.load(handle)
#     with open('select_profit.pickle', 'rb') as handle:
#         select_profit = pickle.load(handle)
#     with open('select_cost.pickle', 'rb') as handle:
#         select_cost = pickle.load(handle)
#     X_test_predicted = pd.read_pickle('X_test_predicted.pkl')
#     with open('quarter_release_date.pickle', 'rb') as handle:
#         quarter_release_date = pickle.load(handle)


    company_list = ['WALMART',
                        'TEXASROADHOUSE',
                        'DENNYS',
                        'DILLARDS',
                        'BIGLOTS',
                        'PLANETFITNESS',
                        'BIG5SPORTINGGOODS',
                        'LUMBERLIQUIDATORS',
                        'CHIPOTLE',
                        'DOLLARGENERAL',
                        'REDROBIN',
                        'DELTA',
                        'DESTINATIONXL',
                        'WINGSTOP',
                        'SEAWORLD',
                        'MCDONALDSUS',
                        'NORWEGIANCRUISELINE',
                        'ADVANCEAUTOPARTS',
                        'CHILDRENSPLACE',
                        'GUESS',
                        'KROGER',
                        'NORDSTROMRACK',
                        'CAESARSENTERTAINMENTCORP',
                        'ROYALCARIBBEAN',
                        'FOOTLOCKER',
                        'ESTEELAUDERCOMPANIES',
                        'CRACKERBARREL',
                        'AMERICANAIRLINES',
                        'LAZBOY',
                        'NIKE',
                        'MARRIOTTINTERNATIONAL',
                        'AUTONATION',
                        'EXTENDEDSTAYAMERICA',
                        'NATURALGROCERS',
                        'SHAKESHACK',
                        'POTBELLYSANDWICHSHOP',
                        'KOHLS']

    quarter_date_list = ['ALL QUARTERS', 'Jan 1 - Mar 31', 'Apr 1 - Jun 30', 'Jul 1 - Sep 30', 'Oct 1 - Dec 31']

    def highlight_greaterthan_1(s):
        if s['Predicted Surprise EPS'] >= 10.:
            return ['background-color: green']*4
        elif s['Predicted Surprise EPS'] > 0. and s['Predicted Surprise EPS'] < 10.:
            return ['background-color: lightgreen']*4
        elif s['Predicted Surprise EPS'] <= 0. and s['Predicted Surprise EPS'] > -10.:
            return ['background-color: pink']*4
        else:
            return ['background-color: red']*4

    filtered_X_test_predicted = X_test_predicted[X_test_predicted.Company.isin(company_list)]

    buy_or_sell = []
    for eps in filtered_X_test_predicted['Predicted Surprise EPS']:
        if eps >= 10.:
            buy_or_sell.append('Strong Buy')
        elif eps > 0. and eps < 10.:
            buy_or_sell.append('Weak Buy')
        elif eps <= 0. and eps > -10.:
            buy_or_sell.append('Weak Sell')
        else:
            buy_or_sell.append('Strong Sell')
    filtered_X_test_predicted.loc[:,'Recommendation'] = buy_or_sell

    release_date = []
    for row in range(len(filtered_X_test_predicted)):
        df_row = filtered_X_test_predicted.iloc[row, :]
        company = df_row['Company']
        date_range = df_row['Quarter Dates']
        release_date.append(quarter_release_date[company][date_range])
    filtered_X_test_predicted.loc[:,'Report Date'] = release_date

    ###(select_cost, select_profit) = get_profit_per_release(filtered_X_test_predicted)

    filtered_X_test_predicted.loc[:,'Cost'] = select_cost
    filtered_X_test_predicted.loc[:,'Profit'] = select_profit

    company_list.sort()
    company_list.insert(0,'ALL COMPANIES')

    dropdown_year = widgets.Dropdown(options = ['ALL YEARS', '2017', '2018'])
    dropdown_quarter = widgets.Dropdown(options = quarter_date_list)

    output = widgets.Output()

    def date_quarter_mask(dataframe, month_list):
        mask = []
        for index in range(len(dataframe)):
            row = dataframe.iloc[index]
            if row['Report Date'].month in month_list:
                mask.append(True)
            else:
                mask.append(False)
        return mask

    def date_year_mask(dataframe, year):
        mask = []
        for index in range(len(dataframe)):
            row = dataframe.iloc[index]
            if row['Report Date'].year == int(year):
                mask.append(True)
            else:
                mask.append(False)
        return mask

    def quarter_year_mask(dataframe, year, month_list):
        mask = []
        for index in range(len(dataframe)):
            row = dataframe.iloc[index]
            if row['Report Date'].month in month_list and row['Report Date'].year == int(year):
                mask.append(True)
            else:
                mask.append(False)
        return mask

    def get_yield(dataframe):

        running_cost = 0
        running_profit = 0
        investment_yield = {}
        for index in range(len(dataframe)):
            row = dataframe.iloc[index]
            cost = row['Cost']
            profit = row['Profit']
            date = row['Report Date']

            running_cost += cost
            running_profit += profit
            temp_yield = running_profit * 100 / running_cost
            investment_yield[date] = temp_yield

        investment_yield1 = OrderedDict(sorted(investment_yield.items()))

        return investment_yield1

    def common_filtering(year, quarter):
        output.clear_output()
        filtered_X_test_predicted.sort_values(by=['Company', 'Report Date'], inplace=True)

        if (year == 'ALL YEARS') & (quarter == 'ALL QUARTERS'):
            common_filter = filtered_X_test_predicted \
                .loc[:, ['Company', 'Report Date', 'Cost', 'Profit']]
            investment_yield = get_yield(common_filter)

        elif (year != 'ALL YEARS') and (quarter == 'ALL QUARTERS'):
            mask = date_year_mask(filtered_X_test_predicted, year)
            common_filter = filtered_X_test_predicted[mask] \
                        .loc[:, ['Company', 'Report Date','Cost', 'Profit']]
            investment_yield = get_yield(common_filter)

        elif (year != 'ALL YEARS') and (quarter != 'ALL QUARTERS'):     
            if quarter == 'Jan 1 - Mar 31':
                mask = quarter_year_mask(filtered_X_test_predicted, year, [1, 2, 3])
                common_filter = filtered_X_test_predicted[mask] \
                        .loc[:, ['Company', 'Report Date','Cost', 'Profit']]
            elif quarter == 'Apr 1 - Jun 30':
                mask = quarter_year_mask(filtered_X_test_predicted, year, [4, 5, 6]) 
                common_filter = filtered_X_test_predicted[mask] \
                        .loc[:, ['Company', 'Report Date','Cost', 'Profit']]
            elif quarter == 'Jul 1 - Sep 30':
                mask = quarter_year_mask(filtered_X_test_predicted, year, [7, 8, 9]) 
                common_filter = filtered_X_test_predicted[mask] \
                        .loc[:, ['Company', 'Report Date','Cost', 'Profit']]
            elif quarter == 'Oct 1 - Dec 31':
                mask = quarter_year_mask(filtered_X_test_predicted, year, [10, 11, 12]) 
                common_filter = filtered_X_test_predicted[mask] \
                        .loc[:, ['Company', 'Report Date','Cost', 'Profit']]
            investment_yield = get_yield(common_filter)

        with output:

            if (year != 'ALL YEARS') and (quarter != 'ALL QUARTERS'):
                xdatastr = dowjones_date_quarter_price_dict[(year, quarter)].keys()
                xdata = [datetime.strptime(i, '%Y-%m-%d').date() for i in xdatastr]
                ypricedata = list(dowjones_date_quarter_price_dict[(year, quarter)].values())
                ydata = [(i-ypricedata[-1])*100/ypricedata[-1] for i in ypricedata]

            elif (year != 'ALL YEARS') and (quarter == 'ALL QUARTERS'):

                q1_xdatastr = dowjones_date_quarter_price_dict[(year, 'Jan 1 - Mar 31')].keys()
                q2_xdatastr = dowjones_date_quarter_price_dict[(year, 'Apr 1 - Jun 30')].keys()
                q3_xdatastr = dowjones_date_quarter_price_dict[(year, 'Jul 1 - Sep 30')].keys()
                q4_xdatastr = dowjones_date_quarter_price_dict[(year, 'Oct 1 - Dec 31')].keys()

                q1_xdata = [datetime.strptime(i, '%Y-%m-%d').date() for i in q1_xdatastr]
                q2_xdata = [datetime.strptime(i, '%Y-%m-%d').date() for i in q2_xdatastr]
                q3_xdata = [datetime.strptime(i, '%Y-%m-%d').date() for i in q3_xdatastr]
                q4_xdata = [datetime.strptime(i, '%Y-%m-%d').date() for i in q4_xdatastr]

                q1_ypricedata = list(dowjones_date_quarter_price_dict[(year, 'Jan 1 - Mar 31')].values())
                q2_ypricedata = list(dowjones_date_quarter_price_dict[(year, 'Apr 1 - Jun 30')].values())
                q3_ypricedata = list(dowjones_date_quarter_price_dict[(year, 'Jul 1 - Sep 30')].values())
                q4_ypricedata = list(dowjones_date_quarter_price_dict[(year, 'Oct 1 - Dec 31')].values())

                q1_ydata = [(i-q1_ypricedata[-1])*100/q1_ypricedata[-1] for i in q1_ypricedata]
                q2_ydata = [(i-q1_ypricedata[-1])*100/q1_ypricedata[-1] for i in q2_ypricedata]
                q3_ydata = [(i-q1_ypricedata[-1])*100/q1_ypricedata[-1] for i in q3_ypricedata]
                q4_ydata = [(i-q1_ypricedata[-1])*100/q1_ypricedata[-1] for i in q4_ypricedata]

                xdata = q4_xdata + q3_xdata + q2_xdata + q1_xdata
                ydata = q4_ydata + q3_ydata + q2_ydata + q1_ydata

            elif (year == 'ALL YEARS') and (quarter == 'ALL QUARTERS'):
                xdata = []
                ydata = []
                for y in ['2018', '2017']:
                    q1_xdatastr = dowjones_date_quarter_price_dict[(y, 'Jan 1 - Mar 31')].keys()
                    q2_xdatastr = dowjones_date_quarter_price_dict[(y, 'Apr 1 - Jun 30')].keys()
                    q3_xdatastr = dowjones_date_quarter_price_dict[(y, 'Jul 1 - Sep 30')].keys()
                    q4_xdatastr = dowjones_date_quarter_price_dict[(y, 'Oct 1 - Dec 31')].keys()

                    q1_xdata = [datetime.strptime(i, '%Y-%m-%d').date() for i in q1_xdatastr]
                    q2_xdata = [datetime.strptime(i, '%Y-%m-%d').date() for i in q2_xdatastr]
                    q3_xdata = [datetime.strptime(i, '%Y-%m-%d').date() for i in q3_xdatastr]
                    q4_xdata = [datetime.strptime(i, '%Y-%m-%d').date() for i in q4_xdatastr]

                    q1_ypricedata = list(dowjones_date_quarter_price_dict[(y, 'Jan 1 - Mar 31')].values())

                    fixed_start = list(dowjones_date_quarter_price_dict[('2017', 'Jan 1 - Mar 31')].values())
                    q2_ypricedata = list(dowjones_date_quarter_price_dict[(y, 'Apr 1 - Jun 30')].values())
                    q3_ypricedata = list(dowjones_date_quarter_price_dict[(y, 'Jul 1 - Sep 30')].values())
                    q4_ypricedata = list(dowjones_date_quarter_price_dict[(y, 'Oct 1 - Dec 31')].values())

                    q1_ydata = [(i-fixed_start[-1])*100/fixed_start[-1] for i in q1_ypricedata]
                    q2_ydata = [(i-fixed_start[-1])*100/fixed_start[-1] for i in q2_ypricedata]
                    q3_ydata = [(i-fixed_start[-1])*100/fixed_start[-1] for i in q3_ypricedata]
                    q4_ydata = [(i-fixed_start[-1])*100/fixed_start[-1] for i in q4_ypricedata]

                    xdata += q4_xdata + q3_xdata + q2_xdata + q1_xdata
                    ydata += q4_ydata + q3_ydata + q2_ydata + q1_ydata

            output_notebook()

            p = figure(
               tools=['pan','box_zoom','reset','save'],
                title="Machine Learning and Naive Model Portfolio Growth",
               x_axis_label='Date', y_axis_label='Portfolio Growth (% Change)',
                x_axis_type="datetime" ,
            )
            p.width = 400
            p.height = 400

            p.line(xdata, ydata, color='black', legend='Dow Jones Index Fund (Naive)')
            MLxdata = list(investment_yield.keys())
            MLydata = list(investment_yield.values())

            p.line(MLxdata, MLydata, legend='ML Predictor Model', color='red')
            p.legend.location = 'top_left'

            p2 = figure()
            p2.width = 400
            p2.height = 400
            try:
                names = ['ML Predictor Investment Return:', 
                                               '{}%'.format(np.round(MLydata[-1], decimals=2)), 
                                               'Naive Investment Return:', 
                                               '{}%'.format(np.round(ydata[0], decimals=2))]
            except:
                print('No data for this selection')
                names = ['ML Predictor Investment Return:', 
                                               '{}%'.format(0, decimals=2), 
                                               'Naive Investment Return:', 
                                               '{}%'.format(np.round(ydata[0], decimals=2))]
                
            source = ColumnDataSource(data=dict(height=[250, 200, 100, 50],
                                        weight=[45, 160, 75, 160],
                                        names=names))
            labels = LabelSet(x='weight', y='height', text='names', level='glyph',
                  x_offset=5, y_offset=5, source=source, render_mode='canvas',
                             x_units='screen', y_units='screen',text_font_size="15pt")

            p2.add_layout(labels)
            grid = gridplot([[p, p2]])
            show(grid)


    def dropdown_year_eventhandler(change):
        common_filtering(change.new, dropdown_quarter.value)
    def dropdown_quarter_eventhandler(change):
        common_filtering(dropdown_year.value, change.new)

    dropdown_year.observe(dropdown_year_eventhandler, names='value')
    dropdown_quarter.observe(dropdown_quarter_eventhandler, names='value')

    display(dropdown_year)
    display(dropdown_quarter)

    display(output)

In [5]:
pd.options.mode.chained_assignment = None 