# Top 10 Stocks Past 7 Days

In this notebook, I visualize the total gain and loss of all stocks in <a href="https://www.kaggle.com/paultimothymooney/stock-market-data"> this fantastic stock market dataset provided by Kaggle staff</a>. There are some adjustable parameters that I've listed in the section right below this, so if you wish, you can fork this notebook and adjust the parameters to whatever makes you most comfortable.

Currently, all I have in this notebook is some momentum analysis. **Do not simply trade in the stock market because you see some stock in this notebook**. Do your due dilligence (research the stocks) and make educated choices. Additionally, momentum (gain/loss in past couple of days) is arguably not a great indicator, at least when it's isolated. Therefore, if there is interest, I will add more charts based on other indicators (some popular ones are MA/EMA (Moving Average), RSI (Relative Strength Index), BB (Bollinger Bands), etc.). 

Finally, if you are interested in using this stock market data to train a neural network and do your own algorithmic trading, feel free to check out <a href="https://www.kaggle.com/ironicninja/stock-market-cluster-analysis"> some work I did in the past</a>. I plan on using this larger dataset to train a more accurate neural network as well, so be on the lookout for that in the future. Please ```upvote``` if you like this notebook :)

<div style="background: #ffcccb">
    Thanks to @ZombieChris for pointing out that I should use <code>Adjusted Close</code> instead of <code>Close</code> for the stock prices.
</div>

<h2> Table of Contents </h2>

<ol>
<li> <a href="https://www.kaggle.com/ironicninja/top-10-stocks-past-7-days#Adjustable-Parameters"> Adjustable Parameters </a> </li>
    <li> <a href="https://www.kaggle.com/ironicninja/top-10-stocks-past-7-days#Actual-Data"> Data (Click this if you just want to look at results) </a> </li>
<li> <a href="https://www.kaggle.com/ironicninja/top-10-stocks-past-7-days#Tutorial"> Tutorial </a> </li>
</ol>

# Adjustable Parameters

Feel free to fork the notebook and play with these parameters! If you want to see all of the data in the visualizations, set the th_total and th_perc values very high.

* ```DAYS_BACK (default = 4)``` - Determines how many trading days back you look to calculate the momentum. I suggest keeping this at 4 since it allows you see the gain/loss of stocks in approximately the past week.
* ```LOOK_AT (default = 10)``` - Determines how many companies you would like to visualize for gain/loss (2*LOOK_AT companies will be plotted)
* ```th_total (default = 100)``` - Anything with absolute value of momentum above this threshold will not be included in the visualization.
* ```th_perc (default = 500)``` - Anything with absolute value of percentage momentum above this threshold will not be included in the visualization.

In [None]:
DAYS_BACK = 4
LOOK_AT = 10
th_total = 100
th_perc = 500

# Actual Data

Interpret these results as you will, methodology is in the tutorial section. Please pay attention to the information printed in the terminal, as it contains the companies that were removed from the visualizations due to their extreme outlier nature (which is based on the threshold values set above).

In [None]:
import numpy as np
import pandas as pd
import os
import sys
import matplotlib.pyplot as plt
import plotly.graph_objects as go
import plotly.express as px
import plotly.graph_objects as go
import plotly.offline as py
import math
import itertools
py.init_notebook_mode(connected=True)

def momentum(df, time):
    org_arr = df['Adjusted Close'] # Current time
    changed_arr = (np.concatenate(([0 for i in range(time)], df['Adjusted Close'])))[:-time] # Looks back in time
    
    new_arr = org_arr-changed_arr
    filler = np.mean(new_arr[time:])
    new_arr[:time] = filler
    
    new_perc_arr = (org_arr-changed_arr)/changed_arr*100
    filler_perc = np.mean(new_perc_arr[time:])
    new_perc_arr[:time] = filler_perc
    
    df['MOM' + str(time)] = new_arr
    df['% MOM' + str(time)] = new_perc_arr
    return df

def convert(df):
    df = momentum(df, DAYS_BACK)
    return df

def remove_outliers(my_dict, index, th):
    del_list = []
    for k, v in my_dict.items():
        if abs(v[index]) >= th:
            print(f'%s ({v[index]})' % k.replace('.csv', ''))
            del_list.append(k)
        else:
            break
            
    for k in del_list:
        del my_dict[k]
        
    return my_dict

def show_slice(neg_dict, pos_dict, index, title):
    total_neg_dict = {k: v for k, v in sorted(dict(itertools.islice(neg_dict.items(), 10)).items(), key=lambda item: item[1][index], reverse=True)} # Resort
    total_pos_dict = dict(itertools.islice(pos_dict.items(), 10))
    total_dict = {**total_pos_dict, **total_neg_dict}

    labels = [k.replace('.csv', '') for k in total_dict.keys()]
    values = [v[index] for v in total_dict.values()]

    fig = go.Figure()

    fig.add_trace(go.Bar(x=labels, y=values))
    fig.update_layout(title={'text': title, 'x': 0.5,
                         'xanchor': 'center', 'font': {'size': 20}}, yaxis_title="%s Gain/Loss" % ("Total" if index == 0 else "Relative"))

    fig.show()

def vis_stock(file, name):
    print(f'{name} Data:')
    stock_list = {}
    for dirname, _, filenames in os.walk(file):
        for filename in filenames:
            try:
                stock_list[filename] = convert(pd.read_csv(os.path.join(dirname, filename)))
            except Exception as e:
                print(filename, e)

    momentum_dict = {}
    empty_companies = []
    for key in stock_list:
        company_df = stock_list[key]
        n = len(company_df)-1

        total = company_df['MOM4'][n]
        perc = company_df['% MOM4'][n]

        if math.isnan(total) | math.isnan(perc):
            empty_companies.append(key.replace('.csv', ''))
        else:
            momentum_dict[key] = [total, perc]

    print("The following companies do not have momentum values: ")
    s = ""
    for company in empty_companies:
        s += f'{company} '

    print(s)
    print("\n-----------\n")

    print("Negative Total Gain/Loss Removed Companies: ")
    sorted_neg_total = remove_outliers({k: v for k, v in sorted(momentum_dict.items(), key=lambda item: item[1][0])}, 0, th_total)
    print("\nPositive Total Gain/Loss Removed Companies: ")
    sorted_pos_total = remove_outliers({k: v for k, v in sorted(momentum_dict.items(), key=lambda item: item[1][0], reverse=True)}, 0, th_total)

    print("\nNegative Relative Gain/Loss Removed Companies: ")
    sorted_neg_perc = remove_outliers({k: v for k, v in sorted(momentum_dict.items(), key=lambda item: item[1][1])}, 1, th_perc)
    print("\nPositive Relative Gain/Loss Removed Companies: ")
    sorted_pos_perc = remove_outliers({k: v for k, v in sorted(momentum_dict.items(), key=lambda item: item[1][1], reverse=True)}, 1, th_perc)
    
    show_slice(sorted_neg_total, sorted_pos_total, 0, f"Top {LOOK_AT} Gain/Loss for {name} Companies")
    show_slice(sorted_neg_perc, sorted_pos_perc, 1, f"Top {LOOK_AT} Relative Gain/Loss for {name} Companies")

In [None]:
vis_stock("../input/stock-market-data/stock_market_data/nasdaq/csv", "NASDAQ")

In [None]:
vis_stock("../input/stock-market-data/stock_market_data/nyse/csv", "NYSE")

In [None]:
vis_stock("../input/stock-market-data/stock_market_data/sp500/csv", "S&P500")

In [None]:
vis_stock("../input/stock-market-data/stock_market_data/forbes2000/csv", "Forbes2000")

# Tutorial

If you are following along this notebook mainly for its code, I will perform the following analysis on NASDAQ data and write detailed descriptions of what I'm doing in each code block using markdown. 

<h2> Essential Imports </h2>

In [None]:
import numpy as np
import pandas as pd
import os
import sys
import matplotlib.pyplot as plt
import plotly.graph_objects as go
import plotly.express as px
import plotly.offline as py
import math
import itertools
py.init_notebook_mode(connected=True)

<h2> Momentum Function </h2>

Simple vectorized implementation of momentum. I include both total (difference) momentum and relative (percentage) momentum in these visualizations.

In [None]:
def momentum(df, time):
    org_arr = df['Close'] # Current time
    changed_arr = (np.concatenate(([0 for i in range(time)], df['Close'])))[:-time] # Looks back in time
    
    new_arr = org_arr-changed_arr
    filler = np.mean(new_arr[time:])
    new_arr[:time] = filler
    
    new_perc_arr = (org_arr-changed_arr)/changed_arr*100
    filler_perc = np.mean(new_perc_arr[time:])
    new_perc_arr[:time] = filler_perc
    
    df['MOM' + str(time)] = new_arr
    df['% MOM' + str(time)] = new_perc_arr
    return df

def convert(df):
    df = momentum(df, DAYS_BACK)
    return df

<h2> Reading in the Data </h2>

Here, I read in all of the csv data in a folder by calling a ```for``` loop on ```filenames```. This saves me from having to manually add each csv file to my dictionary.

In [None]:
%%time

nasdaq_list = {}
for dirname, _, filenames in os.walk('../input/stock-market-data/stock_market_data/nasdaq/csv'):
    for filename in filenames:
        try:
            nasdaq_list[filename] = convert(pd.read_csv(os.path.join(dirname, filename)))
        except Exception as e:
            print(filename, e)

<h2> Extracting the Momentum </h2>

I extract the momentum for the most recent day in the dataset and check if it is ```NaN```. If it is, the company is "discarded", and we print to the terminal which companies have been discarded.

In [None]:
momentum_dict = {}
empty_companies = []
for key in nasdaq_list:
    company_df = nasdaq_list[key]
    n = len(company_df)-1
    
    total = company_df['MOM4'][n]
    perc = company_df['% MOM4'][n]
    
    if math.isnan(total) | math.isnan(perc):
        empty_companies.append(key.replace('.csv', ''))
    else:
        momentum_dict[key] = [total, perc]
        
print("The following companies do not have momentum values: ")
s = ""
for company in empty_companies:
    s += f'{company} '
    
print(s)

<h2> Creating Our Sorted Dictionary & Removing Outliers </h2>

I do two things in this code; I sort the dictionary by respective momentums, and also remove the outliers using the simple ```remove_outliers``` function. If a company is removed from the visualization, I print it to the terminal. The reason I remove outliers is so that the chart is remains readable. If there was a stock with 5000 dollar gain, then stocks with 50 dollar gain would be unreadable.

In [None]:
def remove_outliers(my_dict, index, th):
    del_list = []
    for k, v in my_dict.items():
        if abs(v[index]) >= th:
            print(f'{k} ({v[index]})')
            del_list.append(k)
        else:
            break
            
    for k in del_list:
        del my_dict[k]
        
    return my_dict

print("Negative Total Removed: ")
sorted_neg_total = remove_outliers({k: v for k, v in sorted(momentum_dict.items(), key=lambda item: item[1][0])}, 0, th_total)
print("\nPositive Total Removed: ")
sorted_pos_total = remove_outliers({k: v for k, v in sorted(momentum_dict.items(), key=lambda item: item[1][0], reverse=True)}, 0, th_total)

print("\nNegative Percentage Removed: ")
sorted_neg_perc = remove_outliers({k: v for k, v in sorted(momentum_dict.items(), key=lambda item: item[1][1])}, 1, th_perc)
print("\nPositive Percentage Removed: ")
sorted_pos_perc = remove_outliers({k: v for k, v in sorted(momentum_dict.items(), key=lambda item: item[1][1], reverse=True)}, 1, th_perc)

<h2> Visualizing Our Data </h2>

And the final step is to visualize our data! I used the ```plotly``` library for this because I think interactive graphs are just much more meaningful than the static graphs ```matplotlib``` provides.

In [None]:
def show_slice(neg_dict, pos_dict, index, title):
    total_neg_dict = {k: v for k, v in sorted(dict(itertools.islice(neg_dict.items(), 10)).items(), key=lambda item: item[1][index], reverse=True)} # Resort
    total_pos_dict = dict(itertools.islice(pos_dict.items(), 10))
    total_dict = {**total_pos_dict, **total_neg_dict}

    labels = [k.replace('.csv', '') for k in total_dict.keys()]
    values = [v[index] for v in total_dict.values()]

    fig = go.Figure()

    fig.add_trace(go.Bar(x=labels, y=values))
    fig.update_layout(title={'text': title, 'x': 0.5,
                         'xanchor': 'center', 'font': {'size': 20}}, yaxis_title="%s Gain/Loss" % ("Total" if index == 0 else "Relative"))

    fig.show()

In [None]:
show_slice(sorted_neg_total, sorted_pos_total, 0, f"Top {LOOK_AT} Gain/Loss for NASDAQ Companies")

In [None]:
show_slice(sorted_neg_perc, sorted_pos_perc, 1, f"Top {LOOK_AT} Relative Gain/Loss for NASDAQ Companies")

And that's it! If you found the data in this notebook/the tutorial helpful, please leave an upvote and let me know! Thanks again to the Kaggle staff for providing such an in-depth and large stock market dataset, it would be really hard to do this manually.