<h1> <center> Predicting Day Trade Return by Deep Learning </center> </h1>

The aim of this project: Predicting the possible outcome of a day trade by training a deep learning model on the image data of historic candle stick charts with some financial indicators drawn on them 

- **Data Scraping**: 

	For 100 stocks listed in S&P500 index, scraped the historical price data for the last five years. 


- **Creating .png images***:

	For ever 22 day long interval, draw the candlestick chart of the data along with some financial indicators (bollinger bands for now) on it
	For each image file, created day_trade_precentage feature - calculated as the percentage return of buying the stock at the Close price of the 22nd day (the last day included in the candle stick chart) and selling it at the next day's Close price
	Discretized the percentage return into N many categories. 
		How are the categories created? 
	Save the image files in the directory images/label, where label is its category


- **Preparing Data Directory for flow_from_directory**: 

	In order to be able to use flow_from_directory method of Keras, split the data into 3 directories under images_separated directory, called train_data, validation_data, test_data. The structure of the directory is as follows:

	```pyton 
    images_separated/
		train_data/
			label_1/
				train1_image_1.png
				train1_image_2.png
				...
			label_2/
				train2_image_1.png
				train2_image_2.png
				...
			...
		validation_data/
			label_1/
				validation1_image_1.png
				validation1_image_2.png
				...
			label_2/
				validation2_image_1.png
				validation2_image_2.png
				...
			...
		test_data/
			label_1/
				test1_image_1.png
				test1_image_2.png
				...
			label_2/
				test2_image_1.png
				test2_image_2.png
				...
			...
    ```

- **Train CNN model**: 

	The architecture of the CNN model is as follows:


- **Results**: 

Below is the table showing the the accuracy as the number of categories representing the discretized percentage returns changes

|  num_cat |  2  |  5  |  10  |  14  |
|----------|-----|-----|------|------|
| accuracy | --  | --  |  --  | 0.18 |


In [1]:
# Load the required packages
import plotly.graph_objects as go
import pandas as pd
import os
import shutil
import numpy as np
import matplotlib.pyplot as plt
import math 

## Data Scraping

In [2]:
from Scrape_Historical_Data import scrape_historical_data

In [3]:
list_stocks = [
"MSFT", "AAPL", "AMZN", "GOOG", "GOOGL", "FB", "BRK.B", "V", "WMT", "JPM", "PG", 
"MA", "UNH", "INTC", "VZ", "T", "HD", "BAC", "MRK", "DIS", "PFE", "PEP", "CSCO", 
"CMCSA", "ORCL", "NFLX", "XOM", "NVDA", "ADBE", "ABT", "CRM", "NKE", "CVX", "LLY", "COST", 
"WFC", "MCD", "MDT", "BMY", "AMGN", "NEE", "PYPL", "TMO", "PM", "ABBV", "ACN", "CHTR", 
"LMT", "DHR", "UNP", "IBM", "TXN", "HON", "AVGO", "GILD", "C", "BA", "LIN", "UTX", 
"UPS", "SBUX", "MMM", "CVS", "QCOM", "FIS", "AXP", "TMUS", "MDLZ", "MO", "BLK", "LOW", "GE", 
"FISV", "CME", "D", "CI", "INTU", "SYK", "SO", "BDX", "PLD", "CAT", "EL", "SPGI", 
"ISRG", "CCI", "AGN", "TJX", "ADP", "VRTX", "ANTM", "CL", "GS", "AMD", "USB", "ZTS", "NOC", 
"MS", "NOW", "BIIB", "BKNG", "EQIX", "REGN", "CB", "MU", "TGT", "ITW", "ECL", "TFC", 
"ATVI", "CSX", "GPN", "SCHW", "MMC", "PGR", "PNC", "BSX", "KMB", "APD", "DE", "SHW", "AMAT", 
"AEP", "MCO", "EW", "WM", "BAX", "LHX", "NSC", "ILMN", "RTN", "HUM", "WBA", "SPG",  
"GD", "NEM", "DG", "SRE", "LRCX", "EXC", "DLR", "PSA", "ADI", "ROP", "CNC", "LVS", "COP", 
"FDX", "GIS", "KMI", "ADSK", "XEL", "ETN", "GM", "MNST", "ROST", "KHC", "HCA", "SBAC", "BK", 
"MET", "WEC", "ALL", "EMR", "STZ", "EA", "HSY", "ES", "ED", "SYY", "CTSH", "AFL", 
"MAR", "TRV", "COF", "DD", "HRL", "HPQ", "RSG", "EBAY", "INFO", "MSCI", "EQR", "ORLY", "MSI", 
"TROW", "KR", "PSX", "VFC", "AVB", "PEG", "VRSK", "KLAC", "AIG", "MCK", "APH", "A", "AWK", 
"CLX", "PAYX", "WLTW", "DOW", "PRU", "TEL", "BLL", "EOG", "FE", "IQV", "YUM", "PCAR", "F", 
"RMD", "WELL", "K", "VRSN", "EIX", "PPG", "AZO", "JCI", "TWTR", "CMI", "IDXX", "TT", "ZBH", 
"O", "PPL", "ETR", "HLT", "ANSS", "SLB", "DAL", "CTAS", "LUV", "DTE", "XLNX", "SNPS", 
"ADM", "ALXN", "VLO", "AEE", "CERN", "DLTR"
]

In [4]:
for stock_code in list_stocks: 
    scrape_historical_data(stock_code)
print('\nData scraping is complete.')

Historical price data for MSFT is scraped.
Historical price data for AAPL is scraped.
Historical price data for AMZN is scraped.
Historical price data for GOOG is scraped.
Historical price data for GOOGL is scraped.
Historical price data for FB is scraped.
Historical price data for BRK.B is scraped.
Historical price data for V is scraped.
Historical price data for WMT is scraped.
Historical price data for JPM is scraped.
Historical price data for PG is scraped.
Historical price data for MA is scraped.
Historical price data for UNH is scraped.
Historical price data for INTC is scraped.
Historical price data for VZ is scraped.
Historical price data for T is scraped.
Historical price data for HD is scraped.
Historical price data for BAC is scraped.
Historical price data for MRK is scraped.
Historical price data for DIS is scraped.
Historical price data for PFE is scraped.
Historical price data for PEP is scraped.
Historical price data for CSCO is scraped.
Historical price data for CMCSA i

Historical price data for APH is scraped.
Historical price data for A is scraped.
Historical price data for AWK is scraped.
Historical price data for CLX is scraped.
Historical price data for PAYX is scraped.
Historical price data for WLTW is scraped.
Historical price data for DOW is scraped.
Historical price data for PRU is scraped.
Historical price data for TEL is scraped.
Historical price data for BLL is scraped.
Historical price data for EOG is scraped.
Historical price data for FE is scraped.
Historical price data for IQV is scraped.
Historical price data for YUM is scraped.
Historical price data for PCAR is scraped.
Historical price data for F is scraped.
Historical price data for RMD is scraped.
Historical price data for WELL is scraped.
Historical price data for K is scraped.
Historical price data for VRSN is scraped.
Historical price data for EIX is scraped.
Historical price data for PPG is scraped.
Historical price data for AZO is scraped.
Historical price data for JCI is scr

## Preprocessing Data and Creating Image Files

In [5]:
# Import functions I created 
from DataFrame_Preprocessors import cleaner, calculate_return, categorizer 
from Bollinger_Bands import bollinger_bands 
from Image_Creator import image_creator 

In [6]:
time_interval = 22
categories = (-1, 0, 1)

for sub_dir in categories:
    images_dir = 'images/{}'.format(sub_dir)
    if not os.path.exists(images_dir):
        os.makedirs(images_dir)

for stock_name in os.listdir('historical_price_data'):    
    data_path = 'historical_price_data/' + stock_name 
    
    if os.stat(data_path).st_size <= 5:
        continue  

    stock_price = pd.read_csv(data_path)

    if len(stock_price) < 200 :
        continue 

    stock_price = cleaner(stock_price)
    stock_price = bollinger_bands(stock_price)
    stock_price = calculate_return(stock_price)
    stock_price = categorizer(stock_price)

    for start in range(len(stock_price) - time_interval):
        end = start + time_interval
        sub_stock_price = stock_price[start: end] 
        file_name = '{}_{}'.format(stock_name[:-4], start)
        
        image_creator(df = sub_stock_price, file_name = file_name)
        

In [7]:
from Train_Test_Directory_Split import train_test_directory_split

In [8]:
# Prepare the data directory to flow_from_direcoty method 
train_test_directory_split(classes=categories)


Total images in class -1 is 164605
	98763 copied to ../training/-1
	32921 copied to ../validation/-1
	32921 copied to ../testing/-1

Total images in class 0 is 189794
	113876 copied to ../training/0
	37959 copied to ../validation/0
	37959 copied to ../testing/0

Total images in class 1 is 189666
	113799 copied to ../training/1
	37933 copied to ../validation/1
	37934 copied to ../testing/1


## Deep Learning Model

In [9]:
import CNN_Model

Using TensorFlow backend.


Model: "sequential_1"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
conv2d_1 (Conv2D)            (None, 64, 64, 88)        2288      
_________________________________________________________________
max_pooling2d_1 (MaxPooling2 (None, 16, 16, 88)        0         
_________________________________________________________________
conv2d_2 (Conv2D)            (None, 16, 16, 44)        96844     
_________________________________________________________________
max_pooling2d_2 (MaxPooling2 (None, 4, 4, 44)          0         
_________________________________________________________________
flatten_1 (Flatten)          (None, 704)               0         
_________________________________________________________________
dense_1 (Dense)              (None, 3)                 2115      
Total params: 101,247
Trainable params: 101,247
Non-trainable params: 0
________________________________________________

## Results 

Here we have tried to classify the 22 day candlestick chart of a stock with its bollinger bands into three categories, namely as follows: 
    
    - category  1: percentage return > 0.5% 
    - category  0: percentage return between -0.5% and +0.5% 
    - category -1: percentage return < -0.5% 

The resulting accuracy is . In next version, I will try to improve the architecture of the network and present the precision and recall as well. 
