<h1> <center> Predicting Day Trade Return by Deep Learning </center> </h1>

The aim of this project: Predicting the possible outcome of a day trade by training a deep learning model on the image data of historic candle stick charts with some financial indicators drawn on them 

- **Data Scraping**: 

	For 100 stocks listed in S&P500 index, scraped the historical price data for the last five years. 


- **Creating .png images***:

	For ever 22 day long interval, draw the candlestick chart of the data along with some financial indicators (bollinger bands for now) on it
	For each image file, created day_trade_precentage feature - calculated as the percentage return of buying the stock at the Close price of the 22nd day (the last day included in the candle stick chart) and selling it at the next day's Close price
	Discretized the percentage return into N many categories. 
		How are the categories created? 
	Save the image files in the directory images/label, where label is its category


- **Preparing Data Directory for flow_from_directory**: 

	In order to be able to use flow_from_directory method of Keras, split the data into 3 directories under images_separated directory, called train_data, validation_data, test_data. The structure of the directory is as follows:

	```pyton 
    images_separated/
		train_data/
			label_1/
				train1_image_1.png
				train1_image_2.png
				...
			label_2/
				train2_image_1.png
				train2_image_2.png
				...
			...
		validation_data/
			label_1/
				validation1_image_1.png
				validation1_image_2.png
				...
			label_2/
				validation2_image_1.png
				validation2_image_2.png
				...
			...
		test_data/
			label_1/
				test1_image_1.png
				test1_image_2.png
				...
			label_2/
				test2_image_1.png
				test2_image_2.png
				...
			...
    ```

- **Train CNN model**: 

	The architecture of the CNN model is as follows:


- **Results**: 

Below is the table showing the the accuracy as the number of categories representing the discretized percentage returns changes

|  num_cat |  2  |  5  |  10  |  14  |
|----------|-----|-----|------|------|
| accuracy | --  | --  |  --  | 0.18 |


In [6]:
# Load the required packages
import plotly.graph_objects as go
import pandas as pd
import os
import numpy as np
import matplotlib.pyplot as plt
import math 

# Import functions I created 
from Bollinger_Bands import bollinger_bands
from DataFrame_Preprocessors import cleaner, calculate_return, categorizer 
from Image_Creator import image_creator
from Train_Test_Directory_Split import train_test_directory_split

## Data Scraping

In [7]:
# To be filled 

## Creating Image Data

In [None]:
time_interval = 22
# create a pd dataframe to store the labels for each feature
df_labels = pd.DataFrame(columns = ['feature', 'label'])

if not os.path.exists('images'): 
    os.mkdir('images')

for stock_name in os.listdir('data_folder'):    
    data_path = 'data_folder/' + stock_name 

    if os.stat(data_path).st_size <= 5:
        pass 

    stock_price = pd.read_csv(data_path)

    if len(stock_price) < 60 :
        pass

    stock_price = cleaner(stock_price)
    stock_price = bollinger_bands(stock_price)
    stock_price = calculate_return(stock_price)
    stock_price = categorizer(stock_price)

    for start in range(len(stock_price) - time_interval):
        end = start + time_interval
        sub_stock_price = stock_price[start: end] 
        file_name = '{}_{}'.format(stock_name[:-4], start)
        
        image_creator(df = sub_stock_price, file_name = file_name)



In [None]:
# Prepare the data directory to flow_from_direcoty method 
train_test_directory_split()

## Deep Learning Model

In [None]:
import cnn_model

## Results 

In [1]:
import CNN_Model

Using TensorFlow backend.


Model: "sequential_1"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
conv2d_1 (Conv2D)            (None, 64, 64, 44)        1144      
_________________________________________________________________
max_pooling2d_1 (MaxPooling2 (None, 16, 16, 44)        0         
_________________________________________________________________
conv2d_2 (Conv2D)            (None, 16, 16, 44)        48444     
_________________________________________________________________
max_pooling2d_2 (MaxPooling2 (None, 4, 4, 44)          0         
_________________________________________________________________
dropout_1 (Dropout)          (None, 4, 4, 44)          0         
_________________________________________________________________
flatten_1 (Flatten)          (None, 704)               0         
_________________________________________________________________
dense_1 (Dense)              (None, 14)               