#**Stock Trading Using Deep Q-Learning**


## **Problem Statement**

Prepare an agent by implementing Deep Q-Learning that can perform unsupervised trading in stock trade. The aim of this project is to train an agent that uses Q-learning and neural networks to predict the profit or loss by building a model and implementing it on a dataset that is available for evaluation.


The stock trading index environment provides the agent with a set of actions:<br>
* Buy<br>
* Sell<br>
* Sit

This project has following sections:
* Import libraries 
* Create a DQN agent
* Preprocess the data
* Train and build the model
* Evaluate the model and agent
<br><br>

**Steps to perform**<br>

In the section **create a DQN agent**, create a class called agent where:
* Action size is defined as 3
* Experience replay memory to deque is 1000
* Empty list for stocks that has already been bought
* The agent must possess the following hyperparameters:<br>
  * gamma= 0.95<br>
  * epsilon = 1.0<br>
  * epsilon_final = 0.01<br>
  * epsilon_decay = 0.995<br>


    Note: It is advised to compare the results using different values in hyperparameters.

* Neural network has 3 hidden layers
* Action and experience replay are defined




## **Solution**

### **Dataset **

In [1]:
from google.colab import drive
drive.mount('/content/drive', force_remount=True)

Mounted at /content/drive


In [2]:
!unzip -qq /content/drive/MyDrive/datasets/simplilearn_RL_stock_trading/dataset.zip

In [None]:
!scp /content/dataset/GSPC_Training_Dataset.csv /content/GSPC_Training_Dataset.csv
!scp /content/dataset/GSPC_Evaluation_Dataset.csv /content/GSPC_Evaluation_Dataset.csv

**REINFORCEMENT LIBRARIES**

In [1]:
!pip install python-opengl xvfb
!pip install pyvirtualdisplay
!apt install xvfb -y
!pip install piglet
!pip3 install box2d-py
!pip3 install gym[Box_2D]
!pip install tensorflow==2.3.1 gym keras-rl2

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
[31mERROR: Could not find a version that satisfies the requirement python-opengl (from versions: none)[0m
[31mERROR: No matching distribution found for python-opengl[0m
Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting pyvirtualdisplay
  Downloading PyVirtualDisplay-3.0-py3-none-any.whl (15 kB)
Installing collected packages: pyvirtualdisplay
Successfully installed pyvirtualdisplay-3.0
Reading package lists... Done
Building dependency tree       
Reading state information... Done
The following package was automatically installed and is no longer required:
  libnvidia-common-460
Use 'apt autoremove' to remove it.
The following NEW packages will be installed:
  xvfb
0 upgraded, 1 newly installed, 0 to remove and 20 not upgraded.
Need to get 785 kB of archives.
After this operation, 2,271 kB of additional disk space will be used.

### **Import the libraries** 

In [3]:
from tensorflow import keras
from tensorflow.keras.models import Sequential
from tensorflow.keras.models import load_model
from tensorflow.keras.layers import Dense
from tensorflow.keras.optimizers import Adam
import numpy as np
import random
from collections import deque

### **Create a DQN agent**

**Use the instruction below to prepare an agent**


In [4]:
# Action space include 3 actions: Buy, Sell, and Sit
#Setting up the experience replay memory to deque with 1000 elements inside it
#Empty list with inventory is created that contains the stocks that were already bought
#Setting up gamma to 0.95, that helps to maximize the current reward over the long-term
#Epsilon parameter determines whether to use a random action or to use the model for the action. 
#In the beginning random actions are encouraged, hence epsilon is set up to 1.0 when the model is not trained.
#And over time the epsilon is reduced to 0.01 in order to decrease the random actions and use the trained model
#We're then set the speed of decreasing epsililon in the epsilon_decay parameter

#Defining our neural network:
#Define the neural network function called _model and it just takes the keyword self
#Define the model with Sequential()
#Define states i.e. the previous n days and stock prices of the days
#Defining 3 hidden layers in this network
#Changing the activation function to relu because mean-squared error is used for the loss




### **Preprocess the stock market data**

**The environment is given**

In [5]:
import math

# prints formatted price
def formatPrice(n):
	return ("-$" if n < 0 else "$") + "{0:.2f}".format(abs(n))

# returns the vector containing stock data from a fixed file
def getStockDataVec(key):
	vec = []
	lines = open("" + key + ".csv", "r").read().splitlines()

	for line in lines[1:]:
		vec.append(float(line.split(",")[4]))

	return vec

# returns the sigmoid
def sigmoid(x):
	return 1 / (1 + math.exp(-x))

# returns an an n-day state representation ending at time t
def getState(data, t, n):
	d = t - n + 1
	block = data[d:t + 1] if d >= 0 else -d * [data[0]] + data[0:t + 1] # pad with t0
	res = []
	for i in range(n - 1):
		res.append(sigmoid(block[i + 1] - block[i]))

	return np.array([res])


In [6]:
from sklearn.preprocessing import LabelEncoder
from sklearn.model_selection import train_test_split
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Activation
from tensorflow.keras.optimizers import SGD
from tensorflow.keras.optimizers import Adam
from tensorflow.keras.layers import Dense
from tensorflow.keras.regularizers import l2
from tensorflow.keras.layers import BatchNormalization
from tensorflow.keras.layers import Dropout

In [7]:
# changes
class Agent():
    def __init__(self, window_size, is_eval=False, model_name=''):

        self.nS = window_size
        self.nA = 3
        self.memory = deque([], maxlen=1000)
        self.alpha = 0.001
        self.window_size = window_size
        self.gamma = 0.95
        #Explore/Exploit
        self.epsilon = 1
        self.epsilon_min = 0.01
        self.epsilon_decay = 0.995
        # self.model = self.build_model()
        self.loss = []

        self.is_eval = is_eval
        self.model = load_model(model_name) if self.is_eval else self.build_model()
        
    def build_model(self):
        # model = keras.Sequential() 
        # model.add(keras.layers.Dense(24, input_dim=self.window_size, activation='relu')) #[Input] -> Layer 1
        # #   Dense: Densely connected layer https://keras.io/layers/core/
        # #   24: Number of neurons
        # #   input_dim: Number of input variables
        # #   activation: Rectified Linear Unit (relu) ranges >= 0
        # model.add(keras.layers.Dense(24, activation='relu')) #Layer 2 -> 3
        # model.add(keras.layers.Dense(self.nA, activation='linear')) #Layer 3 -> 4
        # # model.add(keras.layers.Dense(self.nA, activation='linear')) #Layer 4 -> [output]
        # #   Size has to match the output (different actions)
        # #   Linear activation on the last layer
        # model.compile(loss='mean_squared_error', #Loss function: Mean Squared Error
        #               optimizer=keras.optimizers.Adam(lr=self.alpha)) #Optimaizer: Adam (Feel free to check other options)

        model = Sequential()
        model.add(Dense(50, input_dim=self.window_size, kernel_initializer="uniform", kernel_regularizer=l2(0.0002), name="LAYER____1"))
        model.add(Activation("elu"))
        model.add(BatchNormalization())
        model.add(Dropout(0.25))
        model.add(Dense(35, kernel_initializer="uniform", kernel_regularizer=l2(0.0002), name="LAYER____2"))
        model.add(Activation("elu"))
        model.add(BatchNormalization())
        model.add(Dropout(0.25))
        model.add(Dense(20, kernel_initializer="uniform", kernel_regularizer=l2(0.0002), name="LAYER____3"))
        model.add(Activation("elu"))
        model.add(BatchNormalization())
        model.add(Dropout(0.25))
        model.add(Dense(self.nA, activation='linear'))

        opt = Adam(lr=0.0001, beta_1=0.5, decay=0.0002)
        model.compile(loss="mean_squared_error", optimizer=opt)

        return model

    def act(self, state):#act
        if np.random.rand() <= self.epsilon:
            return random.randrange(3) #Explore
        action_vals = self.model.predict(state) #Exploit: Use the NN to predict the correct action from this state
        return np.argmax(action_vals[0])      

    def test_action(self, state): #Exploit
        action_vals = self.model.predict(state)
        return np.argmax(action_vals[0])

    def store(self, state, action, reward, nstate, done):
        #Store the experience in memory
        self.memory.append( (state, action, reward, nstate, done) )

    def expReplay(self, batch_size): ## training the neural network 
        #Execute the experience replay
        minibatch = random.sample( self.memory, batch_size ) #Randomly sample from memory

        #Convert to numpy for speed by vectorization
        x = []
        y = []
        np_array = np.array(minibatch)
        st = np.zeros((0, self.nS)) #States
        nst = np.zeros( (0, self.nS) )#Next States
        for i in range(len(np_array)): #Creating the state and next state np arrays
            st = np.append( st, np_array[i,0], axis=0)
            nst = np.append( nst, np_array[i,3], axis=0)
        st_predict = self.model.predict(st) #Here is the speedup! I can predict on the ENTIRE batch
        nst_predict = self.model.predict(nst)
        index = 0
        for state, action, reward, nstate, done in minibatch:
            x.append(state)
            #Predict from state
            nst_action_predict_model = nst_predict[index]
            if done == True: #Terminal: Just assign reward much like {* (not done) - QB[state][action]}
                target = reward
            else:   #Non terminal
                target = reward + self.gamma * np.amax(nst_action_predict_model)
            target_f = st_predict[index]
            target_f[action] = target
            y.append(target_f)
            index += 1
        #Reshape for Keras Fit
        x_reshape = np.array(x).reshape(batch_size,self.nS)
        y_reshape = np.array(y)
        epoch_count = 1 #Epochs is the number or iterations
        hist = self.model.fit(x_reshape, y_reshape, epochs=epoch_count, verbose=0)
        #Graph Losses
        for i in range(epoch_count):
            self.loss.append( hist.history['loss'][i] )
        #Decay Epsilon
        if self.epsilon > self.epsilon_min:
            self.epsilon *= self.epsilon_decay

### **Train and build the model**

In [18]:
import sys

if len(sys.argv) != 4:
	print ("Usage: python train.py [stock] [window] [episodes]")
	exit()


stock_name = input("Enter stock_name, window_size, Episode_count")
#Fill the given information when prompted: 
#Enter stock_name = GSPC_Training_Dataset
#window_size = 10
#Episode_count = 100 or it can be 10 or 20 or 30 and so on.

window_size = input()
episode_count = input()
stock_name = str(stock_name)
window_size = int(window_size)
episode_count = int(episode_count)

agent = Agent(window_size)
data = getStockDataVec(stock_name)
l = len(data) - 1
batch_size = 32

for e in range(episode_count + 1):
	print ("Episode " + str(e) + "/" + str(episode_count))
	state = getState(data, 0, window_size + 1)

	total_profit = 0
	agent.inventory = []

	for t in range(l):
		action = agent.act(state)

		# sit
		next_state = getState(data, t + 1, window_size + 1)
		reward = 0

		if action == 1: # buy
			agent.inventory.append(data[t])
			# print ("Buy: " + formatPrice(data[t]))

		elif action == 2 and len(agent.inventory) > 0: # sell
			bought_price = agent.inventory.pop(0)
			reward = max(data[t] - bought_price, 0)
			total_profit += data[t] - bought_price
			# print ("Sell: " + formatPrice(data[t]) + " | Profit: " + formatPrice(data[t] - bought_price))

		done = True if t == l - 1 else False
		agent.memory.append((state, action, reward, next_state, done))
		state = next_state

		if done:
			print ("--------------------------------")
			print ("-----Episode: {} -----".format(e))
			print ("Total Profit: " + formatPrice(total_profit))
			

		if len(agent.memory) > batch_size:
			agent.expReplay(batch_size)

	# # if e % 10 == 0:
	if e % 10 == 0:
		agent.model.save("model_ep" + str(e))
	# agent.model.save("model_ep" + str(e))
 
#Fill the given information when prompted: 
#Enter stock_name = GSPC_Training_Dataset
#window_size = 10
#Episode_count = 100 or it can be 10 or 20 or 30 and so on. 

Usage: python train.py [stock] [window] [episodes]
Enter stock_name, window_size, Episode_countGSPC_Training_Dataset
10
30
Episode 0/30
--------------------------------
-----Episode: 0 -----
Total Profit: $865.92


Instructions for updating:
This property should not be used in TensorFlow 2.0, as updates are applied automatically.
Instructions for updating:
This property should not be used in TensorFlow 2.0, as updates are applied automatically.


Episode 1/30
--------------------------------
-----Episode: 1 -----
Total Profit: $7141.67
Episode 2/30
--------------------------------
-----Episode: 2 -----
Total Profit: $6871.84
Episode 3/30
--------------------------------
-----Episode: 3 -----
Total Profit: $6425.62
Episode 4/30
--------------------------------
-----Episode: 4 -----
Total Profit: $7306.68
Episode 5/30
--------------------------------
-----Episode: 5 -----
Total Profit: $7161.61
Episode 6/30
--------------------------------
-----Episode: 6 -----
Total Profit: $6990.42
Episode 7/30
--------------------------------
-----Episode: 7 -----
Total Profit: $7420.58
Episode 8/30
--------------------------------
-----Episode: 8 -----
Total Profit: $5741.95
Episode 9/30
--------------------------------
-----Episode: 9 -----
Total Profit: $7061.20
Episode 10/30
--------------------------------
-----Episode: 10 -----
Total Profit: $5241.04
Episode 11/30
--------------------------------
-----Episode: 11 -----
Total Profit: $709

In [1]:
# save all the model to my google drive
!scp -r /content/model_ep* /content/drive/MyDrive/datasets/simplilearn_RL_stock_trading/model_file

### **Evaluate the model and agent**

In [8]:
import sys
from tensorflow.keras.models import load_model


if len(sys.argv) != 3:
	print ("Usage: python evaluate.py [stock] [model]")
	exit()


stock_name = input("Enter Stock_name, Model_name")
model_name = input()
#Note: 
#Fill the given information when prompted: 
#Enter stock_name = GSPC_Evaluation_Dataset
#Model_name = respective model name

model = load_model("" + model_name)
window_size = model.layers[0].input.shape.as_list()[1]

agent = Agent(window_size, True, model_name)
data = getStockDataVec(stock_name)
l = len(data) - 1
batch_size = 32

state = getState(data, 0, window_size + 1)
total_profit = 0
agent.inventory = []

for t in range(l):
	action = agent.act(state)

	# sit
	next_state = getState(data, t + 1, window_size + 1)
	reward = 0

	if action == 1: # buy
		agent.inventory.append(data[t])
		print ("Buy: " + formatPrice(data[t]))

	elif action == 2 and len(agent.inventory) > 0: # sell
		bought_price = agent.inventory.pop(0)
		reward = max(data[t] - bought_price, 0)
		total_profit += data[t] - bought_price
		print ("Sell: " + formatPrice(data[t]) + " | Profit: " + formatPrice(data[t] - bought_price))

	done = True if t == l - 1 else False
	agent.memory.append((state, action, reward, next_state, done))
	state = next_state

	if done:
		print ("--------------------------------")
		# print	("-----Episode: {} -----".format(e))
		print (stock_name + " Total Profit: " + formatPrice(total_profit))

	# if len(agent.memory) > batch_size:
	# 	agent.expReplay(batch_size)

# GSPC_Evaluation_Dataset
# /content/model_ep30

Enter Stock_name, Model_nameGSPC_Evaluation_Dataset
/content/model_ep30
Buy: $1271.87
Buy: $1276.56
Buy: $1273.85
Buy: $1271.50
Buy: $1269.75
Buy: $1285.96
Sell: $1293.24 | Profit: $21.37
Buy: $1295.02
Sell: $1281.92 | Profit: $5.36
Sell: $1280.26 | Profit: $6.41
Buy: $1283.35
Buy: $1291.18
Buy: $1296.63
Sell: $1276.34 | Profit: $4.84
Sell: $1286.12 | Profit: $16.37
Sell: $1307.10 | Profit: $21.14
Buy: $1310.87
Buy: $1319.05
Sell: $1324.57 | Profit: $29.55
Buy: $1320.88
Sell: $1321.87 | Profit: $38.52
Buy: $1329.15
Sell: $1328.01 | Profit: $36.83
Sell: $1343.01 | Profit: $46.38
Sell: $1315.44 | Profit: $4.57
Sell: $1307.40 | Profit: -$11.65
Buy: $1306.10
Buy: $1319.88
Sell: $1327.22 | Profit: $6.34
Sell: $1306.33 | Profit: -$22.82
Sell: $1330.97 | Profit: $24.87
Sell: $1321.15 | Profit: $1.27
Buy: $1310.13
Buy: $1321.82
Buy: $1320.02
Buy: $1295.11
Buy: $1304.28
Sell: $1296.39 | Profit: -$13.74
Buy: $1281.87
Sell: $1256.88 | Profit: -$64.94
Buy: $1298.38
Sell: $1293.77 | Profit: -$26.25

**Note: Run the training section for considerable episodes so that while evaluating the model it can generate significant profit.** 
