#**Stock Trading Using Deep Q-Learning**


## **Problem Statement**

Prepare an agent by implementing Deep Q-Learning that can perform unsupervised trading in stock trade. The aim of this project is to train an agent that uses Q-learning and neural networks to predict the profit or loss by building a model and implementing it on a dataset that is available for evaluation.


The stock trading index environment provides the agent with a set of actions:<br>
* Buy<br>
* Sell<br>
* Sit

This project has following sections:
* Import libraries 
* Create a DQN agent
* Preprocess the data
* Train and build the model
* Evaluate the model and agent
<br><br>

**Steps to perform**<br>

In the section **create a DQN agent**, create a class called agent where:
* Action size is defined as 3
* Experience replay memory to deque is 1000
* Empty list for stocks that has already been bought
* The agent must possess the following hyperparameters:<br>
  * gamma= 0.95<br>
  * epsilon = 1.0<br>
  * epsilon_final = 0.01<br>
  * epsilon_decay = 0.995<br>


    Note: It is advised to compare the results using different values in hyperparameters.

* Neural network has 3 hidden layers
* Action and experience replay are defined




## **Solution**

### **Import the libraries** 

In [3]:
import warnings
warnings.filterwarnings("ignore")

import keras
from keras.models import Sequential
from keras.models import load_model
from keras.layers import Dense
from keras.optimizers import adam_v2
import numpy as np
import random
from collections import deque

### **Create a DQN agent**

**Use the instruction below to prepare an agent**


In [4]:
# Action space include 3 actions: Buy, Sell, and Sit
#Setting up the experience replay memory to deque with 1000 elements inside it
#Empty list with inventory is created that contains the stocks that were already bought
#Setting up gamma to 0.95, that helps to maximize the current reward over the long-term
#Epsilon parameter determines whether to use a random action or to use the model for the action. 
#In the beginning random actions are encouraged, hence epsilon is set up to 1.0 when the model is not trained.
#And over time the epsilon is reduced to 0.01 in order to decrease the random actions and use the trained model
#We're then set the speed of decreasing epsililon in the epsilon_decay parameter

#Defining our neural network:
#Define the neural network function called _model and it just takes the keyword self
#Define the model with Sequential()
#Define states i.e. the previous n days and stock prices of the days
#Defining 3 hidden layers in this network
#Changing the activation function to relu because mean-squared error is used for the loss




### **Preprocess the stock market data**

In [5]:
import math

# prints formatted price
def formatPrice(n):
	return ("-$" if n < 0 else "$") + "{0:.2f}".format(abs(n))

# returns the vector containing stock data from a fixed file
def getStockDataVec(key):
	vec = []
	lines = open("" + key + ".csv", "r").read().splitlines()

	for line in lines[1:]:
		vec.append(float(line.split(",")[4]))

	return vec

# returns the sigmoid
def sigmoid(x):
	return 1 / (1 + math.exp(-x))

# returns an an n-day state representation ending at time t
def getState(data, t, n):
	d = t - n + 1
	block = data[d:t + 1] if d >= 0 else -d * [data[0]] + data[0:t + 1] # pad with t0
	res = []
	for i in range(n - 1):
		res.append(sigmoid(block[i + 1] - block[i]))

	return np.array([res])


In [6]:
class Agent():
    def __init__(self,window_size,is_eval=False,model_name=""):
        self.nS = window_size
        self.nA = 3
        self.memory = deque([],maxlen=1000)
        self.alpha = .001
        self.window_size = window_size
        self.gamma = 0.95
        #Explore/Explot
        self.epsilon = 1
        self.epsilon_min = 0.01
        self.epsilon_decay = 0.995
        self.loss = []
        
        self.is_eval = is_eval
        self.model = load_model (model_name) if self.is_eval else self.build_model()
    
    def build_model(self):
        model = keras.Sequential()
        model.add(keras.layers.Dense(24,input_dim=self.window_size,activation='relu'))
        model.add(keras.layers.Dense(24,activation='relu'))
        model.add(keras.layers.Dense(self.nA,activation='linear'))
        #model.compile(loss='mean_squared_error',optimizer=adam_v2(learning_rate=self.alpha))
        model.compile(loss="mse", optimizer=adam_v2.Adam(learning_rate=self.alpha))
        return model
    
    def act(self,state):
        if np.random.rand() <= self.epsilon:
            return random.randrange(3) # Explore
        action_vals = self.model.predict(state)
        return np.argmax(action_vals[0])
    
    def test_action(): #Exploit 
        action_vals = self.model.predict(state)
        return np.argmax(action_valls[0])
    
    def store(self,state,action,reward,nstate,done):
        # Store the experience in memory
        self.memory.append((state,action,reward,nstate,done))
    
    def expReplay(self,batch_size):
        # Execute the experience replay
        minibatch = random.sample(self.memory,batch_size)
        
        x=[]
        y=[]
        np_array = np.array(minibatch)
        st = np.zeros((0,self.nS)) # State
        nst  = np.zeros((0,self.nS)) # Next State
        for i in range (len(np_array)):
            st = np.append(st,np_array[i,0],axis=0)
            nst = np.append(st,np_array[i,3],axis=0)
        
        st_predict= self.model.predict(st) # speedup, can do on the Entire batch as well
        nst_predict= self.model.predict(nst)
    
        index = 0
        for state,action,reward,nstate,done in minibatch:
            x.append(state)
            # Predict from state
            nst_action_predict_model = nst_predict[index]
            if done == True:   # Terminal
                target = reward
            else: # Non Terminal
                target = reward + self.gamma * np.amax(nst_action_predict_model)
            
            target_f = st_predict[index]
            target_f[action] = target
            y.append(target_f)
            index +=1
        
        #Reshape the keras fit
        x_reshape = np.array(x).reshape(batch_size,self.nS)
        y_reshape = np.array(y)
        epoch_count = 1
        hist = self.model.fit(x_reshape,y_reshape,epochs=epoch_count,verbose=0)
        # Graph losses
        for i in range(epoch_count):
            self.loss.append( hist.history['loss'][i])
        # decay epsilon
        if self.epsilon > self.epsilon_min:
            self.epsilon *= self.epsilon_decay 

### **Train and build the model**

In [5]:
import sys
from collections import deque

if len(sys.argv) != 4:
	print ("Usage: python train.py [stock] [window] [episodes]")
	exit()


stock_name = input("Enter stock_name, window_size, Episode_count")
#Fill the given information when prompted: 
#Enter stock_name = GSPC_Training_Dataset /content/GSPC_Evaluation_Dataset
#window_size = 10
#Episode_count = 100 or it can be 10 or 20 or 30 and so on.

window_size = input()
episode_count = input()
stock_name = str(stock_name)
window_size = int(window_size)
episode_count = int(episode_count)

agent = Agent(window_size)
data = getStockDataVec(stock_name)
l = len(data) - 1
batch_size = 32

for e in range(episode_count + 1):
	print ("Episode " + str(e) + "/" + str(episode_count))
	state = getState(data, 0, window_size + 1)

	total_profit = 0
	agent.inventory = []

	for t in range(l):
		action = agent.act(state)

		# sit
		next_state = getState(data, t + 1, window_size + 1)
		reward = 0

		if action == 1: # buy
			agent.inventory.append(data[t])
			#print ("Buy: " + formatPrice(data[t]))

		elif action == 2 and len(agent.inventory) > 0: # sell
			bought_price = agent.inventory.pop(0)
			reward = max(data[t] - bought_price, 0)
			total_profit += data[t] - bought_price
			#print ("Sell: " + formatPrice(data[t]) + " | Profit: " + formatPrice(data[t] - bought_price))

		done = True if t == l - 1 else False
		agent.memory.append((state, action, reward, next_state, done))
		state = next_state

		if done:
			print ("--------------------------------")
			print ("Total Profit: " + formatPrice(total_profit))
			

		if len(agent.memory) > batch_size:
			agent.expReplay(batch_size)

	if e % 10 == 0:
		agent.model.save("model_ep" + str(e))

Usage: python train.py [stock] [window] [episodes]
Enter stock_name, window_size, Episode_count/content/GSPC_Evaluation_Dataset
10
100
Episode 0/100
--------------------------------
Total Profit: -$1130.21
Episode 1/100
--------------------------------
Total Profit: $17.19
Episode 2/100
--------------------------------
Total Profit: -$1.18
Episode 3/100
--------------------------------
Total Profit: $26.03
Episode 4/100
--------------------------------
Total Profit: -$60.15
Episode 5/100
--------------------------------
Total Profit: $74.57
Episode 6/100
--------------------------------
Total Profit: $1.41
Episode 7/100
--------------------------------
Total Profit: $26.34
Episode 8/100
--------------------------------
Total Profit: $38.36
Episode 9/100
--------------------------------
Total Profit: -$87.07
Episode 10/100
--------------------------------
Total Profit: -$223.95
Episode 11/100
--------------------------------
Total Profit: -$19.76
Episode 12/100
-------------------------

### **Evaluate the model and agent**

In [7]:
import sys
from keras.models import load_model


if len(sys.argv) != 3:
	print ("Usage: python evaluate.py [stock] [model]")
	exit()


stock_name = input("Enter Stock_name, Model_name")
model_name = input()
#Note: 
#Fill the given information when prompted: 
#Enter stock_name = GSPC_Evaluation_Dataset /content/GSPC_Evaluation_Dataset
#Model_name = respective model name /content/model_ep90  

model = load_model(model_name)
window_size = model.layers[0].input.shape.as_list()[1]

agent = Agent(window_size, True, model_name)
data = getStockDataVec(stock_name)
l = len(data) - 1
batch_size = 32

state = getState(data, 0, window_size + 1)
total_profit = 0
agent.inventory = []

for t in range(l):
	action = agent.act(state)

	# sit
	next_state = getState(data, t + 1, window_size + 1)
	reward = 0

	if action == 1: # buy
		agent.inventory.append(data[t])
		print ("Buy: " + formatPrice(data[t]))

	elif action == 2 and len(agent.inventory) > 0: # sell
		bought_price = agent.inventory.pop(0)
		reward = max(data[t] - bought_price, 0)
		total_profit += data[t] - bought_price
		print ("Sell: " + formatPrice(data[t]) + " | Profit: " + formatPrice(data[t] - bought_price))

	done = True if t == l - 1 else False
	agent.memory.append((state, action, reward, next_state, done))
	state = next_state

	if done:
		print ("--------------------------------")
		print (stock_name + " Total Profit: " + formatPrice(total_profit))



Enter Stock_name, Model_name GSPC_Evaluation_Dataset
 model_ep70


Buy: $1271.50
Buy: $1269.75
Buy: $1274.48
Sell: $1283.76 | Profit: $12.26
Buy: $1295.02
Buy: $1281.92
Sell: $1280.26 | Profit: $10.51
Sell: $1283.35 | Profit: $8.87
Buy: $1296.63
Sell: $1276.34 | Profit: -$18.68
Buy: $1286.12
Sell: $1307.59 | Profit: $25.67
Sell: $1304.03 | Profit: $7.40
Sell: $1307.10 | Profit: $20.98
Buy: $1319.05
Sell: $1324.57 | Profit: $5.52
Buy: $1320.88
Sell: $1321.87 | Profit: $0.99
Buy: $1332.32
Buy: $1328.01
Buy: $1336.32
Buy: $1340.43
Sell: $1315.44 | Profit: -$16.88
Buy: $1307.40
Buy: $1306.10
Buy: $1319.88
Sell: $1327.22 | Profit: -$0.79
Sell: $1306.33 | Profit: -$29.99
Buy: $1321.15
Buy: $1310.13
Sell: $1321.82 | Profit: -$18.61
Buy: $1320.02
Sell: $1295.11 | Profit: -$12.29
Buy: $1304.28
Sell: $1256.88 | Profit: -$49.22
Buy: $1273.72
Sell: $1298.38 | Profit: -$21.50
Buy: $1293.77
Buy: $1309.66
Buy: $1313.80
Buy: $1319.44
Buy: $1325.83
Sell: $1332.41 | Profit: $11.26
Buy: $1332.87
Sell: $1332.63 | Profit: $22.50
Buy: $1333.51
Sell: $1328.17 | Profit: $8.1

**Note: Run the training section for considerable episodes so that while evaluating the model it can generate significant profit.** 
