**WHAT'S THIS?**

This is a very, very simple and minimalist model for predicting Brazilian stock close prices over almost 30-year worth of data, using a basic version of a Long Short-Term Memory (LSTM) Neural Network.

My dataset was download from here: https://www.kaggle.com/felsal/ibovespa-stocks

The idea of the problem is pretty simple: you feed the model with observations from the past X days and it tells you what it thinks will be the close price of a given stock Y days ahead.

**INPUT**

The input dictionary for this code is mostly self-explanatory: 'features' is what you're going to consider in your prediction, 'target' is what you want to predict, 'size' is the X from above, 'ahead' is Y, 'tick' is your favorite stock identifier, 'ratio' is training ratio, and the remaining entries are more or less jargon.

In [1]:
inpt = {
	'features' : ['close'],
	'target'   : ['close'],
	'size'     : 10,
	'ahead'    : 1,
	'tick'     : "ITUB4",   #Itaú!
	'ratio'    : 0.20,      #Test size
	'layers'   : 3,
	'neurons'  : [150, 50, 50],  #For each layer
	'epochs'   : 100,
	'batch'    : 128  
}

**THE MAIN FUNCTIONS**

There are only two important functions in this code, this the first one:

In [None]:
#Receives a DF vector, normalizes it, and splits it into a number of size-subvectors. Returns a list of such size-subvectors plus a vector of ahead-vectors with future values.
def windowsize(data, size, ahead):
	lx, ly = [], []
	for i in range(len(data)-size-ahead):
		x=data[i:(i+size), :]
		y=data[i+size+ahead-1, 0].squeeze()
		lx.append(x)
		ly.append(y)
	return np.array(lx), np.array(ly)

It simply divides your whole dataset into a bunch of tiny datasets as it is common practice in recurrent neural networks.

The other important function is, of course, the prediction function which takes the alraedy divided data, normalizes it, and feeds it to the pre-coded LSTM model from KERAS. Then, it plots the testing results and returns the prediction model.

In [None]:
#LSTM predictor
def lstmpred(data, inpt):
	exdata = data[data.ticker == inpt['tick']]
	npdata = np.array(exdata[inpt['features']])
	
	#Labels (dates)
	labels = list(map(str, exdata['datetime'].values))
	
	#Scale data
	scaler=mms(feature_range=(-1,1))
	npdata=scaler.fit_transform(npdata)

	#Split data in smaller windows
	nx, ny = windowsize(npdata, inpt['size'], inpt['ahead'])

	#Splitting into train and test sets
	ntrnx, ntstx, ntrny, ntsty = tts([nx, ny], inpt['ratio'])
	
	#Build model
	model = Sequential()
	for i in range(inpt['layers']):
		model.add(LSTM(inpt['neurons'][i], dropout=0.2, return_sequences = True, input_shape = (ntrnx.shape[1], len(inpt['features']))))
	model.add(LSTM(1, dropout=0.2, return_sequences= False))

	#Compile and fit model
	model.compile(optimizer='adam', loss='mean_squared_error')
	model.fit(ntrnx, ntrny, validation_data=(ntstx, ntsty), batch_size=inpt['batch'], epochs=inpt['epochs'])

	#Test
	npreds = model.predict(ntstx)
	preds = scaler.inverse_transform(npreds)
	tsty = scaler.inverse_transform(ntsty.reshape(-1,1))

	plt.figure(figsize=(20,10))
	xaxis = range(len(preds))
	plt.plot(xaxis,preds.squeeze(), color="red")
	plt.plot(xaxis,tsty.squeeze(), color="blue")
	plt.show()
	return model

**TEST RESULTS**

This is how our model performed trying to predict the close prices of ITUB4.