##Algoritmo de previsão de preços de ações utilizando rede neural *Long short-term memory*

Utilizando a linguagem Python, desenvolver um programa para prever o preço de ações utilizando *Machine Learning*, mais especificamente uma Arquitetura de Redes Neurais *Long short-term memory (LSTM)*. Para esse projeto, foi utilizado o histórico de preços da ação da Microsoft obtido através da blbioteca *yfinance*.

In [19]:
# Bibliotecas utilizadas
import pandas as pd
import yfinance as yf
from datetime import date, timedelta

In [20]:
#Obtendo o conjunto dos dados
today = date.today()

endDate = today.strftime("%Y-%m-%d")
startDate = today - timedelta(days=5000)
startDateF = startDate.strftime("%Y-%m-%d")

data = yf.download('MSFT', start=startDateF, end=endDate, progress=False)

In [21]:
#Verificando as primeiras cinco linhas
data.head()

Unnamed: 0_level_0,Open,High,Low,Close,Adj Close,Volume
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
2008-08-25,27.610001,27.84,27.459999,27.66,20.691088,51381300
2008-08-26,27.58,27.719999,27.17,27.27,20.399338,44774400
2008-08-27,27.34,27.790001,27.129999,27.559999,20.616282,33975300
2008-08-28,27.610001,28.01,27.6,27.940001,20.900537,48372600
2008-08-29,27.68,27.780001,27.290001,27.290001,20.414305,50735500


###Análise Exploratória dos dados

In [22]:
#Verificando as informações dos objetos das colunas
data.info()

<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 3447 entries, 2008-08-25 to 2022-05-03
Data columns (total 6 columns):
 #   Column     Non-Null Count  Dtype  
---  ------     --------------  -----  
 0   Open       3447 non-null   float64
 1   High       3447 non-null   float64
 2   Low        3447 non-null   float64
 3   Close      3447 non-null   float64
 4   Adj Close  3447 non-null   float64
 5   Volume     3447 non-null   int64  
dtypes: float64(5), int64(1)
memory usage: 188.5 KB


In [24]:
#Checar se existe valores NA nas colunas dos dados
data.isnull().sum()

Open         0
High         0
Low          0
Close        0
Adj Close    0
Volume       0
dtype: int64

In [25]:
#Estatísticas descritivas das colunas
data.describe()

Unnamed: 0,Open,High,Low,Close,Adj Close,Volume
count,3447.0,3447.0,3447.0,3447.0,3447.0,3447.0
mean,84.956748,85.775982,84.09293,84.976107,80.499319,41740880.0
std,81.167039,81.968905,80.2826,81.166119,82.481501,25197120.0
min,15.2,15.62,14.87,15.15,11.487971,7425600.0
25%,28.805,29.025001,28.51,28.785001,23.014761,24799550.0
50%,46.91,47.43,46.529999,47.0,41.715446,34849700.0
75%,108.559998,109.530003,107.475002,108.445,104.291428,51662450.0
max,344.619995,349.670013,342.200012,343.109985,342.402008,319317900.0


In [28]:
#Verificar a correlação de todas as colunas com a coluna alvo "Close"
correlation = data.corr()
print(correlation["Close"].sort_values(ascending=False))

Close        1.000000
Adj Close    0.999978
Low          0.999899
High         0.999888
Open         0.999794
Volume      -0.341257
Name: Close, dtype: float64


###Rede Neural *Long short-term memory*

In [56]:
#Preparando os dados e definindo a parte de treinamento e teste
x = data[["Open", "High", "Low", "Volume"]]
y = data["Close"]
x = x.to_numpy()
y = y.to_numpy()
y = y.reshape(-1,1)

from sklearn.model_selection import train_test_split
xtrain, xtest, ytrain, ytest = train_test_split(x, y, test_size=0.2, random_state=42)

In [67]:
#Preparando a arquitetura da rede neural
from keras.models import Sequential
from keras.layers import Dense, LSTM
model = Sequential()
model.add(LSTM(128, return_sequences=True, input_shape=(xtrain.shape[1],1)))
model.add(LSTM(64, return_sequences=False))
model.add(Dense(25))
model.add(Dense(1))
model.summary()


Model: "sequential_1"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 lstm_2 (LSTM)               (None, 4, 128)            66560     
                                                                 
 lstm_3 (LSTM)               (None, 64)                49408     
                                                                 
 dense_1 (Dense)             (None, 25)                1625      
                                                                 
 dense_2 (Dense)             (None, 1)                 26        
                                                                 
Total params: 117,619
Trainable params: 117,619
Non-trainable params: 0
_________________________________________________________________


In [68]:
model.compile(optimizer='adam', loss='mean_squared_error')
model.fit(xtrain, ytrain, batch_size=1, epochs=100)

Epoch 1/100
Epoch 2/100
Epoch 3/100
Epoch 4/100
Epoch 5/100
Epoch 6/100
Epoch 7/100
Epoch 8/100
Epoch 9/100
Epoch 10/100
Epoch 11/100
Epoch 12/100
Epoch 13/100
Epoch 14/100
Epoch 15/100
Epoch 16/100
Epoch 17/100
Epoch 18/100
Epoch 19/100
Epoch 20/100
Epoch 21/100
Epoch 22/100
Epoch 23/100
Epoch 24/100
Epoch 25/100
Epoch 26/100
Epoch 27/100
Epoch 28/100
Epoch 29/100
Epoch 30/100
Epoch 31/100
Epoch 32/100
Epoch 33/100
Epoch 34/100
Epoch 35/100
Epoch 36/100
Epoch 37/100
Epoch 38/100
Epoch 39/100
Epoch 40/100
Epoch 41/100
Epoch 42/100
Epoch 43/100
Epoch 44/100
Epoch 45/100
Epoch 46/100
Epoch 47/100
Epoch 48/100
Epoch 49/100
Epoch 50/100
Epoch 51/100
Epoch 52/100
Epoch 53/100
Epoch 54/100
Epoch 55/100
Epoch 56/100
Epoch 57/100
Epoch 58/100
Epoch 59/100
Epoch 60/100
Epoch 61/100
Epoch 62/100
Epoch 63/100
Epoch 64/100
Epoch 65/100
Epoch 66/100
Epoch 67/100
Epoch 68/100
Epoch 69/100
Epoch 70/100
Epoch 71/100
Epoch 72/100
Epoch 73/100
Epoch 74/100
Epoch 75/100
Epoch 76/100
Epoch 77/100
Epoch 78

Epoch 97/100
Epoch 98/100
Epoch 99/100
Epoch 100/100


<keras.callbacks.History at 0x1ad62660040>

In [102]:
#Realizando a predição do preço
import numpy as np

xtest_one = np.array([xtest[0]])
print(model.predict(xtest_one))
print(ytest[0])

[[25.751585]]
[25.79999924]
