<a href="https://colab.research.google.com/github/ricardoricrob76/machinelearning/blob/main/UNIPE_Regressao_MultiLinear_Bolsa.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Prever o volume da bolsa futuro de Tokyo utilizando regressão multilinear

* Determinando o volume da bolsa de valores de Tokyo utilizando regressão multilinear com a biblioteca SciKit Learn do Python
* @ricardoricrob
* Prof. Ricardo Roberto de Lima
* E-mail: ricardo.roberto@unipe.edu.br 

Importando bibliotecas padrões

In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

Lendo dataset

In [2]:
dados = pd.read_csv('Tokyo_Stock.csv')

Exibindo cinco primeiras linhas

In [3]:
dados.head()

Unnamed: 0,Date,Open,High,Low,Close,Volume,Stock Trading
0,2016-12-30,42120,42330,41700,41830,610000,25628028000
1,2016-12-29,43000,43220,42540,42660,448400,19188227000
2,2016-12-28,43940,43970,43270,43270,339900,14780670000
3,2016-12-27,43140,43700,43140,43620,400100,17427993000
4,2016-12-26,43310,43660,43090,43340,358200,15547803000


Exibindo cinco últimas linhas

In [4]:
dados.tail()

Unnamed: 0,Date,Open,High,Low,Close,Volume,Stock Trading
1221,2012-01-11,14360,14750,14280,14590,1043400,15191988000
1222,2012-01-10,13890,14390,13860,14390,952300,13533413000
1223,2012-01-06,13990,14030,13790,13850,765500,10635609000
1224,2012-01-05,13720,13840,13600,13800,511500,7030811000
1225,2012-01-04,14050,14050,13700,13720,559100,7719804000


Exibindo informações da amostra

In [5]:
dados.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1226 entries, 0 to 1225
Data columns (total 7 columns):
 #   Column         Non-Null Count  Dtype 
---  ------         --------------  ----- 
 0   Date           1226 non-null   object
 1   Open           1226 non-null   int64 
 2   High           1226 non-null   int64 
 3   Low            1226 non-null   int64 
 4   Close          1226 non-null   int64 
 5   Volume         1226 non-null   int64 
 6   Stock Trading  1226 non-null   int64 
dtypes: int64(6), object(1)
memory usage: 67.2+ KB


Para aplicar o modelo precisamos converter a data para número de dias apenas

In [6]:
import datetime as ddt

In [7]:
dados['Date']=pd.to_datetime(dados['Date'])
dados['Date']=dados['Date'].map(ddt.datetime.toordinal)

Verificando se a conversão foi bem sucedida

In [8]:
dados.head()

Unnamed: 0,Date,Open,High,Low,Close,Volume,Stock Trading
0,736328,42120,42330,41700,41830,610000,25628028000
1,736327,43000,43220,42540,42660,448400,19188227000
2,736326,43940,43970,43270,43270,339900,14780670000
3,736325,43140,43700,43140,43620,400100,17427993000
4,736324,43310,43660,43090,43340,358200,15547803000


Determinando as variaveis X e Y incluindo a data como variável

In [9]:
X = dados.drop(['Volume','Stock Trading'],axis=1).values
Y = dados['Volume'].values

Criando amostras de treino e teste

In [10]:
from sklearn.model_selection import train_test_split

In [11]:
X_treino, X_teste, Y_treino, Y_teste = train_test_split(X, Y, test_size=0.30, shuffle=True, random_state=0)

Importando biblioteca do Scikit-learn para realizar regressão multilinear

In [12]:
from sklearn.linear_model import LinearRegression

In [13]:
modelo=LinearRegression()

In [14]:
modelo.fit(X_treino,Y_treino)

LinearRegression()

In [15]:
coeficiente_linear = modelo.intercept_
coeficiente_angular = modelo.coef_

In [16]:
coeficiente_linear

17194517.30781505

In [17]:
coeficiente_angular

array([ -22.43969781,  -14.37289672,  418.1923407 , -500.33020903,
         86.17669071])

In [18]:
Y_pred = modelo.predict(X_teste)

Determinando erros do modelo

In [19]:
from sklearn.metrics import mean_absolute_error,mean_squared_error

In [20]:
MAE = mean_absolute_error(Y_teste,Y_pred)

In [21]:
MSE = mean_squared_error(Y_teste,Y_pred)

In [22]:
RMSE = np.sqrt(MSE)

In [23]:
print("MAE = {:0.2f}".format(MAE))
print("MSE = {:0.2f}".format(MSE))
print("RMSE = {:0.2f}".format(RMSE))

MAE = 210883.66
MSE = 140319056646.47
RMSE = 374591.85


Modelando sem a variável data

In [24]:
X = dados.drop(['Date','Volume','Stock Trading'],axis=1).values
Y = dados['Volume'].values

In [25]:
X_treino, X_teste, Y_treino, Y_teste = train_test_split(X, Y, test_size=0.30, shuffle=True, random_state=0)

In [26]:
modelo2 = LinearRegression()

In [27]:
modelo2.fit(X_treino,Y_treino)

LinearRegression()

In [28]:
coeficiente_linear = modelo2.intercept_
coeficiente_angular = modelo2.coef_

In [29]:
coeficiente_linear

715932.1875608574

In [30]:
coeficiente_angular

array([ -14.99761802,  417.04556126, -500.36142221,   87.27713972])

In [31]:
Y_pred = modelo2.predict(X_teste)

In [32]:
MAE = mean_absolute_error(Y_teste,Y_pred)
MSE = mean_squared_error(Y_teste,Y_pred)
RMSE = np.sqrt(MSE)

In [33]:
print("MAE = {:0.2f}".format(MAE))
print("MSE = {:0.2f}".format(MSE))
print("RMSE = {:0.2f}".format(RMSE))

MAE = 210945.77
MSE = 140458064189.87
RMSE = 374777.35
