## Regresión de tata_dataset con pandas y sklearn
Tata Consumer Products Limited, anteriormente Tata Global Beverages Limited, es una empresa multinacional india de bienes de consumo con sede en Kolkata, Bengala Occidental, India y una subsidiaria del Grupo Tata. En este programa se realizará y evaluará un modelo de regresión para predecir diferentes cantidades relacionadas con la venta de sus acciones, siendo estas: 

1. Día de venta de las acciones (Date)
2. Valor de apertura (Open)
3. Valor máximo (High)
4. Valor mínimo (Low)
5. Último valor vendido (Last)
6. Valor de cierre (Close)
7. Cantidad total vendida (Total Trade Quantity)
8. Número de acciones vendidas entre el precio promedio (Turnover)




In [2]:
import numpy as np
import pandas as pd
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error, r2_score

### Leer los datos

In [3]:
df_train = pd.read_csv("tata_train.csv")
df_test = pd.read_csv("tata_test.csv")

In [4]:
df_train.head()

Unnamed: 0,Date,Open,High,Low,Last,Close,Total Trade Quantity,Turnover (Lacs)
0,2018-09-28,234.05,235.95,230.2,233.5,233.75,3069914,7162.35
1,2018-09-27,234.55,236.8,231.1,233.8,233.25,5082859,11859.95
2,2018-09-26,240.0,240.0,232.5,235.0,234.25,2240909,5248.6
3,2018-09-25,233.3,236.75,232.0,236.25,236.1,2349368,5503.9
4,2018-09-24,233.55,239.2,230.75,234.0,233.3,3423509,7999.55


In [5]:
df_test.head()

Unnamed: 0,Date,Open,High,Low,Last,Close,Total Trade Quantity,Turnover (Lacs)
0,2018-10-24,220.1,221.25,217.05,219.55,219.8,2171956,4771.34
1,2018-10-23,221.1,222.2,214.75,219.55,218.3,1416279,3092.15
2,2018-10-22,229.45,231.6,222.0,223.05,223.25,3529711,8028.37
3,2018-10-19,230.3,232.7,225.5,227.75,227.2,1527904,3490.78
4,2018-10-17,237.7,240.8,229.45,231.3,231.1,2945914,6961.65


### Cantidad de datos

In [6]:
len(df_test), len(df_train) 

(16, 2035)

### Acondicionar "Date"
Dado que Date es una fecha, sería más conveniente que tenga un valor numérico.

In [7]:
df_train["Date"] = pd.to_numeric(df_train.Date.str.replace('-',''))

In [8]:
df_test["Date"] = pd.to_numeric(df_test.Date.str.replace('-',''))

In [9]:
df_train.head()

Unnamed: 0,Date,Open,High,Low,Last,Close,Total Trade Quantity,Turnover (Lacs)
0,20180928,234.05,235.95,230.2,233.5,233.75,3069914,7162.35
1,20180927,234.55,236.8,231.1,233.8,233.25,5082859,11859.95
2,20180926,240.0,240.0,232.5,235.0,234.25,2240909,5248.6
3,20180925,233.3,236.75,232.0,236.25,236.1,2349368,5503.9
4,20180924,233.55,239.2,230.75,234.0,233.3,3423509,7999.55


In [10]:
df_test.head()

Unnamed: 0,Date,Open,High,Low,Last,Close,Total Trade Quantity,Turnover (Lacs)
0,20181024,220.1,221.25,217.05,219.55,219.8,2171956,4771.34
1,20181023,221.1,222.2,214.75,219.55,218.3,1416279,3092.15
2,20181022,229.45,231.6,222.0,223.05,223.25,3529711,8028.37
3,20181019,230.3,232.7,225.5,227.75,227.2,1527904,3490.78
4,20181017,237.7,240.8,229.45,231.3,231.1,2945914,6961.65


### Escoger características y salidas
Para este caso se escogerá el valor de apertura de la acción ("Open") como la variable a predecir

In [11]:
x_train = df_train.drop(["Open"], axis = 1)
x_test = df_test.drop(["Open"], axis = 1)

In [12]:
x_train.head()

Unnamed: 0,Date,High,Low,Last,Close,Total Trade Quantity,Turnover (Lacs)
0,20180928,235.95,230.2,233.5,233.75,3069914,7162.35
1,20180927,236.8,231.1,233.8,233.25,5082859,11859.95
2,20180926,240.0,232.5,235.0,234.25,2240909,5248.6
3,20180925,236.75,232.0,236.25,236.1,2349368,5503.9
4,20180924,239.2,230.75,234.0,233.3,3423509,7999.55


In [13]:
x_test.head()

Unnamed: 0,Date,High,Low,Last,Close,Total Trade Quantity,Turnover (Lacs)
0,20181024,221.25,217.05,219.55,219.8,2171956,4771.34
1,20181023,222.2,214.75,219.55,218.3,1416279,3092.15
2,20181022,231.6,222.0,223.05,223.25,3529711,8028.37
3,20181019,232.7,225.5,227.75,227.2,1527904,3490.78
4,20181017,240.8,229.45,231.3,231.1,2945914,6961.65


In [14]:
y_train = df_train.loc[:, ['Open']] #equivalente a y_train = df_train[['Open']]
y_test = df_test.loc[:, ['Open']]

In [15]:
y_train.head()

Unnamed: 0,Open
0,234.05
1,234.55
2,240.0
3,233.3
4,233.55


In [16]:
y_test.head()

Unnamed: 0,Open
0,220.1
1,221.1
2,229.45
3,230.3
4,237.7


### Construcción del modelo

In [17]:
lin_reg = LinearRegression()
lin_reg.fit(x_train, y_train)

LinearRegression()

### Predicción del modelo

In [18]:
y_predict = lin_reg.predict(x_test)
y_predict = pd.DataFrame(y_predict)

In [19]:
y_predict.head()

Unnamed: 0,0
0,218.405405
1,219.273494
2,229.89922
3,231.070176
4,239.090553


In [20]:
y_test.head()

Unnamed: 0,Open
0,220.1
1,221.1
2,229.45
3,230.3
4,237.7


### Métricas

In [22]:
mean_squared_error(y_test, y_predict)

16.198207317166

In [None]:
r2_score(y_test, y_predict)