# Introducción a Regresión Lineal


En esta charla se realizará una introducción a como abordar problemas donde la regresion lineal sea la herramienta mas optima para solucionarlos


## Requerimientos
Las librerías a usar son:
    - Pandas
    - Matplotlib
    - Sklearn
    

In [None]:
import pandas as pd
import numpy as np  
import matplotlib.pyplot as plt  
from sklearn.model_selection import train_test_split 
from sklearn.linear_model import LinearRegression
from sklearn import metrics
import seaborn as sns

%matplotlib inline
# Input data files are available in the "../input/" directory.
# For example, running this (by clicking run or pressing Shift+Enter) will list all files under the input directory

import os
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))

# Any results you write to the current directory are saved as output.

## Analisis Inicial del Dataset


In [None]:
datos = pd.read_csv("/kaggle/input/red-wine-quality-cortez-et-al-2009/winequality-red.csv")

datos.shape

In [None]:
datos.describe()

In [None]:
datos.head(9)

In [None]:
sns.pairplot(datos)

In [None]:
datos.corr().style.background_gradient(cmap='coolwarm', axis=None)

### Visualización nulos

In [None]:
np.sum(pd.isnull(datos))

## Creación conjuntos de entrenamiento

In [None]:
X = datos[['fixed acidity', 'volatile acidity', 'citric acid', 'residual sugar',
       'chlorides', 'free sulfur dioxide', 'total sulfur dioxide', 'density',
       'pH', 'sulphates', 'alcohol']].values
y = datos[['quality']].values

In [None]:
print(f"Dimensiones de X: {X.shape}")
print(f"Dimensiones de Y: {y.shape}")

## Entrenamiento del modelo

- Separacion entre entrenamiento y prueba
- Creacion del modelo

In [None]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=0)

In [None]:
regressor = LinearRegression()  
regressor.fit(X_train, y_train)

In [None]:
y_pred = regressor.predict(X_test)

In [None]:
plt.scatter(X_test[:,0], y_test,  color='gray')
plt.plot(X_test[:,0], y_pred, color='red', linewidth=2)
plt.show()

## Reconocimiento Coeficientes


In [None]:
coeff_df = pd.DataFrame(regressor.coef_.reshape((11,1)), datos.columns[:-1].values, columns=['Coefficient'])  
coeff_df

### Pruebas de Desempeño

In [None]:
print('Mean Absolute Error:', metrics.mean_absolute_error(y_test, y_pred))  
print('Mean Squared Error:', metrics.mean_squared_error(y_test, y_pred))  
print('Root Mean Squared Error:', np.sqrt(metrics.mean_squared_error(y_test, y_pred)))

## Estandarización

In [None]:
from sklearn.preprocessing import MinMaxScaler

In [None]:
scaler = MinMaxScaler()
scaled_df = scaler.fit_transform(datos)
scaled_df = pd.DataFrame(scaled_df, columns=datos.columns)

scaled_df.describe()

In [None]:
X = scaled_df[['fixed acidity', 'volatile acidity', 'citric acid', 'residual sugar',
       'chlorides', 'free sulfur dioxide', 'total sulfur dioxide', 'density',
       'pH', 'sulphates', 'alcohol']].values
y = scaled_df[['quality']].values

In [None]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=0)

In [None]:
regressor = LinearRegression()  
regressor.fit(X_train, y_train)

In [None]:
y_pred = regressor.predict(X_test)

In [None]:
print('Mean Absolute Error:', metrics.mean_absolute_error(y_test, y_pred))  
print('Mean Squared Error:', metrics.mean_squared_error(y_test, y_pred))  
print('Root Mean Squared Error:', np.sqrt(metrics.mean_squared_error(y_test, y_pred)))

# Ejercicio

Predecir la cantidad de café a tomar según la cantidad de horas que se pase frente al pc
- [dataset](https://www.kaggle.com/devready/coffee-and-code)



# Referencias

- [A beginner’s guide to Linear Regression](https://towardsdatascience.com/a-beginners-guide-to-linear-regression-in-python-with-scikit-learn-83a8f7ae2b4f)
- [Linear Regression using Python](https://medium.com/analytics-vidhya/linear-regression-using-python-ce21aa90ade6)