# Millas por galón
Se está haciendo una investigación en la industria automotriz para mejorar el rendimiento de los motores a combustible. Para esto, se recolectó un set de datos que contiene el rendimiento (millas por galón) de distintos modelos de automóviles, así como también de los factores que se piensa influyan.

El objetivo de este trabajo es descubrir cuál es la variable que tiene mayor incidencia en el rendimiento de un motor a combustible. Las variables son las siguientes:
- Horse power
- Weight
- Acceleration

## Instrucciones
- Cree un notebook ordenado, documentado y reproducibble con su análisis
- Haga un breve análisis exploratorio de los datos, es un problema lineal?
- Elabore tres modelos regresivos simples, uno para cada variable a estudiar
- Calcular las métricas de error para cada modelo
- Calcular el coeficiente de determinación
- Hacer análisis de residuales
- Cuál es el factor que mayor incidencia tiene en el rendimiento?, por qué?

In [1]:
import pandas as pd
import matplotlib.pyplot as plt
import statsmodels.formula.api as smf

In [2]:
df = pd.read_csv('millas-por-galon.csv')

In [3]:
df.head()

Unnamed: 0,cylinders,displacement,horsepower,weight,acceleration,model year,origin,mpg,car name
0,discrete,continuous,continuous,continuous,continuous,discrete,discrete,continuous,string
1,,,,,,,,class,meta
2,8,307.0,130.0,3504.0,12.0,70,1,18.0,chevrolet chevelle malibu
3,8,350.0,165.0,3693.0,11.5,70,1,15.0,buick skylark 320
4,8,318.0,150.0,3436.0,11.0,70,1,18.0,plymouth satellite


# Preparación de datos

In [9]:
#df = df.iloc[2:]
df = df.drop([0, 1])

In [10]:
df.head()

Unnamed: 0,cylinders,displacement,horsepower,weight,acceleration,model year,origin,mpg,car name
2,8,307.0,130.0,3504.0,12.0,70,1,18.0,chevrolet chevelle malibu
3,8,350.0,165.0,3693.0,11.5,70,1,15.0,buick skylark 320
4,8,318.0,150.0,3436.0,11.0,70,1,18.0,plymouth satellite
5,8,304.0,150.0,3433.0,12.0,70,1,16.0,amc rebel sst
6,8,302.0,140.0,3449.0,10.5,70,1,17.0,ford torino


In [11]:
df.dtypes

cylinders       str
displacement    str
horsepower      str
weight          str
acceleration    str
model year      str
origin          str
mpg             str
car name        str
dtype: object

In [27]:
df['mpg'] = df['mpg'].astype('float')
df['horsepower'] = df['horsepower'].astype('float')
df['weight'] = df['weight'].astype('float')
df['acceleration'] = df['acceleration'].astype('float')

In [28]:
df.dtypes

cylinders           str
displacement        str
horsepower      float64
weight          float64
acceleration    float64
model year          str
origin              str
mpg             float64
car name            str
dtype: object

In [29]:
df

Unnamed: 0,cylinders,displacement,horsepower,weight,acceleration,model year,origin,mpg,car name
2,8,307.0,130.0,3504.0,12.0,70,1,18.0,chevrolet chevelle malibu
3,8,350.0,165.0,3693.0,11.5,70,1,15.0,buick skylark 320
4,8,318.0,150.0,3436.0,11.0,70,1,18.0,plymouth satellite
5,8,304.0,150.0,3433.0,12.0,70,1,16.0,amc rebel sst
6,8,302.0,140.0,3449.0,10.5,70,1,17.0,ford torino
...,...,...,...,...,...,...,...,...,...
395,4,140.0,86.0,2790.0,15.6,82,1,27.0,ford mustang gl
396,4,97.0,52.0,2130.0,24.6,82,2,44.0,vw pickup
397,4,135.0,84.0,2295.0,11.6,82,1,32.0,dodge rampage
398,4,120.0,79.0,2625.0,18.6,82,1,28.0,ford ranger


In [16]:
df.isnull().sum()

cylinders       0
displacement    0
horsepower      6
weight          0
acceleration    0
model year      0
origin          0
mpg             0
car name        0
dtype: int64

In [24]:
df.dropna()

Unnamed: 0,cylinders,displacement,horsepower,weight,acceleration,model year,origin,mpg,car name


# Regresión lineal simple

In [17]:
df

Unnamed: 0,cylinders,displacement,horsepower,weight,acceleration,model year,origin,mpg,car name
2,8,307.0,130.0,3504.0,12.0,70,1,18.0,chevrolet chevelle malibu
3,8,350.0,165.0,3693.0,11.5,70,1,15.0,buick skylark 320
4,8,318.0,150.0,3436.0,11.0,70,1,18.0,plymouth satellite
5,8,304.0,150.0,3433.0,12.0,70,1,16.0,amc rebel sst
6,8,302.0,140.0,3449.0,10.5,70,1,17.0,ford torino
...,...,...,...,...,...,...,...,...,...
395,4,140.0,86.0,2790.0,15.6,82,1,27.0,ford mustang gl
396,4,97.0,52.0,2130.0,24.6,82,2,44.0,vw pickup
397,4,135.0,84.0,2295.0,11.6,82,1,32.0,dodge rampage
398,4,120.0,79.0,2625.0,18.6,82,1,28.0,ford ranger


In [19]:
df.info()

<class 'pandas.DataFrame'>
RangeIndex: 398 entries, 2 to 399
Data columns (total 9 columns):
 #   Column        Non-Null Count  Dtype
---  ------        --------------  -----
 0   cylinders     398 non-null    str  
 1   displacement  398 non-null    str  
 2   horsepower    392 non-null    str  
 3   weight        398 non-null    str  
 4   acceleration  398 non-null    str  
 5   model year    398 non-null    str  
 6   origin        398 non-null    str  
 7   mpg           398 non-null    str  
 8   car name      398 non-null    str  
dtypes: str(9)
memory usage: 28.1 KB
