# Modelos de Regressão: SVR

### Importando libs e funções:

Importando libs

In [0]:
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.svm import SVR
from sklearn.preprocessing import StandardScaler



Importando funções

In [0]:
# Função de escalonamento
def feature_scaling(data):
    sc = StandardScaler()
    return sc.fit_transform(data)

### Etapa de exploração e tratamento dos **dados**

Importando o dataset do nosso estudo. Esse dataset consiste em prever o consumo médio de carros através da coluna *mpg - galões de combustível por milhas*. Portanto, queremos prever o grau de economia de cada modelo de carro através de atributos como: número de cilindros, peso, potência, etc..
Fonte: [UCL](https://archive.ics.uci.edu/ml/datasets/Auto+MPG)

In [0]:
df = pd.read_csv('https://raw.githubusercontent.com/intelligentagents/aprendizagem-supervisionada/master/data/cars.csv', sep = ";")

Explorando o dataset:

In [6]:
# Exporando o dataset
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 398 entries, 0 to 397
Data columns (total 8 columns):
mpg             398 non-null float64
cylinders       398 non-null int64
displacement    398 non-null float64
horsepower      392 non-null float64
weight          398 non-null int64
acceleration    398 non-null float64
model year      398 non-null int64
origin          398 non-null int64
dtypes: float64(4), int64(4)
memory usage: 25.0 KB


In [7]:
# Visualizando o sumário das colunas numéricas do dataset
df.describe()

Unnamed: 0,mpg,cylinders,displacement,horsepower,weight,acceleration,model year,origin
count,398.0,398.0,398.0,392.0,398.0,398.0,398.0,398.0
mean,23.514573,5.454774,193.425879,104.469388,2970.424623,15.56809,76.01005,1.572864
std,7.815984,1.701004,104.269838,38.49116,846.841774,2.757689,3.697627,0.802055
min,9.0,3.0,68.0,46.0,1613.0,8.0,70.0,1.0
25%,17.5,4.0,104.25,75.0,2223.75,13.825,73.0,1.0
50%,23.0,4.0,148.5,93.5,2803.5,15.5,76.0,1.0
75%,29.0,8.0,262.0,126.0,3608.0,17.175,79.0,2.0
max,46.6,8.0,455.0,230.0,5140.0,24.8,82.0,3.0


Visualizando o dataset

In [8]:
df.head(10)

Unnamed: 0,mpg,cylinders,displacement,horsepower,weight,acceleration,model year,origin
0,18.0,8,307.0,130.0,3504,12.0,70,1
1,15.0,8,350.0,165.0,3693,11.5,70,1
2,18.0,8,318.0,150.0,3436,11.0,70,1
3,16.0,8,304.0,150.0,3433,12.0,70,1
4,17.0,8,302.0,140.0,3449,10.5,70,1
5,15.0,8,429.0,198.0,4341,10.0,70,1
6,14.0,8,454.0,220.0,4354,9.0,70,1
7,14.0,8,440.0,215.0,4312,8.5,70,1
8,14.0,8,455.0,225.0,4425,10.0,70,1
9,15.0,8,390.0,190.0,3850,8.5,70,1


Algumas colunas do atribuito *horsepower* contém alguns valores nulos:

In [9]:
df[df.isnull().values.any(axis=1)]

Unnamed: 0,mpg,cylinders,displacement,horsepower,weight,acceleration,model year,origin
32,25.0,4,98.0,,2046,19.0,71,1
126,21.0,6,200.0,,2875,17.0,74,1
330,40.9,4,85.0,,1835,17.3,80,2
336,23.6,4,140.0,,2905,14.3,80,1
354,34.5,4,100.0,,2320,15.8,81,2
374,23.0,4,151.0,,3035,20.5,82,1


Preenchendo os valores númericos nulos (NA) com a mediana.

In [10]:
df = df.fillna(df.median())

#Exibindo algumas das linhas tinham valores nulos via indíces:
df.iloc[[32,126,374],]

Unnamed: 0,mpg,cylinders,displacement,horsepower,weight,acceleration,model year,origin
32,25.0,4,98.0,93.5,2046,19.0,71,1
126,21.0,6,200.0,93.5,2875,17.0,74,1
374,23.0,4,151.0,93.5,3035,20.5,82,1


Definindo as variáveis dependentes/independentes.

In [0]:
X = df.iloc[:,1:8]
y = df.iloc[:,0]

Criando os subconjuntos de treinamento e testes:

In [0]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2, random_state = 0)

Normalizando as features :

In [0]:
X_train = feature_scaling(X_train)
X_test = feature_scaling(X_test)

### Etapa de Treinamento e Validação do Modelo

Importando e treinando o modelo de Regressao com o Conjunto de Treinamento:

In [17]:
regressor = SVR(kernel = 'rbf')
regressor.fit(X_train, y_train)



SVR(C=1.0, cache_size=200, coef0=0.0, degree=3, epsilon=0.1,
    gamma='auto_deprecated', kernel='rbf', max_iter=-1, shrinking=True,
    tol=0.001, verbose=False)

Avaliando o modelo com a métrica r²:

In [18]:
regressor.score(X_test, y_test)

-0.0019111566923788459

Prevendo os resultados com o conjunto de testes

In [20]:
y_pred = regressor.predict(X_test)

y_pred

array([22.79503532, 22.79503532, 22.79503532, 22.79503532, 22.79503532,
       22.79503532, 22.83445115, 22.79503532, 22.79503532, 22.79503532,
       22.79503532, 23.00760286, 22.79503532, 22.79503532, 22.79501322,
       22.79503532, 22.79503532, 22.79528741, 22.79503532, 22.79505077,
       22.79503532, 22.79503532, 22.79503532, 22.79503532, 22.79504414,
       22.79503532, 22.79503702, 22.79503532, 22.79503532, 22.79503532,
       22.79503532, 22.79503532, 22.79503532, 22.79504294, 22.79503532,
       22.79503532, 22.55674981, 22.79480426, 22.79503532, 22.79503532,
       22.79503532, 22.79503532, 22.79503532, 22.79503532, 22.79503533,
       22.79503532, 22.79503532, 22.79503532, 22.79503532, 22.79503532,
       22.79503662, 22.79503532, 22.79503532, 22.79503532, 22.79503532,
       22.79503532, 22.79503532, 22.79503527, 22.79503532, 22.79503532,
       22.79503532, 22.79503532, 22.79503532, 22.79503532, 22.79503532,
       22.79503532, 22.79503532, 22.79503532, 22.79503532, 22.79