# Tarefa 1: Aprendizado Supervisionado

**Autor**: Matheus Jericó Palhares <br>
**LinkedIn**: https://linkedin.com/in/matheusjerico <br>
**Github**: https://github.com/matheusjerico

### 2) Tarefa 2: implementar a função “predict_KNN(pontos, ponto)”, que recebe o conjunto de treinamento e o ponto cuja classe será predita, porém, aqui, você realizará uma regressão. Considere as features como sendo apenas age e chol, tendo thalach como o alvo da regressão. Esta tarefa deve ser realizada nos dois datasets fornecidos.

- Para a tarefa 2: um scatter plot mostrando os pontos de treinamento, os pontos que estão sendo preditos, um código de cores intuitivo que diferencie os pontos preditos dos pontos de treinamento e o erro total médio.
- Compare os seus resultados com os obtidos através do sklearn.neighbors.KNeighborsClassifier e do sklearn.neighbors.KNeighborsRegressor. Os seus resultados e os resultados do sklearn deveriam ser iguais.

### Bibliotecas

In [55]:
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
import math
import operator
from sklearn.neighbors import KNeighborsRegressor
from sklearn.metrics import mean_squared_error
from mpl_toolkits.mplot3d import Axes3D

## Tarefa 1

### 1. Carregando dados

In [4]:
dataset = pd.read_csv("./Dataset/heart.csv")
dataset.head()

Unnamed: 0,age,sex,cp,trestbps,chol,fbs,restecg,thalach,exang,oldpeak,slope,ca,thal,target
0,63,1,3,145,233,1,0,150,0,2.3,0,0,1,1
1,37,1,2,130,250,0,1,187,0,3.5,0,0,2,1
2,41,0,1,130,204,0,0,172,0,1.4,2,0,2,1
3,56,1,1,120,236,0,1,178,0,0.8,2,0,2,1
4,57,0,0,120,354,0,1,163,1,0.6,2,0,2,1


In [5]:
dataset = dataset[['age', 'chol', 'thalach']]

In [6]:
dataset.head()

Unnamed: 0,age,chol,thalach
0,63,233,150
1,37,250,187
2,41,204,172
3,56,236,178
4,57,354,163


In [7]:
dataset.rename(columns={'thalach':'target'}, inplace = True)

### 3. Criando classe do KNN

In [62]:
class kNN():
    def __init__(self, x, y, k, weighted=False):
        self.x = x
        self.y = y
        self.k = k

    def euclidean_distance(self, x1, y1, x2, y2):
        return math.sqrt((x1 - x2) ** 2 + (y1 - y2) ** 2)

    def gaussian(self, dist, sigma=1):
        result = 1./(math.sqrt(2. * math.pi) * sigma) * math.exp(-dist ** 2 / (2 * sigma **2))
        return result
    
    def predict(self, test_set):
        predictions = []
        for i, j in test_set:
            distances = []
            for idx, (l, m) in enumerate(self.x):
                dist = self.euclidean_distance(i, j, l, m)
                distances.append((self.y[idx], dist))
            distances.sort(key=operator.itemgetter(1))
            v = 0
            total_weight = 0
            for i in range(self.k):
                weight = self.gaussian(distances[i][1])
                v += distances[i][0]
                total_weight += weight
                predictions.append(v / self.k)
        return predictions


### 4. DIvidindo dados 

In [63]:
def train_test_split(dataset, test_size=0.3, random_state=0):
    np.random.seed(random_state)
    _dataset = np.array(dataset)
    np.random.shuffle(_dataset)
    
    threshold = int(_dataset.shape[0] * test_size)
    X_test = _dataset[:threshold, :-1]
    Y_test = _dataset[:threshold, -1]
    X_train = _dataset[threshold:, :-1]
    Y_train = _dataset[threshold:, -1]
    
    return X_train, X_test, Y_train, Y_test


In [64]:
X_train, X_test, y_train, y_test = train_test_split(dataset, test_size=0.3, random_state=7)

In [65]:
X_train.shape

(213, 2)

### 4. Visualizando Graficamente

In [76]:
# def plotScatter(dataset):

### 5. Comparando modelos 

In [72]:
classifier = kNN(X_train, y_train, k = 1)
pred_test = classifier.predict(X_test)

test_error = mean_squared_error(y_test, pred_test)
print("Test error: {}".format(test_error))

Test error: 1072.1888888888889


In [74]:
knn_sklearn = KNeighborsRegressor(n_neighbors=1)
knn_sklearn.fit(X_train, y_train)
preds = knn_sklearn.predict(X_test)

test_error = mean_squared_error(y_test, preds)
print("Test error: {}".format(test_error))

Test error: 1072.0555555555557
