 ![Spark Logo](http://spark-mooc.github.io/web-assets/images/ta_Spark-logo-small.png)
 
 # Welcome to Apache Spark with Python

> Apache Spark is a fast and general-purpose cluster computing system. It provides high-level APIs in Java, Scala, Python and R, and an optimized engine that supports general execution graphs. It also supports a rich set of higher-level tools including Spark SQL for SQL and structured data processing, MLlib for machine learning, GraphX for graph processing, and Spark Streaming.
http://spark.apache.org/


In this notebook, we will implement k-Nearest Neighbors. It has 2 parts: 

1. Training Phase:
a. Extract feature vectors from a
randomly selected subset of training
examples.

2. Testing / Prediction Phase:
a. Find k nearest neighbors to new
observations.
b. Classify test and/or new observation
based on some decision rule


In [13]:
import numpy as np
from sklearn import neighbors, datasets

> Insert the dataset Iris

In [19]:
k = 3
iris = datasets.load_iris()

X = iris.data[:, :2]
y = iris.target

> Build the model

In [20]:
knn = neighbors.KNeighborsClassifier(
    n_neighbors=k, 
    weights='distance')

In [21]:
knn.fit(X, y)

KNeighborsClassifier(algorithm='auto', leaf_size=30, metric='minkowski',
           metric_params=None, n_jobs=1, n_neighbors=3, p=2,
           weights='distance')

In [22]:
X

array([[ 5.1,  3.5],
       [ 4.9,  3. ],
       [ 4.7,  3.2],
       [ 4.6,  3.1],
       [ 5. ,  3.6],
       [ 5.4,  3.9],
       [ 4.6,  3.4],
       [ 5. ,  3.4],
       [ 4.4,  2.9],
       [ 4.9,  3.1],
       [ 5.4,  3.7],
       [ 4.8,  3.4],
       [ 4.8,  3. ],
       [ 4.3,  3. ],
       [ 5.8,  4. ],
       [ 5.7,  4.4],
       [ 5.4,  3.9],
       [ 5.1,  3.5],
       [ 5.7,  3.8],
       [ 5.1,  3.8],
       [ 5.4,  3.4],
       [ 5.1,  3.7],
       [ 4.6,  3.6],
       [ 5.1,  3.3],
       [ 4.8,  3.4],
       [ 5. ,  3. ],
       [ 5. ,  3.4],
       [ 5.2,  3.5],
       [ 5.2,  3.4],
       [ 4.7,  3.2],
       [ 4.8,  3.1],
       [ 5.4,  3.4],
       [ 5.2,  4.1],
       [ 5.5,  4.2],
       [ 4.9,  3.1],
       [ 5. ,  3.2],
       [ 5.5,  3.5],
       [ 4.9,  3.1],
       [ 4.4,  3. ],
       [ 5.1,  3.4],
       [ 5. ,  3.5],
       [ 4.5,  2.3],
       [ 4.4,  3.2],
       [ 5. ,  3.5],
       [ 5.1,  3.8],
       [ 4.8,  3. ],
       [ 5.1,  3.8],
       [ 4.6,

In [25]:
pred_1 = knn.predict([[6,2.5]])
print(pred_1)

[1]


In [40]:
k = 15

pred_2 = knn.predict([[0,0]])
print(pred_2)

[0]


In [41]:
pred_3 = knn.predict([[8,5]])
print(pred_3)

[2]


In [42]:
pred_4 = knn.predict([[2.5,2.5]])
print(pred_4)

[0]


In [43]:
preds = knn.predict([[0,0],[8,5],[2.5,2.5]])
print(preds)

[0 2 0]
