### Importing scikit learn datasets
- The `sklearn.datasets` module includes utilities to load datasets, including methods to load and fetch popular reference datasets. It also features some artificial data generators.

In [1]:
from sklearn import datasets

- This package also features helpers to fetch larger datasets commonly used by the machine learning community to benchmark algorithms on data that comes from the ‘real world’. **load_iris** was used here. 

- It’s also possible for almost all of these functions to constrain the output to be a tuple containing only the **data(features=X)** and the **target(labels=y)**, by setting the `return_X_y` parameter to `True`.

- In line[2] below,a **tuple** containing both data(features) and target(labels) of the **iris** dataset is returned `X, y = datasets.load_iris(return_X_y=True)`, the only difference is that this time, we're not loading the entire dataset metadata as we did on the episode #2;
- **Note:** In Python, elements of tuples and lists can be assigned to multiple variables. It is called **sequence unpacking.** For our specific case **Tuple Unpacking.**

In [2]:
X, y = datasets.load_iris(return_X_y=True)

- In line[3], we can clearly see that this time no extra exercise was required to split our dataset into training and testing subsets.
- Just by importing the `train_test_split` function from `sklearn.model_selection` we could quickly get our data split into four(4) shuffled datasets. the `train_test_split()` function must take the **data(features), target(labels) subsets** if test_size is not defined a default value of .25 will be used. which stands for 25%. In our case we've defined .5 or 50% for splitting and shuffling our dataset. **50% of 150 records 75 lines**
- `train_test_split()` **returns: list, length=2 * len(arrays)** in our case(*we provided two(2) arrays, X and y*), four(4) arrays will be returned, 50% for training, 50% for testing. `X_train, X_test, y_train, y_test`.

In [3]:
from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.5)

- It's always a good practice to visualize the datasets before starting to analyze it;
- In line[4], I'm just printing the datasets shapes and preview data to ensure all is fine;
- The **Iris** dataset table is composed of four(4) columns of features (sepal length, sepal width, petal length, petal width) and one list containing the type of iris flowers (Setosa, Versicolor, and Virginica). That's why the features dataset are 75 lines by 4 columns while the labels are only 75 lines. 
- **X_train** - Features(data);
- **y_train** - Labels(target);
- **X_test** - Test data (features);
- **y_test** - Test labels (target) to check how accurate our classifier is, by comparing it with correct predicted **X_test** inputs;

In [4]:
print(X_train.shape, y_train.shape, X_test.shape, y_test.shape, "\n")

print(X_train[0], y_train[0], X_test[0], y_test[0])

(75, 4) (75,) (75, 4) (75,) 

[6.1 2.8 4.7 1.2] 1 [5.7 2.8 4.1 1.3] 1


- (sklearn.neighbors)[https://scikit-learn.org/stable/modules/neighbors.html?highlight=knn%20classification#nearest-neighbors] provides functionality for unsupervised and supervised neighbors-based learning methods. Unsupervised nearest neighbors is the foundation of many other learning methods, notably manifold learning and spectral clustering. Supervised neighbors-based learning comes in two flavors: classification for data with discrete labels, and regression for data with continuous labels.
- We'll benefit of the **KNeighborsClassifier** for this exercise, we can tell by the shape of our data that we're dealing with a classification problem. **(discrete labels data composed by three flower species)**

In [5]:
from sklearn.neighbors import KNeighborsClassifier

clf = KNeighborsClassifier()

In [6]:
clf.fit(X_train, y_train)

KNeighborsClassifier()

In [7]:
pred = clf.predict(X_test)

In [8]:
clf.score(X_test, y_test) * 100

94.66666666666667

In [9]:
from sklearn.metrics import accuracy_score

In [10]:
accuracy_score(pred, y_test) * 100

94.66666666666667