# Week 1 - Scikit-learn (sklearn) datasets tutorial

sklearn is a popular machine learning python library, commonly used together with numpy and scipy.

In [None]:
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline

## Loading data

We will frequently be loading our datasets into 2D arrays, either numpy arrays or scipy sparse matrices. The shape of these arrays is usually `(n_samples, n_features)`. 

* `n_samples` is the number of examples in our dataset, e.g. the number of rows in a csv file
* `n_features` is the number of variables describing each example in our dataset, e.g. the number of columns in a csv file

This 2D matrix is often called the *feature matrix*.

#### The digits dataset

This dataset consists of 8x8 images of hand-written digits with their associated labels, from 0-9.

In [None]:
from sklearn import datasets

In [None]:
# Load in the `digits` data
digits = datasets.load_digits()

In [None]:
print(type(digits))
digits.keys()

In [None]:
type(digits.data)
type(digits.images)

In [None]:
digits.data.shape  # (Number of data examples, number of pixels)

In [None]:
digits.images.shape

In [None]:
print(np.all(digits.images.reshape((1797,64)) == digits.data))

In [None]:
#Display a random digit from the dataset
i = np.random.choice(len(digits.data))
plt.figure(1, figsize=(3, 3))
plt.imshow(digits.images[i], cmap=plt.cm.gray_r, interpolation='nearest')
print(f"Digit: {digits.target[i]}")
plt.show()

#### The Iris Dataset

This dataset consists of measurements of three different species of irises.

Setosa:<br>
<img src="images/iris_setosa.jpg" width=300 height=300 align=left>

Versicolor:<br>
<img src="images/iris_versicolor.jpg" width=300 height=300 align=left>

Virginica:<br>
<img src="images/iris_virginica.jpg" width=300 height=300 align=left>

In [None]:
iris = datasets.load_iris()

In [None]:
iris.keys()

In [None]:
iris.data.shape

In [None]:
iris.feature_names

In [None]:
iris.target_names

In [None]:
iris.target.shape

In [None]:
iris.target

In [None]:
print(iris.target_names)