# Scitkit-learn and Machine Learning Basics

## Using Iris Dataset
- 150 observations
- 4 Features: sepal length, sepal width, petal length, petal width)
- Response variable is the iris species

### Load Data

In [1]:
# Load iris dataset from datset module (load_iris is a function)
from sklearn.datasets import load_iris

In [2]:
# Assign return of load_iris() function to a variable (returns a "Bunch" object)
iris = load_iris()
type(iris)

sklearn.utils.Bunch

### Data Attributes

In [3]:
# List the attributes of the iris "Bunch" object
iris.keys()

dict_keys(['data', 'target', 'target_names', 'DESCR', 'feature_names'])

In [4]:
# Look at iris data attribute
# Each column represents each feature (measurement)
iris.data[0:5]

array([[ 5.1,  3.5,  1.4,  0.2],
       [ 4.9,  3. ,  1.4,  0.2],
       [ 4.7,  3.2,  1.3,  0.2],
       [ 4.6,  3.1,  1.5,  0.2],
       [ 5. ,  3.6,  1.4,  0.2]])

In [5]:
# Look at names of four features
iris.feature_names

['sepal length (cm)',
 'sepal width (cm)',
 'petal length (cm)',
 'petal width (cm)']

In [6]:
# Look a target (integers representing the species)
iris.target

array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
       1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
       1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
       2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
       2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2])

In [7]:
# Look at the target names ( 0 = setosa, 1 = versicolor, 2 = virginica)
iris.target_names

array(['setosa', 'versicolor', 'virginica'],
      dtype='<U10')

## Scikit-learn Requirements
1. Features and Response should be separate objects
2. Features and Response should be numeric
3. Features and Response should be NumPy arrays
4. Features and Response should have specific shapes

In [8]:
# Features and Response should be separate NumPy arrays objects 
print(type(iris.data))
print(type(iris.target))

<class 'numpy.ndarray'>
<class 'numpy.ndarray'>


In [9]:
# Features and Response should have specific shapes
print(iris.data.shape) # 150 observations, 4 features
print(iris.target.shape) # 150 observations

(150, 4)
(150,)


## How to Store Data
- Store Features in X (Matrix)
- Store Response in y (Vector)

In [10]:
X = iris.data
y = iris.target