![](images/logo.png)

# Machine Learning - Day 1

- Representation
- Split your data into features and labels
- preprocess using numpy and pandas
- Building a machine learning model
---

![](images/Picture1.png)

---

## 1. Representation

![](images/Picture2.png)

#### Iris Dataset
![](pandas/images/iris.jpg)

   The data set consists of 50 samples from each of three species of Iris (setosa, virginica and versicolor). Four features were measured from each sample: the length and the width of the sepals and petals, in centimeters. Based on the combination of these four features, we have to developed a ML model to distinguish the species from each other.<br>

#### Attribute Information
1. sepal length in cm
2. sepal width in cm
3. petal length in cm
4. petal width in cm
5. class:<br>
    -- Iris Setosa<br>
    -- Iris Versicolour<br>
    -- Iris Virginica<br>
    
##### Note: The idea of finding a representation, for real world objects, is widely used in many other paradigm of computer science. For example, <span style="color:blue">Object Oriented Programming.</span>

### Loading the Dataset

In [22]:
import seaborn as sns

In [23]:
iris = sns.load_dataset('iris'); iris.head()

Unnamed: 0,sepal_length,sepal_width,petal_length,petal_width,species
0,5.1,3.5,1.4,0.2,setosa
1,4.9,3.0,1.4,0.2,setosa
2,4.7,3.2,1.3,0.2,setosa
3,4.6,3.1,1.5,0.2,setosa
4,5.0,3.6,1.4,0.2,setosa


In [25]:
iris.shape

(150, 5)

## 2. Features and Labels

Dividing your dataset into:
- features and labels (or) 
- X and y (or)
- Input and output (or) 
- data and output (or) 
- independent and dependent variables

## 3. Preprocessing

You (as a data scientist or Machine Learning Engineer) will be spending most of your time in preprocessing the data. All the algorithm that are used in kaggle competitions are already implemented. Everyone is free to use them. The only thing that differentiate winners from other competitors is the **preprocessing**.

### <span style="color:blue">Numpy</span>

##### 0. Introduction
##### 1. Creating Numpy array
##### 2. Methods and Attributes
##### 3. Indexing and Slicing

#### <span style="color:orange">Quiz time</span>

Now you will be given a few matrices, and be asked to replicate the resulting matrix outputs:

In [5]:
import numpy as np

In [10]:
m = np.arange(1,26,1).reshape(5,5)
m

array([[ 1,  2,  3,  4,  5],
       [ 6,  7,  8,  9, 10],
       [11, 12, 13, 14, 15],
       [16, 17, 18, 19, 20],
       [21, 22, 23, 24, 25]])

In [11]:
m[3,4]

20

In [14]:
m[0]

array([1, 2, 3, 4, 5])

In [15]:
m[4]

array([21, 22, 23, 24, 25])

In [17]:
m[0:3,1:2]

array([[ 2],
       [ 7],
       [12]])

In [21]:
m[2:,1:]

array([[12, 13, 14, 15],
       [17, 18, 19, 20],
       [22, 23, 24, 25]])

### <span style="color:blue">Pandas

##### 0. Introduction to Pandas
##### 1. How pandas is same as numpy
##### 2. How pandas df is built on top off numpy
##### 3. Same indexing techniques can be used; pandas has dual referencing

#### <span style="color:orange">Quiz time</span>
1. split iris dataset into X & y.
2. split boston dataset into X & y.

### <span style="color:blue">Broadcasting

##### 1. Arithematic Operators
##### 2. Comparision Operators

#### <span style="color:orange">Quiz time</span>
1. Normalizing features and
2. standarizing features.

### <span style="color:blue">Arithematic Operations between numpy arrays

#### <span style="color:orange">Quiz time</span>
1. Creating new features
2. Combining multiple columns
3. Fare per head/ price per room

### <span style="color:blue">Universal Function

Numpy comes with many [universal array functions](http://docs.scipy.org/doc/numpy/reference/ufuncs.html), which are essentially just mathematical operations you can use to perform the operation across the array.

#### <span style="color:orange">Quiz time</span>
1. Implementing distance formula

## 4. Building a Machine Learning Model

### <span style="color:blue">Scikit Learn

##### 0. Introduction to sklearn
##### 1. How to build a model 

```python
from sklearn.neighbors import KNeighborsRegressor
from sklearn.linear_model import LinearRegression
from sklearn.svm import SVR

```

### <span style="color:blue">Supervised & Unsupervised Learning

#### Supervised Learning

The computer is presented with example **inputs** and their **desired outputs**, given by a "teacher", and the goal is to **learn a general rule that maps inputs to outputs**. Once training is complete, the algorithm will apply what was learned to new data.

#### UnSupervised Learning

**No labels** are given to the learning algorithm, leaving it on its own to **find structure in its input**. Once trained, the algorithm can use its bank of associations to interpret new data. These algorithms have only become feasible in the age of big data, as they require massive amounts of training data.

![](images/Picture3.png)

### <span style="color:blue">Regression & Classification

#### Regression

Regression is the task of predicting a **continuous quantity.**

#### Classification
Classification is the task of predicting a **discrete class label**.

#### <span style="color:orange">Quiz time</span>
1. Supervised or Unsupervised?
2. Classification or Regression?