# Introduction to Machine Learning

Following the book __Introduction to Statistical Learning__ by __Gareth James__.

- Machine learning is a method of data analysis that automates analytical model building.
- Using algorithms that iteratively learn from data, machine learning allows computers to find hidden insights without being explicitly programmed where to look.
- What is machine learning used for?
    - Fraud detection
    - Web search results
    - Real-time ads on web pages
    - Credit scoring and next-best offers
    - Prediction of equipment failures
    - New pricing models
    - Network intrusion detection
    - Recommendation engines
    - Customer segmentation
    - Text sentiment analysis
    - Predicting customer churn
    - Pattern and image recognition
    - Email spam filtering
    - Financial modeling

***

![title](images/i1.png)

***

There are 3 main types of __Machine Learning algorithms__:


- Supervised Learning
- Unsupervised Learning
- Reinforcement Learning

- Supervised Learning:
    * We have labeled data and we are trying to predict a label based off of known features.

- Unsupervised Learning:
    * We have unlabeled data and we are trying to group together similar data points based off of features.
    
- Reinforcement Learning:
    * Algorithm learns to perform an action from experience

***

- __Supervised learning__ algorithms are trained using __labeled__ examples, such as an input where the desired output is known.
<br><br>
- For example, a piece of equipment could have data points labeled either "F" (failed) or "R" (runs).
<br><br>
- The learning algorithm receives a set of inputs along with the corresponding correct outputs, and then the algorithm learns by comparing its actual output with correct outputs to find errors.
<br><br>
- It then modifies the model accordingly.
<br><br>
- Through methods like classification, regression, prediction and gradient boosting, supervised learning uses patterns to predict the values of the label on additional unlabeled data.
<br><br>
- Supervised learning is commonly used in applications where historical data predicts future likely events.
<br><br>
- For example, it can anticipate when credit card transactions are likely to be fraudulent or which insurance customer is likely to file a claim.
<br><br>
- Or it can attempt to predict the price of a house based on different features for houses for which we have historical price data.

***

- __Unsupervised learning__ is used against data that has no historical labels.
<br><br>
- The system is not told the 'right answer'. The algorithm must figure out what is being shown.
<br><br>
- The goal is to explore the data and find some structure within.
<br><br>
- Or it can find the main attributes that separate customer segments from each other.
<br><br>
- Popular techniques include self-organizing maps, nearest-neighbor mapping, k-means clustering and singular value decomposition.
<br><br>
- These algorithms are also used to segment text topics, recommend items and identify data outliers.

***

- __Reinforcement learning__ is often used for robotics, gaming and navigation.
<br><br>
- With reinforcement learning, the algorithm discovers through trial and error which actions yield the greatest rewards.
<br><br>
- This type of learning has three primary components: the agent (the learner or the decision maker), the environment (everything the agent interacts with) and actions (what the agent can do).
<br><br>
- The objective is for the agent to choose actions that maximize the expected reward over a given amount of time.
<br><br>
- The agent will reach the goal much faster by following a good policy.
<br><br>
- So the goal in reinforcement learning is to learn the best policy.

# Machine Learning with Python

- I am going to be using the __Scikit Learn__ package.
<br><br>
- It's the most popular machine learning package for Python and has a lot of algorithms built-in!
<br><br>
- Note that you might need to install it using:
    - __conda install scikit-learn__ (if you have the Anaconda distribution)
    <br><br>
    or
    <br><br>
    - __pip install scikit-learn__

***

Before getting into the ML world and the scikit-learn package, let's go over and review the machine learning process.
<br><br>
- The machine learning process starts off with our data.
<br><br>
- Somehow we need to acquire data, and then the next step is to clean the data and to format the data so that the machine learning model can accept it.
<br><br>
- Before we actually give it to the model however, we are going to split that data into a __test set__ and a __training set__.
<br><br>
- We train our model on the training set and then in the next step we test our model using the test set and we iterate our model and tune the parameters of the model until its ready to deploy.

***

In [1]:
# Every algorithm is exposed in scikit-learn via an "Estimator".

# First, we will need to import the model. The general form is:
from sklearn.linear_model import LinearRegression

In [2]:
# Here the "LinearRegression" is the "Estimator".

__Estimator parameters:__ All the parameters of an estimator can be set when it is instantiated, and have suitable default values.

In [3]:
model = LinearRegression(normalize=True)
print(model)

LinearRegression(copy_X=True, fit_intercept=True, n_jobs=None, normalize=True)


Once we have our model created with our parameters, it is time to fit our model on some data!
<br><br>
Remember that we should split our data into a training set and a test set.

In [4]:
import numpy as np
from sklearn.model_selection import train_test_split
x, y = np.arange(10).reshape(5,2), range(5)
x

array([[0, 1],
       [2, 3],
       [4, 5],
       [6, 7],
       [8, 9]])

In [5]:
list(y)

[0, 1, 2, 3, 4]

In [6]:
x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.3)

In [7]:
x_train

array([[8, 9],
       [0, 1],
       [6, 7]])

In [8]:
y_train

[4, 0, 3]

In [9]:
x_test

array([[2, 3],
       [4, 5]])

In [10]:
y_test

[1, 2]

Now that we have split the data, we can train/fit our model on the training data.
<br><br>
This is done through the __model.fit()__ method.

In [11]:
model.fit(x_train, y_train)

LinearRegression(copy_X=True, fit_intercept=True, n_jobs=None, normalize=True)

Now the model has been fit and trained on the training data.
<br><br>
The model is ready to predict labels or values on the test set.
<br><br>
We get predicted values using the __predict()__ method.

In [12]:
predictions = model.predict(x_test)

We can then evaluate our model by comparing our predictions to the correct values.
<br><br>
The evaluation method depends on what sort of machine learning algorithm we are using (e.g. Regression, Classification, Clustering etc.)

***

Scikit-learn really strives to have a uniform interface across all methods, and we will see examples of these below.
<br><br>
Given a scikit-learn __estimator__ object named model, the following methods are available:
<br><br>
- Available in __all Estimators:__
    - __model.fit()__ : fit training data
    <br><br>
    - For supervised learning applications, this accepts two arguments: the data X and the labels y (e.g. model.fit(X,y)
    <br><br>
    - For unsupervised learning applications, this accepts only a single argument, the data X (e.g. model.fit(X)
    <br><br>
- Available in __supervised estimators:__
    - __model.predict()__ : given a trained model, predict the label of a new set of data. This method accepts one argument, the new data X_new (e.g. model.predict(x_new)), and returns the learned label for each object in the array.
    <br><br>
    - __model.predict_proba()__ : for classification problems, some estimators also provide this method, which returns the probability that a new observation has each categorical label. In this case, the label with the highest probability is returned by model.predict()
    <br><br>
    - __model.score()__ : for classification or regression problems, most estimators implement a score method. Scores are between 0 and 1, with a larger score indicating a better fit.
    <br><br>
- Available in __unsupervised estimators:__
    - __model.predict()__ : predict labels in clustering algorithms.
    <br><br>
    - __model.transform()__ : given an unsupervised model, transform new data into the new basis. This also accepts one argument X_new, and returns the new representation of the data based on the unsupervised model.
    <br><br>
    - __model.fit_transform()__ : some estimators implement this method, which more efficiently performs a fit and a transform on the same input data.

![title](images/ml_map.png)

Always refer to the chart above to decide on what estimator to use.
https://scikit-learn.org/stable/tutorial/machine_learning_map/index.html