# Introduction

Data is not random, it has a structure, e.g., customer behaviour.

We need a “big theory” to extract that structure from data for
 
- Understanding the process
- Making predictions for the future 


### Machine learning 

- is the art of programming computers to optimize a performance criterion using example data or past experience.
- There is no need to “learn” or to calculate payroll

### Learning is used when:

- Human expertise does not exist (navigating on Mars),
- Humans are unable to explain their expertise (speech recognition)
- Solution changes in time (routing on a computer network)
- Solution needs to be adapted to particular cases (user biometrics)


### `Data is not expensive and abundant (data warehouses, data marts); knowledge is expensive and scarce. ` 


# What is ML? 

**`Machine learning (ML)`** is a collection of algorithms and techniques used to design systems that learn from data. **`ML`** algorithms have a strong mathematical and statistical basis, but they do not take into account domain knowledge. **`ML`** consists of the following disciplines:

- Scientific computing
- Mathematics
- Statistics


# Type of **`ML`**

## Machine learning algorithms fall into two broad categories:

### - Supervised learning algorithms are trained with labeled data. In other words, data composed of examples of the desired answers. 

**Example:** For instance, a model that identifies fraudulent credit card use would be trained from a dataset with labeled data points of known fraudulent and valid charges. Most machine learning is supervised.

### - Unsupervised learning algorithms are used on data with no labels, and the goal is to find relationships in the data.

**Example:** For instance, you might want to find groupings of customer demographics with similar buying habits.



# Supervised Learning

## Regression

Examples: House Price Prediction

In **`supervised learning`**, a labeled dataset is used. A labeled dataset means that a group of data has been tagged with a label. This label provides informative meaning to the data. Using the label, an unlabeled data can be predicted to obtain a new label. 

For example, a dataset may contain a series of records containing the following fields, which record the size of the various houses and the prices for which they were sold:
House Size, Price Sold

In this very simple example, Price Sold is the label. When plotted on a chart, this dataset can help you to predict the price of a house that is yet to be sold. Predicting a price for the house is a regression problem.


## Classification

### Supervised induction used to analyze the historical data stored in a database and to automatically generate a model that can predict future behaviour

Suppose that you have a dataset containing the following: Tumor Size, Age, Malignant

The Malignant field is a label indicating if a tumor is cancerous. When you plot the dataset on a chart, you will be able to classify it into two distinct groups, with one group containing the cancerous tumors and the other containing the benign tumors.

Using this grouping, you can now **`predict`** if a new tumor is cancerous or not. This type of problem is known as a **`classification problem`**.


Data is divided into two classes: 
- **`Cancerous`**
- **`Non-cancerous`**


## Classification: Applications

Can be called as the Pattern recognition

- Face recognition: Pose, lighting, occlusion (glasses, beard), make-up, hair style.
- Character recognition: Different handwriting styles.
- Speech recognition: Temporal dependency. 
- Medical diagnosis: From symptoms to illnesses.
- Biometrics: Recognition/ authentication using physical and/or behavioral characteristics: Face, iris, signature, etc.
- Outlier/ novelty detection:


## Classification: Use Cases

- **Manufacturing:** Predictive maintenance and condition monitoring
- **Retail:** Upselling and cross-channel marketing
- **Healthcare and life sciences:** Disease identification and risk satisfaction
- **Travel and hospitality:** Dynamic pricing
- **Financial services:** Risk analytics and regulation
- **Energy:** Energy demand and supply optimization 


# Machine learning model

### A statistical representation of a real-world process based on data.
- **`Training data:`** existing data to learn from
- **`Training a model:`** when a model is being built from training data, It takes only nanoseconds to weeks

It's necessary split the data in **`train`**  and **`test`** 

### In `Supervised Learning`

- The training data is **`labeled`** 







# `ML` Model Steps:

- Extract features: Choosing features and manipulating the dataset 
- Split dataset: Train and test dataset 
- Train model: Input train dataset into a machine learning model 
- Evaluate: If desired performance isn’t reached: tune the model and repeat Step 3


1 ) Extract the features for each product in the shop, including brand, number of times viewed, cost, colour, and clothing type.

2) Split the dataset onto 2/3 and 1/3 for the train and test dataset respectively.

3) Train the model using the train dataset and a logistic regression model.

4) Evaluate the percentage of products in the test dataset that were accurately predicted as bought.


# Regression vs Classification

- **`Regression:`** is a continuous variable
 e.g., price prediction


- **`Classification:`** the label is a discrete variable
 e.g., the task of predicting the types of residence


# Unsupervised Learning

In reality, data doesn't always come with labels 

Requires manual labor to label 

### Labels are unknown 

- **No labels:** model is unsupervised and finds its own patterns

### Useful for:

- Anomaly Detection
- Clustering (dividing data into a group)


In unsupervised learning, the dataset used is not labeled. An easy way to visualize unlabeled data is to consider the dataset containing the waist size and leg length of a group of people:

- Waist Size, Leg Length (**`Clustering`**) 

- Using **`unsupervised learning`**, your job is to try to predict a pattern in the dataset. You may plot the dataset in a chart.

- You can then use some **`clustering algorithms`** to find the patterns in the dataset. The end result might be the discovery of three distinct groups of clusters in the data.


# Tuning of `ML` Models

- Every machine learning problem has **`parameters`** that must be tuned properly to ensure optimal learning
- For example, there are two **`parameters`** that must be properly tuned in the case of a simple linear regression.
- That is, when fitting a line to a scatter of data: the slope and intercept of the **`linear model`**.
- These two **`parameters`** are tuned by forming what is called a cost function or loss function.


# Reinforcement Learning

**`Reinforcement learning (RL)`** is an area of machine learning concerned with how software agents ought to take actions in an environment in order to maximize the notion of cumulative reward. 

Reinforcement learning is one of **`three basic machine learning paradigms`**, alongside **`supervised learning`** and **`unsupervised learning`**.

- Learning a policy: A sequence of outputs
- No supervised output but delayed reward
- Credit assignment problem
- Game playing
- Robot in a maze
- Multiple agents, partial observability, ...


# 3 types of `ML`: 

### **`Supervised Learning, Unsupervised Learning and Reinforcement Learning`**


# Deep Learning

**`Deep learning`** is a **subset** of machine learning in artificial intelligence that has networks capable of `learning unsupervised` from data that is unstructured or unlabeled. Also known as **`deep neural learning`** or **`deep neural network`**.


- Neural Networks -> Basic unit: neurons (nodes)
- Special area of Machine Learning
- Requires more data
- Best when inputs that are images or text


#  Cross Validation

To estimate generalization error, we need data unseen during training. We split the data as

- Training set (50%)
- Validation set (25%)
- Test (publication) set (25%)


**`Cross-validation`** is a **resampling** procedure used to evaluate machine learning models **on a limited data** sample.

**`Cross-validation`** is used to assess the predictive performance of the models and and to judge how they perform outside the sample to a new data set also known as test data.

That is, to use a **limited sample** in order to estimate how the model is expected to perform in general when used to make predictions on data not used during the training of the model.

