In [1]:
import utils
import warnings

utils.set_css_style('style.css')
warnings.filterwarnings('ignore')

# 1. AI vs Machine Learning vs Deep Learning

## AI is…

Simply put, AI is anything capable of mimicking human behavior. From the simplest application — say, a talking doll or an automated telemarketing call — to more robust algorithms like the deep neural networks, they’re all trying to mimic human behavior.

Today, AI is a term being applied broadly in the technology world to describe solutions that can learn on their own. These algorithms are capable of looking at vast amounts of data and finding trends in it, trends that unveil insights, insights that would be extremely hard for a human to find. However, AI algorithms can’t think like you and me. They are trained to perform very specialized tasks, whereas the human brain is a pretty generic thinking system.

## Machine learning

Now we know that anything capable of mimicking human behavior is called AI. If we start to narrow down to the algorithms that can “think” and provide an answer or decision, we’re talking about a subset of AI called “machine learning.” Machine learning algorithms apply statistical methodologies to identify patterns in past human behavior and make decisions. They’re good at predicting, such as predicting if someone will default on a loan being requested, predicting your next online purchase and offering multiple products as a bundle, or predicting fraudulent behavior. They get better at their predictions every time they acquire new data. However, even though they can get better and better at predicting, they only explore data based on programmed data feature extraction; that is, they only look at data in the way we programmed them to do so. They don’t adapt on their own to look at data in a different way.

## Deep learning

Going a step narrower, we can look at the class of algorithms that can learn on their own — the “deep learning” algorithms. Deep learning essentially means that, when exposed to different situations or patterns of data, these algorithms adapt. That’s right, they can adapt on their own, uncovering features in data that we never specifically programmed them to find, and therefore we say they learn on their own. This behavior is what people are often describing when they talk about AI these days.

<div class="item">
    <img src="figures/ai_ml_dl.png" alt="regression-ai_ml_dl" width="600px"/>
</div>

# 2. What is Machine Learning?

Two definitions of Machine Learning are offered. Arthur Samuel described it as: "the field of study that gives computers the ability to learn without being explicitly programmed." This is an older, informal definition.

Tom Mitchell provides a more modern definition: "A computer program is said to learn from experience E with respect to some class of tasks T and performance measure P, if its performance at tasks in T, as measured by P, improves with experience E."

Example: playing checkers.

E = the experience of playing many games of checkers

T = the task of playing checkers.

P = the probability that the program will win the next game.

In general, any machine learning problem can be assigned to one of two broad classifications:

Supervised learning and Unsupervised learning.



# 3. Three components of machine learning

The only goal of machine learning is to predict results based on incoming data. That's it. All ML tasks can be represented this way, or it's not an ML problem from the beginning.

The greater variety in the samples you have, the easier it is to find relevant patterns and predict the result. Therefore, we need three components to teach the machine:

### Data 

Want to detect spam? Get samples of spam messages. Want to forecast stocks? Find the price history. Want to find out user preferences? Parse their activities on Facebook. The more diverse the data, the better the result. Tens of thousands of rows is the bare minimum for the desperate ones.

It's extremely tough to collect a good collection of data (usually called a dataset). They are so important that companies may even reveal their algorithms, but rarely datasets.

### Features 

Also known as parameters or variables. Those could be car mileage, user's gender, stock price, word frequency in the text. In other words, these are the factors for a machine to look at.

When data stored in tables it's simple — features are column names. But what are they if you have 100 Gb of cat pictures? We cannot consider each pixel as a feature. That's why selecting the right features usually takes way longer than all the other ML parts. That's also the main source of errors. 

### Algorithms 

Most obvious part. Any problem can be solved differently. The method you choose affects the precision, performance, and size of the final model. There is one important nuance though: if the data is crappy, even the best algorithm won't help. Sometimes it's referred as "garbage in – garbage out".


# 4. Supervised Learning

Classical machine learning is often divided into two categories – Supervised and Unsupervised Learning.

In the first case, the machine has a "supervisor" or a "teacher" who gives the machine all the answers, like whether it's a cat in the picture or a dog. The teacher has already divided (labeled) the data into cats and dogs, and the machine is using these examples to learn. One by one. Dog by cat.

Unsupervised learning means the machine is left on its own with a pile of animal photos and a task to find out who's who. Data is not labeled, there's no teacher, the machine is trying to find any patterns on its own. We'll talk about these methods below.

Clearly, the machine will learn faster with a teacher, so it's more commonly used in real-life tasks. 

Supervised learning problems are categorized into "regression" and "classification" problems. In a regression problem, we are trying to predict results within a continuous output, meaning that we are trying to map input variables to some continuous function. In a classification problem, we are instead trying to predict results in a discrete output. In other words, we are trying to map input variables into discrete categories.

## Classification

<img src="figures/classification.jpg" alt="classification" align="right" width="250px"> 

Today used for:
    
- Spam filtering
- Language detection
- A search of similar documents
- Sentiment analysis
- Recognition of handwritten characters and numbers
- Fraud detection

Popular algorithms: Naive Bayes, Decision Tree, Logistic Regression, K-Nearest Neighbours, Support Vector Machine


## Regression

<img src="figures/regression.jpg" alt="regression" align="right" width="250px"> 

Today this is used for:

- Stock price forecasts
- Demand and sales volume analysis
- Medical diagnosis
- Any number-time correlations

Popular algorithms are Linear and Polynomial regressions.


## Examples:

Given data about the size of houses on the real estate market, try to predict their price. Price as a function of size is a continuous output, so this is a regression problem. We could turn this example into a classification problem by instead making our output about whether the house "sells for more or less than the asking price." Here we are classifying the houses based on price into two discrete categories.

<div class="item">
    <img src="figures/regression-vs-classification.jpeg" alt="regression-classification" width="500px"/>
    <span class="caption">Regression vs Classification (<a href="https://medium.com/@ali_88273/regression-vs-classification-87c224350d69">Source</a>) </span>
</div>

Given a picture of a person, we have to predict their age on the basis of the given picture, this is a regression problem

Given a patient with a tumor, we have to predict whether the tumor is malignant or benign, this is a regression problem


To racap, classification is the problem of identifying to which of a set of categories (sub-populations) a new observation belongs, on the basis of a training set of data containing observations (or instances) whose category membership is known. regression analysis is a set of statistical processes for estimating the relationships between a dependent variable (often called the ‘outcome variable’) and one or more independent variables (often called ‘predictors’, ‘covariates’, or ‘features’).



<div class="item">
    <img src="figures/regression-vs-classification-2.png" alt="regression-classification" width="600px"/>
    <span class="caption">Regression Vs Classification (<a href="https://medium.com/@ghsusan7/what-is-data-science-it-depends-ea6f4ce5659b">Source</a>) </span>
</div>



# 5. Unsupervised Learning

Unsupervised learning allows us to approach problems with little or no idea what our results should look like. We can derive structure from data where we don't necessarily know the effect of the variables.

These are called unsupervised learning because unlike supervised learning above there is no correct answers and there is no teacher. Algorithms are left to their own devises to discover and present the interesting structure in the data.

We can derive this structure by clustering the data based on relationships among the variables in the data. With unsupervised learning there is no feedback based on the prediction results.

<div class="item">
    <img src="figures/supervised_vs_unsupervised.png" alt="supervised_vs_unsupervised" width="600px"/>
    <span class="caption">Supervised Vs Unsupervised (<a href="https://medium.com/@ghsusan7/what-is-data-science-it-depends-ea6f4ce5659b">Source</a>) </span>
</div>

##  Clustering

<img src="figures/clustering.jpg" alt="clustering" align="right" width="250px"> 

Clustering, like regression, describes the class of problem and the class of methods.

Clustering methods are typically organized by the modeling approaches such as centroid-based and hierarchal. All methods are concerned with using the inherent structures in the data to best organize the data into groups of maximum commonality.

The most popular clustering algorithms are:

- k-Means
- k-Medians
- Expectation Maximisation (EM)
- Hierarchical Clustering

### Example:

Clustering: Take a collection of 1,000,000 different genes, and find a way to automatically group these genes into groups that are somehow similar or related by different variables, such as lifespan, location, roles, and so on.

## Dimensionality Reduction

<img src="figures/dimensional-reduction.jpg" alt="dimensional-reduction" align="right" width="250px"> 

Like clustering methods, dimensionality reduction seek and exploit the inherent structure in the data, but in this case in an unsupervised manner or order to summarize or describe data using less information.

This can be useful to visualize dimensional data or to simplify data which can then be used in a supervised learning method. Many of these methods can be adapted for use in classification and regression.

The most popular algorithms are:

- Principal Component Analysis (PCA)
- Singular Value Decomposition (SVD)
- Latent Dirichlet allocation (LDA)
- Latent Semantic Analysis (LSA, pLSA, GLSA)
- t-SNE (for visualization)

## Association

<img src="figures/association.jpg" alt="association" align="right" width="250px"> 

Association rule learning methods extract rules that best explain observed relationships between variables in data.

These rules can discover important and commercially useful associations in large multidimensional datasets that can be exploited by an organization.

Nowadays is used:

- To forecast sales and discounts
- To analyze goods bought together
- To place the products on the shelves
- To analyze web surfing patterns

The most popular association rule learning algorithms are:

- Apriori algorithm
- Eclat algorithm
