## AI vs ML vs DL
### Artificial Intelligence (AI)
A branch of computer science dealing with the simulation of intelligent behavior in computers. In layman's term, **AI** is a decision making algorithm that mimics human mind by doing smart work intelligently to solve complex problems.

![ai](imgs/ai.png)

### Machine Learning (ML)

**Machine Learning** is the field of study that gives computers the ability to learn without being explicitly programmed. It is the study and construction of programs that are not explicitly programmed, but learn patterns as they are exposed to more data over time. The goal is to self-learn from data on certain task to maximize the performance of machine using experience on this task.

A computer program is said to learn from **experience E** with respect to some class of **tasks T** and **performance measure P**, if its performance at tasks in T, as measured by P, improves with experience E. Consider an example, playing checkers where:
* **E** = the experience of playing many games of checkers
* **T** = the task of playing checkers.
* **P** = the probability that the program will win the next game.

In **supervised learning**, we are given a data set and already know what our correct output should look like, having the idea that there is a relationship between the input and the output. Supervised learning problems are categorized into *regression* and *classification* problems.

**Unsupervised learning** allows us to approach problems with little or no idea what our results should look like. We can derive structure from data where we don't necessarily know the effect of the variables. We can derive this structure by *clustering* the data based on relationships among the variables in the data. With unsupervised learning there is no feedback based on the prediction results.

### Deep Learning (DL)
Deep learning is a subset of machine learning in which multilayered neural networks learn from vast amounts of data.


### Differentiating Artificial Learning from Machine Learning
|AI|ML|
|---|---|
|The aim is to increase chance of success and not accuracy.|The aim is to increase accuracy, but it does not care about success.|
|AI will go for finding the optimal solution.|ML will go for only solution for that whether it is optimal or not.|
|AI leads to intelligence or wisdom.|ML leads to knowledge.|
|AI manages more comprehensive issues of automating a system. This computerization should be possible by utilizing any field such as image processing, cognitive science, neural systems, machine learning etc.|Machine Learning (ML) manages influencing user’s machine to gain from the external environment. This external environment can be sensors, electronic segments, external storage gadgets and numerous other devices.|
|AI manages the making of machines, frameworks and different gadgets savvy by enabling them to think and do errands as all people generally do.|What ML does, depends on the user input or a query requested by the client, the framework checks whether it is available in the knowledge base or not. If it is available, it will restore the outcome to the user related with that query, however if it isn’t stored initially, the machine will take in the user input and will enhance its knowledge base, to give a better value to the end user.|

Examples of **Artificial Intelligence** include:
* Voice assistants, such as Siri
* Recommendation systems, such as Netflix
* Self-driving cars
* Drones that fly over fields and capture footage used to optimize crop yield
* Google Search
* Surfacing algorithms, such as those employed by Twitter and Facebook, that decide what content to show you in our feed

Examples of **Machine Learning** include:
* Predicting whether a given credit card transaction is fraudulent or not, given transaction details
* Predicting whether an email is spam or not, given the email sender, subject, and body
* Predicting the diagnosis of a particular piece of medical imaging
* Predicting the present and future location of pedestrians, cars, and other stationary/moving objects in a video feed (such as those used by self-driving cars)

### Machine Learning Workflow
The machine learning workflow consists of:

* **Problem statement**: What problem are we trying to solve?
* **Data collection**: What data do you need to solve it?
* **Data exploration and preprocessing**: How should we clean our data so our model can use it?
* **Modeling**: Build a model to solve our problem?
* **Validation**: Did we solve the problem?
* **Decision Making and Deployment**: Communicate to stakeholders or put into production?

### Common Terminologies of data for Machine Learning:
* **target**: category or value we are trying to predict
* **observation**: an example or single data point within the data (usually a point or row in dataset)
* **features**: explanatory variables used for prediction for each observation (usually a column)
* **label**: the value of the target for a single data point (output variable being predicted)
* **algorithms**: computer programs that estimate models based on available data
* **model**: hypothesized relationship between observations and data

### Transforming Data

Models used in Machine Learning Workflows often make assumptions about the data. For example, linear regression model assumes a linear relationship between observations and target (outcome) variables. An example of a linear model relating (feature) variables $x_1$ and $x_2$ with target (label) variable $y$, is:

$$y_\beta(x)=\beta_0+\beta_1x_1+\beta_2x_2$$

where $\beta=(\beta_0,\beta_1,\beta_2)$ represents the model's parameter.

Predictions from linear regression models assume **residuals are normally distributed**. However, the raw data (having features) and predicted data are often **skewed** (distorted away from the center), hence we TRANSFORM the data.

**Log transformations** can be a useful way to find a linear relationship when the underlying raw data may not actually have a linear relationship. 
```python
# Useful transformation functions
from numpy import log,log1p
from scipy.stats import boxcox
```

So the resulting algorithm will still be a linear regression since the outcome is still a linear combinations of the features $y_\beta(x)=\beta_0+\beta_1log(x_1)$ (the features have been transformed). Now the linear regression involves a linear combination of our new features, one of our new features being the log(x) rather than just x.

Similarly, we can estimate higher-order relationships by adding polynomial features to fit a linear model. 
$$y_\beta(x)=\beta_0+\beta_1x+\beta_2x^2$$

Again, we're changing our features, but maintaining a linear model with features being transformed into squared and cubed.

### Variable Selection
This involves choosing the set of features to include in the model. Variables must often be transformed before they can be included in models. In addition to log and polynomial transformations, this can involve:
* **Encoding**: converting non-numeric features to numeric features
* **Scaling**: converting the scale of numeric data so they are comparable.

The appropriate method of encoding and scaling depends on the type of feature. **Feature Encoding** is often applied to categorical features, two primary types are: 
* Nominal: categorical variables takes values in unordered categories (eg. red, blue, green, True, False)
* Ordinal: categorical variables takes values in ordered categories (eg. low, medium, high)

There are several common approaches to encoding variables:
* **Binary Encoding**: converts variable to either 0 or 1 and is suitable for variables that takes two possible values (eg. True, False)
* **One-hot Encoding**: converts variables that takes multiple values into binary (0,1) variables, one for each category. This creates several new variables.
* **Ordinal Encoding**: involves converting ordered categories to numerical values, usually by creating one variable that takes integer equal to the number of categories (eg. 0,1,2,3,...).

**Feature Scaling** involves adjusting a variable's scale. This allows comparison of variables with different scales, as different continuous (numeric) features often have different scales. Some common approaches to scaling features are:
* **Standard Scaling**: converts features to standard normal variables (by subtracting the mean and dividing by the standard error)
* **Min-max Scaling**: converts variables to continuous variables in the (0,1) interval by mapping minimum values to 0 and maximum values to 1. This type of scaling is sensitive to outliers. 
* **Robust Scaling**: is similar to min-max scaling, but instead maps the interquartile range to (0,1). This means the variable itself takes values outside of the (0,1) interval.

```python
# Common variable transformation
from sklearn.preprocessing import StandardScaler, MinMaxScaler, RobustScaler

# Functions for encoding categorical variables
from sklearn.preprocessing import LabelEncoder, LabelBinarizer, OneHotEncoder
from pandas import get_dummies

# Functions for encoding ordinal variables
from sklearn.feature_extraction import DictVectorizer
from sklearn.preprocessing import OrdinalEncoder
```

### Write a difference between statistical tools that are used for prediction and machine learning tools that are used for classification. Examples are required.

* First, statistical models generally try to explain the world, in terms of relation based on causality. On the contrary, machine learning only tries to mimic the work rather than explain it. 
* The statistical modern approach has a very specified model of the world that just needs to be estimated. Relation between observables are deduced in this approach. Different relations between observables are induced in the machine learning approach. This is because it does not have any a priori pre-specified model of the world and instead focuses on the predictive power. 
* Furthermore, statistical models typically deal with small data, with up to hundreds of attributes and up to thousands of examples, typically. On the other side, machine learning sometimes deal with data that might have the number of attributes in hundreds of thousands. And the number of examples in hundreds of millions because of these differences. 
* While scalability is normally not a major concern for statistical model and approaches. It sometimes becomes critical in machine learning applications. 
* Finally, statistical modeling is based on probabilistic approach. Some machine learning methods including support vector machines, neural nets, and some clustering methods are non-probabilistic. 

![ml_vs_stats.png](imgs/ml_vs_stats.png)

**Regression** is the most commonly used machine learning technique. Prediction of stock returns, markets, portfolio, credit losses, and many other problems amount to different sorts of regression. The other major class of supervised learning algorithms, **classification**, is also widely used. For example, it's used for such tasks as loan default model and credit rating predictions, credit card fraud, and anti-money laundering. In unsupervised learning, **clustering** our items sound more or less obvious. For example, such problems say segmentation of stocks, credit card holders, or institutional clients are all classical cases for clustering methods. Representation learning methods such as factor models or **Principal Component Analysis** are machine learning models. Other examples of machine learning approach are to regime change detections and methods to imputation of missing data. 