## What is Machine Learning and Supervised Learning ?
### =====================================================================================

## Machine Learning
- Machine learning is teaching computers how to learn from data to make decisions or predictions without human intervention or explicit programming. 
- The computer learns to `identify patterns` without being explicitly programmed to execute a sequence of instructions.

### Scikit-Learn
- Scikit-Learn, also known as `sklearn`, is Python’s general-purpose machine learning library. 
- Scikit-Learn’s versatility makes it the best option for implementing Machine Learning problems.
- It’s the best library for beginners, as it offers `high-level interface` for many tasks.
- This allows us to better practice entire machine learning workflow to understand the big picture.
### -----------------------------------------------------------------------------------------------------------------------------------------------------

## Supervised learning
- Supervised Learning is machine learning task of learning a function that `maps an input` to an `output` based on sample `input-output` pairs.
- It infers a function from labeled training data consisting of a set of training examples.
- Each example is a pair consisting of an input object and a desired output value. 
- Supervised learning algorithm analyzes training data and produces an inferred function, which can be used for mapping new examples.

### Linear Regression & Logistic Regression 
There are two popular types of basic machine learning algorithms in Supervised Learning: 
- `Linear Regression` 
- `Logistic Regression`
### -----------------------------------------------------------------------------------------------------------------------------------------------------

### What is Linear Regression ? 
**Regression predicts `CONTINUOUS` value outputs while Classification predicts `DISCRETE` outputs.** 
- Predicting price of a car in dollars is a regression problem.
- Predicting whether email received is valid or spam is a classification problem.

#### Linear Regression Theory

- Term ` “Linearity” ` in algebra refers to linear relationship between two or more variables. 
- If we draw this relationship between two variables, we get a straight line.
- `Regression technique` finds out `linear` relationship between `x (input) and y(output)`, hence, the name Linear Regression. 
- It performs task to predict a dependent variable value (y) based on a given independent variable (x).

**If we plot `independent variable (x)` on x-axis and `dependent variable (y)` on y-axis,<br>
linear regression gives us a straight line that best fits the data points.**

We know that the equation of a straight line is plotted as below:

![linreg1.png](attachment:linreg1.png)

### Math behind Linear Regression:
If y denotes dependent variable which we want to predict,and x denotes independent variable which is used to predict y,<br> 
the mathematical relationship between them can be written as below equation, which is a straight line. 

![ymxc.png](attachment:ymxc.png)

Where `c` is the intercept and `m` is the slope of the line. 

So linear regression algorithm gives us most optimal value for intercept and the slope (in two dimensions).<br> 
The y and x variables remain the same, since they are data features and cannot be changed.<br> 
The values that we can control are intercept(c) and slope(m).<br> 
There can be multiple straight lines depending upon values of intercept and slope.<br> 

`What linear regression algorithm does is fits multiple lines on data points and returns the line that results in least error`.

When we have `n independent variables`, the equation can be written as

![ymxcn.png](attachment:ymxcn.png)
 
here c denotes y-intercept (point where line cuts the y-axis)and m denotes slope of independent variable x. 

![slope.png](attachment:slope.png)


## y = mx + c

- Slope m = 2/3 
- y Intercept c = 1
- We get value m by calculating slope (2/3) and c as 1 (it cuts the y-axis at 1).

and linear equation representing this is:

## y = (2/3)x + 1

Above graph shows a line whose equation is `y = (2/3)x + 1`. 

If we have an equation having dependent variable and independent variables,<br> 
we can predict dependent variable by substituting values for independent variables.

Our objective is to find values of `m` and `c` that minimize difference between actual and predicted values - yₐ (actual) and yᵢ (predicted).

Once we get best values of the two parameters, we will have line of best fit that can be used to predict values of y, based on value of x.

To minimize the difference between yₐ and yᵢ , we use the method of Least Square Method.

### Least Square Method
- Least Square method `helps in finding the line of best fit` . 
- Values of m (slope) and c (intercept) are found by keeping<br> `sum of squared difference between yₐ (actual) and yᵢ (predicted) minimized`.
![lsm.png](attachment:lsm.png)

- `Least Square method` helps us in finding the line of best fit. 
- The values of `m (slope)` and `c (intercept)` are found by keeping<br> `sum of squared difference between yₐ (actual) and yᵢ (predicted) minimized`.

### Variance
- Variance in linear regression, is `measure of how far observed values differ from average of predicted values`.
- It is the `difference from mean of predicted values`, goal is to have a low Variance value.
- How low is quantified by the `R2 score`.

### R2 Score
- r2 score—varies between 0 and 100%. 
- So if value is `100%, two variables are perfectly correlated`, i.e., with no variance at all. 
- A `low value` would show a `low level of correlation`, meaning a regression model is not valid enough.
- r2 score is closely related to the MSE. 

### Mean Square Error(MSE)
- Mean Square Error (MSE) - is average of square of the errors. 
- The `larger the number the larger the error`. 

- Error refers to difference between observed values y1, y2, y3 and corresponding predicted values - pred(y1), pred(y2), pred(y3).
- Differences are squared, so that negative and positive values do not cancel each other out.

### ---------------------------------------------------------------------------------------------------------------------------------------------------------------

### What is Logistic Regression ?

Logistic regression is a `classification problem` of identifying to which set of categories a new observation belongs to based on training data.

A real life example of classification problem would be:
- Categorize `email as spam or genuine` 
- Categorize `tumor as malignant or benign` 
- Categorize `transaction as fraudulent or genuine` 

Above problems' answers are in `categorical form i.e. Yes or No`, hence they are two class classification problems.

It is a special case of linear regression where the target variable is categorical in nature.

### Classification

- Classification techniques are an essential part of machine learning and data mining applications. 
- Approximately `70% of problems in Data Science are classification problems`. 
- Logistics regression is the most common and useful regression method for solving binary classification problem. 

### Linear Regression Vs. Logistic Regression
- `Linear` regression gives a `continuous output`, but `logistic` regression provides a `constant output`. 
- An example of continuous output is house price and stock price. 
- Example of discrete output is predicting whether patient has cancer or not. 

- Linear regression is estimated using `Ordinary Least Squares (OLS)` 
- Logistic regression is estimated using `Maximum Likelihood Estimation (MLE)` approach.

![linear_vs_logistic_regression.png](attachment:linear_vs_logistic_regression.png)

### Sigmoid Function

The sigmoid function, also called logistic function gives an ‘S’ shaped curve that can take any real-valued number and map it into a value between 0 and 1. 
If the curve goes to positive infinity, y predicted will become 1, and if the curve goes to negative infinity, y predicted will become 0. 
If the output of the sigmoid function is more than 0.5, we can classify the outcome as 1 or YES, and if it is less than 0.5, we can classify it as 0 or NO. 

`For example:` 

If the output is 0.75, we can say in terms of probability as: 

There is a 75 percent chance that patient will suffer from cancer.

![sigmoid.png](attachment:sigmoid.png)

![sigmoid2.png](attachment:sigmoid2.png)

### Types of Logistic Regression:

`Binary Logistic Regression:`<br>
Target variable has only two possible outcomes such as Spam or Not Spam, Cancer or No Cancer.
    
`Multinomial Logistic Regression:` <br>
Target variable has three or more nominal categories such as predicting the type of Cheese.
    
`Ordinal Logistic Regression:` <br>
Target variable has three or more ordinal categories such as Amazon product rating from 1 to 5.

### ---------------------------------------------------------------------------------------------------------------------------------------------------------------

## What is Overfitting and Underfitting in Machine Learning ?

Machine Learning models have one sole purpose; to generalize well. 

`Generalization is the model’s ability to give sensible outputs to sets of input that it has never seen before.`

### Overfitting and Underfitting 
Overfitting and Underfitting refer to deficiencies that the model’s performance might suffer from.<br><br>
**A model that generalizes well is a model that is neither underfit nor overfit.**

### Sample Dataset

![model.png](attachment:model.png)

### Overfitting the Machine Learning Model
![overfit.png](attachment:overfit.png)

### Underfitting the Machine Learning Model
![underfit.png](attachment:underfit.png)

### Generalizing the Machine Learning Model
![le.png](attachment:le.png)