# Algorithms in Supervised Learning

Supervised learning is the types of machine learning in which machines are trained using well "labelled" training data, and on basis of that data, machines predict the output. The labelled data means some input data is already tagged with the correct output.

We will introduce here different models of supervised Learning that are used for Classification or Regression tasks or both.

## Linear Regression

**<font color='red'>Formal Definition</font>**<br>
Linear Regression is the basic algorithm for regression tasks. 
Given a data $X$ - matrix composed from $n$ features and $m$ observations ,and the target $y$ (vector). The algorithm create a model of linear function as the following according to data:<br>
<h3>$ h_\theta = \theta_0 + \theta_1 *x_1 + \theta_2 *x_2 + ... + \theta_n *x_n$</h3>

The linear Regression find the parameters ($\theta_0$ to $\theta_n$) to create the model that predict the target values.<br>
The algorithm need to find the best model parameters. for that we use the **cost function**.<br> 
The cost function helps us to figure out the best possible values for $\theta_0$ to $\theta_m$ which would provide the best fit line for the data points. Since we want the best values for those parameters, we convert this search problem into a minimization problem where we would like to minimize the error between the predicted value and the actual value(for more information about the cost function you can go to the end of this summary).
![image.png](attachment:image.png)


**<font color='purple'>Alternative Definition</font>**<br>
Linear Regression is a statistical supervised learning technique to predict the quantitative variable by forming a linear relationship with one or more independent features.


**Links and sources:**
1. https://medium.com/analytics-vidhya/understanding-the-linear-regression-808c1f6941c0
2. https://towardsdatascience.com/linear-regression-explained-1b36f97b7572
3. https://towardsdatascience.com/introduction-to-machine-learning-algorithms-linear-regression-14c4e325882a

**<font color='Green'>Code Example</font>**<br>
In the code Example we take data that contain SAT(Scholastic Aptitude Test) and GPA information, we want to check if we can predict the GPA using the SAT. 

More information on Kaggle: https://www.kaggle.com/kerneler/starter-1-01-simple-linear-regression-1dfef774-1

**Import Packages**

In [1]:
import numpy as np
import pandas as pd
import seaborn as sns
import plotly.express as px
from sklearn import preprocessing, svm
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error

**Explore the data**

In [2]:
data = pd.read_csv('Simple linear regression.csv')
data.info()
display(data.head(10))

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 84 entries, 0 to 83
Data columns (total 2 columns):
 #   Column  Non-Null Count  Dtype  
---  ------  --------------  -----  
 0   SAT     84 non-null     int64  
 1   GPA     84 non-null     float64
dtypes: float64(1), int64(1)
memory usage: 1.4 KB


Unnamed: 0,SAT,GPA
0,1714,2.4
1,1664,2.52
2,1760,2.54
3,1685,2.74
4,1693,2.83
5,1670,2.91
6,1764,3.0
7,1764,3.0
8,1792,3.01
9,1850,3.01


**Create a Linear Regression Model**

In [8]:
# split the data to features and target
features = data.drop('SAT',axis=1)
target = data['GPA']

# split the data to training set and test set
features_train, features_test, target_train, target_test = train_test_split(features, target, test_size = 0.3)

# create and train the model
model = LinearRegression()
model.fit(features_train,target_train)

# predict values using the test features
predictions = model.predict(features_test)

# check the mean square error
print(mean_squared_error(predictions,target_test))
fig = px.scatter()

0.0


## K-Nearest Neighbors

K-Nearest Neighbors is a not linear learning algorithm that is used both for classification and regression tasks. this algorithm differs from other learning algorithms because is not creating a model and base the decision according to all the data. <br>
when we get a new observation we take the k(number) nearest neighbors(with similer features values).<br>

![image.png](attachment:image.png)
**Source** : https://www.analyticsvidhya.com/blog/2018/08/k-nearest-neighbor-introduction-regression-python/

We calculate the distance between 2 different observations $x$ and $x'$ that contain $n$ features with the Euclidean metric(there also other ways to measue the distance like manhatten distance):<br>
<h3>$d(x,x')=\sqrt{(x_1-x'_1)^2 + (x_2-x'_2)^2 + ... + (x_n-x'_n)^2}$</h3><br>
After saving the k points with the lowest distance from new observation in group called $A$. we should give a value/classify the observation in one of the following ways:<br>
<font color="purple"><b>Classification</b></font> - the observation will be classified as the class that more have more frequency (the highest probability)in the neighbors group. <br>
<font color="purple"><b>Regression</b></font> - the observation will be get a value according the following equation:<br>
     <h3>$\hat{y} = \frac{1}{K} \sum \limits _{j \in A} y^{(j)} $</h3>
     when: $y^{(j)}$ is the value of a certin neighbor(observtion) in $A$.<br>
In other way the prediction equal to the mean value of neighbors

**<font color='Green'>Code Example</font>**<br>

## Logistic Regression

Logistic Regression is used for classification tasks. This algorithm in his basic version used 2 steps:<br> 
Firstable we use the linear regression:<br>
<h3>$ z = \theta_0 + \theta_1 *x_1 + \theta_2 *x_2 + ... + \theta_n *x_n$</h3>

After calculating $z$ we use a function called sigmoid to create the model $h_\theta$:<br> 
<h3>$h_\theta = g(z)=\frac{1}{1+e^{-z}}$</h3>

![image-2.png](attachment:image-2.png)
**source:** https://medium.com/@toprak.mhmt/activation-functions-for-deep-learning-13d8b9b20e

We will repeat the process of using those 2 equation until we find $\theta_0$ to $\theta_n$ that the cost for them are small(using the cost fuction i'll talk about later). after we found them, we found our model. $\theta_0$ to $\theta_n$ are call weights because they influence how much influation a feature have on the prediction. 

**Logistic Regression Flow**
![image.png](attachment:image.png)
**source and other information:** https://medium.com/analytics-vidhya/logistic-regression-b35d2801a29c

The model predict an probability according the features values(value between 0 to 1) with the sigmoid function. if the value is close to 1 we will classify the observation as 1, if is more close to 0 we will predict 0.

## Ridge Regression

## Lasso Regression

Lasso regression is a type of linear regression that uses **<font color="orange">shrinkage</font>**.<br> 
Shrinkage is where data values are shrunk(concertrate) towards a central point, like the mean.

## Elastic Net

## Decision Tree

## Random Forest

# Cost Function

# Model Evaluation 