# List of Common Machine Learning Algorithms

Here is the list of commonly used machine learning algorithms. These algorithms can be applied to almost any data problem:

* Linear Regression
* Logistic Regression
* Decision Tree
* SVM
* Naive Bayes
* kNN
* K-Means
* Random Forest
* Dimensionality Reduction Algorithms
* Gradient Boosting algorithms
* GBM
* XGBoost
* LightGBM
* CatBoost

# Linear Regression

It is used to estimate real values (cost of houses, number of calls, total sales etc.) based on continuous variable(s). Here, we establish relationship between independent and dependent variables by fitting a best line. This best fit line is known as regression line and represented by a linear equation Y= a *X + b.

The best way to understand linear regression is to relive this experience of childhood. Let us say, you ask a child in fifth grade to arrange people in his class by increasing order of weight, without asking them their weights! What do you think the child will do? He / she would likely look (visually analyze) at the height and build of people and arrange them using a combination of these visible parameters. This is linear regression in real life! The child has actually figured out that height and build would be correlated to the weight by a relationship, which looks like the equation above.

In this equation:
* Y – Dependent Variable
* a – Slope
* X – Independent variable
* b – Intercept

![image.png](attachment:image.png)

Linear Regression is of mainly two types: Simple Linear Regression and Multiple Linear Regression. Simple Linear Regression is characterized by one independent variable. And, Multiple Linear Regression(as the name suggests) is characterized by multiple (more than 1) independent variables. While finding best fit line, you can fit a polynomial or curvilinear regression. And these are known as polynomial or curvilinear regression.

In [96]:
#Import Library
#Import other necessary libraries like pandas, numpy...
from sklearn import linear_model
import pandas as pd
import numpy as np
import math
import sklearn.metrics as metrics

#Load Train and Test datasets
data=pd.read_csv('./data/ozone.csv')
print(data.head())

#drop rows with na
#data = data.dropna(axis=0, how='any') #how='all'

#Data fill NaN with average
for column in data:
    average = data[column].mean()
    median = data[column].median()
    data  = data.fillna(average)


#Shuffle
data = data.sample(frac=1)
data_shape = data.shape
data_rows = data_shape[0]
partition_value = 0.7

train_data = data.head(math.ceil(data_rows*partition_value))
test_data = data.tail(math.trunc(data_rows*(1-partition_value)))
#print(train_data)
#print(test_data)

x_train = train_data.drop(['Ozone'], axis=1)
y_train = train_data['Ozone']

x_test = test_data.drop(['Ozone'], axis=1)
y_test = test_data['Ozone']


linear = linear_model.LinearRegression()




# Train the model using the training sets and check score
linear.fit(x_train, y_train)
linear.score(x_train, y_train)

#Equation coefficient and Intercept
print('Coefficient: \n', linear.coef_)
print('Intercept: \n', linear.intercept_)

#Predict Output
predicted= linear.predict(x_test)

mean_absolute_error = metrics.mean_absolute_error(y_test.tolist(),predicted)


print(mean_absolute_error)

   Ozone  Solar.R  Wind  Temp  Month  Day
0   41.0    190.0   7.4    67      5    1
1   36.0    118.0   8.0    72      5    2
2   12.0    149.0  12.6    74      5    3
3   18.0    313.0  11.5    62      5    4
4    NaN      NaN  14.3    56      5    5
Coefficient: 
 [ 0.05790517 -3.13945801  1.1513679  -1.22999843  0.12200915]
Intercept: 
 -19.442921359136086
14.126180283269726
