# Machine Learning Algorithms

Examples of machine learning algorithms based on this Medium post: https://towardsdatascience.com/10-machine-learning-algorithms-you-need-to-know-77fb0055fe0

## Imports

In [11]:
from sklearn import linear_model, svm, datasets
from sklearn.neighbors import KNeighborsClassifier

## Data Preparation

In [7]:
# Load the dataset
digits = datasets.load_digits()

## Data Manipulation

### Linear Regression

This algorithm fits a line to describe the model.

In [8]:
# Create the Linear Regression model
clf = linear_model.LinearRegression()

# Set training set
x, y = digits.data[:-1], digits.target[:-1]

# Train model
clf.fit(x, y)

# Predict
y_pred = clf.predict([digits.data[-1]])
y_true = digits.target[-1]

print(y_pred)
print(y_true)

[ 8.86342983]
8


### Support Vector Machine (SVM)

This algorithm fits a line to classify (clusterize) groups of data.

In [10]:
# Create the support vector classifier
clf = svm.SVC(gamma=0.001, C=100)

# Set training set
x, y = digits.data[:-1], digits.target[:-1]

# Train model
clf.fit(x, y)

# Predict
y_pred = clf.predict([digits.data[-1]])
y_true = digits.target[-1]

print(y_pred)
print(y_true)

[8]
8


### K-Nearest Neighbors (KNN)

This algorithm categorize unknown data points based on the distance of its neighbors (the closest datapoints). It uses distance functions such as Euclidean.

In [12]:
# Create the KNeighborsClassifier
clf = KNeighborsClassifier(n_neighbors=6)

# Set training set
x, y = digits.data[:-1], digits.target[:-1]

# Train the model
clf.fit(x, y)

# Predict
y_pred = clf.predict([digits.data[-1]])
y_true = digits.target[-1]

print(y_pred)
print(y_true)

[8]
8


### Logistic Regression

This algorithm tries to predict a discrete outcome from given events by classifying data into groups. It uses functions such as Sigmoid to categorize data points.

In [13]:
# Implement Logistic Regression

### Decision Tree

This algorithm categorize data based on choosen independent variables.

In [14]:
# Implement Decision Tree

### K-Means

This algoritm categorize data based on a number k of centroids. Data points are grouped based on its proximity with one of the centroids. New centroids are created based on the formed groups.

In [15]:
# Implement K-Means

### Random Forest

This algorithm combines a series of Decision Trees. Each tree estimate a classification and the final result is voted until the final prediction is chosen.

In [16]:
# Implement Random Forest

### Naive Bayes

This algorithm classifies data and assume that all of the features are independent from each other. It uses the Bayes Theorem.

In [17]:
# Implement Naive Bayes

### Dimensional Reduction Algorithms

These algorithms try to reduce the number of features to be used on models based on its importance to describe some dependent outcome. It uses other algorithms such as Random Forest or Decision Trees.

In [18]:
# Implement Dimensional Reduction Algorithms

### Gradient Boosting Algorithms

These algorithms combine weaker algorithms to create more accurate ones. Two of these are XGBoost, which uses linear and tree algorithms, and LightGBM, which uses tree based algorithms.

In [19]:
# Implement Gradient Boosting Algorithms