# Explanation
*The purpose of this file is to represent the research I've learnt on the inner workings of support vector machines. This will include what it is, how it works, how to optimize it, different functions, etc. The aim of this to gain a deep confident grasp of SVM's. This research will be related/connect to this titanic model I'm making.*

# Table of Contents:
- [Classification VS Regression]("ClassVSReg")
- [What is a Support Vector Machine?]("svm-intro")
- [SVM Basic Code Explanation]("svm-intro-code")

<a name="ClassVSReg"></a>
# Classification VS Regression

*Classification and regression are both types of supervised learning used in machine learning, where the goal is to train a model on labeled data to make predictions. The key difference lies in the type of output they produce: classification predicts discrete categories or class labels (such as 'spam' or 'not spam'), while regression predicts continuous numerical values (such as house prices or temperatures). Despite this difference, both approaches follow the same underlying process - learning a mapping from input features to an output - and use similar algorithms that are adapted to suit either categorical or continuous prediction tasks.*

<p align="center">
  <img src="classification.png" alt="Classification" width="300"/>
  <img src="regression.png" alt="Regression" width="300"/>
</p>

### Classification:
Example:
- A list of students in a class can be categorised by gender.
- A dataset of images of hand-drawn numbers 0-9 can be classified into type integers.

Common Usage:
- Medical diagnostics
- Identifying spam vs non-spam
- Identifying whether a file is malicous
- Image recognistion

### Regression:
Example:
- Predicting the price of a house based on its features (size, location, number of rooms).
- Estimating a person's weight based on height and age.

Common Usage:
- Forecasting stock prices or sales revenue
- Predicting temperature or rainfall levels
- Estimating delivery time or traffic flow
- Modeling population growth over time


<a name="svm-intro"></a>
# What is a Support Vector Machine?

*A Support Vector Machine (SVM) is a supervised machine learning algorithm used for both classification and regression tasks. In this project, the focus will be on **classification**, not regression.*

### How It Works:

SVM works by plotting data points in a multi-dimensional space, where each feature represents one axis (or dimension).  
- For example: a dataset with two features like **height** and **weight** can be visualized in a 2D space.  
- A dataset with three features would be 3D, and so on.

The SVM algorithm then attempts to find the **best hyperplane** that separates the different classes of data. A hyperplane is a decision boundary:
- In **2D**, it’s a line (1D hyperplane).
- In **3D**, it’s a plane (2D hyperplane).
- In general, the hyperplane is always **one dimension less** than the feature space.

The key question is: **where should the hyperplane be placed?**

SVM answers this by choosing the hyperplane that **maximizes the margin** — the distance between the hyperplane and the **nearest data points** from each class (called **support vectors**). This helps improve generalization to new data and makes SVM robust. The margin would be the distance between the hyperplane and the dotted line below.
<p align="center">
  <img src="margin.png" alt="margin" width="300"/>
</p>

### Types of Margins
*Hard Margin: No misclassifications allowed; assumes data is perfectly separable*

*Soft Margin: Allows some misclassification; better for real-world noisy data*


<a name="svm-intro-code"></a>
# SVM Basic Code Explanation:

The first bit of every AI model consists of data loading and data exploration. This can easily be done with the use of python modules such as- Pandas, matplotlib, numpy and seaborn.

Pandas is a fasts and power tool to help analyses data and is my prefered way to load data from a csv file. Pandas can be easily installed from the console through the Python Package Installer (pip):

pip install pandas

To import the pandas library and load the data we can use these lines of code:

In [None]:
import pandas
df = pandas.read_csv('titanic/train.csv') #Note that is pandas funciton to load csv files.

We can analyse this dataframe (df) with pandas functions such as head(), describe() info().

**head()** shows the first 5 (defualt) rows of the dataframe, head(x) shows the first x rows of the dataframe. 

**describe()** shows each column's count(cnt), mean, standard deviation (std), minimum (min), 25th percentile (25%). 50th percentile (50%), 75th percentile (75%), and maximum (max).

**info()** shows each column's index (#), Column label, number of non-null values, and data type (Dtype). It also states the number of rows and columns.

In [None]:
df.head()
df.describe()
df.info()