# Module 1: Introduction to Scikit-Learn

## Section 4: Unsupervised Learning Algorithms

### Part 1: One-Class SVM

In this part, we will explore One-Class Support Vector Machines (One-Class SVM), a popular algorithm used for outlier detection and novelty detection. One-Class SVM is particularly useful when dealing with imbalanced datasets or detecting anomalies in unlabeled data. Let's dive in!

### 1.1 Understanding One-Class SVM

One-Class SVM is a variant of Support Vector Machines (SVM) that learns a boundary around the majority of the data points, considering them as "inliers," while identifying outliers as data points lying outside the boundary. One-Class SVM is an unsupervised algorithm that does not require any labeled data for training.

The key idea behind One-Class SVM is to find a hyperplane that captures the majority of the data points in a high-dimensional feature space. This hyperplane is defined as the decision boundary, and data points falling on the outer side of the boundary are considered outliers.

### 1.2 Training and Evaluation

To apply One-Class SVM, we need a dataset consisting of only the majority class or normal instances. The algorithm learns the support vectors that define the decision boundary. During training, One-Class SVM adjusts the parameters to maximize the margin around the majority of the data points, capturing the normal data distribution.

Once trained, we can use the One-Class SVM model to predict the outliers or detect anomalies in new, unseen data points. The model assigns a score to each data point, indicating its proximity to the decision boundary. Data points with a score higher than a certain threshold are classified as outliers.

Scikit-Learn provides the OneClassSVM class for performing One-Class SVM. Here's an example of how to use it:

```python
from sklearn.svm import OneClassSVM

# Create an instance of the OneClassSVM model
nu = 0.1  # Contamination parameter (proportion of outliers)
one_class_svm = OneClassSVM(nu=nu)

# Fit the model to the data
one_class_svm.fit(X)

# Predict outliers on new, unseen data points
y_pred = one_class_svm.predict(X_test)

# Evaluate the model's performance (if applicable)
# - One-Class SVM is an unsupervised technique, and evaluation depends on the specific task and dataset
```

### 1.3 Choosing Parameters

One-Class SVM has several important parameters that need to be set appropriately. The nu parameter determines the proportion of outliers in the data, and it needs to be set based on prior knowledge or estimated from the dataset.

### 1.4 Handling Imbalanced Datasets

One-Class SVM is particularly useful when dealing with imbalanced datasets, where the majority class dominates the data. It allows us to focus on detecting the outliers or anomalies, even in the absence of labeled outlier data.

### 1.5 Applications of One-Class SVM

One-Class SVM has various applications, including:

- Anomaly detection: One-Class SVM can be used to detect outliers or anomalies in unlabeled data.
- Novelty detection: One-Class SVM can identify novel instances or new patterns in data.
- Fraud detection: One-Class SVM can be used to detect fraudulent transactions or activities.

### 1.6 Summary

One-Class Support Vector Machines (One-Class SVM) is a powerful algorithm for outlier detection and novelty detection. It identifies outliers by learning a decision boundary around the majority of the data points. Scikit-Learn provides the necessary classes to implement One-Class SVM easily. Understanding the concepts, training, and parameter tuning is crucial for effectively using One-Class SVM in practice.

In the next part, we will explore other algorithms for unsupervised learning.

Feel free to practice implementing One-Class SVM using Scikit-Learn. Experiment with different contamination parameters, threshold values, and evaluation techniques to gain a deeper understanding of the algorithm and its performance.