# 6. Unsupervised Learning

### Clustering

![Image of Runcode](https://static.javatpoint.com/tutorial/machine-learning/images/clustering-in-machine-learning2.png)

<b>Clustering</b> is a machine learning technique that involves dividing a dataset into groups (also known as clusters) based on the patterns within the data. The goal of clustering is to split the data into groups such that the data within each group is more similar to each other than to data in other groups. This technique is useful for a variety of purposes, such as data compression, anomaly detection, and summarizing data.

There are many different clustering algorithms available, and the choice of which one to use depends on the characteristics of the data and the goals of the analysis. Some common types of clustering algorithms include:

1. K-Means Clustering: This is one of the most popular clustering algorithms, and it works by iteratively dividing the data into K clusters based on the distance between data points and the centroid (mean) of each cluster.

2. Hierarchical Clustering: This type of clustering algorithm builds a hierarchy of clusters, where each cluster is nested within another cluster. There are two main types of hierarchical clustering: Agglomerative and Divisive. Agglomerative clustering starts with each data point as its own cluster, and then merges the closest pairs of clusters until there is only one cluster left. Divisive clustering, on the other hand, starts with all data points in one cluster and then splits the cluster into smaller and smaller clusters.

3. DBSCAN: This stands for Density-Based Spatial Clustering of Applications with Noise. It is a density-based clustering algorithm that works by identifying "dense" clusters of points in the data and marking points that are not part of a dense cluster as noise.

### 6.1 K-Means clustering

<b>K-Means clustering</b> is used to divide a dataset into K clusters based on the similarity of the data points. The goal of the algorithm is to minimize the within-cluster sum of squares, which is the sum of the squared distances between the data points and the centroid (mean) of their cluster.

The algorithm works by starting with a set of K randomly chosen initial centroids, and then iteratively assigning each data point to the closest centroid and updating the centroids to the mean of the points assigned to it. This process continues until the centroids stop changing or a predetermined number of iterations is reached.

Here is an example of K-Means clustering in Python:

### 6.2 Hierarchical clustering

<b>Hierarchical clustering</b> is used to group data points into a hierarchy of clusters. There are two main types of hierarchical clustering: Agglomerative and Divisive.

* Agglomerative Clustering: This type of hierarchical clustering starts with each data point as its own cluster, and then iteratively merges the closest pairs of clusters until there is only one cluster left. The main advantage of agglomerative clustering is that it is relatively simple to implement and understand.

* Divisive Clustering: This type of hierarchical clustering starts with all data points in one cluster and then iteratively splits the cluster into smaller and smaller clusters until each data point is in its own cluster. Divisive clustering can be more computationally expensive than agglomerative clustering, but it can be useful in certain situations.

Here is an example of agglomerative clustering in Python using the sklearn library:

### 6.3 DBSCAN 

<b>DBSCAN</b> (Density-Based Spatial Clustering of Applications with Noise) is a density-based clustering algorithm that is used to identify "dense" clusters of points in a dataset and mark points that are not part of a dense cluster as noise. It works by identifying points that have a high number of nearby points (which are considered part of the same cluster) and expanding the cluster to include all points that are reachable from those points, as long as they meet a minimum density threshold.

One of the main advantages of DBSCAN is that it does not require the user to specify the number of clusters in advance. Instead, it automatically detects the number of clusters based on the density of the data.

Here is an example of DBSCAN in Python using the sklearn library:

### 6.4 Dimensionality Reduction

<b>Dimensionality Reduction</b> is a technique used to reduce the number of features (dimensions) in a dataset while retaining as much of the information as possible. It is often used in conjunction with clustering algorithms to make the clustering process more efficient and to improve the interpretability of the results.

There are many different dimensionality reduction techniques available, including Principal Component Analysis (PCA), Singular Value Decomposition (SVD), and t-SNE (t-Distributed Stochastic Neighbor Embedding).

<b>Principal Component Analysis (PCA)</b>:

Principal Component Analysis (PCA) is a popular dimensionality reduction technique that is used to project high-dimensional data onto a lower-dimensional space while retaining as much of the original variance as possible. It does this by finding the directions in which the data varies the most and using these directions as the new axes of the lower-dimensional space.

There are several techniques for performing PCA, including:

1. Standard PCA: This is the most common technique for performing PCA. It involves calculating the covariance matrix of the data, finding the eigenvectors and eigenvalues of the covariance matrix, and then selecting the top k eigenvectors (where k is the number of dimensions in the lower-dimensional space) to form the projection matrix.

2. Kernel PCA: This technique is used when the data is not linearly separable and cannot be projected onto a lower-dimensional space using standard PCA. It involves using a kernel function to transform the data into a higher-dimensional space where it becomes linearly separable, and then applying standard PCA to this transformed data.

3. Incremental PCA: This technique is used when the data is too large to fit in memory, and it allows you to perform PCA in a streaming fashion by processing the data in small batches.

<b>Steps involved in PCA are:</b>

1. Standardize the data: PCA is sensitive to the scaling of the data, so it is important to standardize the data before applying PCA. This can be done by subtracting the mean from each feature and dividing by the standard deviation.

2. Calculate the covariance matrix: The next step is to calculate the covariance matrix of the standardized data. The covariance matrix is a square matrix that gives the covariance between all pairs of features in the data.

3. Calculate the eigenvectors and eigenvalues of the covariance matrix: The eigenvectors and eigenvalues of the covariance matrix are used to project the data onto the lower-dimensional space. The eigenvectors are the directions in which the data varies the most, and the eigenvalues are the magnitudes of the variations along these directions.

4. Sort the eigenvectors by the eigenvalues: The eigenvectors should be sorted in decreasing order of the eigenvalues, as the eigenvectors corresponding to the largest eigenvalues are the ones that capture the most variance in the data.

5. Select the top k eigenvectors: The top k eigenvectors (where k is the number of dimensions in the lower-dimensional space) are used to form the projection matrix.

6. Project the data onto the lower-dimensional space: The projection matrix is used to transform the data onto the lower-dimensional space.

Here's a simple example of how standard PCA can be implemented in Python using the popular scikit-learn library:

In the above example, X is the high-dimensional data and X_transformed is the transformed data in the lower-dimensional space. The fit method calculates the eigenvectors and eigenvalues of the data and the transform method projects the data onto the lower-dimensional space using these eigenvectors.