# Unsupervised Learning

## Introduction to Unsupervised Learning:
Unsupervised learning is a branch of machine learning where the goal is to discover patterns, structures, or relationships in the data without any labeled target outputs. Unlike supervised learning, where the model learns from labeled data, unsupervised learning focuses on working with unlabeled data to find inherent structures within the data.

## Clustering: Grouping Similar Data Points:
**Clustering** is a common unsupervised learning technique used to group similar data points together based on certain features or characteristics. The goal is to partition the data into clusters where data points within the same cluster are more similar to each other compared to points in other clusters.

In [None]:
import pandas as pd
from sklearn.cluster import KMeans
from sklearn.preprocessing import StandardScaler
import matplotlib.pyplot as plt

# Load the dataset
data = pd.read_csv("data/classification_data.csv")

In [None]:
# Process the data
scaler = StandardScaler()
scaled_data = scaler.fit_transform(data)

In [None]:
# Perform K_Means Clustering
kmeans = KMeans(n_clusters= 3, random_state=42)
data['cluster'] = kmeans.fit_predict(scaled_data)

In [None]:
# Vizualize the clusters in 2D plot
plt.scatter(data['feature1'], data['feature2'], data['cluster'], cmap='rainbow')
plt.xlabel('Feature 1')
plt.ylabel('Feature 2')
plt.title('K-Means Clustering')
plt.show()

## Dimensionality Reduction : Reducing Features While Retaining Information:
Dimensionality reduction is another crucial unsupervised learning technique that aims to reduce the number of features (dimensions) in a dataset while preserving its essential information. This is particularly useful for simplifying complex datasets, speeding up computations, and avoiding overfitting.

In [None]:
from sklearn.decomposition import PCA

In [None]:
# Standardize the data
from sklearn.preprocessing import StandardScaler

scaler = StandardScaler()
data_scaled = scaler.fit_transform(data)

In [None]:
# Appply PCA for dimnesionality reduction
pca = PCA(n_components=2)
data_pca = pca.fit_transform(data_scaled)

In [None]:
# Vizualize the reduce dimensional data
plt.scatter(data[:,0], data_pca[0:1])
plt.xlabel('Principal Component 1')
plt.ylabel('Principal Component 2')
plt.title('Principla Dimensionality Reduction')
plt.show()