**DBSCAN**

**Full Form:**
DBSCAN = Density-Based Spatial Clustering of Applications with Noise

**Definition:**
DBSCAN is an unsupervised clustering algorithm that creates clusters based on data density and identifies outliers automatically.

**Core Idea:**
Points that are close to each other form a cluster. Points that are isolated are treated as noise. The number of clusters does not need to be defined in advance.

**Main Parameters:**

* **eps (Îµ):** maximum distance to consider two points as neighbors
* **min_samples:** minimum number of points required to form a dense region

**Types of Points:**

* **Core point:** has enough neighboring points
* **Border point:** close to a core point but not dense itself
* **Noise point:** isolated outlier

**Advantages:**

* Automatically finds number of clusters
* Detects outliers well
* Works with non-spherical cluster shapes

**Limitation:**

* Sensitive to eps value
* Not suitable for datasets with varying densities

**One-line interview answer:**
DBSCAN is a density-based clustering algorithm that groups dense points and marks isolated points as noise.


In [None]:
from sklearn.datasets import load_iris
from sklearn.cluster import DBSCAN
from sklearn.preprocessing import StandardScaler
import matplotlib.pyplot as plt


# load dataset
iris = load_iris()
X = iris.data

# Standardize the data
scaler = StandardScaler()
X_scaler = scaler.fit_transform(X)

# Create the DBSCAN object
dbscan = DBSCAN(eps=0.5, min_samples=5)

# fit the model
dbscan.fit(X_scaler)

# plot the data with color-coded clusters
plt.scatter(X[:, 0], X[:, 2], c=dbscan.labels_)
plt.xlabel(iris.feature_names[0])
plt.ylabel(iris.feature_names[2])
plt.title('DBSCAN Clustering')
plt.show()

# **Hyperparameters for DBMS**

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn.cluster import DBSCAN
from sklearn.neighbors import NearestNeighbors
from sklearn.preprocessing import StandardScaler
import seaborn as sns

# load iris data
df = sns.load_dataset('iris')

# Normalize the feature
feature = df[['sepal_length', 'sepal_width', 'petal_length', 'petal_width']].values
feature = StandardScaler().fit_transform(feature)

# find the nearest neighbors
neigh = NearestNeighbors(n_neighbors=5)
nbrs = neigh.fit(feature)
distances, indices = nbrs.kneighbors(feature)

# sort the distance
distances = np.sort(distances, axis=0)
distances = distances[:,1]

# plot the k-distance
plt.plot(distances)
plt.xlabel('Points')
plt.ylabel('Distance')
plt.title('K-distance Plot')
plt.show()