**DBSCAN**

Density-Based Spatial Clustering of Applications with Noise (DBSCAN) locates regions of high density that
are separated from one another by regions of low density. If there are at least min_samples many data points 
within a distance of eps to a given data point, that data point is classified as a core sample
core samples that are closer to each other than the distance eps are put into the same cluster by DBSCAN.\
*Density at a point P*: Number of points within a circle of Radius *eps (ϵ)* from point P.\
*Dense Region*: For each point in the cluster, the circle with radius ϵ contains at least minimum number of points *min_samples*.

<div>
<img src="dbscan.png" width="500"/>
</div>

### Clustering Metrics

Reference: https://gemfury.com/stream/python:scikit-learn/-/content/metrics/cluster/supervised.py

**homogeneity_score**:
A clustering result satisfies homogeneity if all of its clusters contain only data points which are members of a single class.
- Perfect labelings are homogeneous
```python
homogeneity_score([0, 0, 1, 1], [1, 1, 0, 0]) --> 1.0
```
- Non-perfect labelings that further split classes into more clusters can be perfectly homogeneous
```python
homogeneity_score([0, 0, 1, 1], [0, 1, 2, 3]) --> 1.000000
```
- Clusters that include samples from different classes do not make for an homogeneous labeling
```python
homogeneity_score([0, 0, 1, 1], [0, 1, 0, 1]) --> 0.0...
```
***

**completeness_score**:
A clustering result satisfies completeness if all the data points that are members of a given class are elements of the same cluster.
- Perfect labelings are complete
- Non-perfect labelings that assign all classes members to the same clusters are still complete
```python 
completeness_score([0, 0, 1, 1], [0, 0, 0, 0]) --> 1.0 
completeness_score([0, 1, 2, 3], [0, 0, 1, 1]) --> 0.999...
```
- If classes members are split across different clusters, the assignment cannot be complete
```python
completeness_score([0, 0, 1, 1], [0, 1, 0, 1]) --> 0.0
```
***

**mutual_info_score**:
Mutual Information between two clusterings. The Mutual Information is a measure of the similarity between two labels of the same data. Where ${|U_i|}$ is the number of the samples in cluster $U_i$ and ${|V_j|}$ is the number of the samples in cluster $V_j$, the Mutual Information between clusterings U and V is given as:

<center> 
    $MI(U,V)=\sum_{i=1}^{|U|} \sum_{j=1}^{|V|} \frac{|U_i\cap V_j|}{N}\log\frac{N|U_i \cap V_j|}{|U_i||V_j|}$ 
</center>

This metric is furthermore symmetric: switching ``label_true`` with ``label_pred`` will return the same score value. This can be useful to measure the agreement of two independent label assignments strategies on the same dataset when the real ground truth is not known.
    
***

**contingency_matrix:**
Matrix C such that $C_{i,j}$ is the number of samples in true class i and in predicted class j. 

***

**v_measure_score**:
harmonic mean between homogeneity and completeness:
<center>
$v = (1 + beta) * homogeneity * completeness / (beta * homogeneity + completeness)$
</center>

- Perfect labelings are both homogeneous and complete, hence have score 1.0
```python
v_measure_score([0, 0, 1, 1], [1, 1, 0, 0]) --> 1.0
```

- Labelings that assign all classes members to the same clusters are complete be not homogeneous, hence penalized
```python
v_measure_score([0, 0, 1, 2], [0, 0, 1, 1]) --> 0.8...
v_measure_score([0, 1, 2, 3], [0, 0, 1, 1]) --> 0.66...
```

- Labelings that have pure clusters with members coming from the same classes are homogeneous but un-necessary splits harms completeness and thus penalize V-measure as well
```python
v_measure_score([0, 0, 1, 1], [0, 1, 2, 3]) --> 0.66...
```

- If classes members are completely split across different clusters, the assignment is totally incomplete, hence the V-Measure is null
```python
v_measure_score([0, 0, 0, 0], [0, 1, 2, 3]) --> 0.0...
```

- Clusters that include samples from totally different classes totally destroy the homogeneity of the labeling
```python
v_measure_score([0, 0, 1, 1], [0, 0, 0, 0]) --> 0.0...
```

***

**adjusted_rand_score**:
The Rand Index computes a similarity measure between two clusterings by considering all pairs of samples and counting pairs that are assigned in the same or different clusters in the predicted and true clusterings.
The raw RI score is then “adjusted for chance” into the ARI score using the following scheme:
<center>
$ARI = (RI - Expected\_RI) / (max(RI) - Expected\_RI)$
</center>

- Labelings that assign all classes members to the same clusters are complete be not always pure, hence penalized
```python
adjusted_rand_score([0, 0, 1, 2], [0, 0, 1, 1]) --> 0.57...
```
- ARI is symmetric, so labelings that have pure clusters with members coming from the same classes but unnecessary splits are penalized
```python
adjusted_rand_score([0, 0, 1, 1], [0, 0, 1, 2]) --> 0.57...
```
- If classes members are completely split across different clusters, the assignment is totally incomplete, hence the ARI is very low
```python
adjusted_rand_score([0, 0, 0, 0], [0, 1, 2, 3]) --> 0.0
```