# DBSCAN Method for Outlier Detection
**Author:** Juan A. Monleón de la Lluvia  
**Date:** 29-08-2023  

## Description
This Jupyter Notebook focuses on identifying outliers in proton-induced experiments data sets. It outlines steps ranging from data preparation to outlier detection using the DBSCAN method. The notebook provides code examples and explanations to facilitate an efficient analysis workflow.

In [None]:
from EXFOR_ProtonReactions_UtilityFunctions import *
import pandas as pd
from sklearn.preprocessing import StandardScaler
from sklearn.cluster import DBSCAN
import numpy as np
import warnings
warnings.filterwarnings("ignore", category=UserWarning, module="sklearn")
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', 12)


## Data Import and Cleaning

In [None]:
path = r'D:\OneDrive\ETSII\MASTER\TFM\Scripts\exfortables\EXFOR_ProtonReactions_Classified_Group_6.csv'
df = pd.read_csv(path)
df = clean_dataframe(df)
df

In [None]:
# Save the IDs and drop them from the dataframe
ids = df['X4_ID']
df_without_ids = df.drop(columns=['X4_ID'])

## Implementation of DBSCAN Method

In [None]:
# Scaling the Data
scaler = StandardScaler()
df_scaled = scaler.fit_transform(df_without_ids)

In [None]:
# Applying DBSCAN Clustering
clustering = DBSCAN().fit(df_scaled)

In [None]:
# Identifying Outliers
outlier_positions = np.where(clustering.labels_ == -1)  # -1 indicates outliers

In [None]:
# Reverting the Scaling
df_descaled = pd.DataFrame(scaler.inverse_transform(df_scaled), columns=df_without_ids.columns)

In [None]:
# Adding the IDs and extracting the outliers
df_descaled['X4_ID'] = ids
outliers_df = df_descaled.iloc[outlier_positions]
print('Percentage of outliers: {:.2f}%'.format(len(outliers_df)/len(df)*100))
outliers_df

## Visual Representation and Verification of Outliers

For the visual representations, the whole data set need to be loaded into memory. This is done by using the `read_experiments_from_binary` function, but also could be done by using the `read_experiments_from_txt` function, both available in the `EXFOR_ProtonReactions_UtilityFunctions.py` file.

In [None]:
experiments = read_experiments_from_binary('EXFOR_ProtonReactions_Database.bin')

In [None]:
plot_outliers(outliers_df, experiments)