# Local Outlier Factor (LOF) Method for Outlier Detection
**Author:** Juan A. Monleón de la Lluvia  
**Date:** 29-08-2023  

## Description
This Jupyter Notebook focuses on identifying outliers in proton-induced experiments data sets. It outlines steps ranging from data preparation to outlier detection using the LOF method. The notebook provides code examples and explanations to facilitate an efficient analysis workflow.

In [None]:
from EXFOR_ProtonReactions_UtilityFunctions import *
import pandas as pd
from sklearn.preprocessing import StandardScaler
from sklearn.neighbors import LocalOutlierFactor
import numpy as np
import warnings
warnings.filterwarnings("ignore", category=UserWarning, module="sklearn")
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', 12)

## Data Import and Cleaning

In [None]:
df = pd.read_csv(r'D:\OneDrive\ETSII\MASTER\TFM\Scripts\exfortables\by_data\group_6.csv')
df = clean_dataframe(df)
df

In [None]:
# Save the IDs and drop them from the dataframe
X4_ID = df['X4_ID']
df_without_id = df.drop(columns=['X4_ID'])

## Implementation of LOF Method

In [None]:
# Scaling the Data
scaler = StandardScaler()
df_scaled = scaler.fit_transform(df_without_id)

In [None]:
# Applying the LOF algorithm
lof = LocalOutlierFactor(n_neighbors=20, contamination=0.01)
outliers = lof.fit_predict(df_scaled)

In [None]:
# Identifying Outliers
is_outlier = outliers == -1

In [None]:
# Reverting the Scaling
df_descaled = pd.DataFrame(scaler.inverse_transform(df_scaled), columns=df_without_id.columns)

In [None]:
# Adding the IDs and extracting the outliers
df_descaled['X4_ID'] = X4_ID
df_descaled['is_outlier'] = is_outlier

In [None]:
# Save on outliers_df the dataframe where is_outlier is True and remove that column
outliers_df = df_descaled[df_descaled['is_outlier'] == True].drop('is_outlier', axis=1)
print('Percentage of outliers: {:.2f}%'.format(len(outliers_df)/len(df)*100))
outliers_df

## Visual Representation and Verification of Outliers

For the visual representations, the whole data set need to be loaded into memory. This is done by using the `read_experiments_from_binary` function, but also could be done by using the `read_experiments_from_txt` function, both available in the `EXFOR_ProtonReactions_UtilityFunctions.py` file.

In [None]:
experiments = read_experiments_from_binary('EXFOR_ProtonReactions_Database.bin')

In [None]:
plot_outliers(outliers_df, experiments, ylog=True)