# OneClass Support Vector Machine (SVM) Method for Outlier Detection
**Author:** Juan A. Monleón de la Lluvia  
**Date:** 29-08-2023  

## Description
This Jupyter Notebook focuses on identifying outliers in proton-induced experiments data sets. It outlines steps ranging from data preparation to outlier detection using the Support Vector Machine (SVM) method. The notebook provides code examples and explanations to facilitate an efficient analysis workflow.

In [None]:
from EXFOR_ProtonReactions_UtilityFunctions import *
import pandas as pd
from sklearn.preprocessing import StandardScaler
from sklearn.svm import OneClassSVM
import numpy as np
import warnings
warnings.filterwarnings("ignore", category=UserWarning, module="sklearn")
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', 12)

## Data Import and Cleaning

In [None]:
path = r'D:\OneDrive\ETSII\MASTER\TFM\Scripts\exfortables\EXFOR_ProtonReactions_Classified_Group_2.csv'
df = pd.read_csv(path)
df = clean_dataframe(df)
df

In [None]:
# Save the IDs and drop them from the dataframe
x4_id_column = df['X4_ID'].copy()
df_without_id = df.drop('X4_ID', axis=1)

## Implementation of the SVM Method

In [None]:
# Scaling the Data
scaler = StandardScaler()
scaled_df = scaler.fit_transform(df_without_id)

In [None]:
# Applying OneClassSVM
ocsvm = OneClassSVM(kernel='rbf', nu=0.001)
ocsvm.fit(scaled_df)

In [None]:
# Identifying Outliers
pred = ocsvm.predict(scaled_df)
outliers = (pred == -1)

In [None]:
# Reverting the Scaling
descaled_df = scaler.inverse_transform(scaled_df)

In [None]:
# Adding the IDs and extracting the outliers
result_df = pd.DataFrame(descaled_df, columns=df_without_id.columns)
result_df['X4_ID'] = x4_id_column
result_df['outliers'] = outliers
outliers_df = result_df[result_df['outliers'] == 1].iloc[:, :-1]
print('Percentage of outliers: {:.2f}%'.format(len(outliers_df)/len(df)*100))
outliers_df

## Visual Representation and Verification of Outliers

For the visual representations, the whole data set need to be loaded into memory. This is done by using the `read_experiments_from_binary` function, but also could be done by using the `read_experiments_from_txt` function, both available in the `EXFOR_ProtonReactions_UtilityFunctions.py` file.

In [None]:
experiments = read_experiments_from_binary('EXFOR_ProtonReactions_Database.bin')

In [None]:
plot_outliers(outliers_df, experiments)