# Data Preprocessing for Proton-Induced Experiments
**Author:** Juan A. Monleón de la Lluvia  
**Date:** 29-08-2023  

## Description
This Jupyter Notebook serves as a guide for preprocessing proton-induced experiments. 
It covers various aspects such as reading experiments from EXFORTABLES, data transformation, 
and saving experiments to different formats. The notebook is equipped with code snippets 
and explanations to facilitate a seamless preprocessing workflow.


In [None]:
from EXFOR_ProtonReactions_UtilityFunctions import *
import pandas as pd
pd.set_option('display.max_columns', 7)
pd.set_option('display.max_rows', 12)

## Reading Proton Experiments from EXFORTABLES

This section assumes that the experiments are organized according to the EXFORTABLES format. The required path should point to the directory containing proton-induced experiments, which are identified by the prefix 'p' in EXFORTABLES.

In [None]:
path = r'D:\OneDrive\ETSII\MASTER\TFM\Documentacion EXFOR\exfortables\p'
experiments = read_proton_experiments_from_exfortables(path)

## Saving Experiments to a Single File (Binary and Text Formats)

In [None]:
write_experiments_to_binary(experiments, 'EXFOR_ProtonReactions_Database.bin')
write_experiments_to_txt(experiments, 'EXFOR_ProtonReactions_Database.txt')

## File Transfer and Reading Capabilities

The files created can be transferred and accessed on any computer, irrespective of whether it has EXFORTABLES installed. 

For practical purposes, it is sufficient to read only the binary files, which offers a faster loading time compared to text files.


In [None]:
exp_from_bin = read_experiments_from_binary('EXFOR_ProtonReactions_Database.bin')
exp_from_txt = read_experiments_from_txt('EXFOR_ProtonReactions_Database.txt')

## Accessing Experiment Data

To retrieve data associated with any specific experiment from the list, you can use Python's `print` function. This allows you to inspect the raw data.


In [None]:
print(exp_from_bin[3])

### Displaying the Experiment's Results DataFrame

For a more structured view, you can display just the results DataFrame for an experiment.


In [None]:
exp_from_bin[3].data

### Converting an Experiment to a DataFrame

To represent all the information related to an experiment as a DataFrame, you can use the `to_dataframe()` method.


In [None]:
exp_from_bin[3].to_dataframe()

## Plotting Experiment Results

### Basic Plotting

You can visualize the experiment's results through a plot, with an option to employ a logarithmic scale on the y-axis.


In [None]:
exp_from_bin[0].plot(ylog=True)

### Plotting with Error Bars

If error information is available for the experiment, error bars will be included in the plot.


In [None]:
exp_from_bin[2135].plot(ylog=True)

## Analyzing and Filtering Experiments

### Checking Unique Attribute Values

You can identify unique values for a specific attribute across all experiments using the `get_unique_values` function. For instance, to get unique MT values:


In [None]:
MT_values = get_unique_values(exp_from_bin, 'MT')
print(MT_values)

### Filtering Experiments by Attribute

To filter experiments based on the value of a particular attribute, you can use `filter_experiments`.


In [None]:
MT42_exp = filter_experiments(exp_from_bin, 'MT', 42)

### Batch Plotting

You can plot multiple experiments in one view, which is useful for comparative analysis.


In [None]:
plot_experiments(MT42_exp)

## Classifying Experiments and Preparing for Machine Learning

### Classifying by Data Type

This operation performs a classification based on the data type of each experiment. It also generates `.csv` files after applying One-Hot Encoding to the attributes. Note that this step can be time-consuming.


In [None]:
df = classify_experiments_by_data(exp_from_bin)

### Importing Classified Data

Once the `.csv` files are ready, they can be imported into another notebook for machine learning tasks.


In [None]:
df = pd.read_csv('EXFOR_ProtonReactions_Classified_Group_1.csv')
df

## Handling Missing Values in 'final_A'

### Segregating DataFrames

In case of Group_4, the attribute 'final_A' may contain `NaN` values. For the purpose of machine learning, it is recommended to segregate these into separate DataFrames.


In [None]:
df = pd.read_csv('EXFOR_ProtonReactions_Classified_Group_4.csv')
# Separate the DataFrame in two DataFrames, one with NaN values and the other with not NaN values
df_nan = df[df['final_A'].isna()]
df_not_nan = df[df['final_A'].notna()]

### Saving Segregated DataFrames

After segregation, the DataFrames can be saved into separate `.csv` files for subsequent machine learning tasks.


In [None]:
# Save the two DataFrames in separate csv files
df_nan.to_csv('EXFOR_ProtonReactions_Classified_Group_4_1.csv', index=False)
df_not_nan.to_csv('EXFOR_ProtonReactions_Classified_Group_4_2.csv', index=False)

In [None]:
df = pd.read_csv('EXFOR_ProtonReactions_Classified_Group_4_2.csv')
df