# Outliers
The following notebook aims to find and replace the outliers from a group of data *(structured specified below)*. <br> The outliers were found by stablishing a boundary based on the first and third quartile of each file's data. Outliers were replaced with the closest boundary, for example: <br><br>
If the boundaries were (2,10) and the value was **-1**, it will be replaced with 2 (which is the closest boundary).

In [1]:
import pandas as pd
import numpy as np
import math

## Folder structure
Both source and destination paths will and have to be structured in the following way:
- Folder path
    - 7 Hz
    - 9 Hz
    - 11 Hz
    - 13 Hz
    - Baseline

## File paths
Both source and destination path have to be specified. <br>
- ***Source path*** (SRC_PATH) contains the files from which outliers will be replaced. 
- ***Destination path*** (DEST_PATH) will contain the files with the replaced outliers. 

In [3]:
SRC_PATH = "C:/EEG_Embedded_Systems/DatosProcesados_V2/RawData"
DEST_PATH = "C:/EEG_Embedded_Systems/DatosProcesados_V2/RawData_Outliers"

In [4]:
def detect_outlier(data):
    q1, q3 = np.percentile(sorted(data), [25, 75])
    iqr = q3 - q1
    lower_bound = q1 - (1.5 * iqr)
    upper_bound = q3 + (1.5 * iqr)
    return lower_bound, upper_bound

In [5]:
def labeling(sub_folder, num_rows):
    if(sub_folder == '7Hz'):
        label = 1
    if(sub_folder == '9Hz'):
        label = 2
    if(sub_folder == '11Hz'):
        label = 3
    if(sub_folder == '13Hz'):
        label = 4
    if(sub_folder == 'Baseline'):
        label = 5
    labels = pd.DataFrame(np.full(shape=(num_rows,1), fill_value=label, dtype=np.int))
    return labels

In [6]:
for folder in os.listdir(SRC_PATH):
    print(folder)
    src_folder = os.path.join(SRC_PATH, folder)
    files = [e for e in os.listdir(src_folder) if e.endswith('.csv')]
    for index, file_ in enumerate(files):
        src_file = os.path.join(src_folder, file_)
        df = pd.read_csv(src_file)
        df_new = pd.DataFrame()
        for electrode in ['EEG.O1', 'EEG.O2']:
            data = np.array(df[electrode])
            lower_bound, upper_bound = detect_outlier(data)
            df_new[electrode] = df[electrode].clip(lower=lower_bound, upper=upper_bound)
        df_new['MarkerValueInt'] = labeling(folder, df.shape[0])
        df_new.to_csv(DEST_PATH + '/' + carpeta + '/muestra' + str(index) + '.csv', index=False)        

11Hz
13Hz
7Hz
9Hz
Baseline
