<a href="https://colab.research.google.com/github/pathtosfion/BCI_Model/blob/main/Data_Prep_EEG_DIVIDICUS.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **This notegook preforms the following:**


- *Loads the CSV file into a pandas DataFrame.*

- *Drops the*  **'other_vote'** *column.*

- *Drops rows where* **'other'** *is the expert consensus.*

- *Defines a function* **'calculate_trust_score`** *to calculate the trust score for each* expert's vote.
- *Applies this function to the DataFrame to create a new* **'trust_level'** *column.*
- *Drops rows where* **'other'** *is the expert consensus.*
- *Drops the* 'expert_consensus' *and vote columns as they are no longer needed.*
- *Saves the processed DataFrame to a new CSV file.*

### **Please ensure that the CSV file is in the same directory as your Jupyter notebook or provide the correct path to the file. Also, adjust the column names and vote categories as needed for your specific dataset.**

In [None]:
import pandas as pd
import numpy as np

## **Load the CSV file**
df = pd.read_csv('train.csv')

In [None]:
df = pd.read_csv('train.csv')

## **Display the first few rows of the DataFrame to understand the data structure**

In [None]:
print(df.head())

## **Drop the** *'other_vote'* **column**

In [None]:
df.drop('other_vote', axis=1, inplace=True)

## **Drop rows where** *'other'* **is the expert consensus**

In [None]:
df = df[df['expert_consensus'].str.lower() != 'other']


##**Define the expert categories**
 - *and their corresponding vote columns*

In [None]:
expert_categories = ['seizure', 'lpd', 'irpd', 'grda', 'gpd']
vote_columns = [f'{expert}_vote' for expert in expert_categories]

## **To express the trust level per row based on the expert consensus and the corresponding vote counts, we can assign weights to each expert's vote based on the number of votes they have. Then, we can sum these weighted votes to get a trust score for each row. Rows where 'other' is the expert consensus can be dropped as they are not needed for the trust level calculation.**

## **Function to calculate the trust score**
- *for each expert's vote*

In [None]:
def calculate_trust_score(row, expert):
    vote_count = row[f'{expert}_vote']
    # Normalize the vote count to a range of 0 to 1
    normalized_vote = vote_count / 19 if vote_count > 0 else 0
    # Apply a weight based on the number of votes
    weight = np.log(vote_count + 1)  # Adding 1 to avoid log(0)
    return normalized_vote * weight


## **Calculate the trust score for each expert's vote**
- *and sum them to get the total trust level*

In [None]:
df['trust_level'] = df.apply(lambda row: sum(calculate_trust_score(row, expert) for expert in expert_categories), axis=1)

## **Drop the expert_consensus and vote columns**
 - *as they are no longer needed*


In [None]:
columns_to_drop = ['expert_consensus'] + vote_columns
df.drop(columns_to_drop, axis=1, inplace=True)

## **Display the processed DataFrame**

In [None]:
print(df.head())

## **Save the processed data to a new CSV file**

In [None]:
df.to_csv('processed_train.csv', index=False)