# UC San Diego: Neural Data Science
## Final Project Title (change this to your project's title)

## Permissions

Place an `X` in the appropriate bracket below to specify if you would like your group's project to be made available to the public. (Note that student names will be included (but PIDs will be scraped from any groups who include their PIDs).

* [  ] YES - make available
* [  ] NO - keep private

# Names

- Black Panther
- Black Widow
- Hulk
- Iron Man
- Thor
- Vision
- Wanda

# Overview

* Write a clear, 3-4 sentence summary of what you did and why.

<a id='research_question'></a>
# Research Question

* One sentence that describes the question you address in your project. Make sure what you’re measuring (variables) to answer your question is clear!


<a id='background'></a>

## Background & Prior Work

* In 2-3 paragraphs, describe the motivation behind your question. What’s the big picture, and why is it interesting? Are there published papers addressing aspects of your question? You should cite at least three primary references here. You are welcome to replicate published papers using publicly available data, just cite them and explain why!

References (include links):
- 1)
- 2)

# Hypothesis


*Fill in your hypotheses here*

# Dataset(s)

*Fill in your dataset information here*

(Copy this information for each dataset)
- Dataset Name:
- Link to the dataset:
- Number of observations:

1-2 sentences describing each dataset. 

If you plan to use multiple datasets, add 1-2 sentences about how you plan to combine these datasets.

# Data Wrangling

* In the case of our code, the data wrangling will be done utilizing both the MNE, Pandas, Scipy, and Librosa libraries. MNE is important for getting information from our EEG data, which is sourced from files with the file type '.edf'. Pandas will be utilized to load in any tabular data. In the case of our research, we will be using this when loading in our events data to determine the start times for when each song plays in its respective EEG reading, the type of song it is, and the song file. Using these song files, we can then go into Scipy to load in these '.wav' files using Scipy's 'wavfile' method then using Librosa methods to extract the necessary features we need. In the case of our project and the way the original dataset is formatted, we needed to wrap all the code for our wrangling, cleaning, and processing into a nested for loop to do all these steps efficiently. We have attached snippets for the data wrangling code for a randomly selected run below to showcase how the process works outside of this loop.

## Loading in EEG Data

In [8]:
import pandas as pd
import numpy as np
import os
import mne
import mne
import scipy
import librosa
import matplotlib.pyplot as plt
import librosa.display
from scipy.io import wavfile
import warnings

b = mne.io.read_raw_edf('./sub-02/eeg/sub-02_task-run3_eeg.edf')
warnings.filterwarnings('ignore')
b

Extracting EDF parameters from /Users/rakesh/Downloads/COGS138/COGS138_Music/sub-02/eeg/sub-02_task-run3_eeg.edf...
EDF file detected
Setting channel info structure...
Creating raw.info structure...


0,1
Measurement date,Unknown
Experimenter,Unknown
Digitized points,Not available
Good channels,37 EEG
Bad channels,
EOG channels,Not available
ECG channels,Not available
Sampling frequency,1000.00 Hz
Highpass,0.00 Hz
Lowpass,500.00 Hz


## Loading in the song onsets, types, and files into a Pandas DataFrame

In [4]:
init_df = pd.read_csv('./sub-02/eeg/sub-02_task-run3_events.tsv', sep = '\t')
wav_files = ['hvha1.wav', 'hvha10.wav', 'hvha11.wav', 'hvha12.wav', 'hvha2.wav','hvha3.wav', 'hvha4.wav', 'hvha5.wav', 'hvha6.wav', 'hvha7.wav',        'hvha8.wav', 'hvha9.wav', 'hvla1.wav', 'hvla10.wav', 'hvla11.wav',       'hvla12.wav', 'hvla2.wav', 'hvla3.wav', 'hvla4.wav', 'hvla5.wav',        'hvla6.wav', 'hvla7.wav', 'hvla8.wav', 'hvla9.wav',         'hvna1.wav', 'hvna10.wav', 'hvna11.wav', 'hvna12.wav', 'hvna2.wav',         'hvna3.wav', 'hvna4.wav', 'hvna5.wav', 'hvna6.wav', 'hvna7.wav' ,         'hvna8.wav', 'hvna9.wav', 'lvha1.wav', 'lvha10.wav', 'lvha11.wav',         'lvha12.wav', 'lvha2.wav', 'lvha3.wav' ,'lvha4.wav', 'lvha5.wav',         'lvha6.wav', 'lvha7.wav', 'lvha8.wav', 'lvha9.wav', 'lvla1.wav' ,         'lvla10.wav', 'lvla11.wav', 'lvla12.wav', 'lvla2.wav', 'lvla3.wav',         'lvla4.wav', 'lvla5.wav',  'lvla6.wav', 'lvla7.wav', 'lvla8.wav' ,         'lvla9.wav', 'lvna1.wav', 'lvna10.wav', 'lvna11.wav' , 'lvna12.wav',         'lvna2.wav', 'lvna3.wav', 'lvna4.wav', 'lvna5.wav', 'lvna6.wav',         'lvna7.wav', 'lvna8.wav', 'lvna9.wav', 'nvha1.wav', 'nvha10.wav',         'nvha11.wav', 'nvha12.wav', 'nvha2.wav', 'nvha3.wav', 'nvha4.wav',         'nvha5.wav', 'nvha6.wav', 'nvha7.wav', 'nvha8.wav', 'nvha9.wav',         'nvla1.wav', 'nvla10.wav', 'nvla11.wav', 'nvla12.wav', 'nvla2.wav',         'nvla3.wav', 'nvla4.wav', 'nvla5.wav', 'nvla6.wav', 'nvla7.wav',         'nvla8.wav', 'nvla9.wav', 'nvna1.wav', 'nvna10.wav', 'nvna11.wav',         'nvna12.wav', 'nvna2.wav', 'nvna3.wav', 'nvna4.wav', 'nvna5.wav',         'nvna6.wav', 'nvna7.wav', 'nvna8.wav', 'nvna9.wav']
song_types = ['LVLA', 'NVLA','HVLA', 'LVNA', 'NVNA', 'HVNA', 'LVHA','NVHA','HVHA']
df = pd.DataFrame({'onset': init_df['onset'].iloc[::2].values, 'song_type':init_df['trial_type'].iloc[::2].values, 'song_value':init_df['trial_type'].iloc[1::2].values})
df['song_value'] = df['song_value'] - 100
list_of_songs = []
for i in df.song_value:
    file_name = wav_files[i-1]
    list_of_songs.append(file_name)
df['song_files'] = list_of_songs
song_longName = []
for i in df['song_type']:
    song_longName.append(song_types[i-1])
df['song_longName'] = song_longName
df

Unnamed: 0,onset,song_type,song_value,song_files,song_longName
0,9.274,4,105,nvna6.wav,LVNA
1,57.172,4,81,nvha6.wav,LVNA
2,108.175,3,62,lvna10.wav,HVLA
3,163.586,6,92,nvla5.wav,HVNA
4,214.362,7,44,lvha5.wav,LVHA
5,263.112,6,5,hvha2.wav,HVNA
6,322.994,5,6,hvha3.wav,NVNA
7,373.027,9,8,hvha5.wav,HVHA
8,436.314,3,99,nvna11.wav,HVLA
9,487.55,2,18,hvla3.wav,NVLA


## Accessing the .wav files and relevant MFCC data for each one

In [7]:
wav_mfcc_data = []
warnings.filterwarnings('ignore')
for j in range(len(list_of_songs)):
    rate, wav_file = wavfile.read('./stimuli/' + list_of_songs[j])
    wav_array = np.array(wav_file.T,dtype=float)
#     tempo_vals = librosa.feature.tempogram(wav_array, sr=1000)
    #Find largest coefficent for dct/mfcc
    mfcc_vals = librosa.feature.mfcc(wav_array, sr=1000)
    wav_mfcc_data.append(mfcc_vals)
df['MFCC'] = pd.Series(wav_mfcc_data)
df

Unnamed: 0,onset,song_type,song_value,song_files,song_longName,MFCC
0,9.274,4,105,nvna6.wav,LVNA,"[[[1687.9974894669529, 1687.9974894669529, 168..."
1,57.172,4,81,nvha6.wav,LVNA,"[[[594.5762101242709, 594.5762101242709, 594.5..."
2,108.175,3,62,lvna10.wav,HVLA,"[[[1687.5354002213599, 1687.5354002213599, 168..."
3,163.586,6,92,nvla5.wav,HVNA,"[[[605.0384535866119, 605.0384535866119, 605.0..."
4,214.362,7,44,lvha5.wav,LVHA,"[[[581.8458843476286, 581.8458843476286, 581.8..."
5,263.112,6,5,hvha2.wav,HVNA,"[[[574.7667146407146, 574.7667146407146, 574.7..."
6,322.994,5,6,hvha3.wav,NVNA,"[[[585.8629369343998, 585.8629369343998, 585.8..."
7,373.027,9,8,hvha5.wav,HVHA,"[[[742.9234046007232, 766.0213869782094, 766.4..."
8,436.314,3,99,nvna11.wav,HVLA,"[[[1666.3546467678034, 1666.3546467678034, 166..."
9,487.55,2,18,hvla3.wav,NVLA,"[[[589.5659364546844, 589.5659364546844, 589.5..."


# Data Cleaning

* Describe your data cleaning steps here.

In [2]:
## YOUR CODE HERE
## FEEL FREE TO ADD MULTIPLE CELLS PER SECTION

# Data Visualization

* This is a good place for some relevant visualizations related to any exploratory data anlayses (EDA) you did after the basic cleaning.

# Data Analysis & Results

* Include cells that describe the steps in your data analysis.
* You'll likely also have some visualizations here as well.

In [1]:
## YOUR CODE HERE
## FEEL FREE TO ADD MULTIPLE CELLS PER SECTION

# Conclusion & Discussion

* Discussion of your results and how they address your experimental question(s).
* Discussed limitations of your analyses.
* You can also discuss future directions you'd like to pursue.