# MJF-1B: Parkinson's Freezing of Gait 
Link to kaggle competition dataset and info: 
- https://www.kaggle.com/competitions/tlvmc-parkinsons-freezing-gait-prediction/data
## Objective:
- To detect the start and stop of each freezing episode and the occurrence in these series of three types of freezing of gait events:
  - Start Hesitation
  - Turn
  - Walking

## File and Field Description
- train/ Folder containing the data series in the training set within three subfolders: tdcsfog/, defog/, and notype/.
- Series in the notype folder are from the defog dataset but lack event-type annotations.
- The fields present in these series vary by folder.
  - Time An integer timestep. Series from the tdcsfog dataset are recorded at 128Hz (128 timesteps per second), while series from the defog and daily series are recorded at 100Hz (100 timesteps per second).
  - AccV, AccML, and AccAP Acceleration from a lower-back sensor on three axes: V - vertical, ML - mediolateral, AP - anteroposterior. Data is in units of m/s^2 for tdcsfog/ and g for defog/ and notype/.
  - StartHesitation, Turn, Walking Indicator variables for the occurrence of each of the event types.
  - Event Indicator variable for the occurrence of any FOG-type event. Present only in the notype series, which lack type-level annotations.
  - Valid There were cases during the video annotation that were hard for the annotator to decide if there was an Akinetic (i.e., essentially no movement) FoG or the subject stopped voluntarily. Only event annotations where the series is marked true should be considered as unambiguous.
  - Task Series were only annotated where this value is true. Portions marked false should be considered unannotated.
    
- Note that the Valid and Task fields are only present in the defog dataset. They are not relevant for the tdcsfog data.

In [9]:
# Import necessary Python libraries
import os
import pandas as pd
import numpy as np
import matplotlib.pylab as plt
import seaborn as sns

In [12]:
# List all relevant files, folders, and subfolders 
all_files = os.listdir('../input/tlvmc-parkinsons-freezing-gait-prediction')
print('All competition datasets:')
print(all_files)

train_files = os.listdir('../input/tlvmc-parkinsons-freezing-gait-prediction/train')
print('\nFolders in train:')
print(train_files)

defog_path = '../input/tlvmc-parkinsons-freezing-gait-prediction/train/defog'
defog_files = os.listdir(defog_path)
print('\nFiles in defog:')
print(f'{defog_files[:10]}... plus {len(defog_files)-10} more remaining csv files')

tdcsfog_path = '../input/tlvmc-parkinsons-freezing-gait-prediction/train/tdcsfog'
tdcsfog_files = os.listdir(tdcsfog_path)
print('\nFiles in tdcsfog:')
print(f'{tdcsfog_files[:10]}... plus {len(tdcsfog_files)-10} more remaining csv files')


All competition datasets:
['sample_submission.csv', 'unlabeled', 'subjects.csv', 'tasks.csv', 'defog_metadata.csv', 'daily_metadata.csv', 'test', 'events.csv', 'tdcsfog_metadata.csv', 'train']

Folders in train:
['defog', 'tdcsfog', 'notype']

Files in defog:
['be9d33541d.csv', '4c3aa8ea6e.csv', '18e7abc37e.csv', '6a20935af5.csv', 'e642d9ea5f.csv', '3f3b08f78d.csv', '68e7e02a47.csv', 'f17eacf7d8.csv', '3f970065e5.csv', '7030643376.csv']... plus 81 more remaining csv files

Files in tdcsfog:
['a171e61840.csv', '4171ea3a0c.csv', '0f985a8440.csv', '5d320ade20.csv', 'ae8c67086b.csv', 'b7214cbf21.csv', 'e18fcafee8.csv', '79568b8e25.csv', 'feba449e1a.csv', '7ebad45aec.csv']... plus 823 more remaining csv files


In [15]:
# Function to read all csv files in folder
def read_dataset(files, folder):
    for fname in files:
        yield pd.read_csv(os.path.join(folder, fname))

# Combine CSVs in defog into one dataframe 
df_defog = pd.concat(read_dataset(defog_files, defog_path), ignore_index=True)
print("\nCombined defog shape:", df_defog.shape)

# Combine CSVs in tdcsfog into one dataframe
df_tdcsfog = pd.concat(read_dataset(tdcsfog_files, tdcsfog_path), ignore_index=True)
print("\nCombined defog shape:", df_tdcsfog.shape)


Combined defog shape: (13525702, 9)

Combined defog shape: (7062672, 7)


In [14]:
# Preview of defog data
df_defog.head(10)

Unnamed: 0,Time,AccV,AccML,AccAP,StartHesitation,Turn,Walking,Valid,Task
0,0,-1.002697,0.022371,0.068304,0,0,0,False,False
1,1,-1.002641,0.019173,0.066162,0,0,0,False,False
2,2,-0.99982,0.019142,0.067536,0,0,0,False,False
3,3,-0.998023,0.018378,0.068409,0,0,0,False,False
4,4,-0.998359,0.016726,0.066448,0,0,0,False,False
5,5,-1.002969,0.016203,0.065118,0,0,0,False,False
6,6,-1.010631,0.014523,0.062518,0,0,0,False,False
7,7,-1.015932,0.014735,0.056944,0,0,0,False,False
8,8,-1.016709,0.020147,0.054147,0,0,0,False,False
9,9,-1.016231,0.022617,0.055152,0,0,0,False,False


In [16]:
# Preview of tdcsfog data
df_tdcsfog.head(10)

Unnamed: 0,Time,AccV,AccML,AccAP,StartHesitation,Turn,Walking
0,0,-9.66589,0.04255,0.184744,0,0,0
1,1,-9.672969,0.049217,0.184644,0,0,0
2,2,-9.67026,0.03362,0.19379,0,0,0
3,3,-9.673356,0.035159,0.184369,0,0,0
4,4,-9.671458,0.043913,0.197814,0,0,0
5,5,-9.66862,0.033002,0.207051,0,0,0
6,6,-9.66781,0.03376,0.205178,0,0,0
7,7,-9.668493,0.037686,0.209401,0,0,0
8,8,-9.66434,0.035,0.195657,0,0,0
9,9,-9.677481,0.024179,0.184481,0,0,0
