# Introduction

The goal of this competition is to **detect freezing of gait (FOG)**, a debilitating symptom that afflicts many people **with Parkinson’s disease**. It is requred to **develop a machine learning model trained on data collected from a wearable 3D lower back sensor** to better understand **when and why FOG episodes occur**.

# Import Libraries

In [1]:
!ls ../input/tsflex/ts_flex
!pip install tsflex --no-index --find-links=file:///kaggle/input/tsflex/ts_flex 

colorama-0.4.6-py2.py3-none-any.whl
dill-0.3.6-py3-none-any.whl
multiprocess-0.70.14-py38-none-any.whl
numpy-1.24.3-cp38-cp38-win_amd64.whl
pandas-1.5.3-cp38-cp38-win_amd64.whl
python_dateutil-2.8.2-py2.py3-none-any.whl
pytz-2023.3-py2.py3-none-any.whl
six-1.16.0-py2.py3-none-any.whl
tqdm-4.65.0-py3-none-any.whl
tsflex-0.3.0-py3-none-any.whl
Looking in links: file:///kaggle/input/tsflex/ts_flex
Processing /kaggle/input/tsflex/ts_flex/tsflex-0.3.0-py3-none-any.whl
Installing collected packages: tsflex
Successfully installed tsflex-0.3.0
[0m

In [2]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import os
from imblearn.over_sampling import SMOTE
import scipy.stats as ss
from tsflex.features import MultipleFeatureDescriptors, FeatureCollection, FeatureDescriptor
from tsflex.features.utils import make_robust
import warnings




In addition, **tdcsfog_metadata.csv identifies** each series in the tdcsfog dataset by **a unique Subject, Visit, Test, and Medication condition**.

In [3]:
# tdcsfog metadata file
tdcsfog_metadata = pd.read_csv("/kaggle/input/tlvmc-parkinsons-freezing-gait-prediction/defog_metadata.csv")
tdcsfog_metadata.head(5)

Unnamed: 0,Id,Subject,Visit,Medication
0,02ab235146,e1f62e,2,on
1,02ea782681,ae2d35,2,on
2,06414383cf,8c1f5e,2,off
3,092b4c1819,2874c5,1,off
4,0a900ed8a2,0e3d49,2,on


# Initialize the extraction pipeline

**Pipeline hyperparameters**

In [4]:
# Method of feature extraction, either concat all data and run feature extraction on that = whole_dataset (assumes that all data is chronologically ordered to some extent)
# or perform feature extraction file by file and collect the results = "individual files"
method = "whole_dataset"

# The window label decides to what timestamp the results for a window are tied. E.g. middle means that the features generated by a window will be added to the row
# containing the timestap in the middle of the window
window_label = "middle"

# The windows array decides the size of the window that will slide over the data
windows = [120]

# The strides array decides with what size steps the window is going to slide over the data
strides = [1] 

# The columns that the features will be extracted from
series_names = ["AccV", "AccML", "AccAP"]

print("\nSettings~ \nWindow_size: " + str(windows[0]) +"\nWindow_label: " + str(window_label) + "\nMethod: " + str(method) + "\nSeries_names: " + str(series_names))


Settings~ 
Window_size: 120
Window_label: middle
Method: whole_dataset
Series_names: ['AccV', 'AccML', 'AccAP']


In [18]:
# Tsflex feature collection pipeline

# The funcs for the function set used to extract the data features
def slope(x): return (x[-1] - x[0]) / x[0] if x[0] else 0
def abs_diff_mean(x): return np.mean(np.abs(x[1:] - x[:-1])) if len(x) > 1 else 0
def diff_std(x): return np.std(x[1:] - x[:-1]) if len(x) > 1 else 0



# funcs = [make_robust(f) for f in [np.median, np.min,np.var, np.max, np.std, np.mean]]
funcs = [make_robust(f) for f in [np.min,np.var, np.max, np.std, np.mean, slope, ss.skew, ss.kurtosis, abs_diff_mean, diff_std, np.sum]]

fc_train = FeatureCollection(
    MultipleFeatureDescriptors(
          functions=funcs,
          series_names=series_names,
          windows=windows,
          strides=strides[0],
    )
)

# # Specifically for the dependent variables
# npmean = make_robust(np.mean)

# fc_train.add(FeatureDescriptor(npmean, "StartHesitation", windows[0], strides[0]))
# fc_train.add(FeatureDescriptor(npmean, "Walking", windows[0], strides[0]))
# fc_train.add(FeatureDescriptor(npmean, "Turn", windows[0], strides[0]))

In [22]:
def extract_features(method = "whole_dataset", df_list = None, window_label = "middle", fc = None, test = False):
    """
    method: Method of feature extraction (perform on a file-per-file basis or on the entire set)
    df_list: The lists that contain the tdcsfog and defog dataframes
    window_label: The id that the extracted features are tied to for every window
    fc: The feature extraction pipeline
    test: Whether this is the test phase or not."""
    
    defog_tot = pd.DataFrame()
    tdcsfog_tot = pd.DataFrame()

    if len(df_list) <= 1:
        raise Exception("Failed to pass the entire dataset")
        
    # First all the files will be concatenated and subsequently ts_flex wil perform feature extraction
    if method == "whole_dataset":
        
        # Collect all files within the defog and tdcsfog folders and concatenate those
        for defog in df_list[0]: 
            defog_tot = pd.concat([defog_tot,defog],ignore_index = True)
        for tdcsfog in df_list[1]:
            tdcsfog_tot = pd.concat([tdcsfog_tot,tdcsfog],ignore_index = True)

        # Concatenate the total tdcsfog and defog sets and continue under the name defog_test
        defog_tot = pd.concat([tdcsfog_tot, defog_tot],ignore_index = True)
        
        # Reset the index for the feature collection
        defog_tot = defog_tot.reset_index(drop = True)
        defog_tot["Time"] = list(defog_tot.index.values)

        df_feats = fc.calculate(data=[defog_tot], window_idx=window_label, approve_sparsity=True, return_df=True, show_progress = True)
        index = np.ones(len(df_feats), dtype=int)

        
        if window_label == 'middle':
                # Repeat the first and last row window/2 times to generate a set of the same size as the input
                index[[0, -1]] = windows[0] / 2 + 1
        if window_label == 'end':
                # Repeat the first row window times to generate a set of the same size as the input
                index[[0]] = windows[0] + 1
        if window_label == 'begin':
                # Repeat the last row window times to generate a set of the same size as the input
                index[[-1]] = windows[0] + 1
        
        df_feats = df_feats.iloc[np.arange(len(df_feats)).repeat(index)].reset_index(drop = True)

        # The input and output should have the same length and ID's
        assert(len(defog_tot) == len(df_feats))
        
        if(test):
            df_feats['Id'] = defog_tot['Id']
            defog_tot = df_feats.merge(defog_tot.drop(columns = ["Time"]), on = "Id")
        else: 
            defog_tot = df_feats.join(defog_tot.drop(columns = ["Time"]))
        
        return defog_tot

    # Ts_flex performs feature extraction per file and concatenates outputs
    if method == "individual_files":

        for idx, defog in enumerate(df_list[0]): 
            
            defog = defog.reset_index(drop = True)
            
            df_feats = fc.calculate(data=[defog], window_idx=window_label, approve_sparsity=True, return_df=True)
            index = np.ones(len(df_feats), dtype=int)


            if window_label == 'middle':
                # Repeat the first and last row window/2 times to generate a set of the same size as the input
                index[[0, -1]] = windows[0] / 2 + 1
            if window_label == 'end':
                # Repeat the first row window times to generate a set of the same size as the input
                index[[0]] = windows[0] + 1
            if window_label == 'begin':
                # Repeat the last row window times to generate a set of the same size as the input
                index[[-1]] = windows[0] + 1

            df_feats = df_feats.iloc[np.arange(len(df_feats)).repeat(index)].reset_index(drop = True)
            
            assert(len(df_feats) == len(defog))
            
            if test:  
                df_feats['Id'] = defog['Id']
                df_feats = df_feats.merge(defog.drop(columns = ["Time"]), on = "Id")
            else:
                df_feats = df_feats.join(defog.drop(columns = ["Time"]))
                
            defog_tot = pd.concat([defog_tot,df_feats],ignore_index = True)

        for idx, tdcsfog in enumerate(df_list[1]): 
            
            tdcsfog = tdcsfog.reset_index(drop = True)
                
            df_feats = fc.calculate(data=[tdcsfog], window_idx=window_label, approve_sparsity=True, return_df=True)
            index = np.ones(len(df_feats), dtype=int)

            if window_label == 'middle':
                # Repeat the first and last row window/2 times to generate a set of the same size as the input
                index[[0, -1]] = windows[0] / 2 + 1
            if window_label == 'end':
                # Repeat the first row window times to generate a set of the same size as the input
                index[[0]] = windows[0] + 1
            if window_label == 'begin':
                # Repeat the last row window times to generate a set of the same size as the input
                index[[-1]] = windows[0] + 1

            df_feats = df_feats.iloc[np.arange(len(df_feats)).repeat(index)].reset_index(drop = True)
            
            assert(len(df_feats) == len(tdcsfog))
            
            if test:
                df_feats['Id'] = tdcsfog['Id']
                df_feats = df_feats.merge(tdcsfog.drop(columns = ["Time"]), on = "Id")
            else: 
                df_feats = df_feats.join(tdcsfog.drop(columns = ["Time"]))
                
            tdcsfog_tot = pd.concat([tdcsfog_tot,df_feats],ignore_index = True)
            
        defog_tot = pd.concat([tdcsfog_tot, defog_tot], ignore_index = True) 
               
        return defog_tot


# Collect the Defog and Tdcsfog train files

In [20]:
# Set the directory path to the folder containing the CSV files.
tdcsfog_path = '/kaggle/input/tlvmc-parkinsons-freezing-gait-prediction/train/tdcsfog'

# Initialize an empty list to store the dataframes.
tdcsfog_list = []


# Loop through each file in the directory and read it into a dataframe.
for file_name in os.listdir(tdcsfog_path):
    if file_name.endswith('.csv'):
        file_path = os.path.join(tdcsfog_path, file_name)
        file = pd.read_csv(file_path)
        file.Time = file.Time # / (len(file) - 1)
        tdcsfog_list.append(file)

In [21]:
# Set the directory path to the folder containing the CSV files.
defog_path = '/kaggle/input/tlvmc-parkinsons-freezing-gait-prediction/train/defog'

# Initialize an empty list to store the dataframes.
defog_list = []

# Loop through each file in the directory and read it into a dataframe.
for file_name in os.listdir(defog_path):
    if file_name.endswith('.csv'):
        file_path = os.path.join(defog_path, file_name)
        file = pd.read_csv(file_path)
        file.Time = file.Time # / (len(file) - 1)
        file = file[(file['Task'] == 1) & (file['Valid'] == 1)]
        file = file.drop(columns = ['Task','Valid'])
        defog_list.append(file)

# Set the training hyperparameters


In [23]:
# Input training data
df_list_train = [defog_list,tdcsfog_list]

# Switch for train and test modes (mainly beacuse of ID's)
test = False

# Extract features from the training data

In [24]:
train_features = extract_features(method = method, df_list = df_list_train, window_label = window_label, fc = fc_train, test = False)

  0%|          | 0/33 [00:00<?, ?it/s]

It is better to reduce the memory usage. Reference: [Reducing DataFrame memory size by ~65%](https://www.kaggle.com/code/arjanso/reducing-dataframe-memory-size-by-65)

In [None]:
def reduce_memory_usage(df):
    
    start_mem = df.memory_usage().sum() / 1024 ** 2
    print('Memory usage of dataframe is {:.2f} MB'.format(start_mem))
    
    for col in df.columns:
        col_type = df[col].dtype.name
        if ((col_type != 'datetime64[ns]') & (col_type != 'category')):
            if (col_type != 'object'):
                c_min = df[col].min()
                c_max = df[col].max()

                if str(col_type)[:3] == 'int':
                    if c_min > np.iinfo(np.int8).min and c_max < np.iinfo(np.int8).max:
                        df[col] = df[col].astype(np.int8)
                    elif c_min > np.iinfo(np.int16).min and c_max < np.iinfo(np.int16).max:
                        df[col] = df[col].astype(np.int16)
                    elif c_min > np.iinfo(np.int32).min and c_max < np.iinfo(np.int32).max:
                        df[col] = df[col].astype(np.int32)
                    elif c_min > np.iinfo(np.int64).min and c_max < np.iinfo(np.int64).max:
                        df[col] = df[col].astype(np.int64)

                else:
                    if c_min > np.finfo(np.float16).min and c_max < np.finfo(np.float16).max:
                        df[col] = df[col].astype(np.float16)
                    elif c_min > np.finfo(np.float32).min and c_max < np.finfo(np.float32).max:
                        df[col] = df[col].astype(np.float32)
                    else:
                        pass
            else:
                df[col] = df[col].astype('category')
    mem_usg = df.memory_usage().sum() / 1024 ** 2 
    print("Memory usage became: ",mem_usg," MB")
    
    return df

In [None]:
# train_features = reduce_memory_usage(train_features)

In [None]:
train_features.describe()

# Create Dataset

First, we need to **split the data into input features (i.e. "Time", "AccV", "AccML", and "AccAP") and target variables (i.e. "StartHesitation", "Turn", and "Walking")**. We can do this using the .iloc method to select the appropriate columns.

In [None]:
# Use smote to create synthetic data

# smote = SMOTE(random_state = 4, k_neighbors=100)
# X_syn, y_syn = smote.fit_resample(X_merged, merged['label'])


# Create Synthetic dataset
# syn = pd.concat([X_syn,y_syn.to_frame(name = "label")], axis=1)
# syn["Turn"], syn["Walking"], syn["StartHesitation"] = (syn["label"] == 1).astype(int), (syn["label"] == 2).astype(int), (syn["label"] == 3).astype(int)

# tot = pd.concat([merged,syn])
# tot = tot.sort_values("Time",ignore_index = True)

# Normalize time
# tot["Time"] = tot["Time"] / (len(tot) - 1)

In [25]:
# data = np.array([train_features['Walking__mean__w='+ str(windows[0])], train_features['StartHesitation__mean__w=' + str(windows[0])], train_features['Turn__mean__w='+ str(windows[0])]])

# labels = np.argmax(data, axis = 0)
# sums = np.sum(data, axis = 0)
# labels = np.where(sums == 0 , 5, labels)

y1 = train_features['StartHesitation']  # target variable for StartHesitation
y2 = train_features['Turn']  # target variable for Turn
y3 = train_features['Walking']  # target variable for Walking

train_features = train_features.drop(columns = ["StartHesitation","Turn","Walking"])

# Change this by hand if you want to try more features
X_tot = pd.concat([train_features.iloc[:, :(len(funcs) * len(series_names))],train_features.iloc[:, -len(series_names):]], axis = 1, ignore_index = False)
print(X_tot.columns)

# train_features['Walking'] = np.where(labels == 0 , 1, 0)
# train_features['StartHesitation'] = np.where(labels == 1 , 1, 0)
# train_features['Turn'] = np.where(labels == 2 , 1, 0)


# y1 = train_features['StartHesitation']  # target variable for StartHesitation
# y2 = train_features['Turn']  # target variable for Turn
# y3 = train_features['Walking']  # target variable for Walking


Index(['AccAP__abs_diff_mean__w=120', 'AccAP__amax__w=120',
       'AccAP__amin__w=120', 'AccAP__diff_std__w=120',
       'AccAP__kurtosis__w=120', 'AccAP__mean__w=120', 'AccAP__skew__w=120',
       'AccAP__slope__w=120', 'AccAP__std__w=120', 'AccAP__sum__w=120',
       'AccAP__var__w=120', 'AccML__abs_diff_mean__w=120',
       'AccML__amax__w=120', 'AccML__amin__w=120', 'AccML__diff_std__w=120',
       'AccML__kurtosis__w=120', 'AccML__mean__w=120', 'AccML__skew__w=120',
       'AccML__slope__w=120', 'AccML__std__w=120', 'AccML__sum__w=120',
       'AccML__var__w=120', 'AccV__abs_diff_mean__w=120', 'AccV__amax__w=120',
       'AccV__amin__w=120', 'AccV__diff_std__w=120', 'AccV__kurtosis__w=120',
       'AccV__mean__w=120', 'AccV__skew__w=120', 'AccV__slope__w=120',
       'AccV__std__w=120', 'AccV__sum__w=120', 'AccV__var__w=120', 'AccV',
       'AccML', 'AccAP'],
      dtype='object')
Index(['AccAP__abs_diff_mean__w=120', 'AccAP__amax__w=120',
       'AccAP__amin__w=120', 'AccAP__dif

Most of the target variables are 0. We had better **create each balanced dataset with the target variables of 0 and 1 equally**.

In [26]:
# Find the positions of y1 where it equals 0.
y1_zeros = np.where(y1 == 0)[0]
y1_ones = np.where(y1 == 1)[0]

# Choose the same number of samples with y1 == 1 as there are with y1 == 0.
num1_ones = (y1 == 1).sum()
np.random.seed(42)
y1_zeros = np.random.choice(np.where(y1 == 0)[0], size = num1_ones, replace = False)

# Combine the positions of y1 == 0 and y1 == 1.
y1_balanced_idxs = np.sort(np.concatenate([y1_zeros, y1_ones]))

# Use the balanced indices to get the corresponding rows of X and y1.
X1_balanced = X_tot.iloc[y1_balanced_idxs, :]
y1_balanced = y1.iloc[y1_balanced_idxs]

In [27]:
# Find the positions of y2 where it equals 0.
y2_zeros = np.where(y2 == 0)[0]
y2_ones = np.where(y2 == 1)[0]

# Choose the same number of samples with y2 == 1 as there are with y2 == 0.
num2_ones = (y2 == 1).sum()
np.random.seed(42)
y2_zeros = np.random.choice(np.where(y2 == 0)[0], size = num2_ones, replace = False)

# Combine the positions of y2 == 0 and y2 == 1.
y2_balanced_idxs = np.sort(np.concatenate([y2_zeros, y2_ones]))

# Use the balanced indices to get the corresponding rows of X and y1.
X2_balanced = X_tot.iloc[y2_balanced_idxs, :]
y2_balanced = y2.iloc[y2_balanced_idxs]

In [28]:
# Find the positions of y3 where it equals 0.
y3_zeros = np.where(y3 == 0)[0]
y3_ones = np.where(y3 == 1)[0]

# Choose the same number of samples with y3 == 1 as there are with y3 == 0.
num3_ones = (y3 == 1).sum()
np.random.seed(42)
y3_zeros = np.random.choice(np.where(y3 == 0)[0], size = num3_ones, replace = False)

# Combine the positions of y3 == 0 and y3 == 1.
y3_balanced_idxs = np.sort(np.concatenate([y3_zeros, y3_ones]))

# Use the balanced indices to get the corresponding rows of X and y3.
X3_balanced = X_tot.iloc[y3_balanced_idxs, :]
y3_balanced = y3.iloc[y3_balanced_idxs]

Next, we can **split the data into training and testing sets using the train_test_split function from scikit-learn**.

In [None]:
from sklearn.model_selection import train_test_split

X1_train, X1_test, y1_train, y1_test = train_test_split(X1_balanced, y1_balanced, test_size = 0.2, random_state = 42)
X2_train, X2_test, y2_train, y2_test = train_test_split(X2_balanced, y2_balanced, test_size = 0.2, random_state = 42)
X3_train, X3_test, y3_train, y3_test = train_test_split(X3_balanced, y3_balanced, test_size = 0.2, random_state = 42)

Then, we **standardize the independent variables**.

In [None]:
from sklearn.preprocessing import StandardScaler

# Standardize the independent variables.
scaler1 = StandardScaler()
X1_train = scaler1.fit_transform(X1_train)
X1_test = scaler1.transform(X1_test)

scaler2 = StandardScaler()
X2_train = scaler2.fit_transform(X2_train)
X2_test = scaler2.transform(X2_test)

scaler3 = StandardScaler()
X3_train = scaler3.fit_transform(X3_train)
X3_test = scaler3.transform(X3_test)

# Create, Train, and Evaluate Model

Finally, we can **create and train three separate models**, one for each target variable, using a suitable algorithm.
       
### This time we use **Random Forest Regressor instead of the Logistic Regression model**.

**For a Logistic Regression model, please see [PD FOG Prediction Baseline by Logistic Regression](https://www.kaggle.com/code/gokifujiya/pd-fog-prediction-baseline-by-logistic-regression).**

In [None]:
#from sklearn.linear_model import LogisticRegression
from sklearn import ensemble

# Create three separate logistic regression models.
#model1 = LogisticRegression()
#model2 = LogisticRegression()
#model3 = LogisticRegression()

# Create three separate Random Forest Regressor models.
model1 = ensemble.RandomForestRegressor(n_estimators = 100, max_depth = 7, n_jobs = -1, random_state = 42)
model2 = ensemble.RandomForestRegressor(n_estimators = 100, max_depth = 7, n_jobs = -1, random_state = 42)
model3 = ensemble.RandomForestRegressor(n_estimators = 100, max_depth = 7, n_jobs = -1, random_state = 42)

# Train the models on the training data.
model1.fit(X1_train, y1_train)
model2.fit(X2_train, y2_train)
model3.fit(X3_train, y3_train)

# Evaluate the models on the test data.
print('R2 for StartHesitation:', model1.score(X1_test, y1_test))
print('R2 for Turn:', model2.score(X2_test, y2_test))
print('R2 for Walking:', model3.score(X3_test, y3_test))

# Recreate Dataset and Training

**For submission we should not split the datasets to keep the amount of data and to get a higher score.**

In [None]:
from sklearn.preprocessing import StandardScaler

# Standardize the independent variables.
scaler1 = StandardScaler()
X1_balanced = scaler1.fit_transform(X1_balanced)

scaler2 = StandardScaler()
X2_balanced = scaler2.fit_transform(X2_balanced)

scaler3 = StandardScaler()
X3_balanced = scaler3.fit_transform(X3_balanced)

In [None]:
#from sklearn.linear_model import LogisticRegression
from sklearn import ensemble

# Create three separate logistic regression models.
#model1 = LogisticRegression()
#model2 = LogisticRegression()
#model3 = LogisticRegression()

# Create three separate Random Forest Regressor models.
model1 = ensemble.RandomForestRegressor(n_estimators = 100, max_depth = 7, n_jobs = -1, random_state = 42)
model2 = ensemble.RandomForestRegressor(n_estimators = 100, max_depth = 7, n_jobs = -1, random_state = 42)
model3 = ensemble.RandomForestRegressor(n_estimators = 100, max_depth = 7, n_jobs = -1, random_state = 42)

# Train the models on the training data.
model1.fit(X1_balanced, y1_balanced)
model2.fit(X2_balanced, y2_balanced)
model3.fit(X3_balanced, y3_balanced)

# Create Test Dataset

In [None]:
# Set the directory path to the folder containing the CSV files.
tdcsfog_test_path = '/kaggle/input/tlvmc-parkinsons-freezing-gait-prediction/test/tdcsfog'

# Initialize an empty list to store the dataframes.
tdcsfog_test_list = []

# Loop through each file in the directory and read it into a dataframe.
for file_name in os.listdir(tdcsfog_test_path):
    if file_name.endswith('.csv'):
        file_path = os.path.join(tdcsfog_test_path, file_name)
        file = pd.read_csv(file_path)
        file['Id'] = file_name[:-4] + '_' + file['Time'].apply(str)
        file.Time = file.Time 
        tdcsfog_test_list.append(file)

In [None]:
# Set the directory path to the folder containing the CSV files.
defog_test_path = '/kaggle/input/tlvmc-parkinsons-freezing-gait-prediction/test/defog'

# Initialize an empty list to store the dataframes.
defog_test_list = []

# Loop through each file in the directory and read it into a dataframe.
for file_name in os.listdir(defog_test_path):
    if file_name.endswith('.csv'):
        file_path = os.path.join(defog_test_path, file_name)
        file = pd.read_csv(file_path)
        file['Id'] = file_name[:-4] + '_' + file['Time'].apply(str)
        file.Time = file.Time
        defog_test_list.append(file)

In [None]:
def slope(x): return (x[-1] - x[0]) / x[0] if x[0] else 0
def abs_diff_mean(x): return np.mean(np.abs(x[1:] - x[:-1])) if len(x) > 1 else 0
def diff_std(x): return np.std(x[1:] - x[:-1]) if len(x) > 1 else 0



funcs = [make_robust(f) for f in [np.min,np.var, np.max, np.std, np.mean, slope, ss.skew, ss.kurtosis, abs_diff_mean, diff_std, np.sum,]]

fc_test = FeatureCollection(
    MultipleFeatureDescriptors(
          functions=funcs,
          series_names=["AccV", "AccML", "AccAP"],
          windows=windows,
          strides=strides[0],
    )
)


# Set the testing hyperparameters

In [None]:
# Input testing data
df_list_test = [defog_test_list,tdcsfog_test_list]

# Switch for train and test modes (mainly beacuse of ID's)
test = True

In [None]:
test_features = extract_features(method = method, df_list = df_list_test, window_label = window_label, fc = fc_test, test = test)

In [None]:
# test_features = reduce_memory_usage(test_features)

In [None]:
test_features

# Inference

In [None]:
# Separate the dataset for the independent variables.
# Change by hand
test_X = pd.concat([test_features.iloc[:, :(len(funcs) * len(series_names))],test_features.iloc[:, -len(series_names):]], axis = 1, ignore_index = False)

print(test_X)
# Standardize the independent variables by a new scaler.
scaler = StandardScaler()
test_X = scaler.fit_transform(test_X)

# Get the predictions for the three models on the test data.
pred_y1 = model1.predict(test_X)
pred_y2 = model2.predict(test_X)
pred_y3 = model3.predict(test_X)

test_features['StartHesitation'] = pred_y1 # target variable for StartHesitation
test_features['Turn'] = pred_y2 # target variable for Turn
test_features['Walking'] = pred_y3 # target variable for Walking

print(test_features)

# Submission

In [None]:
submission = test_features.loc[:,['Id','StartHesitation','Turn','Walking']].fillna(0.0)
submission

In [None]:
submission.to_csv("submission.csv", index = False)

# Save, Load, and Use Model

To save the trained Logistic Regression model, you can use the joblib library from the sklearn.externals module. This will save the model to a file in the current working directory. **To load the saved model later**, we can use the joblib.load() function.

In [None]:
import joblib

# Save the model to disk.
joblib.dump(model1, 'model1.joblib')
joblib.dump(model2, 'model2.joblib')
joblib.dump(model3, 'model3.joblib')

# Load the saved models from disk.
model1_loaded = joblib.load('model1.joblib')
model2_loaded = joblib.load('model2.joblib')
model3_loaded = joblib.load('model3.joblib')

# Use the loaded models to make predictions on test data.
y1_pred_loaded = model1_loaded.predict(test_X)
y2_pred_loaded = model2_loaded.predict(test_X)
y3_pred_loaded = model3_loaded.predict(test_X)

# Conclusion

It is possible that **more features or more advanced machine learning algorithms** could improve the accuracy of the models. Additionally, it may be useful to **investigate other factors** that contribute to the occurrence of freezing of gait events, such as cognitive or environmental factors.

I am a medical doctor working on **artificial intelligence (AI) for medicine**. At present AI is also widely used in the medical field. Particularly, AI performs in the healthcare sector following tasks: **image classification, object detection, semantic segmentation, GANs, text classification, etc**. **If you are interested in AI for medicine, please see my other notebooks.**