## Loading Motion Sensor Data into Pandas DataFrame
The following Python code imports necessary libraries and combines motion sensor data from multiple CSV files, storing it in a structured Pandas DataFrame. Subject-specific information, such as age, gender, height, and weight, is added to the dataset for comprehensive analysis.

* Imports the necessary libraries, including 'os' for file operations, 'numpy' as 'np' for numerical operations, and 'pandas' as 'pd' for data handling.
* Specifies the path to the subject data file and the directory containing motion sensor data.
* Defines two functions:
    'get_all_dataset_paths' that recursively walks through the specified directory and collects paths to all CSV files.
    'load_whole_dataframe_from_paths' that reads and combines motion sensor data from these paths into a single Pandas DataFrame. It also enriches the data with subject information from the subject data file.
* Loads the subject data from the CSV file 'data_subjects_info.csv' into a Pandas DataFrame.
* Calls 'get_all_dataset_paths' to obtain a list of paths to all CSV files in the specified directory.
* Calls 'load_whole_dataframe_from_paths' to create a comprehensive DataFrame containing motion sensor data, with additional subject information.

This code is a critical step in preparing motion sensor data for analysis and is commonly used in data science and machine learning projects involving motion data.

In [None]:
import os
import numpy as np
import pandas as pd

# change these following three lines only
subject_data_file = 'data_subjects_info.csv'
data_dir = 'E:/motion-sense-master/data/A_DeviceMotion_data'

os.chdir(data_dir)
os.chdir(os.pardir)

def get_all_dataset_paths(input_dir) -> []:
    input_files = []
    for dirs, subdirs, files in os.walk(input_dir):
        for file in files:
            if file.endswith('.csv'):
                input_files.append(os.path.join(dirs, file))
    return input_files

def load_whole_dataframe_from_paths(paths, meta) -> pd.DataFrame:
    
    df = pd.DataFrame()

    for p in paths:
        p = p.replace("\\",'/')
        c_dir, c_file = p.split('/')[-2], p.split('/')[-1]
        
        c_cat, c_ses = c_dir.split('_')[-2], c_dir.split('_')[-1]
        c_sub = c_file.split('_')[-1].split('.')[-2]
        
        tdf = pd.read_csv(p, encoding = "utf-8")
        tdf = tdf.assign(subject_id = int(c_sub))
        tdf = tdf.assign(session_id = int(c_ses))
        tdf = tdf.assign(category = str(c_cat))
        tdf = tdf.assign(age = int(meta.age[int(c_sub) - 1]))
        tdf = tdf.assign(gender = int(meta.gender[int(c_sub) - 1]))
        tdf = tdf.assign(height = int(meta.height[int(c_sub) - 1]))
        tdf = tdf.assign(weight = int(meta.weight[int(c_sub) - 1]))

        df = pd.concat([df, tdf])
        print(p,c_cat,c_sub)
    df.reset_index(drop=True, inplace=True)
    return df

subject_data_frame = pd.DataFrame(pd.read_csv(subject_data_file, encoding = "utf-8"))
all_dataset_paths = get_all_dataset_paths(data_dir)
data_frame = load_whole_dataframe_from_paths(all_dataset_paths, subject_data_frame)

## Full DataFrame at a glance
The whole raw DataFrame looks like the following

In [None]:
data_frame

## Data Preprocessing: Removing Unnecessary Columns
In this Python code, a copy of the original DataFrame 'data_frame' is created. Subsequently, several columns ('Unnamed: 0', 'subject_id', 'session_id', 'age', 'gender', 'height', and 'weight') are removed from the copied DataFrame 'df' to streamline the dataset for further analysis.

In [None]:
df = data_frame.copy() #making a copy of original dataframe
df.drop('Unnamed: 0', axis=1, inplace=True)
df.drop('subject_id', axis=1, inplace=True)
df.drop('session_id', axis=1, inplace=True)
df.drop('age', axis=1, inplace=True)
df.drop('gender', axis=1, inplace=True)
df.drop('height', axis=1, inplace=True)
df.drop('weight', axis=1, inplace=True)
df

## Encoding Categorical Data for Machine Learning
Following Python code snippet utilizes the 'LabelEncoder' from the scikit-learn library to transform the 'category' column in the DataFrame 'df' into numerical codes. These codes are stored in a new 'code' column, and the original 'category' column is subsequently removed from the DataFrame, preparing the data for machine learning tasks.

In [None]:
from sklearn.preprocessing import LabelEncoder

lEncoder = LabelEncoder()
labels = lEncoder.fit(df.category)
df['code'] = lEncoder.transform(df.category)
df.drop('category', axis=1, inplace=True)
df

## Visualizing Categorical Data Distribution
We use Seaborn and Matplotlib to create a countplot, visualizing the distribution of numerical codes in the 'code' column of the DataFrame 'df.' This plot provides insight into the frequency of different categories in the dataset.

In [None]:
import seaborn as sns
import matplotlib.pylab as plt

sns.countplot(df, x='code')
plt.show()

## Splitting Data for Machine Learning
The following code uses the 'train_test_split' function from scikit-learn to divide the dataset into training and testing sets. It separates the input features ('x_columns') and the target variable ('y_columns') with a 20% test set size, ensuring that the lengths of the training sets for both features and labels are the same, as asserted.

In [None]:
from sklearn.model_selection import train_test_split

x_columns = df.iloc[:, 0:12]
y_columns = df.iloc[:, 12:13]

trainx, testx, trainy, testy = train_test_split(x_columns, y_columns, test_size=0.2, shuffle=False)
assert(len(trainx) == len(trainy))

## Sequencing Data for Temporal Analysis
We define a sequence generator function that creates sequences of input features and corresponding target labels from the training and testing data. These sequences have a window length of 150 with a stride of 10. The mode of target labels within each sequence is calculated to represent the label for that sequence. This prepares the data for temporal analysis tasks.

In [None]:
from scipy.stats import mode

WINDOW_LENGTH = 150
STRIDE_LENGTH = 10
NUM_CLASSES = 6
NUM_FEATURES = 12
BATCH_SIZE = 100
EPOCHS_SIZE = 10

def sequence_generator(x, y, length, stride):
    seq_x = []
    seq_y = []
    data_length = len(x)

    for i in range(0, data_length - length + 1, stride):
        input_sequence = x.iloc[i : i + length]
        target_sequence = y.iloc[i : i + length]
        target_mode = mode(target_sequence.values)[0][0]
        seq_x.append(input_sequence)
        seq_y.append(target_mode)
    return np.array(seq_x), np.array(seq_y)

tx, ty = sequence_generator(trainx, trainy, WINDOW_LENGTH, STRIDE_LENGTH)
vx, vy = sequence_generator(testx, testy, WINDOW_LENGTH, STRIDE_LENGTH)

## TSAI

In [None]:
import numpy as np
import pandas as pd
from tsai.basics import *

X, y, splits = get_classification_data('LSST', split_data=False)
print(y.shape,X.shape,y[0])
X = np.concatenate([tx,vx])
y = np.concatenate([ty,vy])

splits = [ [i for i in range(ty.shape[0]) ],[ i for i in range(ty.shape[0],y.shape[0])] ]

tfms = [None, TSClassification()]
batch_tfms = TSStandardize(by_sample=True)
mv_clf = TSClassifier(X, y, splits=splits, path='models', arch="InceptionTimePlus", tfms=tfms, batch_tfms=batch_tfms, metrics=accuracy, cbs=ShowGraph())
mv_clf.fit_one_cycle(10, 1e-2)
mv_clf.export("mv_clf.pkl")


## TSAI other

In [None]:
from tsai.models.MINIROCKET import *
model = MiniRocketClassifier()
timer.start(False)
model.fit(xt, yt)
t = timer.stop()
print(f'valid accuracy    : {model.score(xv, yv):.3%} time: {t}')