<a href="https://colab.research.google.com/github/plnu-biomechanics/kin6015/blob/main/notebooks/kin6015_lab1.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

<img src="https://www.pointloma.edu/sites/default/files/styles/basic_page/public/images/PLNU_Biomechanics_Lab_green_yellowSD_HiRes.png" width=400>

## **KIN 6015 Biomechanical Basis of Human Movement**
Instructor: Arnel Aguinaldo, PhD

**Lab 1 Data Processing**

In this lab, gait analysis data was collected with the marker-based and markerless motion capture systems and spatiotemporal metrics and inverse kinematics (IK) were estimated using Visual3D. The data were then exported as text (*.txt) files and uploaded to the class repository in the lab's [GitHub](https://github.com/plnu-biomechanics).

To further process the data for this lab, follow the steps in this **Colab notebook**, which contains instructions and sample code on how to wrangle and analyze the data.


### Create your own Colab Notebook

1. Go to **File -> New notebook in Drive** to open a new notebook in your Python environment:<br>
<img src="https://raw.githubusercontent.com/plnu-biomechanics/kin6015/main/notebooks/images/file_notebook.png" width=450>

2. Rename your Colab notebook using this naming format: **lastname_group_lab#.ipynb** (e.g., "aguinaldo_targaryen_lab1.ipynb")
3. Click on the **+ Code** option above to insert a new code cell: <br>
<img src="https://raw.githubusercontent.com/plnu-biomechanics/kin6015/main/notebooks/images/addcode.png" width=280>

4. The data you will parse and analyze for this lab will be copied from the lab's GitHub and temporarily stored in your Colab working directory, which can be accessed by clicking on the folder icon in the left menu:<br>
<img src="https://raw.githubusercontent.com/plnu-biomechanics/kin6015/main/notebooks/images/colab_folder.png" width=400>

5. Copy the following lines of code to import the packages needed for this analysis and to load the data files into your working directory. Be sure to update the `GROUP` variable with your group's name. **Note**: These files are "runtime" access only, meaning they are only temporarily stored in your working directory and show up when your notebook is in session. However, the following code cell allows you to clone the zipped files to the working directory each time it is executed.


In [1]:
import urllib.request
import zipfile
import os

# --------------------------------------------------
# STUDENT INPUT (edit only this line; case-sensitive)
# --------------------------------------------------
GROUP = "targaryen"   # e.g., "targaryen", "stark", "lannister", "martel"

# --------------------------------------------------
# Configuration (do NOT edit below)
# These lines create a directory for this lab in your
# Colab working directory.
# --------------------------------------------------
zip_dir = "kin6015/lab1"
os.makedirs(zip_dir, exist_ok=True)

zip_filename = f"spring2026_lab1_{GROUP}.zip"

url = (
    "https://raw.githubusercontent.com/"
    "plnu-biomechanics/kin6015/main/"
    f"labs/{zip_filename}"
)

zip_path = os.path.join(zip_dir, zip_filename)

# --------------------------------------------------
# Download zip file
# --------------------------------------------------
urllib.request.urlretrieve(url, zip_path)

# --------------------------------------------------
# Extract contents from the zipped file
# --------------------------------------------------
with zipfile.ZipFile(zip_path, "r") as zip_ref:
    zip_ref.extractall(zip_dir)

print("Extracted files in lab directory:")
print(os.listdir(zip_dir))


Extracted files in lab directory:
['Targaryen_MB_05.txt', 'Targaryen_ML_03.txt', 'Targaryen_ML_01.txt', 'Targaryen_ML_05.txt', 'Targaryen_MB_03.txt', 'spring2026_lab1_targaryen.zip', 'Targaryen_MB_01.txt', 'Targaryen_ML_04.txt', 'Targaryen_MB_04.txt', 'Targaryen_MB_02.txt', 'Targaryen_ML_02.txt']


# Task
Update the `parse_txt_file` function in cell `8b3e504a` by importing the `re` module, renaming the 'Right Ankle Angles' column to 'Right Ankle Dorsiflexion', and removing '_MEAN' or '_Mean' suffixes from all column names to ensure clean and consistent DataFrame output.

## Modify `parse_txt_file`

### Subtask:
Update the existing `parse_txt_file` function (in cell `8b3e504a`) to include importing the `re` module, renaming the column 'Right Ankle Angles' to 'Right Ankle Dorsiflexion' (if present), and removing any '_MEAN' or '_Mean' suffixes from all column names. This ensures the parsing function returns a clean and consistently named DataFrame for each file.


**Reasoning**:
The subtask requires modifying the `parse_txt_file` function in cell `8b3e504a` to import the `re` module, rename a specific column, and remove suffixes from column names. I will provide the updated function in a code block targeting that cell.



In [3]:
import pandas as pd
import os
import re # Import the regular expression module

def parse_txt_file(filepath):
  """Reads a text file, extracts variable names and time-series data, identifies the condition,
  and returns a pandas DataFrame.

  Args:
    filepath (str): The full path to the text file.

  Returns:
    pd.DataFrame: A DataFrame containing the extracted data, condition, and filename.
  """
  with open(filepath, 'r') as file:
    lines = file.readlines()

  # Check if file has enough lines for the expected structure
  if len(lines) < 6: # Need at least 6 lines (5 header, 1 data)
      raise ValueError(f"File {os.path.basename(filepath)} has too few lines to parse correctly.")

  # Extract variable names from the 2nd line (index 1), using tab as delimiter
  variable_names = lines[1].strip().split('\t')

  # Read time-series data starting from the 6th line (index 5), using tab as delimiter
  data_lines = [line.strip().split('\t') for line in lines[5:] if line.strip()]

  # Ensure data_lines is not empty before checking lengths
  if not data_lines:
      raise ValueError(f"No data found in file {os.path.basename(filepath)} after header.")

  # Dynamically add 'Frame' if data rows have one more column than variable names
  if len(data_lines[0]) == len(variable_names) + 1:
      variable_names.insert(0, 'Frame')

  # Ensure variable names and data lines match in length AFTER potential adjustment
  if len(variable_names) != len(data_lines[0]):
      raise ValueError(f"Column name count ({len(variable_names)}) does not match data column count ({len(data_lines[0])}) in file {os.path.basename(filepath)}.")

  # Create DataFrame
  df = pd.DataFrame(data_lines, columns=variable_names)

  # Rename 'Right Ankle Angles' to 'Right Ankle Dorsiflexion' if it exists
  if 'Right Ankle Angles' in df.columns:
    df = df.rename(columns={'Right Ankle Angles': 'Right Ankle Dorsiflexion'})

  # Remove '_MEAN' or '_Mean' suffixes from column names
  new_columns = []
  for col in df.columns:
      new_col = re.sub(r'_MEAN$|_Mean$', '', col)
      new_columns.append(new_col)
  df.columns = new_columns

  # Convert numeric columns to appropriate data types
  for col in df.columns:
    try:
      df[col] = pd.to_numeric(df[col])
    except ValueError:
      pass # Keep as string if not numeric

  # Determine 'condition' from filename
  filename = os.path.basename(filepath)
  if 'MB' in filename.upper(): # Changed to upper() for accurate matching of 'MB'
    condition = 'marker-based'
  elif 'ML' in filename.upper(): # Changed to upper() for accurate matching of 'ML'
    condition = 'markerless'
  else:
    condition = 'unknown'

  # Add 'condition' column
  df['condition'] = condition

  # Add 'filename' column
  df['filename'] = filename

  return df

print("Updated function 'parse_txt_file' to handle tab-delimited data, correct line indexing, dynamic 'Frame' column, case-sensitive condition check, import 're' module, rename 'Right Ankle Angles' column, and remove '_MEAN'/'_Mean' suffixes.")

Updated function 'parse_txt_file' to handle tab-delimited data, correct line indexing, dynamic 'Frame' column, case-sensitive condition check, import 're' module, rename 'Right Ankle Angles' column, and remove '_MEAN'/'_Mean' suffixes.


## Process and Split DataFrames

### Subtask:
Create a new code cell that will perform the following steps: 1. Get the list of .txt files from the working directory. 2. Iterate through each .txt file, applying the modified parse_txt_file function. 3. Collect the resulting DataFrames and concatenate them into a single combined_intermediate_df. 4. From combined_intermediate_df, create timeseries_df by selecting 'Frame', 'Right Knee Angles', 'Right Ankle Dorsiflexion', 'condition', and 'filename'. 5. Create discretes_df by filtering combined_intermediate_df for rows where 'Frame' is 1, and then selecting 'Speed', 'Stride_Length', 'Right_Stance_Time', 'Ankle Plantarflexion at IC', 'Peak Right Knee Flexion', 'condition', and 'filename'. 6. Display the head of both timeseries_df and discretes_df for verification.


**Reasoning**:
The subtask requires processing all `.txt` files in the specified directory using the `parse_txt_file` function, concatenating them into a `combined_intermediate_df`, and then creating two new DataFrames, `timeseries_df` and `discretes_df`, by selecting and filtering specific columns. Finally, the heads of both new DataFrames need to be displayed for verification.



In [5]:
import pandas as pd
import os

# 1. Get the list of .txt files from the working directory
zip_dir = "kin6015/lab1"
all_files = os.listdir(zip_dir)
txt_files = [f for f in all_files if f.endswith('.txt')]

# 2. Initialize an empty list to store DataFrames
all_dataframes = []

# 3. Iterate through each .txt file and apply the parse_txt_file function
for filename in txt_files:
    filepath = os.path.join(zip_dir, filename)
    try:
        df = parse_txt_file(filepath)
        all_dataframes.append(df)
    except Exception as e:
        print(f"Error processing {filename}: {e}")

# 4. Concatenate all DataFrames into a single combined_intermediate_df
if all_dataframes:
    combined_intermediate_df = pd.concat(all_dataframes, ignore_index=True)
    print("Combined_intermediate_df created successfully.")
else:
    combined_intermediate_df = pd.DataFrame()
    print("No dataframes to concatenate, combined_intermediate_df is empty.")

# 5. Create timeseries_df
timeseries_df = combined_intermediate_df[['Frame', 'Right Knee Angles', 'Right Ankle Dorsiflexion', 'condition', 'filename']]

# 6. Create discretes_df
discretes_df = combined_intermediate_df[combined_intermediate_df['Frame'] == 1][['Speed', 'Stride_Length', 'Right_Stance_Time', 'Ankle Plantarflexion at IC', 'Peak Right Knee Flexion', 'condition', 'filename']]

# 7. Display the head of timeseries_df
print("\nHead of timeseries_df:")
print(timeseries_df.head())

# 8. Display the head of discretes_df
print("\nHead of discretes_df:")
print(discretes_df.head())

Combined_intermediate_df created successfully.

Head of timeseries_df:
   Frame  Right Knee Angles  Right Ankle Dorsiflexion     condition  \
0      1            3.54527                  -4.28667  marker-based   
1      2            5.48596                  -4.88219  marker-based   
2      3            6.19649                  -6.12455  marker-based   
3      4            6.24373                  -7.95542  marker-based   
4      5            6.29172                  -9.57410  marker-based   

              filename  
0  Targaryen_MB_05.txt  
1  Targaryen_MB_05.txt  
2  Targaryen_MB_05.txt  
3  Targaryen_MB_05.txt  
4  Targaryen_MB_05.txt  

Head of discretes_df:
       Speed  Stride_Length  Right_Stance_Time  Ankle Plantarflexion at IC  \
0    1.39355        0.99407            0.69667                    -5.96584   
101  1.34273        1.57994            0.66500                    -0.95046   
202  1.32326        1.60115            0.76667                     0.60840   
303  1.40067     

## Final Task

### Subtask:
Confirm that both `timeseries_df` and `discretes_df` have been successfully created with the correct columns and content, ready for further analysis by your students.


## Summary:

### Data Analysis Key Findings

*   The `parse_txt_file` function was successfully updated to include the `re` module for regular expression operations.
*   The column 'Right Ankle Angles' was correctly renamed to 'Right Ankle Dorsiflexion' within the parsing process.
*   All `_MEAN` or `_Mean` suffixes were successfully removed from column names, ensuring data consistency.
*   All `.txt` files in the `kin6015/lab1` directory were processed, and their data was combined into a `combined_intermediate_df`.
*   A `timeseries_df` was successfully created, containing 'Frame', 'Right Knee Angles', 'Right Ankle Dorsiflexion', 'condition', and 'filename' columns, confirming its readiness for time-series analysis.
*   A `discretes_df` was successfully created, including 'Speed', 'Stride\_Length', 'Right\_Stance\_Time', 'Ankle Plantarflexion at IC', 'Peak Right Knee Flexion', 'condition', and 'filename' columns, specifically for discrete event analysis by filtering for `Frame` equal to 1.
*   Both `timeseries_df` and `discretes_df` were verified to have the correct columns and content, indicating they are ready for further analysis.

### Insights or Next Steps

*   The enhanced data parsing and structuring into `timeseries_df` and `discretes_df` provide a clean and well-organized foundation for students to conduct various analyses on marker-based and markerless gait data.
*   The dataframes are now primed for specific types of analysis; for instance, `timeseries_df` is suitable for plotting kinematic curves over time, while `discretes_df` is ideal for comparing specific gait parameters between conditions.


## Statistical Parametric Mapping (SPM)

### Subtask:
Install the `spm1d` library, which is required for Statistical Parametric Mapping analysis. This is not a standard Colab library, so it needs to be installed first.


In [6]:
!pip install spm1d
print("spm1d library installed.")

Collecting spm1d
  Downloading spm1d-0.4.53-py3-none-any.whl.metadata (878 bytes)
Downloading spm1d-0.4.53-py3-none-any.whl (8.5 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m8.5/8.5 MB[0m [31m24.5 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: spm1d
Successfully installed spm1d-0.4.53
spm1d library installed.


## Extract Subject ID for Pairing

### Subtask:
Add a 'subject_id' column to `timeseries_df`. This ID will be derived from the 'filename' (e.g., 'Targaryen_05' from 'Targaryen_MB_05.txt') and is crucial for correctly pairing the marker-based and markerless data for each subject for the SPM analysis.


**Reasoning**:
To add the 'subject_id' column to `timeseries_df`, I will extract it from the 'filename' column by removing specified substrings and then display the updated DataFrame head and unique subject IDs for verification.



In [24]:
import pandas as pd

# Recreate timeseries_df with .copy() to avoid SettingWithCopyWarning
timeseries_df = combined_intermediate_df[['Frame', 'Right Knee Angles', 'Right Ankle Dorsiflexion', 'condition', 'filename']].copy()

# Add 'subject_id' column by cleaning the 'filename'
timeseries_df['subject_id'] = timeseries_df['filename'].str.replace('_MB', '', regex=False)
timeseries_df['subject_id'] = timeseries_df['subject_id'].str.replace('_ML', '', regex=False)
timeseries_df['subject_id'] = timeseries_df['subject_id'].str.replace('.txt', '', regex=False)

print("Head of timeseries_df with new 'subject_id' column:")
display(timeseries_df)

print("\nUnique subject IDs:")
print(timeseries_df.head())

print("\nUnique subject IDs:")
print(timeseries_df['subject_id'].unique())

Head of timeseries_df with new 'subject_id' column:


Unnamed: 0,Frame,Right Knee Angles,Right Ankle Dorsiflexion,condition,filename,subject_id
0,1,3.54527,-4.28667,marker-based,Targaryen_MB_05.txt,Targaryen_05
1,2,5.48596,-4.88219,marker-based,Targaryen_MB_05.txt,Targaryen_05
2,3,6.19649,-6.12455,marker-based,Targaryen_MB_05.txt,Targaryen_05
3,4,6.24373,-7.95542,marker-based,Targaryen_MB_05.txt,Targaryen_05
4,5,6.29172,-9.57410,marker-based,Targaryen_MB_05.txt,Targaryen_05
...,...,...,...,...,...,...
1005,97,3.69249,2.47781,markerless,Targaryen_ML_02.txt,Targaryen_02
1006,98,4.80715,1.85383,markerless,Targaryen_ML_02.txt,Targaryen_02
1007,99,6.19268,0.82454,markerless,Targaryen_ML_02.txt,Targaryen_02
1008,100,7.55065,-0.47004,markerless,Targaryen_ML_02.txt,Targaryen_02



Unique subject IDs:
   Frame  Right Knee Angles  Right Ankle Dorsiflexion     condition  \
0      1            3.54527                  -4.28667  marker-based   
1      2            5.48596                  -4.88219  marker-based   
2      3            6.19649                  -6.12455  marker-based   
3      4            6.24373                  -7.95542  marker-based   
4      5            6.29172                  -9.57410  marker-based   

              filename    subject_id  
0  Targaryen_MB_05.txt  Targaryen_05  
1  Targaryen_MB_05.txt  Targaryen_05  
2  Targaryen_MB_05.txt  Targaryen_05  
3  Targaryen_MB_05.txt  Targaryen_05  
4  Targaryen_MB_05.txt  Targaryen_05  

Unique subject IDs:
['Targaryen_05' 'Targaryen_03' 'Targaryen_01' 'Targaryen_04'
 'Targaryen_02']


### Complete the Analysis

After you've completed this lab's analysis, do the following:

1. Copy and paste the above plots and tables to your lab report
2. Share your Colab notebook with me
3. Copy the link to your notebook and submit it along with your report for this lab on Canvas