<a href="https://colab.research.google.com/github/plnu-biomechanics/kin6015/blob/main/notebooks/kin6015_lab1.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

<img src="https://www.pointloma.edu/sites/default/files/styles/basic_page/public/images/PLNU_Biomechanics_Lab_green_yellowSD_HiRes.png" width=400>

## **KIN 6015 Biomechanical Basis of Human Movement**
Instructor: Arnel Aguinaldo, PhD

**Lab 1 Data Processing**

In this lab, gait analysis data was collected with the marker-based and markerless motion capture systems and spatiotemporal metrics and inverse kinematics (IK) were estimated using Visual3D. The data were then exported as text (*.txt) files and uploaded to the class repository in the lab's [GitHub](https://github.com/plnu-biomechanics).

To further process the data for this lab, follow the steps in this **Colab notebook**, which contains instructions and sample code on how to wrangle and analyze the data.


### Create your own Colab Notebook

1. Go to **File -> New notebook in Drive** to open a new notebook in your Python environment:<br>
<img src="https://raw.githubusercontent.com/plnu-biomechanics/kin6015/main/notebooks/images/file_notebook.png" width=450>

2. Rename your Colab notebook using this naming format: **lastname_group_lab#.ipynb** (e.g., "aguinaldo_targaryen_lab1.ipynb")
3. Click on the **+ Code** option above to insert a new code cell: <br>
<img src="https://raw.githubusercontent.com/plnu-biomechanics/kin6015/main/notebooks/images/addcode.png" width=280>

4. The data you will parse and analyze for this lab will be copied from the lab's GitHub and temporarily stored in your Colab working directory, which can be accessed by clicking on the folder icon in the left menu:<br>
<img src="https://raw.githubusercontent.com/plnu-biomechanics/kin6015/main/notebooks/images/colab_folder.png" width=400>

5. Copy the following lines of code to import the packages needed for this analysis and to load the data files into your working directory. Be sure to update the `GROUP` variable with your group's name. **Note**: These files are "runtime" access only, meaning they are only temporarily stored in your working directory and show up when your notebook is in session. However, the following code cell allows you to clone the zipped files to the working directory each time it is executed.


In [1]:
import urllib.request
import zipfile
import os

# --------------------------------------------------
# STUDENT INPUT (edit only this line; case-sensitive)
# --------------------------------------------------
GROUP = "targaryen"   # e.g., "targaryen", "stark", "lannister", "martel"

# --------------------------------------------------
# Configuration (do NOT edit below)
# These lines create a directory for this lab in your
# Colab working directory.
# --------------------------------------------------
zip_dir = "kin6015/lab1"
os.makedirs(zip_dir, exist_ok=True)

zip_filename = f"spring2026_lab1_{GROUP}.zip"

url = (
    "https://raw.githubusercontent.com/"
    "plnu-biomechanics/kin6015/main/"
    f"labs/{zip_filename}"
)

zip_path = os.path.join(zip_dir, zip_filename)

# --------------------------------------------------
# Download zip file
# --------------------------------------------------
urllib.request.urlretrieve(url, zip_path)

# --------------------------------------------------
# Extract contents from the zipped file
# --------------------------------------------------
with zipfile.ZipFile(zip_path, "r") as zip_ref:
    zip_ref.extractall(zip_dir)

print("Extracted files in lab directory:")
print(os.listdir(zip_dir))


Extracted files in lab directory:
['Targaryen_MB_05.txt', 'Targaryen_ML_03.txt', 'Targaryen_ML_01.txt', 'Targaryen_ML_05.txt', 'Targaryen_MB_03.txt', 'spring2026_lab1_targaryen.zip', 'Targaryen_MB_01.txt', 'Targaryen_ML_04.txt', 'Targaryen_MB_04.txt', 'Targaryen_MB_02.txt', 'Targaryen_ML_02.txt']


# Gemini Task
#### Prompt:
Create a single pandas DataFrame named `combined_df` by iterating through all `.txt` files in the `kin6015/lab1` directory, applying the `parse_txt_file` function to each, and then concatenating the results. Finally, display the head of `combined_df` to verify its structure and content.

## Parse All Files and Combine Data

### Subtask:
Iterate through all '.txt' files located in the 'kin6015/lab1' directory. Apply the updated `parse_txt_file` function to each file. Collect all the resulting DataFrames into a list, and then concatenate them into a single master DataFrame named `combined_df`. This step will re-attempt to process all files with the corrected parsing logic.


In [42]:
import pandas as pd
import os

def parse_txt_file(filepath):
  """Reads a text file, extracts variable names and time-series data, identifies the condition,
  and returns a pandas DataFrame.

  Args:
    filepath (str): The full path to the text file.

  Returns:
    pd.DataFrame: A DataFrame containing the extracted data, condition, and filename.
  """
  with open(filepath, 'r') as file:
    lines = file.readlines()

  # Check if file has enough lines for the expected structure
  if len(lines) < 6: # Need at least 6 lines (5 header, 1 data)
      raise ValueError(f"File {os.path.basename(filepath)} has too few lines to parse correctly.")

  # Extract variable names from the 2nd line (index 1), using tab as delimiter
  variable_names = lines[1].strip().split('\t')

  # Read time-series data starting from the 6th line (index 5), using tab as delimiter
  data_lines = [line.strip().split('\t') for line in lines[5:] if line.strip()]

  # Ensure data_lines is not empty before checking lengths
  if not data_lines:
      raise ValueError(f"No data found in file {os.path.basename(filepath)} after header.")

  # Dynamically add 'Frame' if data rows have one more column than variable names
  if len(data_lines[0]) == len(variable_names) + 1:
      variable_names.insert(0, 'Frame')

  # Ensure variable names and data lines match in length AFTER potential adjustment
  if len(variable_names) != len(data_lines[0]):
      raise ValueError(f"Column name count ({len(variable_names)}) does not match data column count ({len(data_lines[0])}) in file {os.path.basename(filepath)}.")

  # Create DataFrame
  df = pd.DataFrame(data_lines, columns=variable_names)

  # Convert numeric columns to appropriate data types
  for col in df.columns:
    try:
      df[col] = pd.to_numeric(df[col])
    except ValueError:
      pass # Keep as string if not numeric

  # Determine 'condition' from filename
  filename = os.path.basename(filepath)
  if 'MB' in filename.upper(): # Changed to upper() for accurate matching of 'MB'
    condition = 'marker-based'
  elif 'ML' in filename.upper(): # Changed to upper() for accurate matching of 'ML'
    condition = 'markerless'
  else:
    condition = 'unknown'

  # Add 'condition' column
  df['condition'] = condition

  # Add 'filename' column
  df['filename'] = filename

  return df

print("Updated function 'parse_txt_file' to handle tab-delimited data, correct line indexing, and dynamic 'Frame' column and case-sensitive condition check.")

Updated function 'parse_txt_file' to handle tab-delimited data, correct line indexing, and dynamic 'Frame' column and case-sensitive condition check.


**Reasoning**:
The `parse_txt_file` function has been updated to handle the 'Frame' column and tab-delimited data. Now, I need to re-run the code to iterate through all `.txt` files in the `kin6015/lab1` directory, apply the updated `parse_txt_file` function to each, collect the resulting DataFrames, and then concatenate them into a single `combined_df`. Finally, the following code displays the head of `combined_df` to verify the structure and content, as per the subtask instructions.



In [43]:
import re

all_dataframes = []

# Filter for .txt files and exclude the zip file
text_files = [f for f in file_list if f.endswith('.txt') and not f.endswith('.zip')]

for filename in text_files:
    filepath = os.path.join(zip_dir, filename)
    try:
        df = parse_txt_file(filepath)
        all_dataframes.append(df)
    except Exception as e:
        print(f"Error processing file {filename}: {e}")

# Concatenate all DataFrames into a single DataFrame
if all_dataframes:
    combined_df = pd.concat(all_dataframes, ignore_index=True)
    # Let's clean up the variable names by removing
    # '_MEAN' or '_Mean' from column names
    combined_df.columns = [re.sub(r'_MEAN', '', col) for col in combined_df.columns]
    combined_df.columns = [re.sub(r'_Mean', '', col) for col in combined_df.columns]
    print("Combined DataFrame created successfully. Displaying head:")
    print(combined_df.head())
else:
    combined_df = pd.DataFrame()
    print("No dataframes were parsed successfully to combine.")



Combined DataFrame created successfully. Displaying head:
   Frame  Right Knee Angles  Right Ankle Dorsiflexion    Speed  Stride_Length  \
0      1            3.54527                  -4.28667  1.39355        0.99407   
1      2            5.48596                  -4.88219      NaN            NaN   
2      3            6.19649                  -6.12455      NaN            NaN   
3      4            6.24373                  -7.95542      NaN            NaN   
4      5            6.29172                  -9.57410      NaN            NaN   

   Right_Stance_Time  Ankle Plantarflexion at IC  Peak Right Knee Flexion  \
0            0.69667                    -5.96584                 67.92663   
1                NaN                         NaN                      NaN   
2                NaN                         NaN                      NaN   
3                NaN                         NaN                      NaN   
4                NaN                         NaN                      

### Complete the Analysis

After you've completed this lab's analysis, do the following:

1. Copy and paste the above plots and tables to your lab report
2. Share your Colab notebook with me
3. Copy the link to your notebook and submit it along with your report for this lab on Canvas

## Install SPM1D Library

### Subtask:
Install the `spm1d` library, which is required for Statistical Parametric Mapping analysis. This is not a standard Colab library, so it needs to be installed first.


**Reasoning**:
The subtask requires installing the `spm1d` library, which can be done using `!pip install` in a code cell.



In [27]:
!pip install spm1d
print("spm1d library installed.")

Collecting spm1d
  Downloading spm1d-0.4.53-py3-none-any.whl.metadata (878 bytes)
Downloading spm1d-0.4.53-py3-none-any.whl (8.5 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m8.5/8.5 MB[0m [31m57.0 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: spm1d
Successfully installed spm1d-0.4.53
spm1d library installed.


In [28]:
import re
import numpy as np

# 1. Extract a unique subject/trial identifier from the 'filename' column
def extract_subject_id(filename):
    match = re.match(r'^(.*?)_(MB|ML)_(\d+)\.txt$', filename)
    if match:
        return f"{match.group(1)}_{match.group(3)}"
    return None

combined_df['subject_id'] = combined_df['filename'].apply(extract_subject_id)

print("Added 'subject_id' column to combined_df.")
print(combined_df[['filename', 'subject_id']].head())

Added 'subject_id' column to combined_df.
              filename    subject_id
0  Targaryen_MB_05.txt  Targaryen_05
1  Targaryen_MB_05.txt  Targaryen_05
2  Targaryen_MB_05.txt  Targaryen_05
3  Targaryen_MB_05.txt  Targaryen_05
4  Targaryen_MB_05.txt  Targaryen_05
