# Dataset Structure 
 - 43 participants, 3 sessions (days), 17 gestures, 7 trials per gesture
 - Each .mat file contains EMG recordings for one participant/session
 - Since GRABMyo is a very well reputed data i will assume no/neglegible clipping/SATURATION since i cant find the saturation point of the measuring device.

 ### Gesture List

| Gesture | Description                        | Gesture | Description          |
|---------|------------------------------------|---------|----------------------|
| LP      | Lateral prehension                 | IFE     | Index finger extension |
| TA      | Thumb adduction                    | TE      | Thumb extension      |
| TLFO    | Thumb and little finger opposition | WF      | Wrist flexion        |
| TIFO    | Thumb and index finger opposition  | WE      | Wrist extension      |
| TLFE    | Thumb and little finger extension  | FP      | Forearm pronation    |
| TIFE    | Thumb and index finger extension   | FS      | Forearm supination   |
| IMFE    | Index and middle finger extension  | HO      | Hand open            |
| LFE     | Little finger extension            | HC      | Hand close           |
- The order of the 16 gestures was randomized and a resting (REST) trial was collected after all 16 gestures were performed once. 
- A ten-seconds relaxing period was provided between each trial. 


 ### Hardware Specifications 
 - sEMG signals were bandpass filtered between 10 Hz and 500 Hz using a fourth-order Butterworth filte
 - notch filter of 60 Hz was employed to remove the powerline noise
 - Gain: 50Hz
 - Number of Channels: 32 (28 Used(16 forearm + 12 wrist electrodes))

In [7]:
from natsort import natsorted
import numpy as np
import scipy.io as sio
from pathlib import Path


# List available .mat files
data_path = Path("/Users/jasonwang/Desktop/Wearable/EMG_Project/grabmyo_1.1.0/OutputBM/Session1_converted")
mat_files = natsorted(list(data_path.glob("*.mat"))) # use natsorted to sort files as 1, 2, 3, ... instead of 1, 10, 2, 20, 3,...

print(f"\nFound {len(mat_files)} .mat files:")
for file in mat_files[:5]:
    print(f"  - {file.name}")

# Load and inspect a sample .mat file
if mat_files:
    sample_file = mat_files[0]
    print(f"\nExamining file: {sample_file.name}")
    data = sio.loadmat(sample_file) # load .mat file with scipy

    print("\nVariables in .mat file:")
    for k, v in data.items():   # k for key and v for value
        if not k.startswith('__'):
            print(f"  - {k}: {type(v).__name__} {getattr(v, 'shape', '')}")

    # Explore EMG data arrays
    for k, v in data.items():
        if not k.startswith('__'):
            if isinstance(v, np.ndarray):
                print(f"\n{k} details:")
                print(f"  - Shape: {v.shape}")
                print(f"  - Data type: {v.dtype}")

                if v.dtype == 'object':
                    if v.size > 0:
                        first_element = v.flat[0]
                        print(f"  - First element type: {type(first_element)}")
                        if hasattr(first_element, 'shape'):
                            print(f"  - Shape: {first_element.shape}")
                            print(f"  - Data type: {first_element.dtype}")
                            if hasattr(first_element, 'min'):
                                print(f"  - Min/Max: {first_element.min():.3f} / {first_element.max():.3f}")

                        print("  Shapes of first 3 elements:")
                        for i in range(min(3, v.size)):
                            element = v.flat[i]
                            if hasattr(element, 'shape'):
                                print(f"  - Element {i}: {element.shape}")
                else:
                    print(f"  - Min/Max: {v.min():.3f} / {v.max():.3f}")
                    print(f"  - Sample values: {v.flat[:5]}")



Found 43 .mat files:
  - session1_participant1.mat
  - session1_participant2.mat
  - session1_participant3.mat
  - session1_participant4.mat
  - session1_participant5.mat

Examining file: session1_participant1.mat

Variables in .mat file:
  - DATA_FOREARM: ndarray (7, 17)
  - DATA_WRIST: ndarray (7, 17)

DATA_FOREARM details:
  - Shape: (7, 17)
  - Data type: object
  - First element type: <class 'numpy.ndarray'>
  - Shape: (10240, 16)
  - Data type: float64
  - Min/Max: -0.626 / 0.538
  Shapes of first 3 elements:
  - Element 0: (10240, 16)
  - Element 1: (10240, 16)
  - Element 2: (10240, 16)

DATA_WRIST details:
  - Shape: (7, 17)
  - Data type: object
  - First element type: <class 'numpy.ndarray'>
  - Shape: (10240, 12)
  - Data type: float64
  - Min/Max: -0.316 / 0.252
  Shapes of first 3 elements:
  - Element 0: (10240, 12)
  - Element 1: (10240, 12)
  - Element 2: (10240, 12)


In [9]:
# Quick check for NaNs and Infs in all EMG arrays
for k, v in data.items():
    if not k.startswith('__') and isinstance(v, np.ndarray):
        arrs = v.flat if v.dtype == 'object' else [v]
        nan = inf = total = 0
        for arr in arrs:
            if isinstance(arr, np.ndarray):
                nan += np.isnan(arr).sum()
                inf += np.isinf(arr).sum()
                total += arr.size
        print(f"{k}: NaN={nan}, Inf={inf}, Clean={nan==0 and inf==0}, Total={total}")


DATA_FOREARM: NaN=0, Inf=0, Clean=True, Total=19496960
DATA_WRIST: NaN=0, Inf=0, Clean=True, Total=14622720


## Data Room:
- sampling_rate = 2048Hz
- recording_duration = 5 sec
- n_channels = 16 for forearm and 12 for wrist
- n_gestures = 17 ('123', '123', ..., 'REST')  # You choose these
- trials_per_gesture = 7