# Data preprocessing and Spectogram image generation

# EMG and Angle Data Preprocessing and STFT Image Generation

This script preprocesses EMG (Electromyography) and Angle data from text files, removes outliers, denoises the signals, standardizes the data, and generates combined Short-Time Fourier Transform (STFT) images for visualization.

## Data Loading and Preprocessing

### Loading Data
- **Function**: `load_data(file_path)`
- **Purpose**: Load EMG and Angle data from a text file, skipping the first 4 lines.
- **Parameters**: `file_path` - Path to the text file.
- **Returns**: DataFrame with columns 'EMG' and 'Angle'.

### Removing Outliers
- **Function**: `remove_outliers(df, threshold=3)`
- **Purpose**: Remove outliers using the Z-score method.
- **Parameters**: 
  - `df` - Input DataFrame.
  - `threshold` - Z-score threshold for outlier detection (default is 3).
- **Returns**: DataFrame without outliers.

### Denoising Signal
- **Function**: `denoise_signal(signal, sigma=1)`
- **Purpose**: Apply a Gaussian filter to smooth the signal.
- **Parameters**: 
  - `signal` - Input signal array.
  - `sigma` - Standard deviation for Gaussian kernel (default is 1).
- **Returns**: Denoised signal array.


## Data Organization
- **Variables**: 
  - `data_dir` - Directory containing the data files.
  - `subject_ids` - Range of subject IDs (1 to 14).
  - `exercises` - List of exercises ['sitting', 'standing', 'gait'].
  - `data` - Dictionary to store preprocessed data for each subject and exercise.

The script iterates through each subject and exercise, loads the data, removes outliers, denoises, standardizes, and stores the preprocessed data in the `data` dictionary.

## Generating Combined STFT Images

### Function: `sample_and_create_combined_stft(df, output_directory, subject_number, exercise)`
- **Purpose**: Generate and save combined STFT images for given data.
- **Parameters**: 
  - `df` - Preprocessed DataFrame.
  - `output_directory` - Directory to save the STFT images.
  - `subject_number` - Subject ID.
  - `exercise` - Exercise name.

**Steps**:
1. **Configuration**: Define sensor columns, sample size, number of samples, and number of images.
2. **STFT Calculation**: 
   - Use consistent frequency (`fs`) and time axis limits (`nperseg`).
   - Calculate STFT for the first sample to set axis and magnitude limits.
3. **Image Generation**:
   - For each image, randomly select a start index within the valid range.
   - Compute STFT for the selected segment for both EMG and Angle signals.
   - Plot and save the STFT images with consistent scaling and high resolution.
   - Resize and save the images to 400x400 pixels.

**Output**: Combined STFT images are saved in exercise-specific subfolders within the output directory.

### Execution
The script iterates through each subject and exercise in the `data` dictionary, generating and saving the combined STFT images using the `sample_and_create_combined_stft` function.

## Output Directory
- **Variable**: `output_directory`
- **Purpose**: Specify the directory to save the generated STFT images.

The script processes and visualizes the EMG and Angle data, providing clear insights through combined STFT images.

**Note**: Ensure that the specified directories and file paths are correct and accessible before running the script.


In [None]:
import os
import pandas as pd
import numpy as np
import random
from scipy import signal
from scipy.ndimage import gaussian_filter1d
import matplotlib.pyplot as plt
from scipy.stats import zscore

# Function to load data from a text file and skip the first 4 lines
def load_data(file_path):
    data = pd.read_csv(file_path, sep="\s+", header=None, skiprows=5, usecols=[0, 1])
    data.columns = ['EMG', 'Angle']
    data.dropna(inplace=True, axis=0)
    return data

# Function to remove outliers using Z-score
def remove_outliers(df, threshold=3):
    z_scores = np.abs(zscore(df))
    return df[(z_scores < threshold).all(axis=1)]

# Function to apply low-pass filter for denoising
def denoise_signal(signal, sigma=1):
    return gaussian_filter1d(signal, sigma=sigma)


# Example of organizing data
data_dir = r"C:\Users\user\OneDrive\Documents\nitw\spectogram Work\Original EMG datasets"
subject_ids = range(1, 15)  # Assuming subjects are labeled from 1 to 14
exercises = ['sitting', 'standing', 'gait']
data = {}

for subject in subject_ids:
    for exercise in exercises:
        file_name = f"{subject}{exercise}.txt"
        file_path = os.path.join(data_dir, file_name)
        if os.path.exists(file_path):
            data[(subject, exercise)] = load_data(file_path)
        else:
            print(f"File {file_name} not found.")

for subject in subject_ids:
    for exercise in exercises:
        if (subject, exercise) in data:
            df = data[(subject, exercise)]
            df = remove_outliers(df)
            df['EMG'] = denoise_signal(df['EMG'])
            df['Angle'] = denoise_signal(df['Angle'])
            data[(subject, exercise)] = df

# Function to generate and save combined STFT images
def sample_and_create_combined_stft(df, output_directory, subject_number, exercise):
    sensor_columns = ["EMG", "Angle"]
    sample_size = 750
    num_samples = df.shape[0] // sample_size
    num_images = 100

    # Set consistent frequency and time axis limits
    fs = 1.0
    nperseg = 256
    _, _, Zxx = signal.stft(df[sensor_columns[0]].iloc[:sample_size].values, fs=fs, nperseg=nperseg)
    freq_axis_limits = (0, fs / 2)
    time_axis_limits = (0, sample_size / fs)
    magnitude_limits = (0, np.max(np.abs(Zxx)))

    for img_index in range(num_images):
        plt.figure(figsize=(8, 4))  # Adjusted for two subplots and to ensure high resolution

        # Randomly select a start index within the valid range
        start_index = random.randint(0, num_samples - 1) * sample_size
        end_index = start_index + sample_size

        for i, sensor in enumerate(sensor_columns):
            segment = df.loc[start_index:end_index - 1, sensor].values
            f, t, Zxx = signal.stft(segment, fs=fs, nperseg=nperseg)

            # Plot STFT with consistent scaling
            plt.subplot(1, 2, i + 1)
            plt.pcolormesh(t, f, np.abs(Zxx), vmin=magnitude_limits[0], vmax=magnitude_limits[1])  # Magnitude of STFT
            plt.axis('off')  # Turn off axis labels and ticks
            plt.title(sensor)
            plt.ylim(freq_axis_limits)
            plt.xlim(time_axis_limits)

        plt.tight_layout()

        # Save combined STFT images in time-frequency format in activity-wise subfolders
        activity_output_directory = os.path.join(output_directory, f"Activity_{exercise}")
        os.makedirs(activity_output_directory, exist_ok=True)

        stft_image_path = os.path.join(activity_output_directory, f"STFT_Sub{subject_number}_{exercise}_combined_{img_index + 1}.png")
        plt.savefig(stft_image_path, bbox_inches='tight', pad_inches=0, transparent=True, dpi=100)  # Increase DPI for higher resolution
        plt.close()

        # Load the saved image and resize to 400x400 pixels
        img = plt.imread(stft_image_path)
        plt.figure(figsize=(4, 4))
        plt.imshow(img)
        plt.axis('off')
        plt.gca().xaxis.set_major_locator(plt.NullLocator())
        plt.gca().yaxis.set_major_locator(plt.NullLocator())
        plt.savefig(stft_image_path, bbox_inches='tight', pad_inches=0, transparent=True, dpi=100)
        plt.close()

        print(f"Saved combined STFT image: {stft_image_path}")

# Output directory for STFT images
output_directory = r"C:\Users\user\OneDrive\Documents\nitw\spectogram Work\Spectograms"

# Generate combined STFT images for each subject and exercise
for (subject, exercise), df in data.items():
    sample_and_create_combined_stft(df, output_directory, subject, exercise)
