# Data Preprocessing & convertion to images using Time domian

### load_data(file_path)

**Purpose:** To load data from a text file, skipping the first 4 lines, and loading only the first two columns.

**Parameters:** 
- `file_path` (string) - the path to the text file.

**Steps:**
1. Use `pd.read_csv()` to read the file, using whitespace as the separator (`sep="\s+"`), and skipping the first 4 lines (`skiprows=4`).
2. Assign column names `VM` and `FM` to the data.
3. Remove any rows with missing values using `dropna()`.

**Returns:** A pandas DataFrame containing the columns `VM` and `FM`.

---

### remove_outliers(data, threshold=3)

**Purpose:** To remove outliers from the data using the Z-score method.

**Parameters:** 
- `data` (DataFrame) - the input data.
- `threshold` (int) - the Z-score threshold for defining outliers (default is 3).

**Steps:**
1. Calculate the Z-scores for each value in the data.
2. Filter out rows where the Z-score for any column is above the threshold.

**Returns:** A DataFrame with outliers removed.

---

### wavelet_denoising(data, wavelet='db4', level=1)

**Purpose:** To apply wavelet denoising to the data.

**Parameters:**
- `data` (DataFrame) - the input data.
- `wavelet` (string) - the type of wavelet to use (default is 'db4').
- `level` (int) - the level of decomposition (default is 1).

**Steps:**
1. Initialize an empty DataFrame for denoised data.
2. For each column in the input data:
   - Compute wavelet decomposition using `pywt.wavedec()`.
   - Estimate the noise level (`sigma`).
   - Compute the universal threshold (`uthresh`).
   - Apply soft thresholding to the wavelet coefficients.
   - Reconstruct the signal from the thresholded coefficients using `pywt.waverec()`.
   - Add the denoised signal to the output DataFrame.

**Returns:** A DataFrame with denoised signals.

---

### apply_filters(data, fs=1000)

**Purpose:** To apply low-pass and high-pass filters to the data.

**Parameters:**
- `data` (DataFrame) - the input data.
- `fs` (int) - the sampling frequency (default is 1000 Hz).

**Steps:**
1. Calculate the Nyquist frequency (`nyquist`).
2. Define filter parameters for EMG (`VM`) and angle (`FM`):
   - For EMG: low cutoff at 0.5 Hz, high cutoff at 50 Hz.
   - For angle: low cutoff at 0.1 Hz, high cutoff at 20 Hz.
3. Design Butterworth bandpass filters for both signals using `butter()`.
4. Apply the filters using `filtfilt()`.

**Returns:** A DataFrame with filtered signals.

---

### segment_data(data, window_size=800, num_segments=100)

**Purpose:** To randomly segment the data into windows.

**Parameters:**
- `data` (DataFrame) - the input data.
- `window_size` (int) - the size of each segment (default is 800).
- `num_segments` (int) - the number of segments to extract (default is 100).

**Steps:**
1. Initialize an empty list for segments.
2. For each segment:
   - Randomly select a starting point.
   - Extract a segment of the specified window size.
   - Append the segment to the list.

**Returns:** A list of numpy arrays, each representing a segment.

---

### Main Processing Loop

**Purpose:** To load, preprocess, and segment data for all subjects and exercises.

**Steps:**
1. Define data directory, subject IDs, and exercise names.
2. Initialize a dictionary to store segments for each exercise.
3. Loop through each subject and exercise, loading and processing data if the file exists:
   - Load the data using `load_data()`.
   - Remove outliers using `remove_outliers()`.
   - Apply wavelet denoising using `wavelet_denoising()`.
   - Apply filters using `apply_filters()`.
   - Segment the data using `segment_data()`.
4. Append processed segments to the corresponding exercise in `all_segments`.

---

### Balancing Segments

**Purpose:** To balance the number of segments across different classes.

**Steps:**
1. Determine the minimum number of segments among all exercises.
2. Trim each list of segments to this minimum number.

**Returns:** A dictionary with balanced segments for each exercise.

---

### Generating and Saving Images

**Purpose:** To generate and save images of segmented data for each exercise.

**Steps:**
1. Define label mapping for exercises.
2. Create directories to save images for each exercise.
3. Loop through each exercise and its segments:
   - Extract VM and FM signals.
   - Plot the signals.
   - Save the plot as an image.

**Result:** Images are saved in the specified directory.


In [5]:
import os
import pandas as pd
import numpy as np
import pywt
from scipy.signal import butter, filtfilt
import matplotlib.pyplot as plt
import random

# Function to load data from a text file and skip the first 4 lines
def load_data(file_path):
    data = pd.read_csv(file_path, sep="\s+", header=None, skiprows=4, usecols=[0, 1])
    data.columns = ['VM', 'FM']
    data.dropna(inplace=True)
    return data

# Function to remove outliers using Z-score
def remove_outliers(data, threshold=3):
    z_scores = np.abs((data - data.mean()) / data.std())
    return data[(z_scores < threshold).all(axis=1)]

# Function to apply wavelet denoising
def wavelet_denoising(data, wavelet='db4', level=1):
    denoised_data = pd.DataFrame()
    for column in data.columns:
        coeffs = pywt.wavedec(data[column], wavelet, level=level)
        sigma = np.median(np.abs(coeffs[-level])) / 0.6745
        uthresh = sigma * np.sqrt(2 * np.log(len(data)))
        coeffs[1:] = (pywt.threshold(i, value=uthresh, mode='soft') for i in coeffs[1:])
        denoised_signal = pywt.waverec(coeffs, wavelet)
        denoised_data[column] = denoised_signal[:len(data)]
    return denoised_data

# Function to apply low-pass and high-pass filters
def apply_filters(data, fs=1000):
    nyquist = 0.5 * fs

    # EMG (VM) filter parameters
    emg_low_cutoff = 0.5 / nyquist
    emg_high_cutoff = 50 / nyquist
    b_emg, a_emg = butter(1, [emg_low_cutoff, emg_high_cutoff], btype='band')

    # Angle (FM) filter parameters
    angle_low_cutoff = 0.1 / nyquist
    angle_high_cutoff = 20 / nyquist
    b_angle, a_angle = butter(1, [angle_low_cutoff, angle_high_cutoff], btype='band')

    data['VM'] = filtfilt(b_emg, a_emg, data['VM'])
    data['FM'] = filtfilt(b_angle, a_angle, data['FM'])
    return data

# Function to randomly segment the data into windows
def segment_data(data, window_size=800, num_segments=100):
    segments = []
    for _ in range(num_segments):
        start = random.randint(0, len(data) - window_size)
        segment = data.iloc[start:start + window_size]
        segments.append(segment.values)
    return segments

# Load and preprocess the data
data_dir = r"D:\jhansi_emg\internship_2024_may\Original EMG datasets"
subject_ids = range(1, 12)  # Assuming subjects are labeled from 1 to 11
exercises = ['sitting', 'standing', 'gait']
all_segments = {exercise: [] for exercise in exercises}
num_segments_per_class = 100  # Number of segments per class

for subject in subject_ids:
    for exercise in exercises:
        file_name = f"{subject}{exercise}.txt"
        file_path = os.path.join(data_dir, file_name)
        if os.path.exists(file_path):
            data = load_data(file_path)
            data = remove_outliers(data)
            data = wavelet_denoising(data)
            data = apply_filters(data)
            segments = segment_data(data, num_segments=num_segments_per_class)
            all_segments[exercise].extend(segments)
        else:
            print(f"File {file_name} not found.")

# Balance the number of segments per class
min_segments = min(len(all_segments[exercise]) for exercise in exercises)
balanced_segments = {exercise: segments[:min_segments] for exercise, segments in all_segments.items()}

# Define label mapping
label_mapping = {'sitting': 0, 'standing': 1, 'gait': 2}

# Create directories to save images for each class
save_dir = 'time_domain
for exercise in exercises:
    os.makedirs(os.path.join(save_dir, exercise), exist_ok=True)

# Generate and save images for each signal block
for exercise, segments in balanced_segments.items():
    for i, segment in enumerate(segments):
        # Extract VM and FM signals
        vm_signal = segment[:, 0]
        fm_signal = segment[:, 1]

        # Initialize the plot
        plt.figure(figsize=(6, 4))

        # Plot VM signal
        plt.subplot(2, 1, 1)
        plt.plot(vm_signal, color='blue')
        # plt.title(f'VM Signal - {exercise}')
        plt.grid(True)
        plt.ylim(-0.02, 0.02)  # Adjust these limits based on the range of your EMG data

        # Plot FM signal
        plt.subplot(2, 1, 2)
        plt.plot(fm_signal, color='red')
        # plt.title(f'FM Signal - {exercise}')
        plt.grid(True)
        plt.ylim(-70, 70)  # Adjust these limits based on the range of your angle data

        # Adjust layout to prevent overlap
        plt.tight_layout()

        # Resize the figure to 200x200 pixels
        plt.gcf().set_size_inches(2, 2)  # 2 inches by 2 inches for 200x200 pixels at 100 DPI

        # Save the plot as an image
        image_name = f"{exercise}_{i}.png"
        image_path = os.path.join(save_dir, exercise, image_name)
        plt.savefig(image_path, dpi=100)  # DPI of 100 for 200x200 pixels
        plt.close()  # Close the figure to free up memory

print("All images saved successfully.")


All images saved successfully.
