## Overview of the below code

The code provides a comprehensive framework for sensor data augmentation, enabling the application of various augmentation techniques to enhance the dataset used for machine learning models. It consists of two main components: the `SensorAugmentation` class and the `process_sensor_data` function.

### 1. `SensorAugmentation` Class

#### Purpose
The `SensorAugmentation` class encapsulates various techniques for augmenting sensor data. These techniques include adding noise, jittering, scaling, generating random curves, and permuting segments of the data.

#### Methods

- **`__init__(self, config)`**: Initializes the class with a configuration dictionary containing parameters for each augmentation technique.
- **`add_gaussian_noise(self, data, mean=0, std=0.01)`**: Adds Gaussian noise to the data, clipping values to stay within the original data range.
- **`add_uniform_noise(self, data, low=-0.01, high=0.01)`**: Adds uniform noise to the data, with values clipped to the original range.
- **`DA_Jitter(self, data, sigma=0.05)`**: Applies jittering by adding random Gaussian noise to the data.
- **`DA_Scaling(self, data, sigma=0.1)`**: Scales the data by multiplying it with random scaling factors.
- **`GenerateRandomCurves(self, data, sigma=0.2, knot=4)`**: Generates and adds random curves to the data using cubic spline interpolation.
- **`DA_Permutation(self, data, nPerm=3, minSegLength=10, noise_factor=0.1)`**: Permutes segments of the data, adding noise to each segment.
- **`augment_data(self, data, techniques)`**: Applies specified augmentation techniques to the data based on the configuration provided.
- **`save_augmented_data(self, augmented_data, base_filename, original_df, output_path)`**: Saves the augmented data to CSV files.

### 2. `process_sensor_data` Function

#### Purpose
The `process_sensor_data` function orchestrates the entire process of loading sensor data, applying augmentation techniques, and saving the augmented data.

#### Workflow

1. **Configuration Validation**: Ensures that the necessary configuration parameters are present.
2. **Loading Data**: Reads sensor data from a specified CSV file into a DataFrame.
3. **Data Augmentation**: Creates an instance of `SensorAugmentation` and applies the specified augmentation techniques to the data.
4. **Saving Augmented Data**: Saves the augmented data to CSV files in the specified output directory.
5. **Error Handling**: Includes comprehensive error handling for various potential issues such as file not found, permission errors, empty data files, and parsing errors.

### Error Handling
The `process_sensor_data` function includes robust error handling for various exceptions:
- **PermissionError**: Ensures file paths are correct and required permissions are granted.
- **FileNotFoundError**: Ensures the sensor data file exists at the specified path.
- **EmptyDataError**: Handles cases where the data file is empty.
- **ParserError**: Handles errors in parsing the data file.
- **General Exceptions**: Catches any other exceptions that might occur during the process.

### Summary
This framework provides a flexible and efficient way to augment sensor data for machine learning models. The `SensorAugmentation` class offers multiple techniques for enhancing data, while the `process_sensor_data` function streamlines the process of loading, augmenting, and saving the data, with comprehensive error handling to ensure robustness.

In [None]:
import pandas as pd
import numpy as np
import os
from scipy.interpolate import CubicSpline

# References:
# https://github.com/terryum/Data-Augmentation-For-Wearable-Sensor-Data/blob/master/Example_DataAugmentation_TimeseriesData.py

class SensorAugmentation:
    """
    A class to perform various sensor data augmentation techniques.

    Args:
        config (dict): Configuration for sensor data augmentation techniques.
    """

    def __init__(self, config):
        self.config = config

    def add_gaussian_noise(self, data, mean=0, std=0.01):
        """
        Add Gaussian noise to the data.

        Args:
            data (ndarray): Input data.
            mean (float, optional): Mean of the Gaussian noise. Default is 0.
            std (float, optional): Standard deviation of the Gaussian noise. Default is 0.01.

        Returns:
            ndarray: Data with added Gaussian noise.
        """
        noise = np.random.normal(mean, std, data.shape)
        min_vals = np.min(data, axis=0)
        max_vals = np.max(data, axis=0)
        noisy_X = data + noise
        noisy_X = np.clip(noisy_X, min_vals, max_vals)  # Clip values within the range of original data
        return noisy_X

    def add_uniform_noise(self, data, low=-0.01, high=0.01):
        """
        Add uniform noise to the data.

        Args:
            data (ndarray): Input data.
            low (float, optional): Lower bound of the uniform noise. Default is -0.01.
            high (float, optional): Upper bound of the uniform noise. Default is 0.01.

        Returns:
            ndarray: Data with added uniform noise.
        """
        noise = np.random.uniform(low, high, data.shape)
        min_vals = np.min(data, axis=0)
        max_vals = np.max(data, axis=0)
        noisy_X = data + noise
        noisy_X = np.clip(noisy_X, min_vals, max_vals)
        return noisy_X

    def DA_Jitter(self, data, sigma=0.05):
        """
        Apply jittering to the data by adding random Gaussian noise.

        Args:
            data (ndarray): Input data.
            sigma (float, optional): Standard deviation of the Gaussian noise. Default is 0.05.

        Returns:
            ndarray: Jittered data.
        """
        myNoise = np.random.normal(loc=0, scale=sigma, size=data.shape)
        min_vals = np.min(data, axis=0)
        max_vals = np.max(data, axis=0)
        noisy_X = data + myNoise
        noisy_X = np.clip(noisy_X, min_vals, max_vals)
        return noisy_X

    def DA_Scaling(self, data, sigma=0.1):
        """
        Apply scaling to the data by multiplying with random scaling factors.

        Args:
            data (ndarray): Input data.
            sigma (float, optional): Standard deviation of the scaling factors. Default is 0.1.

        Returns:
            ndarray: Scaled data.
        """
        scalingFactor = np.random.normal(loc=1.0, scale=sigma, size=(1, data.shape[1]))
        myNoise = np.matmul(np.ones((data.shape[0], 1)), scalingFactor)
        min_vals = np.min(data, axis=0)
        max_vals = np.max(data, axis=0)
        noisy_X = data * myNoise
        noisy_X = np.clip(noisy_X, min_vals, max_vals)
        return noisy_X

    def GenerateRandomCurves(self, data, sigma=0.2, knot=4):
        """
        Generate random curves and add them to the data.

        Args:
            data (ndarray): Input data.
            sigma (float, optional): Standard deviation of the random curves. Default is 0.2.
            knot (int, optional): Number of knots for the cubic spline. Default is 4.

        Returns:
            ndarray: Data with added random curves.
        """
        x_range = np.arange(data.shape[0])
        curves = np.zeros_like(data)
        for i in range(data.shape[1]):
            xx = np.linspace(0, data.shape[0] - 1, knot + 2)
            yy = np.random.normal(loc=1.0, scale=sigma, size=knot + 2)
            cs = CubicSpline(xx, yy)
            curves[:, i] = cs(x_range)
        min_vals = np.min(data, axis=0)
        max_vals = np.max(data, axis=0)
        noisy_X = data * curves
        noisy_X = np.clip(noisy_X, min_vals, max_vals)
        return noisy_X

    def DA_Permutation(self, data, nPerm=3, minSegLength=10, noise_factor=0.1):
        """
        Permute segments of the data.

        Args:
            data (ndarray): Input data.
            nPerm (int, optional): Number of permutations. Default is 4.
            minSegLength (int, optional): Minimum segment length. Default is 10.

        Returns:
            ndarray: Data with permuted segments.
        """
        X_new = np.zeros_like(data)
        idx = np.random.permutation(nPerm)

        bWhile = True
        while bWhile:
            segs = np.zeros(nPerm + 1, dtype=int)
            segs[1:-1] = np.sort(np.random.randint(minSegLength, data.shape[0] - minSegLength, nPerm - 1))
            segs[-1] = data.shape[0]
            if np.min(segs[1:] - segs[:-1]) >= minSegLength:
                bWhile = False

        pp = 0
        for ii in range(nPerm):
            start_idx = segs[idx[ii]]
            end_idx = segs[idx[ii] + 1] if (idx[ii] + 1) < len(segs) else data.shape[0]
            x_temp = data[start_idx:end_idx, :].copy()
            noise = np.random.normal(0, noise_factor, x_temp.shape)
            x_temp += noise
            X_new[pp:pp + len(x_temp), :] = x_temp
            pp += len(x_temp)

        # Ensure values stay within the original value range
        min_vals = np.min(data, axis=0)
        max_vals = np.max(data, axis=0)
        X_new = np.clip(X_new, min_vals, max_vals)

        return X_new

    def augment_data(self, data, techniques):
        """
        Apply specified augmentation techniques to the data.

        Args:
            data (ndarray): Input data.
            techniques (dict): Dictionary of augmentation techniques and their parameters.

        Returns:
            dict: Dictionary of augmented data with technique names as keys.
        """
        augmented_data = {}
        for technique, params in techniques.items():
            if hasattr(self, technique) and params.get('enabled', False):
                # Remove 'enabled' key before passing to the function
                params = {k: v for k, v in params.items() if k != 'enabled'}
                augmented_data[technique] = getattr(self, technique)(data, **params)
        return augmented_data

    def save_augmented_data(self, augmented_data, base_filename, orignal_df, output_path):
        """
        Save the augmented data to CSV files.

        Args:
            augmented_data (dict): Dictionary of augmented data with technique names as keys.
            base_filename (str): Base filename for the output files.
            output_path (str): Path to the directory where the augmented files will be saved.
        """
        os.makedirs(output_path, exist_ok=True)
        for technique, data in augmented_data.items():
            df = pd.DataFrame(data, columns=orignal_df.columns)
            filename = os.path.join(output_path, f"{base_filename}_{technique}.csv")
            df.to_csv(filename, index=False)


def process_sensor_data(config_data):
    """
        Process sensor data according to the configuration.

        Args:
            config_data (dict): Configuration data containing paths and settings for sensor data augmentation.

        Returns:
            None

        Raises:
            PermissionError: If there are issues with file permissions.
            FileNotFoundError: If the sensor data file is not found at the specified path.
            pd.errors.EmptyDataError: If the data file is empty.
            pd.errors.ParserError: If there is an error parsing the data file.
            Exception: For any other exceptions.
        """
    try:
        sensor_config = config_data["data_augmentation"]["sensor_augmentation"]
        data_path = sensor_config["script_path"]

        if not os.path.exists(data_path):
            raise FileNotFoundError(f"Sensor data file not found at {data_path}")

        print(f"Loading sensor data from {data_path}...")
        original_df = pd.read_csv(data_path)
        data = original_df.to_numpy()
        print("Data loaded successfully. Sample data:")
        augmenter = SensorAugmentation(sensor_config["techniques"])
        augmented_data = augmenter.augment_data(data, sensor_config["techniques"])
        output_path = sensor_config["output_path"]
        if not os.path.exists(output_path):
            os.makedirs(output_path)

        base_filename = os.path.splitext(os.path.basename(data_path))[0]
        augmenter.save_augmented_data(augmented_data, base_filename, original_df, output_path)

        print("Augmentation completed. Checking augmented files...")

        augmented_files = [f for f in os.listdir(output_path) if f.endswith('.csv') and os.path.isfile(os.path.join(output_path, f))]
        for file in augmented_files:
            file_path = os.path.join(output_path, file)
            try:
                augmented_df = pd.read_csv(file_path)
                print(f"Augmented Data from {file}:")
                print(augmented_df.head())
                print(f"Shape of augmented data from {file}: {augmented_df.shape}")
            except UnicodeDecodeError as e:
                print(f"UnicodeDecodeError: {e}. Could not read the file {file_path}. It might be encoded in a non-UTF-8 format.")
            except Exception as e:
                print(f"An error occurred while reading the file {file_path}: {e}")

    except PermissionError as e:
        print(f"PermissionError: {e}. Ensure the file path is correct and you have the required permissions.")
    except FileNotFoundError as e:
        print(f"FileNotFoundError: {e}. Ensure the sensor data file exists at the specified path.")
    except pd.errors.EmptyDataError as e:
        print(f"EmptyDataError: {e}. The data file is empty.")
    except pd.errors.ParserError as e:
        print(f"ParserError: {e}. Error parsing the data file.")
    except Exception as e:
        print(f"An error occurred: {e}")
