**Data Preprocessing**: Missing value handling, and possibly normalization.

**Feature Extraction:**

- Frequency-domain features using CZT.

- Amplitude envelope using Hilbert Transform.

- Statistical features like CDF and cumulative sum.

- Signal segmentation to focus on smaller time periods.

**Cross-validation**: Generating cross-validation splits for model evaluation.

**SVM Training** : The processed features are fed into the SVM for classification.

In [None]:
from google.colab import drive
drive.mount('/content/drive')

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


In [None]:
file_path = file_path = '/content/drive/MyDrive/data/CowScreeningDB.zip'


In [None]:
import zipfile

with zipfile.ZipFile(file_path, 'r') as zip_ref:
    zip_ref.extractall('/content/')

In [None]:
from sklearn.preprocessing import StandardScaler

In [None]:
import os
import pandas as pd
data_list = []
files = ["01299_5", "05309_5", "05252_5", "05317_4", "05363_4", "01184_4",
         "00904_3", "05148_3", "05176_3", "00808_2", "05347_2", "05156_2",
         "00749_1", "05144_1", "05160_1"]

for filename in files:
    label = filename.split('_')[-1]
    filepath = os.path.join("/content/CowScreeningDB", filename)
    print(filepath)

    for i in os.listdir(filepath)[:2]:
        file_path = os.path.join(filepath, i)
        print(f"{file_path}")

        # Read as a single column (no split)
        df_raw = pd.read_csv(file_path, header=None)

        # Split the first column into multiple columns
        split_df = df_raw[0].str.split(expand=True)

        # Convert all to numeric immediately (coerce any bad values to NaN)
        split_df = split_df.apply(pd.to_numeric, errors='coerce')

        # Rename columns if number matches
        split_df.columns = [
            'Time(s)', 'Acceleration_x', 'Acceleration_y', 'Acceleration_z',
            'Gravity_x', 'Gravity_y', 'Gravity_z',
            'Rotation_x', 'Rotation_y', 'Rotation_z',
            'Roll', 'Pitch', 'Yaw'
        ][:split_df.shape[1]]  # Trim to actual column count

        # Add label column
        split_df['label'] = int(label)

        # Append cleaned DataFrame
        data_list.append(split_df)

# Merge all data
final_df = pd.concat(data_list, ignore_index=True)
final_df= final_df[1:]
# Display sample
print(final_df.head())

/content/CowScreeningDB/01299_5
/content/CowScreeningDB/01299_5/Illnessdegree_5_Leg_rearright_Acquisitiondata_07_05_2022_Acquisitiontime_05_10_49.csv
/content/CowScreeningDB/01299_5/Illnessdegree_5_Leg_rearright_Acquisitiondata_07_05_2022_Acquisitiontime_04_37_53.csv
/content/CowScreeningDB/05309_5
/content/CowScreeningDB/05309_5/Illnessdegree_5_Leg_frontright_Acquisitiondata_17_05_2022_Acquisitiontime_08_42_32.csv
/content/CowScreeningDB/05309_5/Illnessdegree_5_Leg_frontright_Acquisitiondata_17_05_2022_Acquisitiontime_17_23_22.csv
/content/CowScreeningDB/05252_5
/content/CowScreeningDB/05252_5/Illnessdegree_5_Leg_frontleft_Acquisitiondata_06_05_2022_Acquisitiontime_14_56_20.csv
/content/CowScreeningDB/05252_5/Illnessdegree_5_Leg_frontleft_Acquisitiondata_07_05_2022_Acquisitiontime_01_25_10.csv
/content/CowScreeningDB/05317_4
/content/CowScreeningDB/05317_4/Illnessdegree_4_Leg_frontright_Acquisitiondata_07_05_2022_Acquisitiontime_02_53_28.csv
/content/CowScreeningDB/05317_4/Illnessdegr

In [None]:
final_df.dtypes

Unnamed: 0,0
Time(s),float64
Acceleration_x,float64
Acceleration_y,float64
Acceleration_z,float64
Gravity_x,float64
Gravity_y,float64
Gravity_z,float64
Rotation_x,float64
Rotation_y,float64
Rotation_z,float64


In [None]:
final_df.head(10)

Unnamed: 0,Time(s),Acceleration_x,Acceleration_y,Acceleration_z,Gravity_x,Gravity_y,Gravity_z,Rotation_x,Rotation_y,Rotation_z,Roll,Pitch,Yaw,label
1,0.0,0.008691,0.003719,-0.000487,-0.874902,0.14276,-0.46278,-0.001225,-0.001603,-0.001539,-1.084258,-0.143249,0.614978,5
2,0.009944,0.009274,0.001218,0.000508,-0.874905,0.142741,-0.462781,0.000242,-0.001332,-0.002133,-1.084257,-0.14323,0.614972,5
3,0.019954,0.008038,0.002492,0.000591,-0.874921,0.14272,-0.462758,-0.002171,-0.004789,-0.00171,-1.084286,-0.143209,0.614965,5
4,0.029943,0.006504,0.001978,0.001205,-0.874943,0.142716,-0.462717,0.001613,-0.004523,3.5e-05,-1.084332,-0.143205,0.614954,5
5,0.039894,0.006936,0.003268,-0.000136,-0.874963,0.142714,-0.462679,-0.001569,-0.004751,-0.000652,-1.084376,-0.143203,0.614955,5
6,0.04988,0.006317,0.002073,0.000839,-0.874984,0.1427,-0.462644,-0.000708,-0.003974,-0.001729,-1.084418,-0.143189,0.614933,5
7,0.060056,0.007474,0.003592,0.002079,-0.875012,0.142697,-0.462592,0.000118,-0.006841,-0.001559,-1.084477,-0.143186,0.614917,5
8,0.07004,0.007883,0.002659,0.000492,-0.87504,0.142681,-0.462544,-0.000563,-0.006704,-0.001588,-1.084533,-0.14317,0.614907,5
9,0.079849,0.007001,0.000814,0.000184,-0.875074,0.142669,-0.462484,-0.001097,-0.008351,-0.002482,-1.084603,-0.143157,0.614885,5
10,0.089772,0.00656,0.002344,-6.4e-05,-0.875106,0.142658,-0.462427,-0.001765,-0.006429,-0.001182,-1.084669,-0.143146,0.614861,5


### 1. **Handling Missing Values**
- **Why we need it**: Missing data can lead to inaccurate models because most machine learning algorithms (including SVM) cannot handle `NaN` or missing values. By replacing missing values or removing them, we ensure the dataset is complete and usable.
- **How it helps**: A clean dataset enables the model to learn from all available data points, increasing its ability to generalize and make predictions effectively.

### 2. **Normalization/Standardization**
- **Why we need it**: Sensor data (e.g., acceleration, gravity, rotation) can have different units and ranges. For instance, the acceleration might have values between -10 and 10, while the rotation angles could range from 0 to 360 degrees.
- **How it helps**: Normalization brings all features to the same scale (mean = 0, standard deviation = 1), making it easier for algorithms like SVM to converge and learn efficiently. Features with larger scales won’t dominate the learning process.

### 3. **Cumulative Sum (CDF)**
- **Why we need it**: The cumulative sum (or cumulative distribution function, CDF) tracks the accumulated value over time. In the context of sensor data, it can help capture long-term trends or patterns that might not be visible in the raw data.
- **How it helps**: It provides insight into the overall behavior of the signal over time. For example, if a sensor is detecting motion, the cumulative sum can show the total movement or displacement over a period, which might be important for detecting anomalies or classifying different states (e.g., standing vs. walking).

### 4. **Rolling Statistics (Mean, Standard Deviation, etc.)**
- **Why we need it**: Sensor data can fluctuate over time. A simple reading at one point might not be very informative; we need to understand the behavior of the sensor over a window of time.
- **How it helps**: Rolling statistics like the mean or standard deviation within a moving window provide a summary of the data’s local behavior, smoothing out noise. For example, the rolling mean of acceleration can help identify periods of high or low movement, which can be useful for classifying different activities or detecting outliers.

### 5. **Frequency Domain Features (e.g., using CZT)**
- **Why we need it**: Some behaviors in the data might only be apparent in the frequency domain, especially if the signal has periodic components (like vibrations or oscillations).
- **How it helps**: Transforming the signal to the frequency domain (e.g., using Constant Q Transform or Fourier Transform) allows us to extract frequency-based features that capture periodic patterns in the signal. For example, detecting certain frequencies of vibration might help in identifying a specific motion or activity (e.g., walking or running in an animal).

### 6. **Hilbert Transform and Envelope**
- **Why we need it**: Many sensor signals have modulated amplitudes, meaning their power or intensity varies over time. The Hilbert Transform is useful for extracting the envelope, which reveals the slow variations in the signal.
- **How it helps**: The envelope highlights amplitude variations in the signal, which can be important for detecting events like peaks or transitions. For example, in motion sensors, the envelope of a signal could represent the overall intensity of movement, which can be used to distinguish between different types of motion.

### 7. **Segmentation of Signal**
- **Why we need it**: Sensor data is often continuous, but the important events or patterns may only occur over specific time intervals. If we analyze the whole signal at once, we might miss out on local patterns or transitions between different states.
- **How it helps**: By segmenting the signal into smaller windows, we can analyze different parts of the signal individually. For example, if we are classifying walking patterns, segmenting the signal can help us focus on individual steps instead of the whole walking sequence, leading to more accurate classifications.

### 8. **Feature Combination**
- **Why we need it**: Different features capture different aspects of the data. For instance, acceleration features capture movement, rotation features capture orientation, and the envelope captures amplitude variations.
- **How it helps**: Combining various features allows the model to use complementary information to make more accurate predictions. A model using only one type of feature (e.g., acceleration) might miss out on important patterns in other features (e.g., orientation or amplitude), reducing its accuracy.


In [None]:
final_df.isnull().sum()

Unnamed: 0,0
Time(s),29
Acceleration_x,29
Acceleration_y,29
Acceleration_z,29
Gravity_x,29
Gravity_y,29
Gravity_z,29
Rotation_x,29
Rotation_y,29
Rotation_z,29


In [None]:
final_df = final_df.dropna()

In [None]:
final_df.isnull().sum()

Unnamed: 0,0
Time(s),0
Acceleration_x,0
Acceleration_y,0
Acceleration_z,0
Gravity_x,0
Gravity_y,0
Gravity_z,0
Rotation_x,0
Rotation_y,0
Rotation_z,0


In [None]:
len(final_df)

270000

In [None]:
final_df

Unnamed: 0,Time(s),Acceleration_x,Acceleration_y,Acceleration_z,Gravity_x,Gravity_y,Gravity_z,Rotation_x,Rotation_y,Rotation_z,Roll,Pitch,Yaw,label
1,0.000000,0.008691,0.003719,-0.000487,-0.874902,0.142760,-0.462780,-0.001225,-0.001603,-0.001539,-1.084258,-0.143249,0.614978,5
2,0.009944,0.009274,0.001218,0.000508,-0.874905,0.142741,-0.462781,0.000242,-0.001332,-0.002133,-1.084257,-0.143230,0.614972,5
3,0.019954,0.008038,0.002492,0.000591,-0.874921,0.142720,-0.462758,-0.002171,-0.004789,-0.001710,-1.084286,-0.143209,0.614965,5
4,0.029943,0.006504,0.001978,0.001205,-0.874943,0.142716,-0.462717,0.001613,-0.004523,0.000035,-1.084332,-0.143205,0.614954,5
5,0.039894,0.006936,0.003268,-0.000136,-0.874963,0.142714,-0.462679,-0.001569,-0.004751,-0.000652,-1.084376,-0.143203,0.614955,5
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
270025,89.844569,0.002567,-0.003900,-0.001870,0.885998,0.270272,-0.376776,-0.024448,-0.030900,0.003440,1.168709,-0.273676,0.112907,1
270026,89.854485,-0.000964,-0.005870,-0.003063,0.885912,0.270321,-0.376943,-0.023343,-0.026883,0.006407,1.168515,-0.273726,0.113155,1
270027,89.864513,-0.001781,-0.004516,-0.004039,0.885845,0.270335,-0.377091,-0.016368,-0.024198,0.006369,1.168346,-0.273741,0.113382,1
270028,89.874468,-0.001410,-0.006253,-0.001991,0.885795,0.270348,-0.377201,-0.016920,-0.017132,0.005285,1.168220,-0.273754,0.113585,1


In [None]:

label = final_df['label']
label.head()

Unnamed: 0,label
1,5
2,5
3,5
4,5
5,5


In [None]:

# Features to normalize (excluding the label column)
features = ['Acceleration_x', 'Acceleration_y', 'Acceleration_z',
            'Gravity_x', 'Gravity_y', 'Gravity_z',
            'Rotation_x', 'Rotation_y', 'Rotation_z',
            'Roll', 'Pitch', 'Yaw']

# Initialize the scaler
scaler = StandardScaler()

# Fit and transform the features
final_df1 = scaler.fit_transform(final_df[features])
final_df = pd.DataFrame(final_df1, columns=features, index=final_df.index)

import joblib

# Save the scaler to a file
scaler_filename = "/content/drive/MyDrive/data/Scalar_lamness.pkl"
joblib.dump(scaler, scaler_filename)
print(f"Scaler saved to {scaler_filename}")


# Now your dataset has normalized features


Scaler saved to /content/drive/MyDrive/data/Scalar_lamness.pkl


In [None]:
final_df.head()

Unnamed: 0,Acceleration_x,Acceleration_y,Acceleration_z,Gravity_x,Gravity_y,Gravity_z,Rotation_x,Rotation_y,Rotation_z,Roll,Pitch,Yaw
1,0.043078,0.052828,0.049773,-1.927959,1.43325,-1.473999,-0.008723,-0.00719,-0.007021,-0.94712,-1.388635,0.705866
2,0.045528,0.03529,0.054416,-1.927965,1.433212,-1.474002,-0.000393,-0.00621,-0.008587,-0.947119,-1.388606,0.705861
3,0.040335,0.044224,0.054804,-1.927995,1.43317,-1.473929,-0.014094,-0.018712,-0.007472,-0.947137,-1.388575,0.705856
4,0.033889,0.04062,0.057669,-1.928035,1.433162,-1.4738,0.007391,-0.01775,-0.002871,-0.947165,-1.388569,0.705849
5,0.035704,0.049665,0.051411,-1.928073,1.433158,-1.47368,-0.010676,-0.018575,-0.004682,-0.947192,-1.388566,0.705849


In [None]:
for feature in features:
    final_df[f'Cumsum_{feature}'] = final_df[feature].cumsum()


In [None]:
final_df.head()

Unnamed: 0,Acceleration_x,Acceleration_y,Acceleration_z,Gravity_x,Gravity_y,Gravity_z,Rotation_x,Rotation_y,Rotation_z,Roll,...,Cumsum_Acceleration_z,Cumsum_Gravity_x,Cumsum_Gravity_y,Cumsum_Gravity_z,Cumsum_Rotation_x,Cumsum_Rotation_y,Cumsum_Rotation_z,Cumsum_Roll,Cumsum_Pitch,Cumsum_Yaw
1,0.043078,0.052828,0.049773,-1.927959,1.43325,-1.473999,-0.008723,-0.00719,-0.007021,-0.94712,...,0.049773,-1.927959,1.43325,-1.473999,-0.008723,-0.00719,-0.007021,-0.94712,-1.388635,0.705866
2,0.045528,0.03529,0.054416,-1.927965,1.433212,-1.474002,-0.000393,-0.00621,-0.008587,-0.947119,...,0.104189,-3.855924,2.866462,-2.948001,-0.009116,-0.0134,-0.015608,-1.894239,-2.777241,1.411727
3,0.040335,0.044224,0.054804,-1.927995,1.43317,-1.473929,-0.014094,-0.018712,-0.007472,-0.947137,...,0.158993,-5.783919,4.299633,-4.421931,-0.02321,-0.032113,-0.02308,-2.841375,-4.165816,2.117583
4,0.033889,0.04062,0.057669,-1.928035,1.433162,-1.4738,0.007391,-0.01775,-0.002871,-0.947165,...,0.216662,-7.711954,5.732795,-5.895731,-0.015819,-0.049863,-0.025951,-3.78854,-5.554385,2.823432
5,0.035704,0.049665,0.051411,-1.928073,1.433158,-1.47368,-0.010676,-0.018575,-0.004682,-0.947192,...,0.268073,-9.640027,7.165953,-7.369411,-0.026495,-0.068438,-0.030633,-4.735732,-6.942951,3.529281


In [None]:
final_df.isnull().sum()

Unnamed: 0,0
Acceleration_x,0
Acceleration_y,0
Acceleration_z,0
Gravity_x,0
Gravity_y,0
Gravity_z,0
Rotation_x,0
Rotation_y,0
Rotation_z,0
Roll,0


In [None]:
len(final_df)

270000

In [None]:
window_size = 5

for feature in features:
    final_df[f'Rolling_Mean_{feature}'] = final_df[feature].rolling(window=window_size).mean()
    final_df[f'Rolling_Std_{feature}'] = final_df[feature].rolling(window=window_size).std()

In [None]:
final_df.head()

Unnamed: 0,Acceleration_x,Acceleration_y,Acceleration_z,Gravity_x,Gravity_y,Gravity_z,Rotation_x,Rotation_y,Rotation_z,Roll,...,Rolling_Mean_Rotation_y,Rolling_Std_Rotation_y,Rolling_Mean_Rotation_z,Rolling_Std_Rotation_z,Rolling_Mean_Roll,Rolling_Std_Roll,Rolling_Mean_Pitch,Rolling_Std_Pitch,Rolling_Mean_Yaw,Rolling_Std_Yaw
1,0.043078,0.052828,0.049773,-1.927959,1.43325,-1.473999,-0.008723,-0.00719,-0.007021,-0.94712,...,,,,,,,,,,
2,0.045528,0.03529,0.054416,-1.927965,1.433212,-1.474002,-0.000393,-0.00621,-0.008587,-0.947119,...,,,,,,,,,,
3,0.040335,0.044224,0.054804,-1.927995,1.43317,-1.473929,-0.014094,-0.018712,-0.007472,-0.947137,...,,,,,,,,,,
4,0.033889,0.04062,0.057669,-1.928035,1.433162,-1.4738,0.007391,-0.01775,-0.002871,-0.947165,...,,,,,,,,,,
5,0.035704,0.049665,0.051411,-1.928073,1.433158,-1.47368,-0.010676,-0.018575,-0.004682,-0.947192,...,-0.013688,0.006399,-0.006127,0.00231,-0.947146,3.2e-05,-1.38859,3e-05,0.705856,7e-06


In [None]:
concat_df = pd.concat([final_df, label], axis=1)

In [None]:
concat_df.head()

Unnamed: 0,Acceleration_x,Acceleration_y,Acceleration_z,Gravity_x,Gravity_y,Gravity_z,Rotation_x,Rotation_y,Rotation_z,Roll,...,Rolling_Std_Rotation_y,Rolling_Mean_Rotation_z,Rolling_Std_Rotation_z,Rolling_Mean_Roll,Rolling_Std_Roll,Rolling_Mean_Pitch,Rolling_Std_Pitch,Rolling_Mean_Yaw,Rolling_Std_Yaw,label
1,0.043078,0.052828,0.049773,-1.927959,1.43325,-1.473999,-0.008723,-0.00719,-0.007021,-0.94712,...,,,,,,,,,,5
2,0.045528,0.03529,0.054416,-1.927965,1.433212,-1.474002,-0.000393,-0.00621,-0.008587,-0.947119,...,,,,,,,,,,5
3,0.040335,0.044224,0.054804,-1.927995,1.43317,-1.473929,-0.014094,-0.018712,-0.007472,-0.947137,...,,,,,,,,,,5
4,0.033889,0.04062,0.057669,-1.928035,1.433162,-1.4738,0.007391,-0.01775,-0.002871,-0.947165,...,,,,,,,,,,5
5,0.035704,0.049665,0.051411,-1.928073,1.433158,-1.47368,-0.010676,-0.018575,-0.004682,-0.947192,...,0.006399,-0.006127,0.00231,-0.947146,3.2e-05,-1.38859,3e-05,0.705856,7e-06,5


In [None]:
concat_df.isnull().sum()

Unnamed: 0,0
Acceleration_x,0
Acceleration_y,0
Acceleration_z,0
Gravity_x,0
Gravity_y,0
Gravity_z,0
Rotation_x,0
Rotation_y,0
Rotation_z,0
Roll,0


In [None]:
concat_df.head()

Unnamed: 0,Acceleration_x,Acceleration_y,Acceleration_z,Gravity_x,Gravity_y,Gravity_z,Rotation_x,Rotation_y,Rotation_z,Roll,...,Rolling_Std_Rotation_y,Rolling_Mean_Rotation_z,Rolling_Std_Rotation_z,Rolling_Mean_Roll,Rolling_Std_Roll,Rolling_Mean_Pitch,Rolling_Std_Pitch,Rolling_Mean_Yaw,Rolling_Std_Yaw,label
1,0.043078,0.052828,0.049773,-1.927959,1.43325,-1.473999,-0.008723,-0.00719,-0.007021,-0.94712,...,,,,,,,,,,5
2,0.045528,0.03529,0.054416,-1.927965,1.433212,-1.474002,-0.000393,-0.00621,-0.008587,-0.947119,...,,,,,,,,,,5
3,0.040335,0.044224,0.054804,-1.927995,1.43317,-1.473929,-0.014094,-0.018712,-0.007472,-0.947137,...,,,,,,,,,,5
4,0.033889,0.04062,0.057669,-1.928035,1.433162,-1.4738,0.007391,-0.01775,-0.002871,-0.947165,...,,,,,,,,,,5
5,0.035704,0.049665,0.051411,-1.928073,1.433158,-1.47368,-0.010676,-0.018575,-0.004682,-0.947192,...,0.006399,-0.006127,0.00231,-0.947146,3.2e-05,-1.38859,3e-05,0.705856,7e-06,5


In [None]:
concat_df = concat_df.dropna()

In [None]:
concat_df.head()

Unnamed: 0,Acceleration_x,Acceleration_y,Acceleration_z,Gravity_x,Gravity_y,Gravity_z,Rotation_x,Rotation_y,Rotation_z,Roll,...,Rolling_Std_Rotation_y,Rolling_Mean_Rotation_z,Rolling_Std_Rotation_z,Rolling_Mean_Roll,Rolling_Std_Roll,Rolling_Mean_Pitch,Rolling_Std_Pitch,Rolling_Mean_Yaw,Rolling_Std_Yaw,label
5,0.035704,0.049665,0.051411,-1.928073,1.433158,-1.47368,-0.010676,-0.018575,-0.004682,-0.947192,...,0.006399,-0.006127,0.00231,-0.947146,3.2e-05,-1.38859,3e-05,0.705856,7e-06,5
6,0.033104,0.041286,0.055961,-1.928112,1.43313,-1.473569,-0.005787,-0.015765,-0.007522,-0.947218,...,0.005272,-0.006227,0.002369,-0.947166,4e-05,-1.388572,2.2e-05,0.70585,1e-05,5
7,0.037965,0.051937,0.061748,-1.928164,1.433124,-1.473405,-0.001097,-0.026134,-0.007074,-0.947254,...,0.003951,-0.005924,0.00207,-0.947193,4.5e-05,-1.388559,1.5e-05,0.705842,1.4e-05,5
8,0.039683,0.045395,0.054342,-1.928216,1.433093,-1.473253,-0.004964,-0.025638,-0.00715,-0.947288,...,0.004782,-0.00586,0.002014,-0.947223,4.9e-05,-1.388548,2.1e-05,0.705834,1.5e-05,5
9,0.035978,0.032457,0.052904,-1.928279,1.433069,-1.473064,-0.007996,-0.031595,-0.009508,-0.947331,...,0.006346,-0.007187,0.001717,-0.947257,5.5e-05,-1.388533,2.7e-05,0.705824,1.9e-05,5


In [None]:
concat_df.columns

Index(['Acceleration_x', 'Acceleration_y', 'Acceleration_z', 'Gravity_x',
       'Gravity_y', 'Gravity_z', 'Rotation_x', 'Rotation_y', 'Rotation_z',
       'Roll', 'Pitch', 'Yaw', 'Cumsum_Acceleration_x',
       'Cumsum_Acceleration_y', 'Cumsum_Acceleration_z', 'Cumsum_Gravity_x',
       'Cumsum_Gravity_y', 'Cumsum_Gravity_z', 'Cumsum_Rotation_x',
       'Cumsum_Rotation_y', 'Cumsum_Rotation_z', 'Cumsum_Roll', 'Cumsum_Pitch',
       'Cumsum_Yaw', 'Rolling_Mean_Acceleration_x',
       'Rolling_Std_Acceleration_x', 'Rolling_Mean_Acceleration_y',
       'Rolling_Std_Acceleration_y', 'Rolling_Mean_Acceleration_z',
       'Rolling_Std_Acceleration_z', 'Rolling_Mean_Gravity_x',
       'Rolling_Std_Gravity_x', 'Rolling_Mean_Gravity_y',
       'Rolling_Std_Gravity_y', 'Rolling_Mean_Gravity_z',
       'Rolling_Std_Gravity_z', 'Rolling_Mean_Rotation_x',
       'Rolling_Std_Rotation_x', 'Rolling_Mean_Rotation_y',
       'Rolling_Std_Rotation_y', 'Rolling_Mean_Rotation_z',
       'Rolling_Std_Rota

In [None]:
# If you want to shuffle in place (modifying the original dataframe)
concat_df = concat_df.sample(frac=1, random_state=42).reset_index(drop=True)

In [None]:
len(concat_df)

269996

In [None]:
from sklearn.svm import SVC
from sklearn.model_selection import train_test_split

In [None]:
concat_df.isnull().sum()

Unnamed: 0,0
Acceleration_x,0
Acceleration_y,0
Acceleration_z,0
Gravity_x,0
Gravity_y,0
Gravity_z,0
Rotation_x,0
Rotation_y,0
Rotation_z,0
Roll,0


In [None]:
concat_df.dropna(inplace=True)

In [None]:
concat_df.isnull().sum()

Unnamed: 0,0
Acceleration_x,0
Acceleration_y,0
Acceleration_z,0
Gravity_x,0
Gravity_y,0
Gravity_z,0
Rotation_x,0
Rotation_y,0
Rotation_z,0
Roll,0


In [None]:
X = concat_df.drop(columns=['label'])

In [None]:
y = concat_df['label']
y.head()

Unnamed: 0,label
0,5
1,2
2,3
3,3
4,1


In [None]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)

In [None]:
len(X_train)

215996

In [None]:
from sklearn.model_selection import KFold, cross_val_score
from sklearn.linear_model import LogisticRegression
from sklearn.svm import SVC
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import RandomForestClassifier

In [None]:
# Define the models you want to evaluate
models = {
    'Logistic Regression': LogisticRegression(max_iter=3000),
    'SVM': SVC(),
    'Decision Tree': DecisionTreeClassifier(),
    'Random Forest': RandomForestClassifier()
}

# Create a KFold object with the desired number of folds (e.g., 5)
kfold = KFold(n_splits=5, shuffle=True, random_state=42)

In [None]:
best_model = None
best_score = 0

for model_name, model in models.items():
    # Calculate cross-validation scores
    scores = cross_val_score(model, X_train, y_train, cv=kfold, scoring='accuracy')

    # Print the average score for the current model
    avg_score = scores.mean()
    print(f'{model_name}: Average Accuracy = {avg_score:.4f}')

    # Update best model if current model has higher accuracy
    if avg_score > best_score:
        best_score = avg_score
        best_model = model
        best_model_name = model_name

print(f'\nBest Model: {best_model_name} with Accuracy = {best_score:.4f}')

Logistic Regression: Average Accuracy = 1.0000
SVM: Average Accuracy = 0.9977
Decision Tree: Average Accuracy = 1.0000
Random Forest: Average Accuracy = 1.0000

Best Model: Random Forest with Accuracy = 1.0000


In [None]:
from sklearn.metrics import accuracy_score, classification_report, confusion_matrix

In [None]:
X_train.columns

Index(['Acceleration_x', 'Acceleration_y', 'Acceleration_z', 'Gravity_x',
       'Gravity_y', 'Gravity_z', 'Rotation_x', 'Rotation_y', 'Rotation_z',
       'Roll', 'Pitch', 'Yaw', 'Cumsum_Acceleration_x',
       'Cumsum_Acceleration_y', 'Cumsum_Acceleration_z', 'Cumsum_Gravity_x',
       'Cumsum_Gravity_y', 'Cumsum_Gravity_z', 'Cumsum_Rotation_x',
       'Cumsum_Rotation_y', 'Cumsum_Rotation_z', 'Cumsum_Roll', 'Cumsum_Pitch',
       'Cumsum_Yaw', 'Rolling_Mean_Acceleration_x',
       'Rolling_Std_Acceleration_x', 'Rolling_Mean_Acceleration_y',
       'Rolling_Std_Acceleration_y', 'Rolling_Mean_Acceleration_z',
       'Rolling_Std_Acceleration_z', 'Rolling_Mean_Gravity_x',
       'Rolling_Std_Gravity_x', 'Rolling_Mean_Gravity_y',
       'Rolling_Std_Gravity_y', 'Rolling_Mean_Gravity_z',
       'Rolling_Std_Gravity_z', 'Rolling_Mean_Rotation_x',
       'Rolling_Std_Rotation_x', 'Rolling_Mean_Rotation_y',
       'Rolling_Std_Rotation_y', 'Rolling_Mean_Rotation_z',
       'Rolling_Std_Rota

In [None]:
clf = SVC()
clf.fit(X_train, y_train)

In [None]:
y_pred = clf.predict(X_test)

In [None]:
accuracy = accuracy_score(y_test, y_pred)
print(f"Accuracy: {accuracy:.4f}")
print("\nClassification Report:")
print(classification_report(y_test, y_pred))

print("\nConfusion Matrix:")
print(confusion_matrix(y_test, y_pred))

Accuracy: 0.9979

Classification Report:
              precision    recall  f1-score   support

           1       1.00      1.00      1.00     10863
           2       1.00      0.99      1.00     10797
           3       1.00      1.00      1.00     10680
           4       1.00      1.00      1.00     10730
           5       1.00      1.00      1.00     10930

    accuracy                           1.00     54000
   macro avg       1.00      1.00      1.00     54000
weighted avg       1.00      1.00      1.00     54000


Confusion Matrix:
[[10863     0     0     0     0]
 [   26 10722    49     0     0]
 [    0     0 10665    15     0]
 [    0     0     0 10712    18]
 [    8     0     0     0 10922]]


In [None]:
import pickle
from google.colab import drive

# Mount Google Drive
drive.mount('/content/drive')

# Specify the file path in your Google Drive
file_path = '/content/drive/MyDrive/trained_modelnew.pkl'  # Change to your desired path

# Save the model to the specified path
with open(file_path, 'wb') as file:
    pickle.dump(clf, file)

print(f"Model saved to: {file_path}")

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).
Model saved to: /content/drive/MyDrive/trained_modelnew.pkl
