**About DATASET**

This dataset contains accelerometer and gyroscope readings gathered from participants
performing a variety of exercises. It includes sensor data from the accelerometer (x, y, z
axes) and gyroscope (x, y, z axes) across different exercise types and intensities. This data
is well-suited for analyzing movement patterns, developing activity recognition models,
and training machine learning algorithms for fitness and health monitoring.
* ep (ms): Timestamp in milliseconds, representing the exact recording time.
* Acc_x: X-axis acceleration value from the fitness tracker.
* Acc_y: Y-axis acceleration value from the fitness tracker.
* Acc_z: Z-axis acceleration value from the fitness tracker.
* Gyro_x: X-axis rotational velocity (gyroscope) reading.
* Gyro_y: Y-axis rotational velocity (gyroscope) reading.
* Gyro_z: Z-axis rotational velocity (gyroscope) reading.
* ID: Identifier for the individual performing the exercise.
* Exercise: Type of exercise or movement (e.g., bench press, overhead press).
* Category: Intensity of the exercise (e.g., heavy, medium).
* Set: Set number or batch identifier for the recorded session.

*import dependencies*

In [1]:
import pandas as pd
import numpy as np
from collections import Counter

*creation d'un objet pandas*

In [41]:
# s = pd.Series([1, 4, -1, np.nan, .5, 1])

*1- Write a function to load the dataset.*

In [2]:
def load_data(chemin):
   try:
      data = pd.read_csv(chemin , sep=';' )
      return data
   except Exception as e:
      print(f"Une erreur s'est produite : {e}")

In [3]:
chemin_data = '..//DATA//DatasetExos.csv'
data = load_data(chemin_data)

In [4]:
data.head(10)

Unnamed: 0,ep (ms),Acc_x,Acc_y,Acc_z,Gyro_x,Gyro_y,Gyro_z,ID,Label,Category,Set
0,2019-01-11 15:08:05.200,0.0135,0.977,-0.071,-2.094.366.723,257.720.316,0.9388000000000002,B,bench,heavy,30.0
1,2019-01-11 15:08:05.400,-0.0014999999999999,0.9705,-0.0794999999999999,-16.826,-0.8904,21.708,B,bench,heavy,30.0
2,2019-01-11 15:08:05.600,0.0013333333333333,0.9716666666666668,-0.0643333333333333,526.942.212,-0.2559999999999999,-14.146,B,bench,heavy,30.0
3,2019-01-11 15:08:05.800,-0.024,0.957,-0.0735,8.061,-45.244,-2.073,B,bench,heavy,30.0
4,2019-01-11 15:08:06.000,-0.0279999999999999,0.9576666666666666,-0.115,2.439,-15.486,-36.098,B,bench,heavy,30.0
5,2019-01-11 15:08:06.200,-0.026,0.965,-0.118,0.4634000000000002,52.194,-64.636,B,bench,heavy,30.0
6,2019-01-11 15:08:06.400,-0.0486666666666666,0.79,-0.1453333333333333,21.695,81.708,1.582.030.845,B,bench,heavy,30.0
7,2019-01-11 15:08:06.600,-0.1699999999999999,0.8995,-0.25,175.246,15.976,-175.854,B,bench,heavy,30.0
8,2019-01-11 15:08:06.800,-0.2226666666666666,0.907,-0.2043333333333333,-72.318,-13.536,-0.40260000000000007,B,bench,heavy,30.0
9,2019-01-11 15:08:07.000,-0.2045,0.93,-0.149,-28.683,-335.699.969,205.732,B,bench,heavy,30.0


*2- Write a function to display basic information about the dataset.*

In [None]:

def display_info(data):
    print("data information : \n " , data.info())
    

In [8]:
   
def display_dataset_info(data):
    # number of rows and columns)
    print("Dataset shape:", data.shape)
    
    # noms des colonnes
    print("\nColumn names:")
    print(data.columns.tolist())
    
    # type de données
    print("\nData types:")
    print(data.dtypes)
    
    # Display summary statistics for numerical columns
    print("\nSummary statistics:")
    print(data.describe())
    
    # Display missing values for each column
    print("\nMissing values:")
    print(data.isnull().sum())



In [9]:
display_dataset_info(data)

Dataset shape: (9009, 11)

Column names:
['ep (ms)', 'Acc_x', 'Acc_y', 'Acc_z', 'Gyro_x', 'Gyro_y', 'Gyro_z', 'ID', 'Label', 'Category', 'Set']

Data types:
ep (ms)      object
Acc_x        object
Acc_y        object
Acc_z        object
Gyro_x       object
Gyro_y       object
Gyro_z       object
ID           object
Label        object
Category     object
Set         float64
dtype: object

Summary statistics:
               Set
count  9003.000000
mean     46.105520
std      34.108085
min     -10.000000
25%      23.000000
50%      47.000000
75%      70.000000
max    2000.000000

Missing values:
ep (ms)     11
Acc_x        3
Acc_y        2
Acc_z        3
Gyro_x       1
Gyro_y       0
Gyro_z       1
ID           2
Label        0
Category     4
Set          6
dtype: int64


*3- Write a function to calculate the central tendencies of an attribute.*

* *fonction median*

In [14]:
def median(data):
    # Supprimer les NaN
    cleaned_data = data.dropna().sort_values()
    n = len(cleaned_data)
    
    if n == 0:
        return None  # Si la liste est vide après avoir supprimé les NaN
    
    if n % 2 == 0:
        Q2 = (cleaned_data.iloc[n // 2 - 1] + cleaned_data.iloc[n // 2]) / 2
    else:
        Q2 = cleaned_data.iloc[n // 2]
    return Q2


In [15]:
median(pd.to_numeric(data['Acc_x'], errors='coerce'))

0.016

* *focntion mood*

In [16]:
def mode(data):
     # Supprimer les NaN
    cleaned_data = data.dropna()
    if len(cleaned_data) == 0:
        return None
    
    # compter le nombre d'occurrences de chaque valeur dans data.
    occurence = cleaned_data.value_counts()
    # Cette valeur représente le nombre d'occurrences maximum trouvé dans data
    max_count = occurence.iloc[0]
    # Cette ligne filtre la Series occurence pour ne garder que les éléments dont la valeur (le nombre d’occurrences) est égale à max_count. Ensuite, elle utilise .index pour récupérer les valeurs qui correspondent à ces occurrences maximales.
    mode_list = occurence[occurence == max_count].index.tolist()
    
    return mode_list

In [17]:
mode(data['Acc_z'])

['-0.12', '-0.125']

* *fonction quartiles*

In [18]:
def quartiles(data):
    # Supprimer les NaN
    cleaned_data = data.dropna().sort_values()
  
    Q0 = cleaned_data.min()
    Q4 = cleaned_data.max()
    Q2 = median(cleaned_data)
    Q1 = cleaned_data.iloc[int(len(cleaned_data) * 0.25)]
    Q3 = cleaned_data.iloc[int(len(cleaned_data) * 0.75)]

    return Q0 , Q1 , Q2 , Q3 , Q4
    

In [19]:
quartiles(pd.to_numeric(data['Acc_x'], errors='coerce'))

(-0.8380000000000001, -0.1115, 0.016, 0.1176666666666666, 10.255)

In [20]:
import pandas as pd

def calculate_central_tendencies(data, column_name):
    if column_name not in data.columns:
        print(f"Column '{column_name}' does not exist in the dataset.")
        return
    
    # convert to numeric 
    data[column_name] = pd.to_numeric(data[column_name], errors='coerce') #s'il y'a de valeur Nan
    
    # Calculate mean
    mean_value = data[column_name].mean()
    
    # Calculate median
    median_value = median(data[column_name])
    
    # Calculate mode
    mode_value = mode(data[column_name])

    # Handle the case of mode
    if len(mode_value) > 1:
        mode_value = mode_value.tolist()  # Convert to list if multiple modes
    else:
        mode_value = mode_value[0]  # Take the single mode value

    # Display results
    print(f"Central tendencies for '{column_name}':")
    print(f"Mean: {mean_value}")
    print(f"Median: {median_value}")
    print(f"Mode: {mode_value}")



In [37]:
calculate_central_tendencies(data, 'Acc_x')  # Replace 'Acc_x' with the column you want to analyze

Central tendencies for 'Acc_x':
Mean: 0.04590467712621936
Median: 0.016
Mode: 0.078


*4- Write a function to calculate the quartiles (Q0, Q1, Q2, Q3, Q4) of an attribute.*

In [69]:
# Convert numeric columns (as defined earlier)
# for column in ['Acc_x', 'Acc_y', 'Acc_z', 'Gyro_x', 'Gyro_y', 'Gyro_z']:
#     data[column] = pd.to_numeric(data[column], errors='coerce')

In [23]:
def calculate_quartiles(data, column_name):
    if column_name not in data.columns:
        print(f"Column '{column_name}' does not exist in the dataset.")
        return

    Q0, Q1, Q2, Q3, Q4 = quartiles(data[column_name])
    print(f"Quartiles for '{column_name}':")
    print(f"Q0 (Min): {Q0}")
    print(f"Q1 (25th percentile): {Q1}")
    print(f"Q2 (Median): {Q2}")
    print(f"Q3 (75th percentile): {Q3}")
    print(f"Q4 (Max): {Q4}")



In [24]:
calculate_quartiles(data, 'Acc_x')

Quartiles for 'Acc_x':
Q0 (Min): -0.8380000000000001
Q1 (25th percentile): -0.1115
Q2 (Median): 0.016
Q3 (75th percentile): 0.1176666666666666
Q4 (Max): 10.255


*5- Write a function to display the number and percentage of missing values for an attribute.*

In [25]:
def display_missing_values(data, column_name):
    if column_name not in data.columns:
        print(f"Column '{column_name}' does not exist in the dataset.")
        return

    missing_count = data[column_name].isnull().sum()
    total_count = data[column_name].shape[0]
    missing_percentage = (missing_count / total_count) * 100
    
    print(f"Missing values in '{column_name}':")
    print(f"Count: {missing_count}")
    print(f"Percentage: {missing_percentage:.2f}%")

In [26]:
display_missing_values(data, 'Acc_x')

Missing values in 'Acc_x':
Count: 22
Percentage: 0.24%


*6- Write a function to display the number of unique values for an attribute.*

In [48]:
def display_unique_values(data, column_name):
    
    if column_name not in data.columns:
        print(f"Column '{column_name}' does not exist in the dataset.")
        return
    print(type(data[column_name][0]))
    unique_count = data[column_name].nunique()
    print(f"Unique values in '{column_name}': {unique_count}")

In [49]:
display_unique_values(data, 'ID')  # Change 'ID' to any column you want to check unique values for

<class 'str'>
Unique values in 'ID': 6
