## Dataset Description

**Name:** IoT-Based Environmental Dataset

**Source:** [Kaggle - IoT-Based Environmental Dataset](https://www.kaggle.com/datasets/ziya07/iot-based-environmental-dataset?resource=download)

**Summary:**  
This dataset provides detailed environmental and mental health data collected from a university setting using IoT sensors. It includes environmental metrics such as temperature, humidity, air quality, noise, lighting, and crowd density, as well as student-reported mental health indicators like stress level, sleep hours, mood score, and mental health status. The goal is to analyze how environmental conditions may influence students' well-being.

**Structure:**  
- Each row represents a 15-minute interval of environmental readings (e.g., temperature, noise, air quality) from various campus locations.
- The dataset contains 1000 rows and 12 columns.
- The dataset contains the following columns:

| Column Name   | Description                                      |
|---------------|--------------------------------------------------|
| timestamp	| Time of environmental reading data capture |
| location_id | Identifier where sensors are deployed. |
| temperature_celsius |	Ambient temperature in Celsius |
| humidity_percent | Relative humidity percentage |
| noise_level_db | Noise level in decibels |
| lighting_lux | Illumination intensity in lux |
| crowd_density | Number of people in the area |
| stress_level | Modeled student stress score (0–100), derived from environmental variables. |
| sleep_hours | Estimated sleep duration in hours based on stress levels. |
| mood_score | Modeled emotional score ranging from -3 (very negative) to +3 (very positive). |
| mental_health_status | Binary indicator (0 = Normal, 1 = Mild Risk, 2 = At Risk) |

**Provenance:**  
Compiled and published by Ziya on Kaggle. Last updated in 2025.

**License:**  
Check the Kaggle page for licensing details; the dataset is typically available for educational and non-commercial use.

**Note:**  
- The location_id column refers to the specific IoT sensor or monitored area within the university environment.
- The dataset was likely compiled from various environmental sensors and self-reported student responses, then structured into a CSV file.
- mental_health_status is a simplified binary label and may not capture the full complexity of a student's psychological condition.

**Potential Implications and Biases:**
- Since the data involves self-reported mental health metrics, responses may be subject to personal bias, underreporting, or overestimation.
- Sensor accuracy and calibration may affect the consistency and precision of environmental measurements (e.g., noise or air quality).
- The dataset is limited to a university population and may not generalize to broader demographic or institutional contexts.
- Environmental conditions are highly dynamic, and snapshots in time may not fully capture long-term exposure or effects.




In [106]:
import pandas as pd
import numpy as np
mental_health_df = pd.read_csv('university_mental_health_iot_dataset.csv')
mental_health_df.shape

(1000, 12)

**The dataset contains 1000 observations through the pandas 'shape' attribute.**

## Target Research Questions

**1. How do environmental factors such as temperature, humidity, air quality, noise, lighting, crowd density relate to student stress levels?**

EDA: What are the correlations between each environmental factor and the reported stress_level in the dataset?

**2. Does sleep duration moderate the relationship between environmental stressors and student mood?**

EDA: How does mood_score vary with sleep_hours among students exposed to high vs. low environmental stressors (ex. high noise or poor air quality)?

**3. Are there specific classroom conditions associated with poorer mental health status among students?**

EDA: What are the average values of temperature, air_quality_index, and noise_level in locations/times where mental_health_status indicates a problem (ex. status = 1 or 2) compared to times/places where status = 0?

## Data Preprocessing

It is noticed that the decimal inconsistency in the specified columns: **temperature_celsius**, **humidity_percent**, **noise_level_db**, **lighting_lux**, **sleep_hours**, and **mood_score** constitutes a data integrity issue affecting numerical precision and standardization. Through data cleaning, specifically rounding to uniform decimal digits the dataset’s quality, reliability, and analytical value are significantly enhanced.

The function **'min_decimal_places'** is used to identify the lowest number of decimal digits present within each specified column. This lowest decimal digit is then applied uniformly to all values in the column, ensuring consistent decimal formatting. Such standardization improves data integrity and facilitates accurate analysis.

In [107]:
def min_decimal_places(series):
    series = series.dropna()
    decimals = series.astype(str).apply(
        lambda x: len(x.split('.')[-1]) if '.' in x else 0
    )
    return decimals.min()


The function **'implement_min_decimal'** is used to implement the data cleaning basing on the returned number of decimal digits from 'min_decimal_places' function and then rounded off if the min_decimal_places value is < than the data's decimal value

In [108]:
def implement_min_decimal(column):
    cleaned_column = column.round(min_decimal_places(column))
    return cleaned_column

**Implementation of the funtions to the columns**

In [109]:
#Brute Force Method: Calling of implement_max_decimal and max_decimal_places for each column mentioned
mental_health_df['temperature_celsius'] = implement_min_decimal(mental_health_df['temperature_celsius'])

mental_health_df['humidity_percent'] = implement_min_decimal(mental_health_df['humidity_percent'])

mental_health_df['noise_level_db'] = implement_min_decimal(mental_health_df['noise_level_db'])

mental_health_df['lighting_lux'] = implement_min_decimal(mental_health_df['lighting_lux'])

mental_health_df['sleep_hours'] = implement_min_decimal(mental_health_df['sleep_hours'])

mental_health_df['mood_score'] = implement_min_decimal(mental_health_df['mood_score'])

mental_health_df['mental_health_status'] = implement_min_decimal(mental_health_df['mental_health_status'])

#Verification for column's datas that were cleaned
print(mental_health_df['mood_score'].head(20))


0     2.3
1     1.7
2     2.9
3     0.0
4     3.0
5     1.6
6     2.0
7     2.5
8     1.0
9     1.3
10   -0.2
11    0.6
12    3.0
13    1.4
14    0.0
15    1.0
16    0.1
17    1.2
18    2.4
19    0.2
Name: mood_score, dtype: float64


In [110]:
#Dynamic Method: Calling of implement_max_decimal and max_decimal_places for each column mentioned
columns = [
    'temperature_celsius', 'humidity_percent', 'noise_level_db',
    'lighting_lux', 'sleep_hours', 'mood_score', 'mental_health_status'
]

mental_health_df[columns] = mental_health_df[columns].apply(implement_min_decimal)

#Verification for column's datas that were cleaned
print(mental_health_df['mood_score'].head(20))

0     2.3
1     1.7
2     2.9
3     0.0
4     3.0
5     1.6
6     2.0
7     2.5
8     1.0
9     1.3
10   -0.2
11    0.6
12    3.0
13    1.4
14    0.0
15    1.0
16    0.1
17    1.2
18    2.4
19    0.2
Name: mood_score, dtype: float64
