# Smart Building

Scenario: You are a data scientist monitoring environmental conditions in a smart building. You have collected sensor data (temperature, humidity, pressure) over 24 hours, with readings taken every minute. Your goal is to process this raw data to find key statistics and identify anomalies.


## Sub-tasks: 

### 1. Data Generation: 

In [None]:
import pandas as pd
import numpy as np

* Create a NumPy array for time_in_minutes from 0 to 1439 (24 hours * 60 minutes).

In [10]:
time_index = pd.date_range(start='2025-07-02 00:00', periods=1440, freq='min')
df = pd.DataFrame(index=time_index)
df['minute_of_day'] = np.arange(1440)
# print(df)

* Generate synthetic temperature data: a base temperature (e.g., 22°C) with some random fluctuations (use np.random.normal).

In [38]:
np.random.seed(0)
df['temperature'] = 22 + np.random.normal(0, 1.5, size=len(df))

* Generate synthetic humidity data: a base humidity (e.g., 55%) with random fluctuations, ensuring values stay realistic (0-100%).

In [None]:
df['humidity'] = np.clip(55 + np.random.normal(0, 5, size=len(df)), 0, 100)

* Generate synthetic humidity data: a base humidity (e.g., 55%) with random fluctuations, ensuring values stay realistic (0-100%).

In [1]:
df['pressure'] = 1012 + np.random.normal(0, 2, size=len(df))

NameError: name 'np' is not defined

* Combine these into a single 2D NumPy array where each row represents a minute and columns are [time, temperature, humidity, pressure].

In [18]:
time_arr =   df['minute_of_day'].to_numpy()   
temp_arr =   df['temperature'].to_numpy()
hum_arr  =   df['humidity'].to_numpy()
pres_arr =   df['pressure'].to_numpy()
data_array = np.column_stack((time_arr, temp_arr, hum_arr, pres_arr))

### 2. Basic Statistics: 

* Calculate the average, minimum, maximum temperature, humidity, and pressure for the entire 24-hour period.

In [20]:
stats = df[['temperature','humidity','pressure']].agg(['mean','min','max'])
print("Overall Statistics:")
print(stats)

Overall Statistics:
      temperature   humidity     pressure
mean    21.966459  54.872714  1011.929317
min     17.430785  39.415717  1004.519799
max     26.756462  69.645481  1019.603320


* Find the standard deviation for each of these parameters.

In [22]:
stats = df[['temperature','humidity','pressure']].agg(['std'])
print("standard deviation:")
print(stats)

standard deviation:
     temperature  humidity  pressure
std     1.473817  4.801297  2.012628


### 3. Hourly Averages: 

* Reshape the data (or use slicing/aggregation) to calculate the average temperature, humidity, and pressure for each hour of the day. You should end up with 24 average values for each parameter.

In [24]:
hourly = df.resample('h').mean()[['temperature','humidity','pressure']]
print("\nHourly Averages:")
print(hourly)


Hourly Averages:
                     temperature   humidity     pressure
2025-07-02 00:00:00    22.115001  55.128445  1011.792983
2025-07-02 01:00:00    22.282834  55.253724  1011.740002
2025-07-02 02:00:00    22.021916  55.156534  1011.921764
2025-07-02 03:00:00    21.866334  55.039510  1011.887922
2025-07-02 04:00:00    21.920387  55.582867  1011.559820
2025-07-02 05:00:00    21.630648  54.441835  1012.001850
2025-07-02 06:00:00    21.948784  54.342268  1012.125766
2025-07-02 07:00:00    21.839964  54.714879  1012.009754
2025-07-02 08:00:00    21.871787  55.699440  1011.793921
2025-07-02 09:00:00    21.678087  54.829470  1012.067992
2025-07-02 10:00:00    21.809653  55.386401  1012.322285
2025-07-02 11:00:00    21.898273  54.083941  1011.996547
2025-07-02 12:00:00    21.719477  54.649354  1012.313464
2025-07-02 13:00:00    22.128185  54.774070  1011.813392
2025-07-02 14:00:00    21.727541  54.679070  1011.809301
2025-07-02 15:00:00    22.234686  54.697144  1011.697761
2025-07-02 16

### 4. Anomaly Detection (Simple): 

* Identify and count how many minutes the temperature exceeded a certain threshold (e.g., 25°C).

In [33]:
threshold = 25
exceed_count = (df['temperature'] > threshold).sum()
print(f"Minutes > {threshold}°C: {exceed_count} Minutes")

Minutes > 25°C: 32 Minutes


* Find the time (in minutes) when the minimum temperature occurred.

In [36]:
min_temp_time = df['temperature'].idxmin()
print(f"Time of minimum temperature: {min_temp_time}")

Time of minimum temperature: 2025-07-02 09:49:00


### 5. Data Export (Optional): 

* Save the combined 2D array to a .csv file using NumPy's saving functions.

In [37]:
df.to_csv('sensor_data.csv')
print("\nSaved to sensor_data.csv")


Saved to sensor_data.csv
