
# AEROSAUR Smart Mode Machine Learning

This notebook simulates **AEROSAUR's AI-powered Smart Mode** which automatically adjusts the purifier’s **fan speed** based on environmental conditions and user habits.

---

### Overview

We trained a **Random Forest Classifier**, a Machine Learning algorithm that uses multiple decision trees to make accurate predictions.  
It works by analyzing historical usage data and finding patterns between **time**, **air quality**, and **purifier usage** to recommend an optimal **fan speed**.

**Why Random Forest?**
- Handles **non-linear relationships** (perfect for environmental data that fluctuates)
- **Prevents overfitting** (by combining many decision trees)
- Provides **high accuracy** and **feature importance** insights


In [1]:

!pip install pandas numpy scikit-learn matplotlib joblib




In [2]:

import pandas as pd
import numpy as np

np.random.seed(42)
n = 1000

data = {
    'time': np.random.randint(0, 24, n),
    'aqi': np.random.randint(0, 201, n),
    'isPurifierOn': np.random.choice([0, 1], n)
}

df = pd.DataFrame(data)

# Define realistic behavior patterns for fan speed
def determine_fan_speed(row):
    if row['isPurifierOn'] == 0:
        return 0  # If purifier is OFF, fan must be OFF
    elif row['aqi'] > 150:
        return 3  # High
    elif row['aqi'] > 100:
        return 2  # Medium
    elif row['aqi'] > 50:
        return 1  # Low
    else:
        return 0  # Clean air

df['fanSpeed'] = df.apply(determine_fan_speed, axis=1)

df.head(10)


Unnamed: 0,time,aqi,isPurifierOn,fanSpeed
0,6,46,0,0
1,19,0,1,0
2,14,89,0,0
3,10,141,1,2
4,7,63,1,1
5,20,37,1,0
6,6,36,1,0
7,18,125,0,0
8,22,138,1,2
9,10,99,0,0


In [3]:

from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score, classification_report
import numpy as np

noise_std = 2
df['aqi_noisy'] = df['aqi'] + np.random.normal(0, noise_std, size=len(df))
df['aqi_prev'] = df['aqi_noisy'].shift(1).fillna(df['aqi_noisy'])

# input
X = df[['time', 'aqi_noisy', 'aqi_prev', 'isPurifierOn']]

# output
y = df['fanSpeed']

# 80% para sa training and 20% para sa testing
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

model = RandomForestClassifier(n_estimators=100, random_state=42)
model.fit(X_train, y_train)

y_pred = model.predict(X_test)
acc = accuracy_score(y_test, y_pred)

print(f"Model Accuracy: {acc*100:.2f}%\n")
print(classification_report(y_test, y_pred))

Model Accuracy: 98.50%

              precision    recall  f1-score   support

           0       1.00      0.99      1.00       121
           1       0.97      0.97      0.97        29
           2       0.93      1.00      0.97        28
           3       1.00      0.95      0.98        22

    accuracy                           0.98       200
   macro avg       0.97      0.98      0.98       200
weighted avg       0.99      0.98      0.99       200




### Algorithm Explanation

**Random Forest Classifier**
- Combines the predictions of many smaller Decision Trees to make one accurate result.
- Each tree looks at random parts of the dataset (time, AQI, temperature, etc.) and “votes” on what fan speed should be.
- The final fan speed prediction is based on the **majority vote** of all trees.

This makes it:
- Highly **robust** against noisy sensor readings.
- Able to **learn user patterns** over time.
- **Accurate** even as conditions change.


In [4]:

# scenario: 9 PM, AQI 180, purifier ON, 29°C, 60% humidity
import pandas as pd

sample = pd.DataFrame([[21, 180, 1]], columns=['time', 'aqi', 'isPurifierOn'])
prediction = model.predict(sample)
print("Predicted Fan Speed:", prediction[0])  # 0=Off, 1=Low, 2=Medium, 3=High

ValueError: The feature names should match those that were passed during fit.
Feature names unseen at fit time:
- aqi
Feature names seen at fit time, yet now missing:
- aqi_noisy
- aqi_prev


In [None]:

import matplotlib.pyplot as plt

plt.bar(X.columns, model.feature_importances_)
plt.title('Feature Importance in Fan Speed Prediction')
plt.ylabel('Importance')
plt.show()



import joblib
joblib.dump(model, 'aerosaur_fan_model_v2.joblib')
print("Model saved successfully as aerosaur_fan_model_v2.joblib")


## Integrating the AI Model with the IoT Purifier System

After training the Random Forest model, AEROSAUR can use it to **automatically adjust the purifier’s fan speed in real time** based on live sensor readings.

### System Integration Flow
1. **ESP32 Microcontroller** reads live environmental data:
   - AQI (from MQ135 / PMS5003 sensor)
   - Temperature & humidity (from DHT22)
   - Current time
   - Whether the purifier is ON or OFF
2. The ESP32 sends these values to the **cloud server or web backend** via Wi-Fi (e.g., REST API or MQTT).
3. The **backend loads the trained model (`aerosaur_fan_model_v2.joblib`)** and predicts the optimal fan speed.
4. The **predicted fan speed** is then sent back to the ESP32.
5. The ESP32 adjusts the **fan’s PWM (Pulse Width Modulation)** signal accordingly to change the fan’s speed in real time.

### Example Pseudocode for Backend Prediction
```python
import joblib
import numpy as np

# Load the pre-trained model
model = joblib.load('aerosaur_fan_model_v2.joblib')

# Example data from IoT device (time, aqi, isPurifierOn, temperature, humidity)
incoming_data = np.array([[21, 180, 1, 29, 60]])

predicted_speed = model.predict(incoming_data)[0]
print("Recommended Fan Speed:", predicted_speed)

int fanSpeed = getPredictionFromServer(); // 0=Off, 1=Low, 2=Medium, 3=High

if (fanSpeed == 0) analogWrite(FAN_PIN, 0);
else if (fanSpeed == 1) analogWrite(FAN_PIN, 85);
else if (fanSpeed == 2) analogWrite(FAN_PIN, 170);
else if (fanSpeed == 3) analogWrite(FAN_PIN, 255);

In [None]:
import numpy as np, pandas as pd, time

# Simulate AQI values changing through the day
for hour in range(8, 23, 2):
    aqi = np.random.randint(50, 300)
    isPurifierOn = 1
    sample = pd.DataFrame([[hour, aqi, isPurifierOn]], columns=['time', 'aqi', 'isPurifierOn'])
    speed = model.predict(sample)[0]
    print(f"Time: {hour}:00 | AQI: {aqi} | Predicted Fan Speed: {speed}")
    time.sleep(1)  # pause for realism

In [None]:
import matplotlib.pyplot as plt

aqi_values = np.arange(0, 300, 10)
predicted_speeds = []

for aqi in aqi_values:
    sample = pd.DataFrame([[12, aqi, 1]], columns=['time', 'aqi', 'isPurifierOn'])
    predicted_speeds.append(model.predict(sample)[0])

plt.plot(aqi_values, predicted_speeds, marker='o')
plt.title("AI Reaction to Changing AQI Levels")
plt.xlabel("AQI (Air Quality Index)")
plt.ylabel("Predicted Fan Speed")
plt.grid(True)
plt.show()