# VoltVault Machine Learning Model

This notebook demonstrates a simple implementation of a machine learning model designed to classify the conditions of a batteries room based on data collected from various sensors.

The model is trained using the dataset from [Kaggle - Environmental Sensor Data](https://www.kaggle.com/datasets/garystafford/environmental-sensor-data-132k).

In the preprocessing step, the data is cleaned and relevant features are extracted. Additionally, a new column is added to the dataset to define the condition of the batteries room based on predefined policies.

### Preprocessing

1. **Import pandas library**: The pandas library is imported to handle data manipulation and analysis.
   
2. **Load the dataset**: The dataset is loaded from a CSV file named `iot_telemetry_data.csv` into a DataFrame named `data`.

3. **Clean the dataset**: Unnecessary columns (`'ts'`, `'device'`, `'light'`, `'lpg'`, `'motion'`, `'smoke'`) are dropped from the DataFrame to focus on relevant data.

4. **Add 'habitable' column**: A new column named 'habitable' is added to the cleaned DataFrame based on specific conditions:
   - Temperature (`'temp'`) must be less than 21.
   - Humidity (`'humidity'`) must be less than 80.
   - Carbon monoxide (`'co'`) level must be less than 0.009.
   If all conditions are met, the environment is considered habitable and the column is set to 1; otherwise, it is set to 0.

5. **Count habitable elements**: The number of elements that are considered habitable (1) and not habitable (0) is counted.

6. **Display results**: The first five rows of the cleaned dataset and the count of habitable and not habitable elements are displayed.


In [6]:
import pandas as pd

# Load the dataset
file_path = 'iot_telemetry_data.csv'
data = pd.read_csv(file_path)

# Clean the dataset: drop unnecessary columns
data_cleaned = data.drop(columns=['ts', 'device', 'light', 'lpg', 'motion', 'smoke'])

# Add a column 'habitable' based on the given conditions
conditions = (data_cleaned['temp'] < 21) & (data_cleaned['humidity'] < 80) & (data_cleaned['co'] < 0.009)
data_cleaned['habitable'] = conditions.astype(int)

# Count the number of elements with 'habitable' set to 1 and 0
habitable_count = data_cleaned['habitable'].value_counts()

# Display the cleaned dataset and the count
data_cleaned.head(), habitable_count

(         co   humidity       temp  habitable
 0  0.004956  51.000000  22.700000          0
 1  0.002840  76.000000  19.700001          1
 2  0.004976  50.900000  22.600000          0
 3  0.004403  76.800003  27.000000          0
 4  0.004967  50.900000  22.600000          0,
 habitable
 0    294739
 1    110445
 Name: count, dtype: int64)

### Training

1. **Import necessary libraries**: Import the required modules from scikit-learn:
   - `train_test_split` from `sklearn.model_selection` for splitting the dataset into training and testing sets.
   - `RandomForestClassifier` from `sklearn.ensemble` for training the model.
   - `classification_report` and `accuracy_score` from `sklearn.metrics` for evaluating the model.

2. **Prepare the data for training**:
   - `features` contains the columns `['temp', 'humidity', 'co']` from `data_cleaned`, which are used as input variables for the model.
   - `target` contains the `habitable` column from `data_cleaned`, which is the output variable the model will predict.

3. **Split the data into training and testing sets**:
   - `train_test_split` function is used to split the data into training set (`X_train` and `y_train`) and testing set (`X_test` and `y_test`).
   - `test_size=0.2` indicates that 20% of the data will be used for testing, and the remaining 80% for training.
   - `random_state=42` ensures reproducibility of the split.

4. **Train the model using a RandomForestClassifier**:
   - `rf_classifier` is instantiated with `n_estimators=100` (indicating 100 trees in the forest) and `random_state=42` for reproducibility.
   - The `fit` method is called on `rf_classifier` with `X_train` and `y_train` to train the model.

5. **Make predictions on the test set**:
   - The `predict` method is called on `rf_classifier` with `X_test` to generate predictions (`predictions`).

6. **Evaluate the model**:
   - `accuracy_score` is used to calculate the accuracy of the model by comparing `y_test` with `predictions`.
   - `classification_report` is used to generate a detailed report on the model's performance, including precision, recall, F1-score, and support for each class.

7. **Print the results**:
   - The accuracy score is printed.
   - The classification report is printed, providing a comprehensive evaluation of the model's performance.

In [9]:
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import classification_report, accuracy_score

# Preparazione dei dati per l'addestramento
features = data_cleaned[['temp', 'humidity', 'co']]
target = data_cleaned['habitable']

# Divisione dei dati in train e test set
X_train, X_test, y_train, y_test = train_test_split(features, target, test_size=0.2, random_state=42)

# Addestramento del modello con un RandomForestClassifier
rf_classifier = RandomForestClassifier(n_estimators=100, random_state=42)
rf_classifier.fit(X_train, y_train)

# Previsioni sul test set
predictions = rf_classifier.predict(X_test)

# Valutazione del modello
accuracy = accuracy_score(y_test, predictions)
report = classification_report(y_test, predictions)

print('Accuracy: ', accuracy)
print(report)

Accuracy:  1.0
              precision    recall  f1-score   support

           0       1.00      1.00      1.00     58959
           1       1.00      1.00      1.00     22078

    accuracy                           1.00     81037
   macro avg       1.00      1.00      1.00     81037
weighted avg       1.00      1.00      1.00     81037



### Exporting the Model to C Code

The following Python code uses the `emlearn` library to convert the trained RandomForestClassifier model into C code. This is done for deploying the model into the nrf52840 microcontroller.


In [10]:
import emlearn

path = 'machine_learning.h'

cmodel = emlearn.convert(rf_classifier, method='inline')

cmodel.save(file=path, name='machine_learning')

print('Wrote model to', path)

Wrote model to machine_learning.h
