<a href="https://colab.research.google.com/github/mariam138/ease-app/blob/main/ml_Ease.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## Machine Learning  - EASE Aapp

> Please note that this is a sample code and is not to be run as there is no existing user data. This is to showcase  use and deployment.  



Collect User Behaviour Data in the Backend
✅ Example Data Structure (JSON from Ease app):

json

```
{
  "user_id": "123",
  "event": "gym",
  "day": "Thursday",
  "hour": 20,
  "wash_type": "delicate",
  "suggestion_accepted": true
}

```

Store these entries in a database


### Train ML Model
We can train and compare several models as follows: As this is multiclass, we train and test the following and use hyperparameter tuning and validation techniques.

✅ Decision Trees, Random Forest

✅ Clustering  (eg. KNN)

✅ Neural Networks

In [None]:
#example ML modelling (Random Forest)
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split, GridSearchCV
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import classification_report, accuracy_score
import joblib
import os

# extract dataset from database, csv
df = pd.read_csv("user_wash_data.csv")


# Data processing: One-hot encode categorical variables
df_encoded = pd.get_dummies(df, columns=["event", "day", "wash_type"])
X = df_encoded.drop("accepted", axis=1)
y = df_encoded["accepted"]

# Train/test split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Random Forest with hyperparameter tuning
rf_params = {
    'n_estimators': [50, 100],
    'max_depth': [3, 5, 10]
}
rf = GridSearchCV(RandomForestClassifier(random_state=42), rf_params, cv=5)
rf.fit(X_train, y_train)

# Evaluate model
y_pred = rf.predict(X_test)
print("Best Parameters:", rf.best_params_)
print("Accuracy:", accuracy_score(y_test, y_pred))
print("Classification Report:\n", classification_report(y_test, y_pred))

# *Save model
os.makedirs("model", exist_ok=True)
joblib.dump(rf, "model/random_forest_model.pkl")

* several models with hyper-parameter tuning should be trained/validated and compared before saving the best performance model

### 4. Model Deployment
Once modelling has been completed and models are compared and cross validated. Model is deployed back into the app

In [None]:
from fastapi import FastAPI
from pydantic import BaseModel
import pandas as pd
import joblib

# Initialize FastAPI app
app = FastAPI()

# Load model
model = joblib.load("model/random_forest_model.joblib")

# These should match your training features
FEATURE_COLUMNS = [
    'hour',
    'event_gym', 'event_rest', 'event_work',
    'day_Friday', 'day_Monday', 'day_Saturday', 'day_Sunday',
    'day_Thursday', 'day_Tuesday', 'day_Wednesday',
    'wash_type_delicate', 'wash_type_normal', 'wash_type_spin'
]

# Define request body model using Pydantic
class WashInput(BaseModel):
    hour: int
    event_gym: int = 0
    event_rest: int = 0
    event_work: int = 0
    day_Friday: int = 0
    day_Monday: int = 0
    day_Saturday: int = 0
    day_Sunday: int = 0
    day_Thursday: int = 0
    day_Tuesday: int = 0
    day_Wednesday: int = 0
    wash_type_delicate: int = 0
    wash_type_normal: int = 0
    wash_type_spin: int = 0

@app.post("/predict")
def predict(input_data: WashInput):
    input_dict = input_data.dict()
    input_df = pd.DataFrame([input_dict])
    input_df = input_df.reindex(columns=FEATURE_COLUMNS, fill_value=0)

    prediction = model.predict(input_df)[0]

    return {"accepted": bool(prediction)}
