# üõ†Ô∏è Manual Model Training for Smart Home Intent Classification

**Purpose**: Train the `RandomForestClassifier` on Google Colab (Linux environment) to ensure compatibility with Vertex AI deployment.

**Steps**:
1. Upload your `dataset.csv` to the Colab 'Files' sidebar.
2. Run all cells.
3. Download the generated `model.joblib` file.
4. Place it in your local project's `model_artifacts/` folder and push to GitHub.


In [None]:
# 1. Install Dependencies
# Fixes:
# - huggingface-hub<0.25.0: For sentence-transformers 2.2.2 compatibility
# - numpy<2.0: For scikit-learn 1.3.0 compatibility (ComplexWarning error)
!pip install "numpy<2.0" pandas scikit-learn==1.3.0 sentence-transformers==2.2.2 joblib "huggingface-hub<0.25.0"

In [None]:
# 2. Import Libraries
import pandas as pd
import joblib
from sentence_transformers import SentenceTransformer
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import classification_report
import os

print("Libraries imported successfully!")

In [None]:
# 3. Load Data
data_path = "dataset.csv"
if not os.path.exists(data_path):
    print("‚ùå Error: dataset.csv not found. Please upload it to the Files sidebar!")
else:
    df = pd.read_csv(data_path)
    print(f"‚úÖ Data loaded: {len(df)} rows")

In [None]:
# 4. Generate Embeddings & Train Model
if 'df' in locals():
    print("üß† Generating embeddings...")
    embedder = SentenceTransformer('all-MiniLM-L6-v2')
    X = embedder.encode(df['Sentence'].tolist())
    y = df['Category']

    print("üå≤ Training Random Forest...")
    X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
    clf = RandomForestClassifier(n_estimators=100, random_state=42)
    clf.fit(X_train, y_train)

    print("üìä Evaluation:")
    print(classification_report(y_test, clf.predict(X_test)))
    
    # Save Model
    joblib.dump(clf, "model.joblib")
    print("üíæ model.joblib saved! Check the Files sidebar to download it.")