#  Vyapar AI - Model Training Pipeline

This notebook generates synthetic business data and trains 4 machine learning models for the Vyapar AI Microservice.

**Models Trained:**
1.  **Churn Prediction** (Logistic Regression) - Predicts customer risk.
2.  **Inventory Forecasting** (Linear Regression) - Predicts restock needs.
3.  **Lead Scoring** (Random Forest) - Classifies sales leads.
4.  **Expense Fraud Detection** (Isolation Forest) - Detects anomalies.

In [1]:
import pandas as pd
import numpy as np
from sklearn.linear_model import LogisticRegression, LinearRegression
from sklearn.ensemble import RandomForestClassifier, IsolationForest
import joblib
import os
import shutil

# Create directory to store models
os.makedirs("models", exist_ok=True)
print("âœ… Environment Ready. Model directory created.")

âœ… Environment Ready. Model directory created.


## 1. Customer Churn Prediction
* **Type:** Classification
* **Algorithm:** Logistic Regression
* **Input Features:** Days Inactive, Support Tickets, Monthly Bill
* **Target:** 0 (Stay), 1 (Churn)

In [2]:
# Generate Synthetic Data
X_churn = np.random.rand(1000, 3)
X_churn[:, 0] = X_churn[:, 0] * 100  # Days Inactive (0-100)
X_churn[:, 1] = X_churn[:, 1] * 10   # Tickets (0-10)
X_churn[:, 2] = X_churn[:, 2] * 5000 # Bill (0-5000)

y_churn = []
for x in X_churn:
    # Logic: High inactivity (>40) OR High tickets (>5) -> High Churn Risk
    prob = 0.1
    if x[0] > 40: prob += 0.5
    if x[1] > 5: prob += 0.3
    y_churn.append(1 if np.random.rand() < prob else 0)

# Train Model
clf_churn = LogisticRegression()
clf_churn.fit(X_churn, y_churn)
joblib.dump(clf_churn, "models/churn_model.pkl")

print(f"âœ… Churn Model Trained. Accuracy: {clf_churn.score(X_churn, y_churn):.2f}")

âœ… Churn Model Trained. Accuracy: 0.73


## 2. Inventory Health Forecasting
* **Type:** Regression
* **Algorithm:** Linear Regression
* **Input Features:** Current Stock, Daily Sales Average
* **Target:** Recommended Restock Quantity

In [3]:
X_inv = np.random.rand(1000, 2)
X_inv[:, 0] = X_inv[:, 0] * 100 # Stock (0-100)
X_inv[:, 1] = X_inv[:, 1] * 20  # Daily Sales (0-20)

y_inv = []
for x in X_inv:
    stock, sales = x[0], x[1]
    # Logic: If stock covers less than 7 days, restock needed
    days_left = stock / (sales + 0.1)
    if days_left < 7:
        y_inv.append((10 - days_left) * sales) # Restock to reach 10 days coverage
    else:
        y_inv.append(0)

reg_inv = LinearRegression()
reg_inv.fit(X_inv, y_inv)
joblib.dump(reg_inv, "models/inventory_model.pkl")
print("âœ… Inventory Model Trained")

âœ… Inventory Model Trained


## 3. Sales Lead Scoring
* **Type:** Classification
* **Algorithm:** Random Forest Classifier
* **Input Features:** Budget, Urgency (1-10)
* **Target:** 0 (Cold Lead), 1 (Hot Lead)

In [4]:
X_lead = np.random.rand(1000, 2)
X_lead[:, 0] = X_lead[:, 0] * 100000 # Budget
X_lead[:, 1] = X_lead[:, 1] * 10     # Urgency

y_lead = []
for x in X_lead:
    # Logic: High budget OR High urgency = Hot Lead
    if x[0] > 50000 or x[1] > 7:
        y_lead.append(1)
    else:
        y_lead.append(0)

clf_lead = RandomForestClassifier(n_estimators=10)
clf_lead.fit(X_lead, y_lead)
joblib.dump(clf_lead, "models/lead_model.pkl")
print("âœ… Lead Model Trained")

âœ… Lead Model Trained


## 4. Expense Audit (Anomaly Detection)
* **Type:** Unsupervised Learning (Anomaly Detection)
* **Algorithm:** Isolation Forest
* **Input Features:** Transaction Amount
* **Target:** -1 (Anomaly/Fraud), 1 (Normal)

In [5]:
# Generate "Normal" transactions (< 5000)
X_exp_normal = np.random.rand(800, 1) * 5000
# Generate "Fraud" transactions (5000 - 10000)
X_exp_anomaly = np.random.rand(200, 1) * 5000 + 5000 
X_exp = np.concatenate([X_exp_normal, X_exp_anomaly])

clf_exp = IsolationForest(contamination=0.2, random_state=42)
clf_exp.fit(X_exp)
joblib.dump(clf_exp, "models/expense_model.pkl")
print("âœ… Expense Model Trained")

âœ… Expense Model Trained


In [None]:
shutil.make_archive("models", 'zip', "models")
print("ðŸ“¦ Models zipped successfully as 'models.zip'")

# If running in Google Colab, uncomment below to download:
from google.colab import files
files.download("models.zip")