**Step 1: Define the problem**

Pathao/Foodpanda needs to tell a customer: "Your ride will arrive in X minutes"

We'll predict ETA using:
* Distance (km)
* Traffic level (low/medium/high)
* Time of day (morning/noon/evening/night)
* Weather (Clear/Rainy)



**Step 2: Create a Mock Dataset**

We'll use dummy but realistic data for learning

**Step 3: Preprocess**

* Encode categorical variables (Traffic, TimeOfDay, Weather) by OneHotEncoding
* Normalize Distance if needed
* Split into train/test (80/20)

**Step 4: Train the Model**

* Use Linear Regression
* Train the model to predict ETA
* Evaluate with Root Mean Squared Error (RMSE)

**Step 5: Save Model & Encoder**


**Step 6: Deploy on Streamlit Cloud**

**1. Setup and Libraries**

In [25]:
# !pip install pandas scikit-learn joblib

In [26]:
import joblib
import pandas as pd
import numpy as np
import random
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import OneHotEncoder
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error

**2. Create Ride Data**

In [27]:
random.seed(42)

def generate_data(n=1000):
  distances = np.round(np.random.uniform(1,15, n),2)  # 1 to 15 km
  traffic_levels = random.choices(['low','medium','high'], k=n)
  times_of_day = random.choices(['morning','noon','evening','night'], k=n)
  weather_conditions = random.choices(['clear','rainy'], k=n)

  eta = []
  for dist, traffic, tod, weather in zip(distances,traffic_levels,times_of_day,weather_conditions):
    base_time = dist * 3  # base 3 mins per km
    if traffic == 'low': base_time *= 1.2
    if traffic == 'medium': base_time *= 1.5
    if traffic == 'high': base_time *=1.8
    if weather == 'rainy': base_time *= 1.3
    if tod in ['morning','evening']: base_time *=1.2
    
    # Minimum realistic time (cannot be faster than 40 km/h)
    min_time = dist * 1.5  # 40 km/h = 1.5 min per km
    final_time = max(base_time, min_time)
    eta.append(round(final_time, 1))

  df = pd.DataFrame({
      'Distance_km':distances,
      'Traffic': traffic_levels,
      'TimeOfDay': times_of_day,
      'Weather': weather_conditions,
      'ETA_min': eta
  })
  return df

df = generate_data(1500)

df.head()

Unnamed: 0,Distance_km,Traffic,TimeOfDay,Weather,ETA_min
0,10.1,medium,evening,rainy,70.9
1,2.39,low,night,rainy,11.2
2,4.42,low,morning,rainy,24.8
3,8.01,low,evening,clear,34.6
4,2.97,high,night,clear,16.0


**3. Preprocessing**

In [28]:
# Separate features and target

x = df[['Distance_km', 'Traffic', 'TimeOfDay', 'Weather']]
y = df['ETA_min']

# One-hot encoding categorical variables
encoder = OneHotEncoder(sparse_output=False)
x_encoded = encoder.fit_transform(x[['Traffic','TimeOfDay','Weather']])

# Combine with distance
x_final = np.hstack([x[['Distance_km']].values, x_encoded])

x_train, x_test, y_train, y_test = train_test_split(x_final, y, test_size=0.2, random_state=42)

**4. Train Model**

In [29]:
model = LinearRegression(positive=True)
model.fit(x_train,y_train)

y_pred = model.predict(x_test)
y_pred = np.maximum(y_pred, 0)
mse = mean_squared_error(y_test, y_pred)
rmse = np.sqrt(mse)
print(rmse)

5.330170897625306


**5. Save Model & Encoder**

In [30]:
joblib.dump(model, 'ride_eta_model.pkl')
joblib.dump(encoder, "encoder.pkl")

['encoder.pkl']

**6. Streamlit Deploy in streamlit_app.py**