# ‚úàÔ∏è Voyage Analytics ‚Äì Travel Intelligence System  
### Integrating Machine Learning & MLOps for Smart Travel Decision Systems


## üë®‚Äçüíª Project Details

**Name:** P. Rohith  
**Batch:** DSAI / AIML  

This project is developed as part of the Data Science & Artificial Intelligence program focusing on real-world ML system deployment.


## üìò Project Summary

Voyage Analytics ‚Äì Travel Intelligence System is an end-to-end Machine Learning project designed to solve real-world problems in the travel and tourism industry.

This system integrates three intelligent modules:

1. ‚úàÔ∏è Flight Price Prediction  
   Predicts the estimated flight ticket price based on travel details such as airline agency, source city, destination city, travel class, distance, travel timing, and booking parameters.

2. üë§ Gender Prediction Model  
   Predicts customer gender based on behavioral and financial attributes such as age, number of bookings, total spending, and annual income.

3. üè® Hotel Recommendation System  
   Recommends similar hotels based on feature similarity using scaling and similarity computation techniques.

The project follows a complete Machine Learning lifecycle:
- Data Preprocessing
- Feature Engineering
- Model Training (Random Forest Regressor & Classifier)
- Model Evaluation
- Model Serialization using Joblib
- Deployment using Streamlit
- Containerization using Docker
- Kubernetes Enablement (MLOps integration)

This project demonstrates how ML models can be productionized and deployed as an interactive web application.


## üîó GitHub Repository

GitHub Repo Link:  
üëâ https://github.com/rohith-ponnala/voyage-analytics-mlops

The repository contains:
- Jupyter Notebooks (Model Development)
- Trained Model Files
- Streamlit Application
- Docker Configuration
- MLOps Components


## üéØ Problem Statement

The travel industry generates massive amounts of customer and booking data, but many travel platforms fail to leverage this data effectively for intelligent decision-making.

Customers often struggle with:
- Uncertainty in flight ticket pricing
- Lack of personalized hotel recommendations
- Limited behavioral insights about customer segmentation

There is a need for an intelligent system that:
- Predicts dynamic flight pricing
- Identifies customer behavioral patterns
- Recommends hotels based on similarity metrics
- Deploys ML models in a scalable and production-ready environment

Voyage Analytics addresses these challenges by building an integrated Travel Intelligence System powered by Machine Learning and MLOps practices.


# 1Ô∏è‚É£ Project Initialization & Library Imports


## 1.1 Import Required Libraries


In [None]:
from google.colab import drive
drive.mount('/content/drive')


Mounted at /content/drive


## 1.2 Mount Google Drive


In [None]:
import os

PROJECT_PATH = "/content/drive/MyDrive/voyage-analytics"
DATA_PATH = os.path.join(PROJECT_PATH, "data")
MODEL_PATH = os.path.join(PROJECT_PATH, "models")

os.makedirs(MODEL_PATH, exist_ok=True)


## 1.3 Define Dataset and Model Paths


In [None]:
import pandas as pd

flights_df = pd.read_csv(DATA_PATH + "/flights.csv")
users_df = pd.read_csv(DATA_PATH + "/users.csv")
hotels_df = pd.read_csv(DATA_PATH + "/hotels.csv")


# 2Ô∏è‚É£ Load Datasets


In [None]:
print("Flights Data:", flights_df.shape)
print("Users Data:", users_df.shape)
print("Hotels Data:", hotels_df.shape)


Flights Data: (271888, 10)
Users Data: (1340, 5)
Hotels Data: (40552, 8)


# 3Ô∏è‚É£ Flight Price Prediction Model



## 3.1 Data Preprocessing ‚Äì Flight Dataset


In [None]:
from sklearn.preprocessing import LabelEncoder

flight_df = flights_df.dropna().copy()

le = LabelEncoder()
for col in flight_df.select_dtypes(include='object').columns:
    flight_df[col] = le.fit_transform(flight_df[col])


## 3.2 Train-Test Split (Flight Model)


In [None]:
from sklearn.model_selection import train_test_split

X = flight_df.drop("price", axis=1)
y = flight_df["price"]

X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42
)


## 3.3 Train Random Forest Regressor


In [None]:
from sklearn.ensemble import RandomForestRegressor

flight_model = RandomForestRegressor(
    n_estimators=100,
    random_state=42
)

flight_model.fit(X_train, y_train)


## 3.4 Evaluate Flight Model Performance


In [None]:
from sklearn.metrics import mean_squared_error, r2_score
import numpy as np

y_pred = flight_model.predict(X_test)

rmse = np.sqrt(mean_squared_error(y_test, y_pred))
r2 = r2_score(y_test, y_pred)

rmse, r2


(np.float64(1.142791345969367), 0.9999900889432187)

# 4Ô∏è‚É£ Gender Classification Model



## 4.1 Data Preprocessing ‚Äì User Dataset


In [None]:
user_df = users_df.dropna().copy()

for col in user_df.select_dtypes(include='object').columns:
    user_df[col] = le.fit_transform(user_df[col])


## 4.2 Train Random Forest Classifier


In [None]:
from sklearn.ensemble import RandomForestClassifier

X_user = user_df.drop("gender", axis=1)
y_user = user_df["gender"]

X_train_u, X_test_u, y_train_u, y_test_u = train_test_split(
    X_user, y_user, test_size=0.2, random_state=42
)

gender_model = RandomForestClassifier(random_state=42)
gender_model.fit(X_train_u, y_train_u)


## 4.5 Evaluate Gender Classification Model


In [None]:
from sklearn.metrics import accuracy_score

y_pred_u = gender_model.predict(X_test_u)
accuracy_score(y_test_u, y_pred_u)


0.3880597014925373

# 5Ô∏è‚É£ Hotel Recommendation System


## 5.1 Standardizing Numerical Hotel Features Using StandardScaler


In [None]:
from sklearn.preprocessing import StandardScaler

hotel_features = hotels_df.select_dtypes(include='number')

scaler = StandardScaler()
scaled_features = scaler.fit_transform(hotel_features)





## 5.3 Cosine Similarity-Based Hotel Recommendation Function


In [None]:
from sklearn.metrics.pairwise import cosine_similarity
import numpy as np

def recommend_hotels(hotel_index, top_n=5):
    # Compute similarity only for one hotel
    sim_scores = cosine_similarity(
        scaled_features[hotel_index].reshape(1, -1),
        scaled_features
    )[0]

    # Get top N similar hotels
    similar_indices = np.argsort(sim_scores)[::-1][1:top_n+1]
    return similar_indices




## 6Ô∏è‚É£ Save All Trained Models Using Joblib


In [None]:
import os
import joblib

# Create model directory if not exists
os.makedirs(MODEL_PATH, exist_ok=True)

# Save models
joblib.dump(flight_model, MODEL_PATH + "/flight_price_model.pkl")
joblib.dump(gender_model, MODEL_PATH + "/gender_classification_model.pkl")
joblib.dump(scaler, MODEL_PATH + "/hotel_scaler.pkl")
joblib.dump(scaled_features, MODEL_PATH + "/hotel_features.pkl")

print("‚úÖ All models saved successfully!\n")

# List all saved files
saved_files = os.listdir(MODEL_PATH)

print("üì¶ Saved Model Files:")
for file in saved_files:
    print(" -", file)



‚úÖ All models saved successfully!

üì¶ Saved Model Files:
 - flight_price_model.pkl
 - flight_encoders.pkl
 - gender_classification_model.pkl
 - hotel_scaler.pkl
 - hotel_features.pkl


# ‚úÖ Project Completed Successfully

Models Trained:
- Flight Price Prediction (Regression)
- Gender Classification (Classification)
- Hotel Recommendation (Similarity-based)

Models Saved:
- flight_price_model.pkl
- gender_classification_model.pkl
- hotel_scaler.pkl
- hotel_features.pkl

Ready for Deployment:
- Flask API
- Docker
- MLflow
- Kubernetes
- Streamlit



## üìä Business Insights & Conclusion

The Voyage Analytics system provides valuable insights for travel companies and booking platforms:

üîπ Dynamic Pricing Strategy  
Flight price prediction helps platforms adjust pricing based on travel parameters and demand patterns.

üîπ Customer Segmentation  
Gender prediction based on behavioral spending allows better targeted marketing campaigns.

üîπ Personalized Recommendations  
Hotel recommendation improves customer experience and increases booking conversion rates.

üîπ Production-Ready Deployment  
With Streamlit deployment, Docker containerization, and Kubernetes enablement, the system is scalable and industry-ready.

### üöÄ Conclusion

This project successfully demonstrates the complete lifecycle of building, training, deploying, and productionizing Machine Learning models in a real-world travel domain use case.

Voyage Analytics is not just a model development project ‚Äî it is a fully integrated ML system aligned with modern MLOps practices.
