<center><p float="center">
  <img src="https://upload.wikimedia.org/wikipedia/commons/e/e9/4_RGB_McCombs_School_Brand_Branded.png" width="300" height="100"/>
  <img src="https://mma.prnewswire.com/media/1458111/Great_Learning_Logo.jpg?p=facebook" width="200" height="100"/>
</p></center>

<center><font size=10>Artificial Intelligence and Machine Learning</center></font>
<center><font size=6>Model Deployment: Week 2 - Containerization</center>

<center><img src="https://www.vectorlogo.zone/logos/airbnb/airbnb-ar21.svg" width="720"></center>

<center><b><font size='5'>Airbnb Rental Price Prediction</font></b></center>

# Problem Statement

## Business Context

Airbnb has transformed the short-term rental industry by providing a platform that connects property owners with travelers seeking accommodation. From individual hosts offering single-room listings to large-scale property management firms, Airbnb serves a broad spectrum of customers with diverse needs. With an expanding number of listings globally, setting the right rental price has become increasingly complex. Hosts aim to optimize their pricing strategies to maximize occupancy while ensuring competitive pricing that attracts guests. Airbnb itself must ensure its pricing structure remains attractive, ensuring both hosts and travelers find value.

To address these challenges, Airbnb has adopted a data-driven approach to create a dynamic pricing strategy. By leveraging historical rental data, Airbnb uses predictive models to assist hosts in setting optimal rental prices. This solution not only enhances business profitability but also improves customer satisfaction by ensuring fair and competitive pricing.

The existing model for rental price prediction, while effective, is facing challenges with scaling and deployment. The demand for real-time predictions from Airbnb’s geographically dispersed hosts and internal teams is growing, leading to a need for a scalable and reliable solution.

## Objective

As a Data Scientist at Airbnb, you are tasked with deploying the rental price prediction model into a containerized, decoupled web application for efficient, real-time use by Airbnb hosts and internal teams. The goal is to ensure that the model can be accessed reliably by geographically distributed teams and stakeholders, providing them with accurate rental price predictions in a user-friendly manner. The solution should:

- **Enable Real-Time Predictions:** Allow Airbnb hosts to input property details and obtain rental price predictions instantly, improving data-driven pricing decisions.

- **Ensure Scalable Deployment:** Address the challenges of scaling and distributing the model by deploying it in a containerized, microservice-based architecture.

- **Provide Seamless User Interaction:** Host the model’s prediction API in a Flask backend and create a Streamlit frontend, decoupled from the backend, allowing for easy integration and a smooth user experience.

- **Simplify Deployment Across Environments:** Leverage Docker to encapsulate the model, environment, and dependencies, ensuring consistent and reliable deployment in multiple Hugging Face Spaces, avoiding issues related to different system environments.

Successful implementation will streamline Airbnb’s pricing strategy, reduce latency in providing price recommendations, and enhance the overall host and user experience across the platform.

# Deployment Approach

The crux of our solution is to decouple the frontend and backend of the application for better accessibility and seamless integration with other services. This modular design enhances maintainability and scalability.

**Backend Development (Flask):**

1. We will develop a Flask application (app.py) responsible for:
  - Loading the serialized XGBoost model we trained and saved as rental_price_prediction_model_v1_0.joblib.
  - Exposing two API endpoints:
    - `/v1/rental`: For predicting the rental price of a single property, accepting input features as JSON, mirroring the data format used during training and EDA.
    - `/v1/rentalbatch`: For batch predictions on multiple properties, processing data uploaded as a CSV file, consistent with the dataset structure.
2. This backend handles the core prediction logic, applying the same data preprocessing steps (imputation, scaling, and encoding) used during model training to ensure consistency and accuracy. It returns predictions as JSON responses, facilitating easy integration with the frontend.
3. We'll deploy this Flask app, the serialized model, and the requirements.txt file to a Hugging Face Space using a Dockerfile. This makes the prediction service publicly accessible via a unique URL.


**Frontend Development (Streamlit):**

1. A separate Streamlit application (app.py) will serve as the user interface, tailored for Airbnb hosts and internal teams.

2. This frontend will include:
  - Form-based inputs for online predictions, allowing users to enter property details like room type, accommodates, bathrooms, etc., aligning with the features used in the model.
  - A CSV file uploader for batch predictions, enabling users to analyze multiple properties simultaneously, similar to the dataset format used for training.
3. The Streamlit app will use the `requests` library to communicate with the Flask API, sending input data and displaying predictions in a user-friendly format. This interaction facilitates a seamless user experience.
4. Similar to the backend, we'll deploy the Streamlit app to a Hugging Face Space using a `Dockerfile` with its own `requirements.txt` for managing dependencies.

Once deployed, Airbnb stakeholders can access the application through the frontend URL. They can input property details or upload a CSV file to obtain instant rental price predictions, enabling data-driven pricing strategies. This decoupled approach ensures flexibility and ease of access for Airbnb users.

# Load the Serialized Model

**Note:** To ensure continuity and leverage our previous work, we'll utilize the serialized XGBoost model trained and saved in the previous week.

In [None]:
import os

# Create a folder to upload your trained serialized model into it
os.makedirs("backend_files", exist_ok=True)

- We need to now upload the serialized model (`rental_price_prediction_model_v1_0.joblib`) into the `backend_files` folder.
- Once uploaded, we will load this model into our application for generating rental price predictions.

- This approach allows us to seamlessly integrate the pre-trained model into our deployment workflow, eliminating the need for retraining.

In [None]:
# Define the file path to load the uploaded serialized model
model_path = "backend_files/rental_price_prediction_model_v1_0.joblib"

In [None]:
import joblib

# Load the saved model pipeline from the file
saved_model = joblib.load(model_path)

# Confirm the model is loaded
print("Model loaded successfully.")

Model loaded successfully.


In [None]:
saved_model

Let's try making predictions on the batch dataset `(airbnb_rental_batch_data.csv)` using the deserialized model.
- Please ensure that the saved model `(rental_price_prediction_model_v1_0.joblib)` is loaded before making predictions.
- We will apply `np.exp` to the predictions to convert the log prices into actual prices.

In [None]:
import pandas as pd
import numpy as np

# Load the data:
airbnb_batch_data = pd.read_csv("airbnb_rental_batch_data.csv")

In [None]:
# Make predictions (fet log_prices)
predictions_log_prices = saved_model.predict(airbnb_batch_data)

In [None]:
# Convert log prices to actual prices
predictions_actual_prices = np.exp(predictions_log_prices)

In [None]:
# Display predictions:
print(predictions_actual_prices)

[402.91107  588.39026  160.17053   52.501183 868.17816  118.19457
  36.735027 911.51044  173.2732    49.24195  156.42319  213.47662
 132.43124  206.422    158.66995 ]


- As we can see, the model can be directly used for making predictions without any retraining.

# App Backend

## Setting up a Hugging Face Docker Space for the Backend

- We are creating a Hugging Face Docker Space for our backend using the Hugging Face Hub API.
- This automates the space creation process and enables seamless deployment of our Flask app.

In [None]:
# Import the login function from the huggingface_hub library
from huggingface_hub import login

# Login to your Hugging Face account using your access token
# Replace "YOUR_HUGGINGFACE_TOKEN" with your actual token
login(token="YOUR_HUGGINGFACE_TOKEN")

# Import the create_repo function from the huggingface_hub library
from huggingface_hub import create_repo

In [None]:
# Try to create the repository for the Hugging Face Space
try:
    create_repo("RentalPricePredictionBackend",  # One can replace "Backend_Docker_space" with the desired space name
        repo_type="space",  # Specify the repository type as "space"
        space_sdk="docker",  # Specify the space SDK as "docker" to create a Docker space
        private=False  # Set to True if you want the space to be private
    )
except Exception as e:
    # Handle potential errors during repository creation
    if "RepositoryAlreadyExistsError" in str(e):
        print("Repository already exists. Skipping creation.")
    else:
        print(f"Error creating repository: {e}")

Error creating repository: 409 Client Error: Conflict for url: https://huggingface.co/api/repos/create (Request ID: Root=1-681c547b-5e138f5411385b5029b9c86d;277dfb6c-8be9-4f6b-a664-d2a4dce94f40)

You already created this space repo


## Flask Web Framework


In [None]:
%%writefile backend_files/app.py
# Import necessary libraries
import numpy as np
import joblib  # For loading the serialized model
import pandas as pd  # For data manipulation
from flask import Flask, request, jsonify  # For creating the Flask API

# Initialize the Flask application
rental_price_predictor_api = Flask("Airbnb Rental Price Predictor")

# Load the trained machine learning model
model = joblib.load("rental_price_prediction_model_v1_0.joblib")

# Define a route for the home page (GET request)
@rental_price_predictor_api.get('/')
def home():
    """
    This function handles GET requests to the root URL ('/') of the API.
    It returns a simple welcome message.
    """
    return "Welcome to the Airbnb Rental Price Prediction API!"

# Define an endpoint for single property prediction (POST request)
@rental_price_predictor_api.post('/v1/rental')
def predict_rental_price():
    """
    This function handles POST requests to the '/v1/rental' endpoint.
    It expects a JSON payload containing property details and returns
    the predicted rental price as a JSON response.
    """
    # Get the JSON data from the request body
    property_data = request.get_json()

    # Extract relevant features from the JSON data
    sample = {
        'room_type': property_data['room_type'],
        'accommodates': property_data['accommodates'],
        'bathrooms': property_data['bathrooms'],
        'cancellation_policy': property_data['cancellation_policy'],
        'cleaning_fee': property_data['cleaning_fee'],
        'instant_bookable': property_data['instant_bookable'],
        'review_scores_rating': property_data['review_scores_rating'],
        'bedrooms': property_data['bedrooms'],
        'beds': property_data['beds']
    }

    # Convert the extracted data into a Pandas DataFrame
    input_data = pd.DataFrame([sample])

    # Make prediction (get log_price)
    predicted_log_price = model.predict(input_data)[0]

    # Calculate actual price
    predicted_price = np.exp(predicted_log_price)

    # Convert predicted_price to Python float
    predicted_price = round(float(predicted_price), 2)
    # The conversion above is needed as we convert the model prediction (log price) to actual price using np.exp, which returns predictions as NumPy float32 values.
    # When we send this value directly within a JSON response, Flask's jsonify function encounters a datatype error

    # Return the actual price
    return jsonify({'Predicted Price (in dollars)': predicted_price})


# Define an endpoint for batch prediction (POST request)
@rental_price_predictor_api.post('/v1/rentalbatch')
def predict_rental_price_batch():
    """
    This function handles POST requests to the '/v1/rentalbatch' endpoint.
    It expects a CSV file containing property details for multiple properties
    and returns the predicted rental prices as a dictionary in the JSON response.
    """
    # Get the uploaded CSV file from the request
    file = request.files['file']

    # Read the CSV file into a Pandas DataFrame
    input_data = pd.read_csv(file)

    # Make predictions for all properties in the DataFrame (get log_prices)
    predicted_log_prices = model.predict(input_data).tolist()

    # Calculate actual prices
    predicted_prices = [round(float(np.exp(log_price)), 2) for log_price in predicted_log_prices]

    # Create a dictionary of predictions with property IDs as keys
    property_ids = input_data['id'].tolist()  # Assuming 'id' is the property ID column
    output_dict = dict(zip(property_ids, predicted_prices))  # Use actual prices

    # Return the predictions dictionary as a JSON response
    return output_dict

# Run the Flask application in debug mode if this script is executed directly
if __name__ == '__main__':
    rental_price_predictor_api.run(debug=True)

Writing backend_files/app.py


## Dependencies File

In [None]:
%%writefile backend_files/requirements.txt
pandas==2.2.2
numpy==2.0.2
scikit-learn==1.6.1
xgboost==2.1.4
joblib==1.4.2
Werkzeug==2.2.2
flask==2.2.2
gunicorn==20.1.0
requests==2.28.1
uvicorn[standard]
streamlit==1.43.2

Writing backend_files/requirements.txt


## Dockerfile

In [None]:
%%writefile backend_files/Dockerfile
FROM python:3.9-slim

# Set the working directory inside the container
WORKDIR /app

# Copy all files from the current directory to the container's working directory
COPY . .

# Install dependencies from the requirements file without using cache to reduce image size
RUN pip install --no-cache-dir --upgrade -r requirements.txt

# Define the command to start the application using Gunicorn with 4 worker processes
# - `-w 4`: Uses 4 worker processes for handling requests
# - `-b 0.0.0.0:7860`: Binds the server to port 7860 on all network interfaces
# - `app:app`: Runs the Flask app (assuming `app.py` contains the Flask instance named `app`)
CMD ["gunicorn", "-w", "4", "-b", "0.0.0.0:7860", "app:rental_price_predictor_api"]

Writing backend_files/Dockerfile


## Uploading Files to Hugging Face Space for the Backend

In [None]:
# for hugging face space authentication to upload files
from huggingface_hub import HfApi

repo_id = "Thiresh/RentalPricePredictionBackend"  # Your Hugging Face space id

# Initialize the API
api = HfApi()

# Upload Streamlit app files stored in the folder called deployment_files
api.upload_folder(
    folder_path="/content/backend_files",  # Local folder path
    repo_id=repo_id,  # Hugging face space id
    repo_type="space",  # Hugging face repo type "space"
)

CommitInfo(commit_url='https://huggingface.co/spaces/Thiresh/RentalPricePredictionBackend/commit/71d67fd9dca6e026d8623a69dd25dcc878af3ac4', commit_message='Upload folder using huggingface_hub', commit_description='', oid='71d67fd9dca6e026d8623a69dd25dcc878af3ac4', pr_url=None, repo_url=RepoUrl('https://huggingface.co/spaces/Thiresh/RentalPricePredictionBackend', endpoint='https://huggingface.co', repo_type='space', repo_id='Thiresh/RentalPricePredictionBackend'), pr_revision=None, pr_num=None)

# App Frontend

## Setting up a Hugging Face Docker Streamlit Space for the Frontend

## Points to note before executing the below cells
- Create a Streamlit space on Hugging Face by following the instructions provided on the content page titled **`Creating Spaces and Adding Secrets in Hugging Face`** from Week 1

## Streamlit for Interactive UI

In [None]:
# Create a folder for storing the files needed for frontend UI deployment
os.makedirs("frontend_files", exist_ok=True)

In [None]:
%%writefile frontend_files/app.py
import streamlit as st
import pandas as pd
import requests

# Set the title of the Streamlit app
st.title("Airbnb Rental Price Prediction")

# Section for online prediction
st.subheader("Online Prediction")

# Collect user input for property features
room_type = st.selectbox("Room Type", ["Entire home/apt", "Private room", "Shared room"])
accommodates = st.number_input("Accommodates (Number of guests)", min_value=1, value=2)
bathrooms = st.number_input("Bathrooms", min_value=1, step=1, value=2)
cancellation_policy = st.selectbox("Cancellation Policy (kind of cancellation policy)", ["strict", "flexible", "moderate"])
cleaning_fee = st.selectbox("Cleaning Fee Charged?", ["True", "False"])
instant_bookable = st.selectbox("Instantly Bookable?", ["False", "True"])
review_scores_rating = st.number_input("Review Score Rating", min_value=0.0, max_value=100.0, step=1.0, value=90.0)
bedrooms = st.number_input("Bedrooms", min_value=0, step=1, value=1)
beds = st.number_input("Beds", min_value=0, step=1, value=1)

# Convert user input into a DataFrame
input_data = pd.DataFrame([{
    'room_type': room_type,
    'accommodates': accommodates,
    'bathrooms': bathrooms,
    'cancellation_policy': cancellation_policy,
    'cleaning_fee': cleaning_fee,
    'instant_bookable': 'f' if instant_bookable=="False" else "t",  # Convert to 't' or 'f'
    'review_scores_rating': review_scores_rating,
    'bedrooms': bedrooms,
    'beds': beds
}])

# Make prediction when the "Predict" button is clicked
if st.button("Predict"):
    response = requests.post("https://<username>-<repo_id>.hf.space/v1/rental", json=input_data.to_dict(orient='records')[0])  # Send data to Flask API
    if response.status_code == 200:
        prediction = response.json()['Predicted Price (in dollars)']
        st.success(f"Predicted Rental Price (in dollars): {prediction}")
    else:
        st.error("Error making prediction.")

# Section for batch prediction
st.subheader("Batch Prediction")

# Allow users to upload a CSV file for batch prediction
uploaded_file = st.file_uploader("Upload CSV file for batch prediction", type=["csv"])

# Make batch prediction when the "Predict Batch" button is clicked
if uploaded_file is not None:
    if st.button("Predict Batch"):
        response = requests.post("https://<username>-<repo_id>.hf.space/v1/rentalbatch", files={"file": uploaded_file})  # Send file to Flask API
        if response.status_code == 200:
            predictions = response.json()
            st.success("Batch predictions completed!")
            st.write(predictions)  # Display the predictions
        else:
            st.error("Error making batch prediction.")

Overwriting frontend_files/app.py


## Dependencies File

In [None]:
%%writefile frontend_files/requirements.txt
pandas==2.2.2
requests==2.28.1
streamlit==1.43.2

Overwriting frontend_files/requirements.txt


## Dockerfile

In [None]:
%%writefile frontend_files/Dockerfile
# Use a minimal base image with Python 3.9 installed
FROM python:3.9-slim

# Set the working directory inside the container to /app
WORKDIR /app

# Copy all files from the current directory on the host to the container's /app directory
COPY . .

# Install Python dependencies listed in requirements.txt
RUN pip3 install -r requirements.txt

# Define the command to run the Streamlit app on port 8501 and make it accessible externally
CMD ["streamlit", "run", "app.py", "--server.port=8501", "--server.address=0.0.0.0", "--server.enableXsrfProtection=false"]

# NOTE: Disable XSRF protection for easier external access in order to make batch predictions

## Uploading Files to Hugging Face Space for the Frontend

In [None]:
# for hugging face space authentication to upload files
from huggingface_hub import HfApi

repo_id = "--------------------------"  # Your Hugging Face space id

# Initialize the API
api = HfApi()

# Upload Streamlit app files stored in the folder called deployment_files
api.upload_folder(
    folder_path="/content/frontend_files",  # Local folder path
    repo_id=repo_id,  # Hugging face space id
    repo_type="space",  # Hugging face repo type "space"
)

CommitInfo(commit_url='https://huggingface.co/spaces/Thiresh/RentalPricePredictionFrontendDocker/commit/868d01bae88e1b0d793ab5f3de0a582f2c02a525', commit_message='Upload folder using huggingface_hub', commit_description='', oid='868d01bae88e1b0d793ab5f3de0a582f2c02a525', pr_url=None, repo_url=RepoUrl('https://huggingface.co/spaces/Thiresh/RentalPricePredictionFrontendDocker', endpoint='https://huggingface.co', repo_type='space', repo_id='Thiresh/RentalPricePredictionFrontendDocker'), pr_revision=None, pr_num=None)

# Inferencing using Flask API


As the ***frontend and backend are decoupled***, we can ***access the backend directly for predictions***.
- The decoupling ensures seamless interaction with the deployed model while leveraging the API for scalable inference.

Let's see how to interact with the Flask API programatically within this notebook to perform **online** and **batch inference**.

We will
1. Send API requests for both online and batch inference.
2. Process and check the model predictions.

In [None]:
import json  # To handle JSON formatting for API requests and responses
import requests  # To send HTTP requests to the deployed Flask API

import pandas as pd  # For data manipulation and analysis
import numpy as np  # For numerical computations

In [None]:
model_root_url = "https://<username>-<repo_id>.hf.space"  # Base URL of the deployed Flask API on Hugging Face Space; enter user name and space name before running the cell

In [None]:
model_url = model_root_url + "/v1/rental"  # Endpoint for online (single) inference

Since our model predictions are served through the Flask endpoint we created, we need to call this endpoint to make a prediction.

> ```@app.post('/v1/rental')```

In [None]:
model_batch_url = model_root_url + "/v1/rentalbatch"  # Endpoint for batch inference

> ```@app.post('/v1/rentalbatch')```

## Online Inference

The idea is to send a single request to the API and receive an immediate response. This is useful for real-time applications like recommendation systems and fraud detection.

* This data is sent as a JSON payload in a POST request to the model endpoint.
* The model processes the input features and returns a prediction as a JSON payload.

In [None]:
payload = {
  "room_type": "Entire home/apt",
  "accommodates": 5,
  "bathrooms": 3,
  "cancellation_policy": "strict",
  "cleaning_fee": True,
  "instant_bookable": "f",
  "review_scores_rating": 90,
  "bedrooms": 3,
  "beds": 3
}

# This payload dictionary includes all the necessary features in the expected
# format for online (single property) prediction, ensuring consistency with the
# model's training data.

In [None]:
# Sending a POST request to the model endpoint with the test payload
response = requests.post(model_url, json=payload)

In [None]:
response

<Response [200]>

- The `<Response [200]>` you see is an HTTP status code.
- It indicates that your request was successful, and the server was able to process it without any problems.

In [None]:
print(response.json())

{'Predicted Price (in dollars)': 422.59}


## Batch Inference

The idea is to send a batch of requests to the API and receive a response. The backend reads the entire dataset, runs it through the ML model, and returns the prediction for every row in the file. This is useful for applications like loan default prediction and customer churn prediction, where we don't need results instantaneously.

* This data is sent as a CSV file in a POST request to the model endpoint.
* The model processes each row containing the input features and returns the predictions for each row as one single JSON payload.

In [None]:
import pandas as pd

In [None]:
# Load the sample batch data for Airbnb
airbnb_batch_data = pd.read_csv("airbnb_rental_batch_data.csv")

- The model was trained using certain set of numerical and categorical features before being serialized.
- We need to use the same set of features and pass the data to the API in order to get predictions.
- We define these feature lists below, where we select the necessary columns from the batch data to ensure the model receives the expected input format for prediction.

In [None]:
# List of numerical features in the Airbnb dataset
numeric_features = [
    'id',
    'accommodates',
    'bathrooms',
    'review_scores_rating',
    'bedrooms',
    'beds'
]

# List of categorical features in the Airbnb dataset
categorical_features = [
    'room_type',
    'cancellation_policy',
    'cleaning_fee',
    'instant_bookable'
]

# Define predictor matrix (X) using selected numeric and categorical features
batch_input_data = airbnb_batch_data[numeric_features + categorical_features]

In [None]:
# Prepare batch input for API request
batch_input = {
    'file': batch_input_data.to_csv(header=True, index=False).encode('utf-8')
}

In [None]:
# Send request to the model API for batch predictions
response = requests.post(
    model_batch_url,  # Model endpoint URL
    files=batch_input
)

In [None]:
response

<Response [200]>

In [None]:
response.text

'{"3808709":158.67,"6304928":213.48,"6901257":156.42,"7919400":132.43,"13418779":206.42,"14567890":402.91,"15678901":588.39,"16789012":160.17,"17890123":52.5,"18901234":868.18,"19012345":118.19,"20123456":36.74,"21234567":911.51,"22345678":173.27,"23456789":49.24}\n'

- As we can see, we receive a JSON where each key represents a property ID, and the value represents the model's predicted rental price (in dollars) for that property.

# Conclusion


1. **Flexibility and Scalability**: By separating the frontend and backend, we can easily update or scale each component independently. This means we can make changes to the user interface without affecting the prediction model, or vice versa. This also allows us to handle a large number of requests by scaling the backend without impacting the frontend's performance. It's like having a system with changeable parts, making it more adaptable and robust.

2. **Technology Agnostic**: The decoupled architecture allows us to use different technologies for the frontend and backend. For example, we can use Streamlit for the frontend and Flask for the backend, or any other suitable technologies. This flexibility enables us to choose the best tools for the job at hand.

3. **Reusability**: The backend API can be reused by other applications or services. This means we can integrate the prediction functionality into different parts of Airbnb's platform or even share it with external partners. This fosters greater efficiency and integration possibilities, extending the model's benefits beyond a single application. It's like creating a versatile tool that can be used in various projects, maximizing its value.

<font size=6 color="blue">Power Ahead!</font>
___