The Customer Propensity Model project aims to assist an early-stage e-commerce company in increasing its conversion rates by predicting the likelihood of a user making a purchase. we are going to build a model by analyzing user behavior and historical data and use the model to predict the probability of a user purchasing a product within a specified time frame, allowing the company to target users with personalized marketing campaigns effectively.
CustomerPropensityModel
β
βββ .github
β βββ workflows
β βββ automate.yml
β
βββ CustomerPropensityModel.egg-info
β
βββ artifacts
β βββ model.pkl
β βββ modelling_data.csv
β βββ preprocessor.pkl
β βββ raw.csv
β βββ raw_processed.csv
β βββ raw_with_rfm_features.csv
β
βββ build
β
βββ dist
β
βββ logs
β
βββ mlruns
β βββ 0
β
βββ notebooks
β βββ EDA with RFM Modelling.ipynb
β βββ Feature Engineering.ipynb
β βββ Model building.ipynb
β
βββ src
β βββ components
β β βββ data_ingestion.py
β β βββ data_preprocessing.py
β β βββ data_transformation.py
β β βββ feature_engineer.py
β β βββ model_evaluation.py
β β βββ model_trainer.py
β β
β βββ exception
β β βββ logger.py
β β
β βββ pipeline
β β βββ prediction_pipeline.py
β β βββ training_pipeline.py
β β
β βββ utils
β βββ utils.py
β
βββ templates
β βββ form.html
β βββ index.html
β βββ result.html
β
βββ .dockerignore
βββ .gitignore
βββ Dockerfile
βββ app.py
βββ init_setup.sh
βββ requirements.txt
βββ requirements_dev.txt
βββ setup.py
βββ test.py
The project is organized into several components:
artifacts
: Contains datasets and pre-trained models.src
: packages the source code for data processing, model training, and evaluation.app.py
: A simple flask application to build API for getting the model predictions.Dockerfile
: Defines the Docker image for containerizing the flask application.automate.yml
: Configures CI/CD pipelines for automated deployment.
The src package contains modules for data processing, model training, and evaluation
- Data Ingestion: Handles data loading and preprocessing.
- Feature Engineering: Performs feature engineering and generates RFM (Recency, Frequency, Monetary) features.
- Data Preprocessing: Preprocesses data for model training.
- Model Trainer: Trains the machine learning model using preprocessed data.
- Model Evaluation: Evaluates the trained model's performance.
We have 2 pipelines which are
- Training Pipeline: The training pipeline consists of several steps including Data ingestion, Feature engineering, Data Preprocessing etc.. to train the model
- Prediction Pipeline: The prediction pipeline utilizes the trained model to predict user purchase propensity based on input data.
The Flask application serves as the interface for interacting with the Customer Propensity Model. It provides endpoints for viewing the home page, testing the server's availability, and making predictions using the trained model.
- Home Page (
/
): Renders theindex.html
template, which provides options for making predictions. - Ping Endpoint (
/ping
): A simple endpoint for testing the server's availability. Returns "Success" when accessed via a GET request. - Prediction Endpoint (
/predict
): Accepts both GET and POST requests. When accessed via GET, it renders a form (form.html
) populated with options for selecting input features. Upon submitting the form via POST request, it processes the input data, makes predictions using the trained model, and renders the result (result.html
).
The prediction endpoint accepts the following input features:
- Category: The category of the product.
- Subcategory: The subcategory of the product.
- Days Active: Number of days the user has been active.
- R: Recency (a measure of how recently the user made a purchase).
- F: Frequency (a measure of how often the user makes purchases).
- M: Monetary value (a measure of how much money the user spends).
- Loyalty: Loyalty status of the user.
- Avg Purchase Gap: Average time gap between purchases.
- Add to Cart to Purchase Ratios: Ratio of add-to-cart actions to purchases.
- Add to Wishlist to Purchase Ratios: Ratio of add-to-wishlist actions to purchases.
- Click Wishlist Page to Purchase Ratios: Ratio of clicks on wishlist page to purchases.
- User Path: Path followed by the user on the website.
- Cart to Purchase Ratios (Category and Subcategory): Ratios of cart actions to purchases for both category and subcategory.
- Wishlist to Purchase Ratios (Category and Subcategory): Ratios of wishlist actions to purchases for both category and subcategory.
- Click Wishlist to Purchase Ratios (Category and Subcategory): Ratios of clicks on wishlist to purchases for both category and subcategory.
- Product View to Purchase Ratios (Category and Subcategory): Ratios of product views to purchases for both category and subcategory.
The prediction endpoint returns the predicted probability of the user making a purchase, expressed as a percentage.
The Dockerfile provided in the project repository allows for containerizing the Customer Propensity Model application using a multi-stage build strategy. This strategy helps reduce the size of the final Docker image by separating the build dependencies from the runtime environment.
The Dockerfile consists of two stages:
-
Builder Stage: In this stage, a Python 3.8 slim-buster image is used to install the project dependencies specified in the
requirements.txt
file. This stage sets the working directory to/install
and copies only therequirements.txt
file to leverage Docker's caching mechanism. It then installs the dependencies into the/install
directory usingpip
. This stage is responsible for creating a temporary image used for building the dependencies. -
Final Stage: The final Docker image is created based on another Python 3.8 slim-buster image. This stage sets the working directory to
/app
and copies the installed dependencies from the builder stage into the/usr/local
directory. It then copies the rest of the application files into the/app
directory. After copying, any unnecessary files are cleaned up to reduce the image size. Finally, the command to run the Flask application is specified using theCMD
directive, which starts the Flask server on0.0.0.0:5000
.
To build the Docker image for the Customer Propensity Model application, navigate to the project directory containing the Dockerfile and execute the following command:
docker build -t customer-propensity-model .
Once the Docker image is built, you can run a Docker container using the following command:
docker run -d -p 5000:5000 customer-propensity-model
This command will start a Docker container based on the customer-propensity-model image, exposing port 5000 on the host machine. You can then access the Customer Propensity Model application by visiting http://localhost:5000 in your web browser
The project utilizes GitHub Actions for Continuous Integration (CI) and Continuous Delivery (CD) pipelines to automate the testing, building, and deployment processes.
The CI pipeline ensures the correctness and reliability of the Customer Propensity Model application by running automated tests using pytest. These tests validate the functionality of key endpoints in the Flask application. The test.py file is responsible for running the tests to ensure the proper running of flask application
- Workflow Name: Containerizing the Image and deploying it to EC2
- Trigger: Automatically triggered upon pushing changes to the
main
branch. - Jobs:
- Job 1: Runs tests with pytest.
- Job 2: Deploys the Docker image to Amazon EC2 instance.
The CD pipeline automates the deployment of the Docker image containing the Flask application to an Amazon EC2 instance.
- Checkout: Checks out the code from the repository.
- Install Python 3: Sets up the Python environment for testing.
- Install Dependencies: Installs project dependencies listed in
requirements_dev.txt
. - Run tests with pytest: Executes automated tests using pytest.
- Print AWS Secrets: Prints AWS access keys for authentication.
- Configure AWS Credentials: Configures AWS credentials for accessing services.
- Login to Amazon ECR: Logs in to Amazon Elastic Container Registry (ECR) for container image storage.
- Build, tag, push image to Amazon ECR: Builds the Docker image, tags it, and pushes it to Amazon ECR.
- Deploy docker image from ECR to EC2 instance: Deploys the Docker image from ECR to an EC2 instance, running the Flask application.
These CI/CD pipelines ensure the application is thoroughly tested and efficiently deployed to the production environment, enhancing development productivity and maintaining application quality.