Detect Data Drift

Motivation

Data drift occurs when the distribution of input features in the production environment differs from the training data, leading to potential inaccuracies and decreased model performance.

To mitigate the impact of data drift on model performance, this workflow automates the process of detecting drift, notifying the data team, and triggering model retraining.

Try it out

Clone the repo:

git clone https://github.com/khuyentran1401/detect-data-drift-pipeline

Next, create and start a Docker container running a PostgreSQL server with prepopulated tables:

docker run -d -p 5432:5432 -e POSTGRES_PASSWORD=... -e POSTGRES_USER=... khuyentran1401/bikeride-postgres:latest

Before running the application, add the required environment variables to the ".env" file:

POSTGRES_USERNAME=...
POSTGRES_PASSWORD=...
SLACK_WEBHOOK=...
AWS_ACCESS_KEY_ID=...
AWS_SECRET_ACCESS_KEY_ID=...

Encode these environment variables and save them in the ".env_encoded" file by running:

bash encode_env.sh

Now, start the containers for Kestra:

docker compose up -d

You can access Kestra's user interface at http://localhost:8080.

To import example flows into Kestra, click the "Import" button and select the files located in the "kestra_pipeline" directory

After importing, you will see the following flows:

Name		Name	Last commit message	Last commit date
Latest commit History 29 Commits
config		config
data		data
images		images
kestra_pipeline		kestra_pipeline
setup		setup
src		src
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
Dockerfile		Dockerfile
Makefile		Makefile
README.md		README.md
docker-compose.yml		docker-compose.yml
encode_env.sh		encode_env.sh
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Detect Data Drift

Motivation

Try it out

About

Releases

Packages

Languages

khuyentran1401/detect-data-drift-pipeline

Folders and files

Latest commit

History

Repository files navigation

Detect Data Drift

Motivation

Try it out

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages