Skip to content

luiswize/MLOpsCapstoneProject

Repository files navigation

MLOpsCapstoneProject

Project created with MLOps-Template cookiecutter. For more info: https://mlopsstudygroup.github.io/mlops-guide/

📋 Requirements

  • DVC
  • Python3 and pip

Create a virtual environment

python3 -m venv mlopsenv

Activate the virtual env

source mlopsenv/bin/activate  

To install dependencies

pip install -r requirements.txt

🏃🏻 Running Project

⚗️ Using DVC

Initialize dvc

dvc init

To set the remote storage service on Firebase

dvc remote add -d storage gs://mlops-85438.appspot.com

To configure credentials ask the team for the mlops-storage-key.json file

export GOOGLE_APPLICATION_CREDENTIALS=mlops-storage-key.json

In this point we are ready to go! 🚀🚀

How to use dvc

Once we have seted up dvc, we can add, pull or push the data we wish.

  1. Chechout to your development branch in git.
git checkout -b dvc-test 
  1. Modify our data, work in your features.
git add files
git tag -a version_name -m "This is version name .... "
  1. In yout branch, bring the versioned data tou need, internally git takes the .dvc file and bring the data associated to the tag
dvc pull

Similar to git add file_path, dvc add file_path adds the files to the dvc versioning

dvc add file_path

Uploads data from the DVC repository(analog to git push)

dvc push

Download data from the DVC repository(analog to git pull)

dvc pull

Versioning our data

To setup a data version

git tag -a version_name -m "This is a comment to reference this data version" 

To push the data.dvc versioned file to git

git push origin --tags

Use case

Lets supose we have some new data we want to add dvc versioning.

  1. Add the new data
dvc add data
  1. Push The changes to our storage bucker
dvc push data
  1. Generate new tag
git tag -a new_version_name -m "New version of the data"
git add .
git commit -m "New data version"
git push
git push origin --tags

In this point we have a new data version, each of this versions have different 'data.dvc' for each version

To change from data version versionA to versionB

git checkout tags/versionB # this changes the `data.dvc` file to the versionB branch
dvc pull # this brings the data from  versionB tag

Bring the initial dvc data to your local environment

Inside the main branch

dvc pull

This brings into your local the already preprocessed file: data/preprocessing_output/test.parquet.gzip

To work in your features, create a new branch and version your own files with dvc inside your branch.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors