FSDL Spring 2021

How to Integrate Dataset Quality Metrics And Flywheel in a MLOps Pipeline

( Demonstration using License Plate Recognition App )

Authors

Introduction

As we know that an AI system involves two major pillars, one is code(model + algorithm) and the other is data. In a recent interview Andrew Ng mentioned the importance of high quality data. Through this project we would like to focus on the data aspect of an AI system. The project aims to demonstrate the improvement in model performance using Data Quality Metrics and Data Flywheel concept.

Data Quality Metric

Visit this repository to know more about our latest work on Data Qulaity Metric Hypothesis. https://github.com/changsin/FSDL

Repository Structure

Training code can be found in training directory. Notebooks used can be found in notebooks directory. License Plate Recognition app in license_plate_recogniser directory.

License Plate Recognition Application (LPR)

To use LPR application install docker app. Clone the repository and go inside the license_plate_recogniser directory, build the docker image using command:

docker build --network host -t lpr:latest .

To start the container run.

docker run -it --network host -p 8080:8080 lpr

Now visit your browser and access the url:

localhost:8080

You should be at this page:

The dashboard is built using streamlit.

Deploying on GCP:

To get this app running on a cloud, we use google ecosystem.

Be aware if you are executing this step, you will be billed by Google.

TO run the app on cloud, visit license_plate_recogniser directory and execute:

make Makefile

If everything is fine on your end, you should have a service running in your Google console dashboard. We are using Google's serverless option , Google cloud Run for running this app on cloud.

Model Training

We have used Yolov5 as our baseline model. To replicate the traing process, you can visit this colab notebook:

We used Weights and Bias for experiment tracking. Here are our validation loss curves, mean mAP and Predicition Images.

Dataset:

Datasets used in the project are:

Tech Stack used

Pytorch
Streamlit
Docker
Weights and Bias
Google Cloud Run

Future Work:

Completion of data flywheel loop to integrate active learning
Replicate results of SOTA papers for UFPR dataset
Integrate Data Qulaity Metric in the pipeline
Include results on other License Plate Datasets

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
license_pate_recogniser		license_pate_recogniser
notebooks		notebooks
static		static
training		training
readme.md		readme.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

FSDL Spring 2021

How to Integrate Dataset Quality Metrics And Flywheel in a MLOps Pipeline

( Demonstration using License Plate Recognition App )

Authors

Introduction

Data Quality Metric

Repository Structure

License Plate Recognition Application (LPR)

Deploying on GCP:

Model Training

Dataset:

Tech Stack used

Future Work:

References:

About

Releases

Packages

Languages

mahavird/fsdl_project

Folders and files

Latest commit

History

Repository files navigation

FSDL Spring 2021

How to Integrate Dataset Quality Metrics And Flywheel in a MLOps Pipeline

( Demonstration using License Plate Recognition App )

Authors

Introduction

Data Quality Metric

Repository Structure

License Plate Recognition Application (LPR)

Deploying on GCP:

Model Training

Dataset:

Tech Stack used

Future Work:

References:

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages