Links: [🌐 Tutorial Website] [👀 Review Slides] [📜 Tutorial Paper]
The goal of this tutorial is to showcase different notions of data importance, based on the DataScope library. The notebooks revolve around a toy problem of classifying the sentiment of recommendation letters.
We provide several notebooks walking you through several data debugging scenarios.
We begin by detailling how to leverage data importance to identify impactful label errors in the data.
We extend the previous notebook with a complex feature encoding pipeline, and show that we can trace data errors through these pipelines easily too.
We extend our example use case with dataframe operations, and show that we can trace data errors through these relational operations as well.
Yes, after cloning the repo and entering the repo directory, run the following commands:
make shell
make setup
make jupyter
Note: The make setup
command installs all the Python dependencies and needs to be run only the first time you set up the repository.