Optimize Notebook Unit Tests #1538

laserprec · 2021-09-24T13:30:39Z

Description

To improve developer experience working with this repo, we will begin migrating our DevOps pipelines into Github Action, leveraging popular DevOps tools like tox and flake8. This will be a sequence of updates to our DevOps infrastructures:

~~Propose and draft the CI pipeline in Github Action~~
~~Setup self-hosted machines to run our GPU workloads~~
~~Create feature parity with the existing CI pipelines (pr-gates & nightly-build)~~
~~Run tests on the appropriate dependency subsets~~
Optimize build time for unit tests <---- (Current PR)
Enforce flake8 (coding style checks) on the build and clean up coding styles to pass flake8 checks
Deprecate CI pipelines in ADO and switch to Github Actions

In this PR:

The idea behind this PR is to use a smaller synthetic dataset to run our notebook unit tests.
This brings two advantages:

No external dependencies to run notebooks (we avoid downloading the dataset from the internet)
Flexibility in controlling the data we are testing (we can generate new synthetic dataset with various size and random values)

Changes:

New Dependencies introduced to recommenders['dev']:
1. pandera: super useful tool for data schema validation and fake data generation. Very similar to pydantic, but works specifically with pandas dataframes.
2. pytest-mock: pytest plugin that enables easier access to mock tools and syntactically easier to use in pytest. See here for its valuable prop :).
New MockMovielensSchema class for data synthesis
Add "mock100" datasize to the methods load_spark_df and load_pandas_df from recommenders.datasets.movielens
Parametrize some of the notebooks to make use of the synthetic data

Optimization results (offline experiment):

Optimization results (github action):

pr-gate from the latest successful commit:

with the current changes:

Reduce a little more than 66% of the build time 😃 .

Related Issues

#1507

Checklist:

I have followed the contribution guidelines and code style for this project.
I have added tests covering my contributions.
I have updated the documentation accordingly.
This PR is being made to staging branch and not to main branch.

review-notebook-app · 2021-09-24T13:30:43Z

Check out this pull request on

See visual diffs & provide feedback on Jupyter Notebooks.

Powered by ReviewNB

examples/02_model_collaborative_filtering/als_deep_dive.ipynb

recommenders/datasets/mock/movielens.py

tests/unit/examples/test_notebooks_pyspark.py

tests/unit/recommenders/datasets/mock/test_movielens.py

recommenders/datasets/mock/movielens.py

recommenders/datasets/movielens.py

examples/03_evaluate/als_movielens_diversity_metrics.ipynb

miguelgfierro

looks great!

setup.py

This reverts commit 84d83e2.

codecov-commenter · 2021-10-08T22:27:32Z

Codecov Report

Merging #1538 (e8fd200) into staging (9f19411) will decrease coverage by 0.05%.
The diff coverage is 94.75%.

@@             Coverage Diff             @@
##           staging    #1538      +/-   ##
===========================================
- Coverage    62.12%   62.07%   -0.06%     
===========================================
  Files           84       84              
  Lines         8397     8492      +95     
===========================================
+ Hits          5217     5271      +54     
- Misses        3180     3221      +41

Impacted Files	Coverage Δ
recommenders/evaluation/spark_evaluation.py	`86.60% <ø> (-0.06%)`	⬇️
recommenders/evaluation/python_evaluation.py	`93.68% <94.11%> (-3.21%)`	⬇️
recommenders/datasets/movielens.py	`68.58% <96.15%> (-8.95%)`	⬇️
recommenders/utils/constants.py	`100.00% <100.00%> (ø)`
recommenders/utils/spark_utils.py	`96.15% <100.00%> (+0.15%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 846d214...e8fd200. Read the comment docs.

Jianjie Liu added 10 commits September 23, 2021 22:54

Mock Movielens schema v1

9e252c5

Mock schema experiment

feef435

use csv and change datetime to int

e4f41e7

Try more experiment with 10 rows and another NB

772bbc6

Try mock100 dataset on other NBs

49f874d

Add mock_movielens test marker

0c7adba

Parametrize als_deep_dive NB

83f26e8

Mock movielens schema v2

5e679be

Don't use 100k dataset

eb939b8

Re-wire local import to minimize module-wide dependency

581c1ed

Jianjie Liu added 2 commits September 24, 2021 14:46

Runnable in non-spark env

5862f25

Rename test marker to fake_movielens

eca5abf

laserprec changed the base branch from main to staging September 24, 2021 15:08

Jianjie Liu added 5 commits September 24, 2021 16:52

Re-render diversity_metric NB outputs

7600784

Re-render als_deep_dive NB outputs

04f1371

Specify tmp path for data serialization

79eff3b

Add pytest-mock as 'dev' dependency

477391f

Add spark test markers to new tests

97c5be0

laserprec changed the title ~~Laserprec/opt unit tests exp~~ Optimize Notebook Unit Tests Sep 24, 2021

laserprec requested review from gramhagen, anargyri and miguelgfierro September 24, 2021 20:08

laserprec added build notebook Notebook related issues labels Sep 24, 2021

laserprec added this to the GitHub Action Migration & CI Infra Enchancement milestone Sep 24, 2021

laserprec linked an issue Sep 24, 2021 that may be closed by this pull request

Tests runnable on default CI agent #1507

Closed

laserprec marked this pull request as ready for review September 24, 2021 21:23

laserprec requested a review from loomlike as a code owner September 24, 2021 21:23

laserprec requested a review from yueguoguo as a code owner September 24, 2021 21:23

Jianjie Liu added 5 commits September 27, 2021 16:01

Small code cleanup

cb2d140

Install 'dev' dependencies in ADO build

65a5327

Undone default partition changes

c2a4458

Merge latest from 'staging' branch

47fcc95

Fix bug after merge

2306b2b

miguelgfierro reviewed Sep 30, 2021

View reviewed changes

examples/02_model_collaborative_filtering/als_deep_dive.ipynb Show resolved Hide resolved

miguelgfierro reviewed Sep 30, 2021

View reviewed changes

gramhagen reviewed Sep 30, 2021

View reviewed changes

recommenders/datasets/mock/movielens.py Outdated Show resolved Hide resolved

gramhagen reviewed Sep 30, 2021

View reviewed changes

recommenders/datasets/movielens.py Show resolved Hide resolved

anargyri reviewed Oct 5, 2021

View reviewed changes

recommenders/datasets/movielens.py Outdated Show resolved Hide resolved

anargyri reviewed Oct 5, 2021

View reviewed changes

recommenders/datasets/movielens.py Outdated Show resolved Hide resolved

anargyri reviewed Oct 5, 2021

View reviewed changes

examples/03_evaluate/als_movielens_diversity_metrics.ipynb Outdated Show resolved Hide resolved

anargyri reviewed Oct 5, 2021

View reviewed changes

examples/03_evaluate/als_movielens_diversity_metrics.ipynb Outdated Show resolved Hide resolved

Jianjie Liu added 3 commits October 5, 2021 19:58

Undo datatype changes

b0bcd75

Merge mock schema into movielens.py

fd33efe

Remove fake_movielens marker

33c05cd

laserprec requested review from gramhagen and miguelgfierro October 6, 2021 18:58

gramhagen approved these changes Oct 7, 2021

View reviewed changes

anargyri approved these changes Oct 7, 2021

View reviewed changes

miguelgfierro approved these changes Oct 8, 2021

View reviewed changes

Jianjie Liu added 2 commits October 8, 2021 14:28

Add pandera as a core dependency

84d83e2

Run als quickstart NB on mock100

06ff901

anargyri reviewed Oct 8, 2021

View reviewed changes

setup.py Outdated Show resolved Hide resolved

Jianjie Liu added 2 commits October 8, 2021 16:10

Revert "Add pandera as a core dependency"

a097dc9

This reverts commit 84d83e2.

Merge lastest changes from 'staging'

e8fd200

laserprec merged commit 68066dd into staging Oct 11, 2021

laserprec deleted the laserprec/opt-unit-tests-exp branch October 11, 2021 11:14

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimize Notebook Unit Tests #1538

Optimize Notebook Unit Tests #1538

laserprec commented Sep 24, 2021 •

edited

Loading

review-notebook-app bot commented Sep 24, 2021

miguelgfierro left a comment

codecov-commenter commented Oct 8, 2021

Optimize Notebook Unit Tests #1538

Optimize Notebook Unit Tests #1538

Conversation

laserprec commented Sep 24, 2021 • edited Loading

Description

In this PR:

Changes:

Optimization results (offline experiment):

Optimization results (github action):

Related Issues

Checklist:

review-notebook-app bot commented Sep 24, 2021

miguelgfierro left a comment

Choose a reason for hiding this comment

codecov-commenter commented Oct 8, 2021

Codecov Report

laserprec commented Sep 24, 2021 •

edited

Loading