Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize Notebook Unit Tests #1538

Merged
merged 29 commits into from
Oct 11, 2021
Merged

Optimize Notebook Unit Tests #1538

merged 29 commits into from
Oct 11, 2021

Conversation

laserprec
Copy link
Contributor

@laserprec laserprec commented Sep 24, 2021

Description

To improve developer experience working with this repo, we will begin migrating our DevOps pipelines into Github Action, leveraging popular DevOps tools like tox and flake8. This will be a sequence of updates to our DevOps infrastructures:

  1. Propose and draft the CI pipeline in Github Action
  2. Setup self-hosted machines to run our GPU workloads
  3. Create feature parity with the existing CI pipelines (pr-gates & nightly-build)
  4. Run tests on the appropriate dependency subsets
  5. Optimize build time for unit tests <---- (Current PR)
  6. Enforce flake8 (coding style checks) on the build and clean up coding styles to pass flake8 checks
  7. Deprecate CI pipelines in ADO and switch to Github Actions

In this PR:

The idea behind this PR is to use a smaller synthetic dataset to run our notebook unit tests.
This brings two advantages:

  1. No external dependencies to run notebooks (we avoid downloading the dataset from the internet)
  2. Flexibility in controlling the data we are testing (we can generate new synthetic dataset with various size and random values)

Changes:

  • New Dependencies introduced to recommenders['dev']:
    1. pandera: super useful tool for data schema validation and fake data generation. Very similar to pydantic, but works specifically with pandas dataframes.
    2. pytest-mock: pytest plugin that enables easier access to mock tools and syntactically easier to use in pytest. See here for its valuable prop :).
  • New MockMovielensSchema class for data synthesis
  • Add "mock100" datasize to the methods load_spark_df and load_pandas_df from recommenders.datasets.movielens
  • Parametrize some of the notebooks to make use of the synthetic data

Optimization results (offline experiment):

opt_result_evaluation_diversity_success
opt_result_als_deep_dive_success
opt_result_rlmc_quickstart_success
opt_result_surprise_svd_nb_success
opt_result_spark_tuning_success
image

Optimization results (github action):

pr-gate from the latest successful commit:
image
with the current changes:
image

Reduce a little more than 66% of the build time 😃 .

Related Issues

#1507

Checklist:

  • I have followed the contribution guidelines and code style for this project.
  • I have added tests covering my contributions.
  • I have updated the documentation accordingly.
  • This PR is being made to staging branch and not to main branch.

@review-notebook-app
Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

@laserprec laserprec changed the base branch from main to staging September 24, 2021 15:08
@laserprec laserprec changed the title Laserprec/opt unit tests exp Optimize Notebook Unit Tests Sep 24, 2021
@laserprec laserprec added build notebook Notebook related issues labels Sep 24, 2021
@laserprec laserprec linked an issue Sep 24, 2021 that may be closed by this pull request
@laserprec laserprec marked this pull request as ready for review September 24, 2021 21:23
recommenders/datasets/mock/movielens.py Outdated Show resolved Hide resolved
recommenders/datasets/mock/movielens.py Outdated Show resolved Hide resolved
recommenders/datasets/mock/movielens.py Outdated Show resolved Hide resolved
tests/unit/examples/test_notebooks_pyspark.py Outdated Show resolved Hide resolved
tests/unit/recommenders/datasets/mock/test_movielens.py Outdated Show resolved Hide resolved
Copy link
Collaborator

@miguelgfierro miguelgfierro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks great!

setup.py Outdated Show resolved Hide resolved
@codecov-commenter
Copy link

Codecov Report

Merging #1538 (e8fd200) into staging (9f19411) will decrease coverage by 0.05%.
The diff coverage is 94.75%.

Impacted file tree graph

@@             Coverage Diff             @@
##           staging    #1538      +/-   ##
===========================================
- Coverage    62.12%   62.07%   -0.06%     
===========================================
  Files           84       84              
  Lines         8397     8492      +95     
===========================================
+ Hits          5217     5271      +54     
- Misses        3180     3221      +41     
Impacted Files Coverage Δ
recommenders/evaluation/spark_evaluation.py 86.60% <ø> (-0.06%) ⬇️
recommenders/evaluation/python_evaluation.py 93.68% <94.11%> (-3.21%) ⬇️
recommenders/datasets/movielens.py 68.58% <96.15%> (-8.95%) ⬇️
recommenders/utils/constants.py 100.00% <100.00%> (ø)
recommenders/utils/spark_utils.py 96.15% <100.00%> (+0.15%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 846d214...e8fd200. Read the comment docs.

@laserprec laserprec merged commit 68066dd into staging Oct 11, 2021
@laserprec laserprec deleted the laserprec/opt-unit-tests-exp branch October 11, 2021 11:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
notebook Notebook related issues
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Tests runnable on default CI agent
5 participants