Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add CCI cache for test data #748

Merged
merged 19 commits into from
May 1, 2020
Merged

Add CCI cache for test data #748

merged 19 commits into from
May 1, 2020

Conversation

mthrok
Copy link
Contributor

@mthrok mthrok commented Apr 30, 2020

This PR adds caching mechanism for Circle CI unit tests.

  • .vector_caches and .data directories are cached.
  • @slow decorators (implementation and usage) are removed.
  • CI job 45 mins without cache -> 7 mins with cache, which I believe is reasonable.
  • Caches expire weekly. When we setup nightly job, new cache will be created on Sunday.
    • Note: Circle CI does not provide mechanism to bust cache.
      If one needs to refresh cache manually, update version string in cache key.
      such as data-v1-{{ checksum ".circleci-weekly" }} -> data-v2-...
    • I also applied the same rule to conda environment cache.
  • The resulting cache will be used by all the CI jobs, regardless of Python version, branch etc...

@mthrok
Copy link
Contributor Author

mthrok commented Apr 30, 2020

  1. First run without any cache with slow tests in test_vocab enabled [link]
    • Took 32m 47s
      • 18m 23s for running test
      • 12m 13s for storing .vector_cache (archive + upload 5.2 GiB)

@mthrok
Copy link
Contributor Author

mthrok commented Apr 30, 2020

  1. Second run with cached .vector_cache directory [link]
    • 5m 59s
      • 1m 43s for restoring cache (download + unarchive 5.2 GiB)
      • 2m 12s for running test!

@mthrok
Copy link
Contributor Author

mthrok commented Apr 30, 2020

@cpuhrsch

Dramatic improve for test_vocab.py.

@mthrok
Copy link
Contributor Author

mthrok commented Apr 30, 2020

  1. Enabled tests from test/data directory. With cached .vector_cached directory and without .data cached [link]
    • Total time 32m
      • 10m 55s for test
      • 17m for storing cache (7 GiB)

@mthrok
Copy link
Contributor Author

mthrok commented Apr 30, 2020

  1. Ran the same tests as 3 with cached .vector_cached and .data
    • Total time 7m 20s
      • 2m 45s for restoring cache (7 GiB)
      • 2m 40s for test

@mthrok
Copy link
Contributor Author

mthrok commented Apr 30, 2020

  1. Run the same workflow as 4. without any cache [link]
    • Total time 45m47s
      • 26m 6s for test
      • 17m 4s for storing cache.

@mthrok mthrok changed the title [TEST] Add CCI cache for .vector_cache directory Add CCI cache for .vector_cache directory May 1, 2020
@mthrok mthrok changed the title Add CCI cache for .vector_cache directory Add CCI cache for test data May 1, 2020
@mthrok mthrok marked this pull request as ready for review May 1, 2020 02:32
@cpuhrsch
Copy link
Contributor

cpuhrsch commented May 1, 2020

Once this lands we'll need to think of new tests to check that the downloads work.

- run:
name: Generate cache key
# This will refresh cache on Sundays, nightly build should refresh the cache.
command: echo "$(date +"%Y-%U")" > .circleci-weekly
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could use this as an opportunity to verify that the download functions continue to download the same files.

@mthrok mthrok merged commit 8b58a22 into pytorch:master May 1, 2020
@mthrok mthrok deleted the cci-cache branch May 1, 2020 14:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants