Update the dataset cache to factor input parameters #6234

datumbox · 2022-07-04T16:07:15Z

The cache path currently used on video references doesn't factor in parameters that affect the stored data (such as clip length, kinetics version and frame rate). This means that every time we switch to a different model, we must invalidate the cache. This PR avoids this by adding the specific parameters in the sha1 hash used for caching.

Proof this works:

Loading data
Loading training data
It is recommended to pre-compute the dataset cache on a single-gpu first, as it will be faster
100%|██████████| 1205/1205 [03:21<00:00,  5.97it/s]
./vision/torchvision/datasets/video_utils.py:223: UserWarning: There aren't enough frames in the current video to get a clip for the given clip length and frames between clips. The video (and potentially others) will be skipped.
  warnings.warn(
Saving dataset_train to ~/.torch/vision/datasets/kinetics/9ff8f5833e.pt
Took 203.63010907173157
Loading validation data
Loading dataset_test from ~/.torch/vision/datasets/kinetics/9ff8f5833e.pt

YosuaMichael

Thanks @datumbox for this PR! This is definitely a pain-point during experimentation and quite often the cache can lead to a weird bug if we are not aware of it.

I only have one small comment of adding step_between_clips, other than that it looks good.

YosuaMichael · 2022-07-04T16:42:30Z

references/video_classification/train.py

    import hashlib

-    h = hashlib.sha1(filepath.encode()).hexdigest()
+    value = f"{filepath}-{args.clip_len}-{args.kinetics_version}-{args.frame_rate}"


Should we also add args.step_between_clips here?

We don't have such as parameter on args. If we introduce it, we should absolutely add it here.

Summary: * Update the dataset cache to factor in parameters from the args. * Fix linter Reviewed By: jdsgomes Differential Revision: D37643907 fbshipit-source-id: e0590c3e3b596dd3e12041c095c4b961941e0d38

Update the dataset cache to factor in parameters from the args.

11a076c

datumbox added enhancement module: reference scripts labels Jul 4, 2022

datumbox requested a review from YosuaMichael July 4, 2022 16:07

facebook-github-bot added the cla signed label Jul 4, 2022

Fix linter

cb1f462

YosuaMichael reviewed Jul 4, 2022

View reviewed changes

YosuaMichael approved these changes Jul 4, 2022

View reviewed changes

datumbox mentioned this pull request Jul 5, 2022

Creating a cache-dataset for Video classification. #6235

Closed

datumbox merged commit 8f98aee into pytorch:main Jul 5, 2022

datumbox deleted the references/param_cache branch July 5, 2022 08:34

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Update the dataset cache to factor input parameters #6234

Update the dataset cache to factor input parameters #6234

Uh oh!

datumbox commented Jul 4, 2022 •

edited

Loading

Uh oh!

YosuaMichael left a comment

Uh oh!

YosuaMichael Jul 4, 2022

Uh oh!

datumbox Jul 4, 2022

Uh oh!

YosuaMichael Jul 4, 2022

Uh oh!

Uh oh!

Update the dataset cache to factor input parameters #6234

Update the dataset cache to factor input parameters #6234

Uh oh!

Conversation

datumbox commented Jul 4, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

YosuaMichael left a comment

Choose a reason for hiding this comment

Uh oh!

YosuaMichael Jul 4, 2022

Choose a reason for hiding this comment

Uh oh!

datumbox Jul 4, 2022

Choose a reason for hiding this comment

Uh oh!

YosuaMichael Jul 4, 2022

Choose a reason for hiding this comment

Uh oh!

Uh oh!

datumbox commented Jul 4, 2022 •

edited

Loading