Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Warning message "DirectedInterleave selected an exhausted input" #7

Closed
crockwell opened this issue Apr 20, 2019 · 3 comments
Closed
Labels
question Further information is requested

Comments

@crockwell
Copy link

Hello, occasionally while training I get this repeated warning message for many inputs (thousands of times in a row). This comes using default parameters with either gin/best/baseline_imagenet.gin or gin/best/baselinefinetune_imagenet.gin, on single or multiple GPUs. Training does not seem to fail. Is this possibly something on my end? Thanks in advance! I've attached example excerpt below.

2019-04-19 21:40:41.379233: W tensorflow/core/kernels/data/experimental/directed_interleave_dataset_op.cc:199] DirectedInterleave selected an exhausted input: 72 2019-04-19 21:40:41.379277: W tensorflow/core/kernels/data/experimental/directed_interleave_dataset_op.cc:199] DirectedInterleave selected an exhausted input: 459 2019-04-19 21:40:41.379319: W tensorflow/core/kernels/data/experimental/directed_interleave_dataset_op.cc:199] DirectedInterleave selected an exhausted input: 306

@lamblin lamblin added the question Further information is requested label Apr 20, 2019
@lamblin
Copy link
Collaborator

lamblin commented Apr 20, 2019

This is also something we have on our end, when doing batch (non-episodic) training.
The reason is that the "batch" pipeline is set up so that:

  • it goes through a full epoch of the dataset before starting iterating again over examples that have been seen previously, and
  • it samples each class proportionally to the number of total examples in that class (which is implemented by the Op in directed_interleave_dataset_op.cc).

So on average the proportion of examples from each class stays proportional to the class imbalance, but towards the end of an epoch, some classes have been sampled more often and are depleted of examples, so it has to re-sample until a class with remaining examples is selected.

An alternative would be to have .repeat() on each class's stream of examples before the class selector (sample_from_datasets()) rather than after. This would be done in BatchReader.create_dataset_input_pipeline, by calling self.construct_class_datasets(..., repeat=True) at the beginning and removing dataset = dataset.repeat() at the end.

We did not test whether that change would have any influence on training time or performance. If you end up trying it out, please let us know :).

@lamblin
Copy link
Collaborator

lamblin commented May 3, 2019

If you experience that warning when doing episodic training, on the other hand, it could indicate that the limit of number of open files is too low (see ulimit -n), and some of the classes might not be read at all.
A limit of 1024 is not enough for training on "all" datasets. 100000 is enough, but there is probably a more reasonable value in between.

@crockwell
Copy link
Author

Thank you! So far I believe this has only occurred in batch training.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Development

No branches or pull requests

2 participants