Warning message "DirectedInterleave selected an exhausted input" #7

crockwell · 2019-04-20T02:55:56Z

Hello, occasionally while training I get this repeated warning message for many inputs (thousands of times in a row). This comes using default parameters with either gin/best/baseline_imagenet.gin or gin/best/baselinefinetune_imagenet.gin, on single or multiple GPUs. Training does not seem to fail. Is this possibly something on my end? Thanks in advance! I've attached example excerpt below.

2019-04-19 21:40:41.379233: W tensorflow/core/kernels/data/experimental/directed_interleave_dataset_op.cc:199] DirectedInterleave selected an exhausted input: 72 2019-04-19 21:40:41.379277: W tensorflow/core/kernels/data/experimental/directed_interleave_dataset_op.cc:199] DirectedInterleave selected an exhausted input: 459 2019-04-19 21:40:41.379319: W tensorflow/core/kernels/data/experimental/directed_interleave_dataset_op.cc:199] DirectedInterleave selected an exhausted input: 306

The text was updated successfully, but these errors were encountered:

lamblin · 2019-04-20T18:50:12Z

This is also something we have on our end, when doing batch (non-episodic) training.
The reason is that the "batch" pipeline is set up so that:

it goes through a full epoch of the dataset before starting iterating again over examples that have been seen previously, and
it samples each class proportionally to the number of total examples in that class (which is implemented by the Op in directed_interleave_dataset_op.cc).

So on average the proportion of examples from each class stays proportional to the class imbalance, but towards the end of an epoch, some classes have been sampled more often and are depleted of examples, so it has to re-sample until a class with remaining examples is selected.

An alternative would be to have .repeat() on each class's stream of examples before the class selector (sample_from_datasets()) rather than after. This would be done in BatchReader.create_dataset_input_pipeline, by calling self.construct_class_datasets(..., repeat=True) at the beginning and removing dataset = dataset.repeat() at the end.

We did not test whether that change would have any influence on training time or performance. If you end up trying it out, please let us know :).

lamblin · 2019-05-03T20:46:20Z

If you experience that warning when doing episodic training, on the other hand, it could indicate that the limit of number of open files is too low (see ulimit -n), and some of the classes might not be read at all.
A limit of 1024 is not enough for training on "all" datasets. 100000 is enough, but there is probably a more reasonable value in between.

crockwell · 2019-05-07T16:52:29Z

Thank you! So far I believe this has only occurred in batch training.

lamblin added the question Further information is requested label Apr 20, 2019

crockwell closed this as completed May 7, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Warning message "DirectedInterleave selected an exhausted input" #7

Warning message "DirectedInterleave selected an exhausted input" #7

crockwell commented Apr 20, 2019

lamblin commented Apr 20, 2019 •

edited

lamblin commented May 3, 2019

crockwell commented May 7, 2019

Warning message "DirectedInterleave selected an exhausted input" #7

Warning message "DirectedInterleave selected an exhausted input" #7

Comments

crockwell commented Apr 20, 2019

lamblin commented Apr 20, 2019 • edited

lamblin commented May 3, 2019

crockwell commented May 7, 2019

lamblin commented Apr 20, 2019 •

edited