add prefetch_factor for multiprocessing prefetching process #41130

yl-to · 2020-07-08T17:42:58Z

fix #40604
Add parameter to Dataloader to configure the per-worker prefetch number.
Before this edit, the prefetch process always prefetch 2 * num_workers data items, this commit help us make this configurable, e.x. you can specify to prefetch 10 * num_workers data items.

yl-to · 2020-07-08T17:46:14Z

@ssnl @albanD

albanD

Looks ok to me (just small edit in the doc)
Will wait for @ssnl's approval.

torch/utils/data/dataloader.py

yl-to · 2020-07-09T17:25:08Z

Looks ok to me (just small edit in the doc)
Will wait for @ssnl's approval.

Can we land this since another reviewer is not responding to this?

albanD

Given #13023 it looks good on his side.
Just a small update on the doc phrasing and it will be good.

torch/utils/data/dataloader.py

ssnl

BC breaking by adding positional arguments in the middle of a signature

ssnl · 2020-07-09T17:49:24Z

Looks ok to me (just small edit in the doc)
Will wait for @ssnl's approval.

Can we land this since another reviewer is not responding to this?

Hi, I don't want to sound critical. But it is less than 24 hours from opening this PR when you commented. Various people have different responsibilities that may prevent them from responding quickly (personally I am in the middle of planning a move). I think it would be better if we are a bit more patient to get more thorough reviews than to rush a patch in. Thanks for understanding!

ssnl · 2020-07-09T17:51:11Z

Re: the patch

Maybe we can make this a keyword only argument? We weren't able to for a lot of these args because we supported py2. But now we can! Also there should be an error if this is set when num_workers = 0.

yl-to · 2020-07-09T22:13:12Z

Looks ok to me (just small edit in the doc)
Will wait for @ssnl's approval.

Can we land this since another reviewer is not responding to this?

Hi, I don't want to sound critical. But it is less than 24 hours from opening this PR when you commented. Various people have different responsibilities that may prevent them from responding quickly (personally I am in the middle of planning a move). I think it would be better if we are a bit more patient to get more thorough reviews than to rush a patch in. Thanks for understanding!

I am so sorry, I was mentioned reviewers randomly and thinking that you maybe were busying in doing other important stuff and didn't mention this issue. I will take care of this next time.

yl-to · 2020-07-09T22:13:51Z

Re: the patch

Maybe we can make this a keyword only argument? We weren't able to for a lot of these args because we supported py2. But now we can! Also there should be an error if this is set when num_workers = 0.

Am working on this to create a new commit.

dr-ci · 2020-07-09T22:42:36Z

💊 CI failures summary and remediations

As of commit c6d6779 (more details on the Dr. CI page):

💚 💚 Looks good so far! There are no failures yet. 💚 💚

This comment was automatically generated by Dr. CI (expand for details).

Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions on the GitHub issue tracker or post in the (internal) Dr. CI Users group.

See how this bot performed.

This comment has been revised 34 times.

ssnl · 2020-07-10T02:10:28Z

Looks ok to me (just small edit in the doc)
Will wait for @ssnl's approval.

Can we land this since another reviewer is not responding to this?

Hi, I don't want to sound critical. But it is less than 24 hours from opening this PR when you commented. Various people have different responsibilities that may prevent them from responding quickly (personally I am in the middle of planning a move). I think it would be better if we are a bit more patient to get more thorough reviews than to rush a patch in. Thanks for understanding!

I am so sorry, I was mentioned reviewers randomly and thinking that you maybe were busying in doing other important stuff and didn't mention this issue. I will take care of this next time.

No worries! Thanks for contributing! :D

yl-to · 2020-07-10T05:36:06Z

Re: the patch

Maybe we can make this a keyword only argument? We weren't able to for a lot of these args because we supported py2. But now we can! Also there should be an error if this is set when num_workers = 0.

I have turned the prefetch factor to keyword-only parameter.
Regarding the num_workers = 0 should raise error when prefetch_factor is set:
if num_workers == 0, I think the code will go to _SingleProcessDataLoaderIter Class, and this class didn't use prefetch factor at all.
if num_workers > 0, the code will go to _MultiProcessingDataLoaderIter Class, and it doesn't matter whether the user set the prefetch factor.

Do we really need to raise an error? Please let me know if I wrongly understand the code. Thanks! @ssnl

ssnl · 2020-07-10T19:21:31Z

@yl-to Yes we still want to raise an error because setting prefetch_factor has no effect when num_workers = 0 so we should tell users that they shouldn't set it with num_workers=0. You can see similar arg check code in DataLoader.__init__.

yl-to · 2020-07-10T20:33:37Z

@yl-to Yes we still want to raise an error because setting prefetch_factor has no effect when num_workers = 0 so we should tell users that they shouldn't set it with num_workers=0. You can see similar arg check code in DataLoader.__init__.

@ssnl how could we know if the prefetch factor is default or set by user? Is setting the prefetch_factor default value to None initially a good idea here?

ssnl · 2020-07-10T23:18:55Z

Just compare against the default value `2`. Similar pattern can be found at DataLoader.__init__

…

On Fri, Jul 10, 2020 at 4:33 PM yl-to ***@***.***> wrote: @yl-to <https://github.com/yl-to> Yes we still want to raise an error because setting prefetch_factor has no effect when num_workers = 0 so we should tell users that they shouldn't set it with num_workers=0. You can see similar arg check code in DataLoader.__init__. Last question, how could we know if the prefetch factor is default or set by user? Is setting the prefetch_factor default value to None initially a good idea here? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#41130 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ABLJMZP7ZBRCHX4KL6JY7D3R253K7ANCNFSM4OUZTAAA> .

yl-to · 2020-07-11T00:15:00Z

Just compare against the default value 2. Similar pattern can be found at DataLoader.init
…
On Fri, Jul 10, 2020 at 4:33 PM yl-to @.***> wrote: @yl-to https://github.com/yl-to Yes we still want to raise an error because setting prefetch_factor has no effect when num_workers = 0 so we should tell users that they shouldn't set it with num_workers=0. You can see similar arg check code in DataLoader.init. Last question, how could we know if the prefetch factor is default or set by user? Is setting the prefetch_factor default value to None initially a good idea here? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#41130 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABLJMZP7ZBRCHX4KL6JY7D3R253K7ANCNFSM4OUZTAAA .

Thanks! Will create a new commit.
What I was thinking before is a case that a user set num_worker = 0 and at the mean time he specified the prefecth_factor = 2(which is the default value itself).

ssnl

Also need some tests in test_dataloader.py, and revert the submodule change.

torch/utils/data/dataloader.py

ssnl · 2020-07-13T15:37:34Z

Also remember to update the signature here

pytorch/docs/source/data.rst

Lines 22 to 25 in 67a4f37

    
               DataLoader(dataset, batch_size=1, shuffle=False, sampler=None, 
        
                          batch_sampler=None, num_workers=0, collate_fn=None, 
        
                          pin_memory=False, drop_last=False, timeout=0, 
        
                          worker_init_fn=None)

ssnl

still need some test, and revert the submodule changes

docs/source/data.rst

yl-to · 2020-07-13T21:41:48Z

Also need some tests in test_dataloader.py, and revert the submodule change.

@ssnl, just to clarify 2 things:

You want me to write new unit tests functions in the DataLoader test, right? If so, what kind of test do we need to implement for this adding parameter, are there any test case which is similar to this one?
What does "revert the submodule change" mean? like to update the submodule?

Sorry I am really new to opensource development.

ssnl · 2020-07-14T18:17:54Z

Re test:
I would recommend augmenting the following test codes to add tests for a custom prefetch_factor

For the following, just adding extra self._test_* calls would be fine.

pytorch/test/test_dataloader.py

Lines 1271 to 1281 in 5f146a4

    
           def test_sequential_workers(self): 
        
               self._test_sequential(DataLoader(self.dataset, num_workers=4)) 
        
           def test_seqential_batch_workers(self): 
        
               self._test_sequential(DataLoader(self.dataset, batch_size=2, num_workers=4)) 
        
           def test_shuffle_workers(self): 
        
               self._test_shuffle(DataLoader(self.dataset, shuffle=True, num_workers=4)) 
        
           def test_shuffle_batch_workers(self): 
        
               self._test_shuffle(DataLoader(self.dataset, batch_size=2, shuffle=True, num_workers=4))

For the following, it would be good to add a for prefetch_factor in [2, 4] outer loop

pytorch/test/test_dataloader.py

Lines 1057 to 1091 in 5f146a4

    
           # [no auto-batching] multiprocessing loading 
        
           num_workers = 3 
        
           sizes_for_all_workers = [0, 4, 20] 
        
           expected = sorted(sum((list(range(s)) for s in sizes_for_all_workers), [])) 
        
           assert len(sizes_for_all_workers) == num_workers, 'invalid test case' 
        
           dataset = WorkerSpecificIterableDataset(sizes_for_all_workers) 
        
           dataloader = DataLoader(dataset, num_workers=num_workers, batch_size=None, 
        
                                   worker_init_fn=set_faulthander_if_available) 
        
           dataloader_iter = iter(dataloader) 
        
           fetched = sorted(dataloader_iter) 
        
           for a, b in zip(fetched, expected): 
        
               # non-batched should not convert ints into tensors 
        
               self.assertIsInstance(a, torch._six.int_classes) 
        
               self.assertEqual(a, b) 
        
           # DataLoader should match len of the iterable-style dataset (if implemented) 
        
           self.assertEqual(len(dataloader), len(dataset)) 
        
           # When loading more than len(dataset) data, after accessing len(dataloader), 
        
           # we should get a warning. See NOTE [ IterableDataset and __len__ ]. 
        
           dataset = CountingIterableDataset(20) 
        
           dataloader = DataLoader(dataset, num_workers=num_workers, 
        
                                   worker_init_fn=set_faulthander_if_available) 
        
           it = iter(dataloader) 
        
           for _ in range(40): 
        
               self.assertNotWarn(lambda: next(it), "Should not warn before accessing len(dataloader)") 
        
           self.assertEqual(len(dataloader), len(dataset)) 
        
           self.assertEqual(len(dataloader), 20) 
        
           it = iter(dataloader) 
        
           for _ in range(20): 
        
               self.assertNotWarn(lambda: next(it), "Should not warn before exceeding length") 
        
           for _ in range(3): 
        
               with self.assertWarnsRegex( 
        
                   UserWarning, 
        
                   r"but [0-9]+ samples have been fetched\. For multiprocessing data-loading, this", 
        
                       msg="Should always warn after exceeding length"): 
        
                   next(it)

pytorch/test/test_dataloader.py

Lines 1113 to 1139 in 5f146a4

    
           # [auto-batching] multiprocessing loading 
        
           num_workers = 3 
        
           sizes_for_all_workers = [0, 4, 20] 
        
           expected = sorted(sum((list(range(s)) for s in sizes_for_all_workers), [])) 
        
           assert len(sizes_for_all_workers) == num_workers, 'invalid test case' 
        
           dataset = WorkerSpecificIterableDataset(sizes_for_all_workers) 
        
           # worker 0 should return 0 batches 
        
           # worker 1 should return 1 batches 
        
           # worker 2 should return 3 batches 
        
           dataloader = DataLoader(dataset, num_workers=num_workers, batch_size=7) 
        
           dataloader_iter = iter(dataloader) 
        
           fetched = list(dataloader_iter) 
        
           self.assertEqual(len(fetched), 4) 
        
           fetched = set(tuple(t.tolist()) for t in fetched) 
        
           self.assertEqual(fetched, {tuple(range(4)), tuple(range(7)), tuple(range(7, 14)), tuple(range(14, 20))}) 
        
           # [auto-batching] test that workers exit gracefully 
        
           workers = dataloader_iter._workers 
        
           del dataloader_iter 
        
           try: 
        
               for w in workers: 
        
                   w.join(JOIN_TIMEOUT) 
        
                   self.assertFalse(w.is_alive()) 
        
                   self.assertEqual(w.exitcode, 0) 
        
           finally: 
        
               for w in workers: 
        
                   w.terminate()

pytorch/test/test_dataloader.py

Lines 1148 to 1175 in 5f146a4

    
           # [auto-batching & drop_last] multiprocessing loading 
        
           num_workers = 3 
        
           sizes_for_all_workers = [0, 4, 20] 
        
           expected = sorted(sum((list(range(s)) for s in sizes_for_all_workers), [])) 
        
           assert len(sizes_for_all_workers) == num_workers, 'invalid test case' 
        
           dataset = WorkerSpecificIterableDataset(sizes_for_all_workers) 
        
           # worker 0 should return 0 batches 
        
           # worker 1 should return 1 batches 
        
           # worker 2 should return 3 batches 
        
           dataloader = DataLoader(dataset, num_workers=num_workers, batch_size=7, drop_last=True, 
        
                                   worker_init_fn=set_faulthander_if_available) 
        
           dataloader_iter = iter(dataloader) 
        
           fetched = list(dataloader_iter) 
        
           self.assertEqual(len(fetched), 2) 
        
           fetched = set(tuple(t.tolist()) for t in fetched) 
        
           self.assertEqual(fetched, {tuple(range(7)), tuple(range(7, 14))}) 
        
           # [auto-batching & drop_last] test that workers exit gracefully 
        
           workers = dataloader_iter._workers 
        
           del dataloader_iter 
        
           try: 
        
               for w in workers: 
        
                   w.join(JOIN_TIMEOUT) 
        
                   self.assertFalse(w.is_alive()) 
        
                   self.assertEqual(w.exitcode, 0) 
        
           finally: 
        
               for w in workers: 
        
                   w.terminate()

ssnl · 2020-07-14T18:18:13Z

For submodule change, you can see it if you click on the "Files Changed" tab on this webpage.

yl-to · 2020-07-20T10:09:34Z

@ssnl Hi Tongzhou, tests added and rebased the branch. Please help to review if you have time. Thanks!

torch/utils/data/dataloader.py

ssnl · 2020-07-21T18:26:54Z

One minor thing! Looks great otherwise. Thanks!

ssnl

Thanks a lot! Waiting on CI.

yl-to · 2020-07-22T00:18:14Z

Thanks a lot! Waiting on CI.

Seems the CI has been passed, please help to land it when you have time, thanks! @ssnl

yl-to · 2020-07-23T20:57:22Z

Thanks a lot! Waiting on CI.

@ssnl Hi Tongzhou, sorry for the bothering, is there any signal or prompt for a merging commit?
I didn't know how to track the merging status.

Thanks!

albanD

Thanks!
My bad I missed the notification from Simon's accept.

facebook-github-bot

@albanD has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

facebook-github-bot · 2020-07-24T20:13:37Z

@albanD merged this pull request in 1b55e2b.

yl-to requested a review from apaszke as a code owner July 8, 2020 17:42

pytorchbot added the open source label Jul 8, 2020

albanD reviewed Jul 8, 2020

View reviewed changes

torch/utils/data/dataloader.py Outdated Show resolved Hide resolved

albanD reviewed Jul 9, 2020

View reviewed changes

torch/utils/data/dataloader.py Outdated Show resolved Hide resolved

ssnl requested changes Jul 9, 2020

View reviewed changes

yl-to force-pushed the prefetch_factor branch from 85df71a to 2f408b6 Compare July 10, 2020 05:38

yl-to force-pushed the prefetch_factor branch from b60d404 to 0525096 Compare July 11, 2020 00:23

ssnl reviewed Jul 11, 2020

View reviewed changes

torch/utils/data/dataloader.py Outdated Show resolved Hide resolved

ailzhang added the triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module label Jul 13, 2020

ssnl reviewed Jul 13, 2020

View reviewed changes

docs/source/data.rst Outdated Show resolved Hide resolved

yl-to force-pushed the prefetch_factor branch 2 times, most recently from e8d0f6c to 0254129 Compare July 20, 2020 10:07

ssnl reviewed Jul 21, 2020

View reviewed changes

torch/utils/data/dataloader.py Show resolved Hide resolved

Liu and others added 8 commits July 21, 2020 12:49

add prefetch_factor for multiprocessing prefetching process

c3c6468

update document about prefetch factor

fbca2e6

document update

ce36049

turn prefetch_factor into keyword-only argument

32b9b6e

document and signature change for prefetch_factor

5bce2b2

add test for prefetch_factor

a9b428e

format change for test

c757287

format change

c3a3c1a

yl-to force-pushed the prefetch_factor branch from 8a640d5 to c3a3c1a Compare July 21, 2020 19:49

yl-to added 2 commits July 21, 2020 14:24

modified assertion for prefetch_factor

712df24

sightly format change to pass flake-8 format check

c6d6779

ssnl approved these changes Jul 21, 2020

View reviewed changes

albanD approved these changes Jul 23, 2020

View reviewed changes

facebook-github-bot reviewed Jul 23, 2020

View reviewed changes

facebook-github-bot closed this in 1b55e2b Jul 24, 2020

facebook-github-bot added the merged label Jul 24, 2020

mruberry added the Merged label Oct 28, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add prefetch_factor for multiprocessing prefetching process #41130

add prefetch_factor for multiprocessing prefetching process #41130

yl-to commented Jul 8, 2020 •

edited

yl-to commented Jul 8, 2020

albanD left a comment

yl-to commented Jul 9, 2020

albanD left a comment

ssnl left a comment •

edited

ssnl commented Jul 9, 2020

ssnl commented Jul 9, 2020

yl-to commented Jul 9, 2020

yl-to commented Jul 9, 2020

dr-ci bot commented Jul 9, 2020 •

edited

ssnl commented Jul 10, 2020

yl-to commented Jul 10, 2020 •

edited

ssnl commented Jul 10, 2020

yl-to commented Jul 10, 2020 •

edited

ssnl commented Jul 10, 2020 via email

yl-to commented Jul 11, 2020 •

edited

ssnl left a comment

ssnl commented Jul 13, 2020

ssnl left a comment

yl-to commented Jul 13, 2020 •

edited

ssnl commented Jul 14, 2020

ssnl commented Jul 14, 2020

yl-to commented Jul 20, 2020

ssnl commented Jul 21, 2020

ssnl left a comment

yl-to commented Jul 22, 2020 •

edited

yl-to commented Jul 23, 2020

albanD left a comment •

edited

facebook-github-bot left a comment

facebook-github-bot commented Jul 24, 2020

add prefetch_factor for multiprocessing prefetching process #41130

add prefetch_factor for multiprocessing prefetching process #41130

Conversation

yl-to commented Jul 8, 2020 • edited

yl-to commented Jul 8, 2020

albanD left a comment

Choose a reason for hiding this comment

yl-to commented Jul 9, 2020

albanD left a comment

Choose a reason for hiding this comment

ssnl left a comment • edited

Choose a reason for hiding this comment

ssnl commented Jul 9, 2020

ssnl commented Jul 9, 2020

yl-to commented Jul 9, 2020

yl-to commented Jul 9, 2020

dr-ci bot commented Jul 9, 2020 • edited

💊 CI failures summary and remediations

ssnl commented Jul 10, 2020

yl-to commented Jul 10, 2020 • edited

ssnl commented Jul 10, 2020

yl-to commented Jul 10, 2020 • edited

ssnl commented Jul 10, 2020 via email

yl-to commented Jul 11, 2020 • edited

ssnl left a comment

Choose a reason for hiding this comment

ssnl commented Jul 13, 2020

ssnl left a comment

Choose a reason for hiding this comment

yl-to commented Jul 13, 2020 • edited

ssnl commented Jul 14, 2020

ssnl commented Jul 14, 2020

yl-to commented Jul 20, 2020

ssnl commented Jul 21, 2020

ssnl left a comment

Choose a reason for hiding this comment

yl-to commented Jul 22, 2020 • edited

yl-to commented Jul 23, 2020

albanD left a comment • edited

Choose a reason for hiding this comment

facebook-github-bot left a comment

Choose a reason for hiding this comment

facebook-github-bot commented Jul 24, 2020

yl-to commented Jul 8, 2020 •

edited

ssnl left a comment •

edited

dr-ci bot commented Jul 9, 2020 •

edited

yl-to commented Jul 10, 2020 •

edited

yl-to commented Jul 10, 2020 •

edited

yl-to commented Jul 11, 2020 •

edited

yl-to commented Jul 13, 2020 •

edited

yl-to commented Jul 22, 2020 •

edited

albanD left a comment •

edited