Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add prefetch_factor for multiprocessing prefetching process #41130

Closed
wants to merge 10 commits into from

Conversation

yl-to
Copy link
Contributor

@yl-to yl-to commented Jul 8, 2020

fix #40604
Add parameter to Dataloader to configure the per-worker prefetch number.
Before this edit, the prefetch process always prefetch 2 * num_workers data items, this commit help us make this configurable, e.x. you can specify to prefetch 10 * num_workers data items.

@yl-to yl-to requested a review from apaszke as a code owner July 8, 2020 17:42
@yl-to
Copy link
Contributor Author

yl-to commented Jul 8, 2020

@ssnl @albanD

Copy link
Collaborator

@albanD albanD left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks ok to me (just small edit in the doc)
Will wait for @ssnl's approval.

torch/utils/data/dataloader.py Outdated Show resolved Hide resolved
@yl-to
Copy link
Contributor Author

yl-to commented Jul 9, 2020

Looks ok to me (just small edit in the doc)
Will wait for @ssnl's approval.

Can we land this since another reviewer is not responding to this?

Copy link
Collaborator

@albanD albanD left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given #13023 it looks good on his side.
Just a small update on the doc phrasing and it will be good.

torch/utils/data/dataloader.py Outdated Show resolved Hide resolved
Copy link
Collaborator

@ssnl ssnl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BC breaking by adding positional arguments in the middle of a signature

@ssnl
Copy link
Collaborator

ssnl commented Jul 9, 2020

Looks ok to me (just small edit in the doc)
Will wait for @ssnl's approval.

Can we land this since another reviewer is not responding to this?

Hi, I don't want to sound critical. But it is less than 24 hours from opening this PR when you commented. Various people have different responsibilities that may prevent them from responding quickly (personally I am in the middle of planning a move). I think it would be better if we are a bit more patient to get more thorough reviews than to rush a patch in. Thanks for understanding!

@ssnl
Copy link
Collaborator

ssnl commented Jul 9, 2020

Re: the patch

Maybe we can make this a keyword only argument? We weren't able to for a lot of these args because we supported py2. But now we can! Also there should be an error if this is set when num_workers = 0.

@yl-to
Copy link
Contributor Author

yl-to commented Jul 9, 2020

Looks ok to me (just small edit in the doc)
Will wait for @ssnl's approval.

Can we land this since another reviewer is not responding to this?

Hi, I don't want to sound critical. But it is less than 24 hours from opening this PR when you commented. Various people have different responsibilities that may prevent them from responding quickly (personally I am in the middle of planning a move). I think it would be better if we are a bit more patient to get more thorough reviews than to rush a patch in. Thanks for understanding!

I am so sorry, I was mentioned reviewers randomly and thinking that you maybe were busying in doing other important stuff and didn't mention this issue. I will take care of this next time.

@yl-to
Copy link
Contributor Author

yl-to commented Jul 9, 2020

Re: the patch

Maybe we can make this a keyword only argument? We weren't able to for a lot of these args because we supported py2. But now we can! Also there should be an error if this is set when num_workers = 0.

Am working on this to create a new commit.

@dr-ci
Copy link

dr-ci bot commented Jul 9, 2020

💊 CI failures summary and remediations

As of commit c6d6779 (more details on the Dr. CI page):


💚 💚 Looks good so far! There are no failures yet. 💚 💚


This comment was automatically generated by Dr. CI (expand for details).Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions on the GitHub issue tracker or post in the (internal) Dr. CI Users group.

See how this bot performed.

This comment has been revised 34 times.

@ssnl
Copy link
Collaborator

ssnl commented Jul 10, 2020

Looks ok to me (just small edit in the doc)
Will wait for @ssnl's approval.

Can we land this since another reviewer is not responding to this?

Hi, I don't want to sound critical. But it is less than 24 hours from opening this PR when you commented. Various people have different responsibilities that may prevent them from responding quickly (personally I am in the middle of planning a move). I think it would be better if we are a bit more patient to get more thorough reviews than to rush a patch in. Thanks for understanding!

I am so sorry, I was mentioned reviewers randomly and thinking that you maybe were busying in doing other important stuff and didn't mention this issue. I will take care of this next time.

No worries! Thanks for contributing! :D

@yl-to
Copy link
Contributor Author

yl-to commented Jul 10, 2020

Re: the patch

Maybe we can make this a keyword only argument? We weren't able to for a lot of these args because we supported py2. But now we can! Also there should be an error if this is set when num_workers = 0.

I have turned the prefetch factor to keyword-only parameter.
Regarding the num_workers = 0 should raise error when prefetch_factor is set:
if num_workers == 0, I think the code will go to _SingleProcessDataLoaderIter Class, and this class didn't use prefetch factor at all.
if num_workers > 0, the code will go to _MultiProcessingDataLoaderIter Class, and it doesn't matter whether the user set the prefetch factor.

Do we really need to raise an error? Please let me know if I wrongly understand the code. Thanks! @ssnl

@ssnl
Copy link
Collaborator

ssnl commented Jul 10, 2020

@yl-to Yes we still want to raise an error because setting prefetch_factor has no effect when num_workers = 0 so we should tell users that they shouldn't set it with num_workers=0. You can see similar arg check code in DataLoader.__init__.

@yl-to
Copy link
Contributor Author

yl-to commented Jul 10, 2020

@yl-to Yes we still want to raise an error because setting prefetch_factor has no effect when num_workers = 0 so we should tell users that they shouldn't set it with num_workers=0. You can see similar arg check code in DataLoader.__init__.

@ssnl how could we know if the prefetch factor is default or set by user? Is setting the prefetch_factor default value to None initially a good idea here?

@ssnl
Copy link
Collaborator

ssnl commented Jul 10, 2020 via email

@yl-to
Copy link
Contributor Author

yl-to commented Jul 11, 2020

Just compare against the default value 2. Similar pattern can be found at DataLoader.init

On Fri, Jul 10, 2020 at 4:33 PM yl-to @.***> wrote: @yl-to https://github.com/yl-to Yes we still want to raise an error because setting prefetch_factor has no effect when num_workers = 0 so we should tell users that they shouldn't set it with num_workers=0. You can see similar arg check code in DataLoader.init. Last question, how could we know if the prefetch factor is default or set by user? Is setting the prefetch_factor default value to None initially a good idea here? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#41130 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABLJMZP7ZBRCHX4KL6JY7D3R253K7ANCNFSM4OUZTAAA .

Thanks! Will create a new commit.
What I was thinking before is a case that a user set num_worker = 0 and at the mean time he specified the prefecth_factor = 2(which is the default value itself).

Copy link
Collaborator

@ssnl ssnl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also need some tests in test_dataloader.py, and revert the submodule change.

torch/utils/data/dataloader.py Outdated Show resolved Hide resolved
@ssnl
Copy link
Collaborator

ssnl commented Jul 13, 2020

Also remember to update the signature here

DataLoader(dataset, batch_size=1, shuffle=False, sampler=None,
batch_sampler=None, num_workers=0, collate_fn=None,
pin_memory=False, drop_last=False, timeout=0,
worker_init_fn=None)

@ailzhang ailzhang added the triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module label Jul 13, 2020
Copy link
Collaborator

@ssnl ssnl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

still need some test, and revert the submodule changes

docs/source/data.rst Outdated Show resolved Hide resolved
@yl-to
Copy link
Contributor Author

yl-to commented Jul 13, 2020

Also need some tests in test_dataloader.py, and revert the submodule change.

@ssnl, just to clarify 2 things:

  1. You want me to write new unit tests functions in the DataLoader test, right? If so, what kind of test do we need to implement for this adding parameter, are there any test case which is similar to this one?
  2. What does "revert the submodule change" mean? like to update the submodule?

Sorry I am really new to opensource development.

@ssnl
Copy link
Collaborator

ssnl commented Jul 14, 2020

Re test:
I would recommend augmenting the following test codes to add tests for a custom prefetch_factor

For the following, just adding extra self._test_* calls would be fine.

def test_sequential_workers(self):
self._test_sequential(DataLoader(self.dataset, num_workers=4))
def test_seqential_batch_workers(self):
self._test_sequential(DataLoader(self.dataset, batch_size=2, num_workers=4))
def test_shuffle_workers(self):
self._test_shuffle(DataLoader(self.dataset, shuffle=True, num_workers=4))
def test_shuffle_batch_workers(self):
self._test_shuffle(DataLoader(self.dataset, batch_size=2, shuffle=True, num_workers=4))

For the following, it would be good to add a for prefetch_factor in [2, 4] outer loop

# [no auto-batching] multiprocessing loading
num_workers = 3
sizes_for_all_workers = [0, 4, 20]
expected = sorted(sum((list(range(s)) for s in sizes_for_all_workers), []))
assert len(sizes_for_all_workers) == num_workers, 'invalid test case'
dataset = WorkerSpecificIterableDataset(sizes_for_all_workers)
dataloader = DataLoader(dataset, num_workers=num_workers, batch_size=None,
worker_init_fn=set_faulthander_if_available)
dataloader_iter = iter(dataloader)
fetched = sorted(dataloader_iter)
for a, b in zip(fetched, expected):
# non-batched should not convert ints into tensors
self.assertIsInstance(a, torch._six.int_classes)
self.assertEqual(a, b)
# DataLoader should match len of the iterable-style dataset (if implemented)
self.assertEqual(len(dataloader), len(dataset))
# When loading more than len(dataset) data, after accessing len(dataloader),
# we should get a warning. See NOTE [ IterableDataset and __len__ ].
dataset = CountingIterableDataset(20)
dataloader = DataLoader(dataset, num_workers=num_workers,
worker_init_fn=set_faulthander_if_available)
it = iter(dataloader)
for _ in range(40):
self.assertNotWarn(lambda: next(it), "Should not warn before accessing len(dataloader)")
self.assertEqual(len(dataloader), len(dataset))
self.assertEqual(len(dataloader), 20)
it = iter(dataloader)
for _ in range(20):
self.assertNotWarn(lambda: next(it), "Should not warn before exceeding length")
for _ in range(3):
with self.assertWarnsRegex(
UserWarning,
r"but [0-9]+ samples have been fetched\. For multiprocessing data-loading, this",
msg="Should always warn after exceeding length"):
next(it)

# [auto-batching] multiprocessing loading
num_workers = 3
sizes_for_all_workers = [0, 4, 20]
expected = sorted(sum((list(range(s)) for s in sizes_for_all_workers), []))
assert len(sizes_for_all_workers) == num_workers, 'invalid test case'
dataset = WorkerSpecificIterableDataset(sizes_for_all_workers)
# worker 0 should return 0 batches
# worker 1 should return 1 batches
# worker 2 should return 3 batches
dataloader = DataLoader(dataset, num_workers=num_workers, batch_size=7)
dataloader_iter = iter(dataloader)
fetched = list(dataloader_iter)
self.assertEqual(len(fetched), 4)
fetched = set(tuple(t.tolist()) for t in fetched)
self.assertEqual(fetched, {tuple(range(4)), tuple(range(7)), tuple(range(7, 14)), tuple(range(14, 20))})
# [auto-batching] test that workers exit gracefully
workers = dataloader_iter._workers
del dataloader_iter
try:
for w in workers:
w.join(JOIN_TIMEOUT)
self.assertFalse(w.is_alive())
self.assertEqual(w.exitcode, 0)
finally:
for w in workers:
w.terminate()

# [auto-batching & drop_last] multiprocessing loading
num_workers = 3
sizes_for_all_workers = [0, 4, 20]
expected = sorted(sum((list(range(s)) for s in sizes_for_all_workers), []))
assert len(sizes_for_all_workers) == num_workers, 'invalid test case'
dataset = WorkerSpecificIterableDataset(sizes_for_all_workers)
# worker 0 should return 0 batches
# worker 1 should return 1 batches
# worker 2 should return 3 batches
dataloader = DataLoader(dataset, num_workers=num_workers, batch_size=7, drop_last=True,
worker_init_fn=set_faulthander_if_available)
dataloader_iter = iter(dataloader)
fetched = list(dataloader_iter)
self.assertEqual(len(fetched), 2)
fetched = set(tuple(t.tolist()) for t in fetched)
self.assertEqual(fetched, {tuple(range(7)), tuple(range(7, 14))})
# [auto-batching & drop_last] test that workers exit gracefully
workers = dataloader_iter._workers
del dataloader_iter
try:
for w in workers:
w.join(JOIN_TIMEOUT)
self.assertFalse(w.is_alive())
self.assertEqual(w.exitcode, 0)
finally:
for w in workers:
w.terminate()

@ssnl
Copy link
Collaborator

ssnl commented Jul 14, 2020

For submodule change, you can see it if you click on the "Files Changed" tab on this webpage.

@yl-to yl-to force-pushed the prefetch_factor branch 2 times, most recently from e8d0f6c to 0254129 Compare July 20, 2020 10:07
@yl-to
Copy link
Contributor Author

yl-to commented Jul 20, 2020

@ssnl Hi Tongzhou, tests added and rebased the branch. Please help to review if you have time. Thanks!

@ssnl
Copy link
Collaborator

ssnl commented Jul 21, 2020

One minor thing! Looks great otherwise. Thanks!

Copy link
Collaborator

@ssnl ssnl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot! Waiting on CI.

@yl-to
Copy link
Contributor Author

yl-to commented Jul 22, 2020

Thanks a lot! Waiting on CI.

Seems the CI has been passed, please help to land it when you have time, thanks! @ssnl

@yl-to
Copy link
Contributor Author

yl-to commented Jul 23, 2020

Thanks a lot! Waiting on CI.

@ssnl Hi Tongzhou, sorry for the bothering, is there any signal or prompt for a merging commit?
I didn't know how to track the merging status.

Thanks!

Copy link
Collaborator

@albanD albanD left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!
My bad I missed the notification from Simon's accept.

Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@albanD has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@albanD merged this pull request in 1b55e2b.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Merged open source triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Parameterize _MultiProcessingDataLoaderIter's multiplier of prefetched items'
7 participants