[DataLoader] Move loop content into a function to ensure we don't preserve anything #83595

albanD · 2022-08-17T15:11:10Z

Can lead to CPU memory saving as we don't hold onto the pin memory buffer as long as we used to.

facebook-github-bot · 2022-08-17T15:11:19Z

🔗 Helpful links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/83595
📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓Need help or want to give feedback on the CI? Visit our office hours

✅ No Failures (0 Pending)

As of commit 93cd751 (more details on the Dr. CI page):

Expand to see more

💚 💚 Looks good so far! There are no failures yet. 💚 💚

This comment was automatically generated by Dr. CI (expand for details).

Please report bugs/suggestions to the (internal) Dr. CI Users group.

Click here to manually regenerate this comment.

NivekT

I think this is faster (not such by how much) and should not cause any issue. I want to benchmark this to see if there is any noticeable difference but haven't gotten to it yet.

@VitalyFedyunin @ejguan, can you confirm this doesn't cause any problem?

ejguan · 2022-08-18T14:34:37Z

torch/utils/data/_utils/pin_memory.py

-                break
-            except queue.Full:
-                continue
-        del r  # save memory


I feel like it would be simpler if we just do r = None at the end if we want to move the Tensor out of scope. WDYT?

But, the original problem that GIL is blocked still persist when gc tries to clean up the Tensor.

I feel like it would be simpler if we just do r = None at the end if we want to move the Tensor out of scope. WDYT?

The problem is that doing r=None here only clear the tuple itself. data is still a local variable that remain alive. So this line is doing pretty much nothing today.
We could do del r, data to solve this. But moving it into the function makes it more future-proof as all local state will be properly removed.

But, the original problem that GIL is blocked still persist when gc tries to clean up the Tensor.

Not sure how this is linked?
If you're talking about expensive unmap that happen on these objects, #83623 should fix that.

If you're talking about expensive unmap that happen on these objects, #83623 should fix that.

Thanks for pointing it out. It looks good.

The problem is that doing r=None here only clear the tuple itself. data is still a local variable that remain alive. So this line is doing pretty much nothing today. We could do del r, data to solve this. But moving it into the function makes it more future-proof as all local state will be properly removed.

Oh, yeah. Tensor is also referenced by data. I agree on the idea of future proof. And, do you mind moving this function out of while not done_event.is_set():? Otherwise, I believe each iteration would create a new function.

Yes I left it there on purpose but without any strong reason.
Re-creating it means that there is no captured variable kept alive by the function.

But that shouldn't happen here indeed. moving it out!

ejguan

Thank you! I think this PR should be good to go after moving the function out of while loop

NivekT

Thanks albanD and Erjia 🚀

albanD · 2022-08-18T17:27:38Z

@pytorchbot merge -g

pytorchmergebot · 2022-08-18T17:28:55Z

@pytorchbot successfully started a merge job. Check the current status here.
The merge job was triggered with the green (-g) flag. This means that your change will be merged once all checks on your PR have passed (ETA: 0-4 Hours). If this is not the intended behavior, feel free to use some of the other merge options in the wiki.
Please reach out to the PyTorch DevX Team with feedback or questions!

…serve anything (#83595) (#83595) Summary: Can lead to CPU memory saving as we don't hold onto the pin memory buffer as long as we used to. Pull Request resolved: #83595 Approved by: https://github.com/ejguan, https://github.com/NivekT Test Plan: contbuild & OSS CI, see https://hud.pytorch.org/commit/pytorch/pytorch/38348362608a47371c65d7fd52db138b4c6a5d65 Reviewed By: atalman Differential Revision: D38852489 Pulled By: albanD fbshipit-source-id: 13021b949a34b64ffb9835992b77ef82fcbb0e85

albanD requested a review from NivekT August 17, 2022 15:11

facebook-github-bot added the cla signed label Aug 17, 2022

NivekT changed the title ~~Move loop content into a function to ensure we don't preserve anything~~ [DataLoader] Move loop content into a function to ensure we don't preserve anything Aug 17, 2022

NivekT added release notes: dataloader release notes category topic: improvements topic category topic: performance topic category labels Aug 17, 2022

NivekT reviewed Aug 17, 2022

View reviewed changes

NivekT requested review from VitalyFedyunin and ejguan August 17, 2022 23:04

ejguan reviewed Aug 18, 2022

View reviewed changes

ejguan approved these changes Aug 18, 2022

View reviewed changes

albanD added 3 commits August 18, 2022 11:44

Move loop content into a function to ensure we don't preserve anything

a27b450

lint

ce86d9f

move function definition out of the loop

93cd751

albanD force-pushed the pin_memory_dataloader_nit branch from 0f0dde1 to 93cd751 Compare August 18, 2022 15:44

NivekT approved these changes Aug 18, 2022

View reviewed changes

pytorchmergebot added the Merged label Aug 18, 2022

pytorchmergebot closed this in 3834836 Aug 18, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[DataLoader] Move loop content into a function to ensure we don't preserve anything #83595

[DataLoader] Move loop content into a function to ensure we don't preserve anything #83595

Uh oh!

albanD commented Aug 17, 2022

Uh oh!

facebook-github-bot commented Aug 17, 2022 •

edited

Loading

Uh oh!

NivekT left a comment •

edited

Loading

Uh oh!

ejguan Aug 18, 2022

Uh oh!

albanD Aug 18, 2022

Uh oh!

ejguan Aug 18, 2022

Uh oh!

albanD Aug 18, 2022

Uh oh!

ejguan left a comment

Uh oh!

NivekT left a comment

Uh oh!

albanD commented Aug 18, 2022

Uh oh!

pytorchmergebot commented Aug 18, 2022

Uh oh!

Uh oh!

[DataLoader] Move loop content into a function to ensure we don't preserve anything #83595

[DataLoader] Move loop content into a function to ensure we don't preserve anything #83595

Uh oh!

Conversation

albanD commented Aug 17, 2022

Uh oh!

facebook-github-bot commented Aug 17, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful links

✅ No Failures (0 Pending)

Uh oh!

NivekT left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ejguan Aug 18, 2022

Choose a reason for hiding this comment

Uh oh!

albanD Aug 18, 2022

Choose a reason for hiding this comment

Uh oh!

ejguan Aug 18, 2022

Choose a reason for hiding this comment

Uh oh!

albanD Aug 18, 2022

Choose a reason for hiding this comment

Uh oh!

ejguan left a comment

Choose a reason for hiding this comment

Uh oh!

NivekT left a comment

Choose a reason for hiding this comment

Uh oh!

albanD commented Aug 18, 2022

Uh oh!

pytorchmergebot commented Aug 18, 2022

Uh oh!

Uh oh!

facebook-github-bot commented Aug 17, 2022 •

edited

Loading

NivekT left a comment •

edited

Loading