Skip to content

Conversation

albanD
Copy link
Collaborator

@albanD albanD commented Aug 17, 2022

Can lead to CPU memory saving as we don't hold onto the pin memory buffer as long as we used to.

@facebook-github-bot
Copy link
Contributor

facebook-github-bot commented Aug 17, 2022

🔗 Helpful links

✅ No Failures (0 Pending)

As of commit 93cd751 (more details on the Dr. CI page):

Expand to see more

💚 💚 Looks good so far! There are no failures yet. 💚 💚


This comment was automatically generated by Dr. CI (expand for details).

Please report bugs/suggestions to the (internal) Dr. CI Users group.

Click here to manually regenerate this comment.

@NivekT NivekT changed the title Move loop content into a function to ensure we don't preserve anything [DataLoader] Move loop content into a function to ensure we don't preserve anything Aug 17, 2022
@NivekT NivekT added release notes: dataloader release notes category topic: improvements topic category topic: performance topic category labels Aug 17, 2022
Copy link
Contributor

@NivekT NivekT left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is faster (not such by how much) and should not cause any issue. I want to benchmark this to see if there is any noticeable difference but haven't gotten to it yet.

@VitalyFedyunin @ejguan, can you confirm this doesn't cause any problem?

break
except queue.Full:
continue
del r # save memory
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like it would be simpler if we just do r = None at the end if we want to move the Tensor out of scope. WDYT?

But, the original problem that GIL is blocked still persist when gc tries to clean up the Tensor.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like it would be simpler if we just do r = None at the end if we want to move the Tensor out of scope. WDYT?

The problem is that doing r=None here only clear the tuple itself. data is still a local variable that remain alive. So this line is doing pretty much nothing today.
We could do del r, data to solve this. But moving it into the function makes it more future-proof as all local state will be properly removed.

But, the original problem that GIL is blocked still persist when gc tries to clean up the Tensor.

Not sure how this is linked?
If you're talking about expensive unmap that happen on these objects, #83623 should fix that.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you're talking about expensive unmap that happen on these objects, #83623 should fix that.

Thanks for pointing it out. It looks good.

The problem is that doing r=None here only clear the tuple itself. data is still a local variable that remain alive. So this line is doing pretty much nothing today. We could do del r, data to solve this. But moving it into the function makes it more future-proof as all local state will be properly removed.

Oh, yeah. Tensor is also referenced by data. I agree on the idea of future proof. And, do you mind moving this function out of while not done_event.is_set():? Otherwise, I believe each iteration would create a new function.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes I left it there on purpose but without any strong reason.
Re-creating it means that there is no captured variable kept alive by the function.

But that shouldn't happen here indeed. moving it out!

Copy link
Contributor

@ejguan ejguan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you! I think this PR should be good to go after moving the function out of while loop

@albanD albanD force-pushed the pin_memory_dataloader_nit branch from 0f0dde1 to 93cd751 Compare August 18, 2022 15:44
Copy link
Contributor

@NivekT NivekT left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks albanD and Erjia 🚀

@albanD
Copy link
Collaborator Author

albanD commented Aug 18, 2022

@pytorchbot merge -g

@pytorchmergebot
Copy link
Collaborator

@pytorchbot successfully started a merge job. Check the current status here.
The merge job was triggered with the green (-g) flag. This means that your change will be merged once all checks on your PR have passed (ETA: 0-4 Hours). If this is not the intended behavior, feel free to use some of the other merge options in the wiki.
Please reach out to the PyTorch DevX Team with feedback or questions!

facebook-github-bot pushed a commit that referenced this pull request Aug 19, 2022
…serve anything (#83595) (#83595)

Summary:
Can lead to CPU memory saving as we don't hold onto the pin memory buffer as long as we used to.

Pull Request resolved: #83595
Approved by: https://github.com/ejguan, https://github.com/NivekT

Test Plan: contbuild & OSS CI, see https://hud.pytorch.org/commit/pytorch/pytorch/38348362608a47371c65d7fd52db138b4c6a5d65

Reviewed By: atalman

Differential Revision: D38852489

Pulled By: albanD

fbshipit-source-id: 13021b949a34b64ffb9835992b77ef82fcbb0e85
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants