Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[tensor] refactor chunk mgr and impl MemStatsCollectorV2 #1077

Merged
merged 10 commits into from
Jun 9, 2022

Conversation

ver217
Copy link
Member

@ver217 ver217 commented Jun 7, 2022

Refactor chunk mgr for easily monitoring memory usage.

@@ -236,10 +235,9 @@ def access_chunk(self, tensor: torch.Tensor) -> None:
self.accessed_chunks.add(chunk)
self.total_mem[chunk.device_type] += chunk.mem

def release_chunk(self, tensor: torch.Tensor) -> None:
def release_chunk(self, chunk: Chunk) -> None:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will the API changing affects our old code?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I update all relevant code.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You'd better update the version.txt in this PR. And post a new release.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Chunk manager's methods should not be called by users directly. I think user's code won't be influenced by this PR.

def get_chunks(self, tensors: Iterable[torch.Tensor]) -> FrozenSet[Chunk]:
return frozenset([self.get_chunk(tensor) for tensor in tensors])

def add_extern_static_tensor(self, tensor: torch.Tensor) -> None:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you need to make sure the static tensor is not registered as a chunk managed tensor later.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. Done.

@ver217 ver217 merged commit be01db3 into hpcaitech:main Jun 9, 2022
@ver217 ver217 deleted the feature/gemini branch June 9, 2022 12:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants