[doc] added documentation to chunk and chunk manager #1094

FrankLeeeee · 2022-06-10T05:30:22Z

While I read through the source for chunk.py, I found it lacks documentation so I just added it for better readability.

FrankLeeeee · 2022-06-10T05:31:48Z

Between, there are some APIs which I felt misleading in terms of semantics:

chunk.is_free should be indeed is_empty?
chunk.access looks more like chunk.synchronize to me.

@ver217

ver217 · 2022-06-10T05:37:11Z

Between, there are some APIs which I felt misleading in terms of semantics:

chunk.is_free should be indeed is_empty?

chunk.access looks more like chunk.synchronize to me.

@ver217

Yes
access include moving data to CUDA ( and broadcast). Broadcast is not always needed, since ZeRO may not be applied.

FrankLeeeee · 2022-06-10T05:39:25Z

According to the code below, broadcast is always called when chunk.access() is invoked.

def access(self) -> None:
        """
        Broadcast the chunk to synchronize the tensors across data parallel processes.
        """
        # recover the chunk on non-owner processes
        # and broadcast the chunk from the source to all processes
        if not self.is_src_rank:
            self.data.storage().resize_(self.size)
        self.data.data = self.data.to(get_current_device())
        dist.broadcast(self.data, self.global_src_rank, group=gpc.get_group(ParallelMode.DATA))

        # update tensor meta info
        self._update_tensors_ptr()
        if not self.is_src_rank:
            self._update_tensors_state(TensorState.HOLD, prev_state=TensorState.FREE)

ver217 · 2022-06-10T05:44:19Z

According to the code below, broadcast is always called when chunk.access() is invoked.

def access(self) -> None:
        """
        Broadcast the chunk to synchronize the tensors across data parallel processes.
        """
        # recover the chunk on non-owner processes
        # and broadcast the chunk from the source to all processes
        if not self.is_src_rank:
            self.data.storage().resize_(self.size)
        self.data.data = self.data.to(get_current_device())
        dist.broadcast(self.data, self.global_src_rank, group=gpc.get_group(ParallelMode.DATA))

        # update tensor meta info
        self._update_tensors_ptr()
        if not self.is_src_rank:
            self._update_tensors_state(TensorState.HOLD, prev_state=TensorState.FREE)

Oh, yes. But chunk_manager.access_chunk() don't always broadcast.

FrankLeeeee · 2022-06-10T05:49:02Z

@ver217 I have replaced is_free with is_empty. As for the access API, not a big deal but perhaps you can give a clearer name as access prompts the user to get some value in return and does not tell what is done by this API.

ver217 · 2022-06-10T05:50:59Z

@ver217 I have replaced is_free with is_empty. As for the access API, not a big deal but perhaps you can give a clearer name as access prompts the user to get some value in return and does not entail what is done by this API.

OK

[doc] added documentation to chunk and chunk manager

145b930

FrankLeeeee added the Run Build and Test label Jun 10, 2022

feifeibear approved these changes Jun 10, 2022

View reviewed changes

polish code

4c85838

FrankLeeeee added 2 commits June 10, 2022 14:04

polish code

5331738

polish code

a2aca34

FrankLeeeee merged commit cb18922 into hpcaitech:main Jun 10, 2022

FrankLeeeee deleted the doc/chunk branch June 13, 2022 02:07

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[doc] added documentation to chunk and chunk manager #1094

[doc] added documentation to chunk and chunk manager #1094

FrankLeeeee commented Jun 10, 2022

FrankLeeeee commented Jun 10, 2022

ver217 commented Jun 10, 2022

FrankLeeeee commented Jun 10, 2022 •

edited

ver217 commented Jun 10, 2022

FrankLeeeee commented Jun 10, 2022 •

edited

ver217 commented Jun 10, 2022

[doc] added documentation to chunk and chunk manager #1094

[doc] added documentation to chunk and chunk manager #1094

Conversation

FrankLeeeee commented Jun 10, 2022

FrankLeeeee commented Jun 10, 2022

ver217 commented Jun 10, 2022

FrankLeeeee commented Jun 10, 2022 • edited

ver217 commented Jun 10, 2022

FrankLeeeee commented Jun 10, 2022 • edited

ver217 commented Jun 10, 2022

FrankLeeeee commented Jun 10, 2022 •

edited

FrankLeeeee commented Jun 10, 2022 •

edited