Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement Partial Memory Flush #22

Open
osanj opened this issue Jan 7, 2020 · 1 comment
Open

Implement Partial Memory Flush #22

osanj opened this issue Jan 7, 2020 · 1 comment
Labels
enhancement New feature or request
Milestone

Comments

@osanj
Copy link
Owner

osanj commented Jan 7, 2020

As soon as a cache is dirty and a related shader is run again, the entire buffer will be flushed. This can become quite expensive for large buffers and for cpu-gpu communication. The vulkan specification allows to do partial copies using buffer regions: https://www.khronos.org/registry/vulkan/specs/1.1-extensions/man/html/vkCmdCopyBuffer.html

Currently, only a single region (the entire buffer) is provided:

lava/lava/api/pipeline.py

Lines 201 to 202 in e82f6d3

region = vk.VkBufferCopy(0, 0, src_buffer.get_size())
vk.vkCmdCopyBuffer(self.command_buffer_handle, src_buffer.handle, dst_buffer.handle, 1, [region])

The specific usecase which should benefit from this, is partial invalidation of large buffers. Ideally the regions can be computed agonistic, e.g. by comparing current bytes with new bytes.

Tasks:

  • experiment with the regions
  • implement more finegrained "dirty byte detection" (already in ByteCache, except scalar arrays)
  • allow for manual dirty setting?
  • make configurable?
  • figure out if the memory mapping for host memory to cpu buffer is a bottleneck as well or if mapping the entire memory can be kept
    self.vulkan_buffer.map(bytez)
  • write tests
@osanj osanj added the enhancement New feature or request label Jan 7, 2020
@osanj osanj modified the milestones: 0.4.0, 0.5.0 Jan 7, 2020
@osanj
Copy link
Owner Author

osanj commented Jan 19, 2020

In the first implementation the new buffer bytes are compared with the existing ones, a mask is created and converted into regions (offset & size).

In the associated test, floats are changed from 1 to 2. On the byte level this only changes 2 out of 4 bytes per float, resulting in a lot of small regions being copyied:

lava/test/buffer.py

Lines 262 to 267 in e110d18

buffer_in["arrayIn"] = np.ones(length, dtype=np.float32)
stage.run_and_wait()
np.testing.assert_equal(buffer_in["arrayIn"].unwrap(), buffer_out["arrayOut"].unwrap())
buffer_in["arrayIn"][length // 2:] = 2
stage.run_and_wait()

Some other tests showed that this leads to increased instead of reduced upload duration... :rage1:

@osanj osanj modified the milestones: 0.4.0, 0.5.0 Jun 29, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant