Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP acceleration structure API proposal #3544

Draft
wants to merge 8 commits into
base: master
Choose a base branch
from

Conversation

tangmi
Copy link
Contributor

@tangmi tangmi commented Dec 19, 2020

Initial API proposal for just the acceleration structure part of adding ray tracing support. Once the API design is set, we can implement (and smoke test) and then move on to adding ray tracing pipelines and ray query feature detection.

There's a lot of open questions I have in the comments and I intend for this draft to be a place to track discussion. @kvark: I intend to drive discussion on individual pieces at #gfx:matrix.org, referring to this PR as a reference, so don't worry about reviewing all at once (I'll make sure we get through all of it)!

The main theme of nearly all my questions is about how far can we deviate from VK_KHR_acceleration_structure to better fit gfx (and potentially gain some API safety). Additionally, my intention is to make an API that is implemented trivially in Vulkan and pretty easily in DX12. Metal is an afterthought (although it is being thought about!), but I don't expect to make any significant concessions in the API to accommodate MPS.

Here's some notes on implementation differences I expect between DX12 and Vulkan, which may be interesting to discuss how we'll solve:

  • B::AccelerationStructure
    • VK: vk::AccelerationStructureKHR
    • DX: Proabaly a ref to ID3D12Buffer, since there's no handle associated with an acceleration structure
  • Serialize/Deserialize
    • VK:
      • Get serialized size: ash::extensions::khr::AccelerationStructure::write_acceleration_structures_properties with ash::vk::QueryType::ACCELERATION_STRUCTURE_COMPACTED_SIZE_KHR
    • DX:
      • Get serialized size: ID3D12GraphicsCommandList4::EmitRaytracingAccelerationStructurePostbuildInfo (or BuildRaytracingAccelerationStructure) and does not use a QueryPool. We can likely modify dx12's B::QueryPool to include a special case for the acceleration structure queries.
  • Host operations
    • VK: vk::PhysicalDeviceAccelerationStructureFeaturesKHR::acceleration_structure_host_commands + methods on Device
    • DX: unsupported natively, could be emulated in gfx
  • Specifically laid out structs for GPU buffers:
    • Transform matrix: 3x4 row matrix that corresponds to mint::RowMatrix3x4. Metal uses a 4x4 matrix (matrix_float4x4), which I believe maps to mint::RowMatrix4, given this "Working with Matrices" doc.
    • AABB positions: Both APIs use a float3 min, then max to specify the AABB. Metal puts a bounding box on every acceleration structure and doesn't have an AABB-specific variant.
    • TLAS instance struct:
  • Building acceleration structures
    • Primitive counts
      • VK: Separates the primitive counts and the geometry data in VkAccelerationStructureGeometryKHR and VkAccelerationStructureBuildRangeInfoKHR at descriptor and build time, respectively. The vkGetAccelerationStructureBuildSizesKHR takes a primitive count array that we can use for the DX case
      • DX: Requires the primitive count at descriptor time when creating D3D12_RAYTRACING_GEOMETRY_TRIANGLES_DESC or D3D12_RAYTRACING_GEOMETRY_AABBS_DESC, and doesn't require any additional data at build time. We will just add the primitive count array values to the geometry info structs as we build them.

Related: #2418

// - write_acceleration_structures_properties

// TODO for checking if a serialized blob is valid with the current driver version. DX12 docs say this is for PIX tooling and building a as from scratch is "likely to be faster than loading one from disk" (https://docs.microsoft.com/en-us/windows/win32/api/d3d12/ne-d3d12-d3d12_raytracing_acceleration_structure_copy_mode). The Vulkan spec/literature doesn't mention perf implications of serialization.
// - get_device_acceleration_structure_compatibility
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function operates basically the same in Vulkan (VkGetDeviceAccelerationStructureCompatibility) and DX (CheckDriverMatchingIdentifier). Both take a pointer to the serialized data and read two GUIDs (32 bytes) to see if it matches the current driver (DX can signal slightly more info than VK).

On the comment in AccelerationStructureCopyMode, I wonder if serializing an acceleration structure is something we need to expose as a part of the API. I can't find any literature on why one would want to do this in Vulkan, but as said mentioned link, DX mainly uses serialization for PIX and not to save on build time. If this is also true for Vulkan (like RenderDoc wants it or something), we may be able to leave this off the API and handle the debug tooling case as a backend implementation detail (i.e. I don't think people are writing debug tooling targetting the gfx api yet)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't you want a ray-traced scene to load faster if the data like this is loaded from disk?
I'd be totally fine just postponing any serialization stuff if you don't consider it useful.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you know of a vulkan resource where we can ask about the perf implications of serialization (or just general "best practices")? The DX docs (for D3D12_RAYTRACING_ACCELERATION_STRUCTURE_COPY_MODE_DESERIALIZE) seem to imply this isn't a perf win:

This mode is only intended for tools such as PIX, though nothing stops any app from using it, but this mode is only permitted when developer mode is enabled in the OS. This copy operation is not intended to be used for caching acceleration structures, because running a full acceleration structure build is likely to be faster than loading one from disk.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Khronos Slack would be the best place to ask

src/hal/src/device.rs Outdated Show resolved Hide resolved
}
}

// TODO AFAIK rust doesn't have custom sized fields, so we'll need some binary writer wrapper to actually support this at an API level.
Copy link
Contributor Author

@tangmi tangmi Dec 19, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe packed_struct or Pre-RFC: Generic integers (uint and int) could be applicable here? Unsure about the likelihood adding a library dependency or using an unimplemented language feature proposal.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

gfx-hal is a low-level API. It's fine to cut some corners and just document what bits are expected within u32.
Extra dependencies are discouraged

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My concern here is that we probably want this struct to be bit-for-bit compatible with VkAccelerationStructureInstanceKHR and D3D12_RAYTRACING_INSTANCE_DESC so users can create an array of acceleration_structure::Instance and upload it directly to a GPU buffer. Otherwise, we need to settle for describing this carefully and letting users BYO packed struct impl... thoughts?

The Metal backend will probably have to do something special to repack into its own special format, however

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did mean preserving the bit-by-bit if this needs to go into the driver. Just instead of doing bitfields, we could have raw unsigned integers with documented layouts - that's what I meant by cutting the corners.


#[derive(Debug)]
pub enum AccelerationStructureCopyMode {
Clone,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

bikeshed: "clone" is the vk terminology, but I think in Rust it implies a deep copy, which I don't believe is what the acceleration structure copy command does.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed, let's rename. Shallow? ShallowClone?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any strong opinions against just Copy? I do think simpler is probably better, but I don't think even Clone would confuse many people familiar with gfx/vulkan...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No strong opinion, Copy is fine :)

@@ -604,6 +604,46 @@ pub trait CommandBuffer<B: Backend>: fmt::Debug + Any + Send + Sync {
/// Requests a timestamp to be written.
unsafe fn write_timestamp(&mut self, stage: pso::PipelineStage, query: query::Query<B>);

/// TODO docs
/// `build_range_infos` must be the same length as `infos` and each element must have a length equal to the parallel `info` entry's `geometries.len()`. there's probably a way to make this safer without mangling the api shape too much...
Copy link
Contributor Author

@tangmi tangmi Dec 19, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These (and build_acceleration_structures_indirect) are some spicy requirements, but the API is unsafe, so I guess it could be argued to be reasonable?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If they need to be the same length, and we aren't passing the pointers to these arrays directly in any of the backends anyway, why not make this one array? I.e.

infos: &[(acceleration_structure::AccelerationStructureGeometryDesc<B>, &[acceleration_structure::AccelerationStructureBuildRangeDesc])],

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, we should probably make this an iterator, like in other commands (see wait_events)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any suggestions on the extra requirement (also for the indirect variant) of:

for (info, build_range_infos) in infos.iter().zip(build_range_infos.iter()) {
    assert_eq!(info.geometries.len(), build_range_infos.len());
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

an assert_eq sounds ok, for now at least.

/// A description of the geometry data needed to populate an acceleration structure.
///
/// TODO: there's something here that smells w/ what fields are needed to get the required build size vs what fields are needed to actually build. Also, the top/bottom levels having different requirements on which fields are valid.
// TODO: I don't like that this is reused for the "get build sizes" and "do actual build" with complex rules on what fields are ignored when. Perhaps we could use DX12's model for `D3D12_BUILD_RAYTRACING_ACCELERATION_STRUCTURE_DESC` and `D3D12_BUILD_RAYTRACING_ACCELERATION_STRUCTURE_INPUTS` and move at least the src/dst fields onto the actual "build acceleration structure" command.?
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the struct I'm most interested in feedback for. Mentioned in the many TODO comments on and around it, this is a kind of central type which most examples I've seen just keep around a mutable version of this and update fields as needed by methods.

I think we can move the src/dst/scratch values onto build_acceleration_structures* or have another type that has src/dst/scratch and a AccelerationStructureGeometryDesc.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That sounds appropriate, let's try that!
I assume the Vulkan backend would be able to build this stuff internally with no issues if we separate it and follow DX12 style here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That should be correct.

Tangentially related: I'd wager we won't be able to reuse vulkan or dx structs across calls that effectively (i.e. the user will have a gfx *Desc, rather than a vulkan or dx *Desc struct), so I assume that we're okay with recreating these descriptor structs as needed?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's correct, yes. We'll be recreating them all the time.

TopLevel,
///
BottomLevel,
// TODO vulkan supports "generic level" (where the concrete build type is specified), but discourages its use for applications "written directly for Vulkan" since it "could affect capabilities or performance in the future" (https://www.khronos.org/blog/vulkan-ray-tracing-final-specification-release). Perhaps this is to better support `vkd3d-proton`, but we probably don't want it exposed in gfx?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Technically, if there is a possibility of running vkd3d over gfx-portability, then we should try to make it possible.
For example, people could use it to run windows games on macOS.
In this case, we can make a call to not support other levels at least for now, just leaving a comment about this omission.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll add this back. I wonder if there's a nicer way to handle Generic, as it's allowed in the "get sizes" and "create" functions, but not during the "build" commands.

src/hal/src/acceleration_structure.rs Outdated Show resolved Hide resolved
src/hal/src/acceleration_structure.rs Outdated Show resolved Hide resolved
src/hal/src/acceleration_structure.rs Outdated Show resolved Hide resolved
src/hal/src/acceleration_structure.rs Outdated Show resolved Hide resolved
src/hal/src/device.rs Show resolved Hide resolved
src/hal/src/device.rs Outdated Show resolved Hide resolved
// - write_acceleration_structures_properties

// TODO for checking if a serialized blob is valid with the current driver version. DX12 docs say this is for PIX tooling and building a as from scratch is "likely to be faster than loading one from disk" (https://docs.microsoft.com/en-us/windows/win32/api/d3d12/ne-d3d12-d3d12_raytracing_acceleration_structure_copy_mode). The Vulkan spec/literature doesn't mention perf implications of serialization.
// - get_device_acceleration_structure_compatibility
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't you want a ray-traced scene to load faster if the data like this is loaded from disk?
I'd be totally fine just postponing any serialization stuff if you don't consider it useful.

src/hal/src/lib.rs Outdated Show resolved Hide resolved
src/hal/src/lib.rs Outdated Show resolved Hide resolved
Copy link
Member

@kvark kvark left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks like github is eating the comments :(

src/hal/src/acceleration_structure.rs Outdated Show resolved Hide resolved
examples/Cargo.toml Outdated Show resolved Hide resolved
@@ -2829,6 +2829,75 @@ impl com::CommandBuffer<Backend> for CommandBuffer {
);
}

unsafe fn build_acceleration_structures<'a, I>(&self, _descs: I)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are you sure you don't want to just make this a default implementation in the trait, to avoid all the boilerplate in backends that don't support it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've undone the backend stubs for now and will make a default "unimplemented" implementation in gfx-hal.

@kvark
Copy link
Member

kvark commented Jan 6, 2021

Merge remote-tracking branch 'upstream/master' into acceleration-stru…

Please try to rebase instead of merging, to keep history linear

@tangmi
Copy link
Contributor Author

tangmi commented Jan 7, 2021

Merge remote-tracking branch 'upstream/master' into acceleration-stru…

Please try to rebase instead of merging, to keep history linear

I'll do this when I clean up this draft, please call me out if I forget then!

@tangmi
Copy link
Contributor Author

tangmi commented Jan 26, 2021

@kvark the changes in hal are ready for review. Please note there's no rush since the backends depend on new releases of upstream deps and the usefulness of this API depends on ray-tracing-pipelines or ray-query being also defined and implemented.

Some things on my mind:

  • I'm wondering if it's worth it to commit the API first, then the implementations or hold off until we have at least the Vulkan backend (which requires a new ash release >=0.32).
    • Note that I'm also not confident in the quality of what I've hacked together here for Vulkan.
  • I will turn the leftover TODO comments into separate issues (i.e. work for host operations, capture replay)
  • I'll definitely clean up the history (rebase) before the API is merged.
  • I'm working on the ray-tracing-pipelines part of the API in a branch: https://github.com/tangmi/gfx/tree/ray-tracing-api
    • I don't have hardware to test ray-query, but I imagine there's very little work in hal to support it.

@kvark
Copy link
Member

kvark commented Jan 26, 2021

Thank you for moving this forward!

First of all, we've been trying to wrap up hal-0.7 release for the last month. I'd like to do this before merging the changes here. Also note how a lot of type signatures are simplified now: Borrow, ExactSizeIterator, and IntoIterator are all gone! If you haven't already, please consider updating this PR to reflect that change with the new API.

Please note there's no rush since the backends depend on new releases of upstream deps

We are totally cool with depending on github revisions from master. We only switch to releases when necessary, i.e. during a release. So this is not a blocker to merging the PR.

@kvark
Copy link
Member

kvark commented Apr 30, 2021

@tangmi what's the status of this work?

@tangmi
Copy link
Contributor Author

tangmi commented May 2, 2021

@tangmi what's the status of this work?

The current state is that I have an API shape I'm pretty happy with and am in the middle of fixing some issues in the Vulkan impl (I think the last thing I was working on was overthinking mapping lists-of-lists of structs such that all the lifetimes agree).

Unfortunately, life and work has taken me away from this for the last couple months. Also, I (and everyone else) have been having a real hard time getting a hold of a GPU that can test ray query--although if ray tracing pipelines work, it's very likely that ray query will also work. I'm definitely still eager to get back into gfx-rs.

On the plus side, I believe ash as since released so I think we don't need to point at some random revision anymore!

@kvark
Copy link
Member

kvark commented May 3, 2021

@tangmi it would be good if gfx-rs organization created a "hardware fund" for cases like this...

@tangmi
Copy link
Contributor Author

tangmi commented May 4, 2021

@kvark I'd happily contribute to a fund like that--sadly, I just can't find any ray tracing cards for sale at all this year!

(I know there's scalpers around but I feel icky supporting that!)

@kvark
Copy link
Member

kvark commented May 4, 2021

@tangmi now with Vulkan's RT support, aren't there any AMD cards on the market? Sorry, I'm not generally shopping for these.

@kvark
Copy link
Member

kvark commented May 4, 2021

It would be great to chat on #gfx:matrix.org (or a different platform, if you have a preference?) and see if we can unblock your progress.

@tangmi tangmi force-pushed the acceleration-structure-api branch from 16c5212 to 1f622c9 Compare June 3, 2021 04:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants