From 504ffac6e508b635b1e37361c223e1f9fd42b837 Mon Sep 17 00:00:00 2001 From: Anders Leino Date: Fri, 7 Feb 2025 16:12:52 +0200 Subject: [PATCH 1/2] Copy proposals from the Slang repo This is a copy of docs/proposal from the Slang repository [1] at revision [2]. [1] https://github.com/shader-slang/slang [2] 79aebc18d54db3f0be8bd6529c0d79f4d8d4fc58 This helps to address https://github.com/shader-slang/slang/issues/6155. --- proposals/000-template.md | 51 + proposals/001-where-clauses.md | 348 +++++ proposals/002-type-equality-constraints.md | 191 +++ proposals/003-atomic-t.md | 153 +++ proposals/004-initialization.md | 447 +++++++ proposals/005-write-only-textures.md | 61 + proposals/007-variadic-generics.md | 679 ++++++++++ proposals/008-tuples.md | 140 +++ proposals/009-ifunc.md | 142 +++ proposals/010-new-diff-type-system.md | 285 +++++ proposals/011-structured-binding.md | 47 + proposals/012-language-version-directive.md | 221 ++++ proposals/013-aligned-load-store.md | 58 + proposals/014-extended-length-vectors.md | 193 +++ proposals/015-descriptor-handle.md | 260 ++++ proposals/016-slangpy.md | 366 ++++++ proposals/017-shader-record.md | 371 ++++++ proposals/018-packed-data-intrinsics.md | 114 ++ proposals/019-cooperative-vector.md | 407 ++++++ proposals/020-stage-switch.md | 93 ++ proposals/README.md | 18 + .../implementation/ast-ir-serialization.md | 286 +++++ proposals/legacy/001-basic-interfaces.md | 254 ++++ proposals/legacy/002-api-headers.md | 952 ++++++++++++++ proposals/legacy/003-error-handling.md | 296 +++++ proposals/legacy/004-com-support.md | 240 ++++ proposals/legacy/005-components.md | 507 ++++++++ .../legacy/006-artifact-container-format.md | 1119 +++++++++++++++++ 28 files changed, 8299 insertions(+) create mode 100644 proposals/000-template.md create mode 100644 proposals/001-where-clauses.md create mode 100644 proposals/002-type-equality-constraints.md create mode 100644 proposals/003-atomic-t.md create mode 100644 proposals/004-initialization.md create mode 100644 proposals/005-write-only-textures.md create mode 100644 proposals/007-variadic-generics.md create mode 100644 proposals/008-tuples.md create mode 100644 proposals/009-ifunc.md create mode 100644 proposals/010-new-diff-type-system.md create mode 100644 proposals/011-structured-binding.md create mode 100644 proposals/012-language-version-directive.md create mode 100644 proposals/013-aligned-load-store.md create mode 100644 proposals/014-extended-length-vectors.md create mode 100644 proposals/015-descriptor-handle.md create mode 100644 proposals/016-slangpy.md create mode 100644 proposals/017-shader-record.md create mode 100644 proposals/018-packed-data-intrinsics.md create mode 100644 proposals/019-cooperative-vector.md create mode 100644 proposals/020-stage-switch.md create mode 100644 proposals/README.md create mode 100644 proposals/implementation/ast-ir-serialization.md create mode 100644 proposals/legacy/001-basic-interfaces.md create mode 100644 proposals/legacy/002-api-headers.md create mode 100644 proposals/legacy/003-error-handling.md create mode 100644 proposals/legacy/004-com-support.md create mode 100644 proposals/legacy/005-components.md create mode 100644 proposals/legacy/006-artifact-container-format.md diff --git a/proposals/000-template.md b/proposals/000-template.md new file mode 100644 index 0000000..cb03778 --- /dev/null +++ b/proposals/000-template.md @@ -0,0 +1,51 @@ +SP #000: Proposal Template +================= + +This document provides a starting point for a larger feature proposal. +The sections in it are suggested, but can be removed if they don't make sense for a chosen feature. + +The first section should provide a concise description of **what** the feature is and, if possible, **why** it is important. + +A proposal for a Slang language/compiler feature or system should start with a concise description of what the feature it and why it could be important. + +Status +------ + +Status: Design Review/Planned/Implementation In-Progress/Implemented/Partially Implemented. Note here whether the proposal is unimplemented, in-progress, has landed, etc. + +Implementation: [PR 000] [PR 001] ... (list links to PRs) + +Author: authors of the design doc and the implementation. + +Reviewer: Reviewers of the proposal and implementation. + +Background +---------- + +The background section should explain where things stand in the language/compiler today, along with any relevant concepts or terms of art from the wider industry. +If the proposal is about solving a problem, this section should clearly illustrate the problem. +If the proposal is about improving a design, it should explain where the current design falls short. + +Related Work +------------ + +The related work section should show examples of how other languages, compilers, etc. have solved the same or related problems. Even if there are no direct precedents for what is being proposed, there should ideally be some points of comparison for where ideas sprang from. + +Proposed Approach +----------------- + +Explain the idea in enough detail that a reader can concretely know what you are proposing to do. Anybody who is just going to *use* the resulting feature/system should be able to read this and get an accurate idea of what that experience will be like. + +Detailed Explanation +-------------------- + +Here's where you go into the messy details related to language semantics, implementation, corner cases and gotchas, etc. +Ideally this section provides enough detail that a contributor who wasn't involved in the proposal process could implement the feature in a way that is faithful to the original. + +Alternatives Considered +----------------------- + +Any important alternative designs should be listed here. +If somebody comes along and says "that proposal is neat, but you should just do X" you want to be able to show that X was considered, and give enough context on why we made the decision we did. +This section doesn't need to be defensive, or focus on which of various options is "best". +Ideally we can acknowledge that different designs are suited for different circumstances/constraints. diff --git a/proposals/001-where-clauses.md b/proposals/001-where-clauses.md new file mode 100644 index 0000000..02f60a0 --- /dev/null +++ b/proposals/001-where-clauses.md @@ -0,0 +1,348 @@ +SP #001: `where` Clauses +=============== + +We propose to allow generic declarations in Slang to move the constraints on generic type parameters outside of the `<>` and onto distinct `where` clauses. + +Status +------ + +Status: Partially implemented. The only unimplemented case is the canonicalization of generic constraints. + +Implementation: [PR 4986](https://github.com/shader-slang/slang/pull/4986) + +Reviewed by: Theresa Foley, Yong He + + +Background +---------- + +Slang supports generic type parameters with *constraints* on them. +Currently constraints can only be written as part of the declaration of the type parameter itself, e.g.: + + void resolve, V: IResolveDestination>( + ResolutionContext context, List stuffToResolve, out V destination) + { ... } + +The above example illustrates how intermixing the declaration of the type parameters with their constraints can make for long declarations that can be difficult for programmers to read and understand. + +Introducing `where` clauses allows a programmer to state the constraints *after* the rest of the declaration header, e.g.: + + void resolve(ResolutionContext context, List stuffToResolve, out V destination) + where T : IResolvable, + where U : IResolver, + where V : IResolveDestination + { ... } + +This latter form makes it easier to quickly glean the overall shape of the function signature. + +A second important benefit of `where` clauses is that they open the door to expressing more complicated constraints on and between type parameters, such as allowing constraints on *associated types*, e.g.: + + void writePackedData(T src, out U dst) + where T : IPackable, + where T.Packed : IWritable + { .. } + +Related Work +------------ + +Many other languages with support for generics have introduced `where` clauses, and most follow a broadly similar shape. To present our `resolve` example in various other languages: + +### Rust + +Rust supports `where` clauses with a comma-separated list of constraints: + + fn resolve(context: ResolutionContext, stuffToResolve: List, destination: mut& V) + where T : IResolvable, + U : IResolver, + V : IResolveDestination, + { ... } + +### Swift + +Swift's `where` clauses are nearly identical to Rust's: + + fn resolve(context: ResolutionContext, stuffToResolve: List, destination: out V) + where T : IResolvable, + U : IResolver, + V : IResolveDestination, + { ... } + +### C# + +C# is broadly similar, but uses multiple `where` clauses, one per constraint: + + void resolve(ResolutionContext context, List stuffToResolve, out V destination) + where T : IResolvable + where U : IResolver + where V : IResolveDestination + { ... } + +### Haskell + +While Haskell is a quite different language from the others mentioned here, Haskell typeclasses have undeniably influenced the concept of traits/protocols in Rust/Swift. + +In Haskell a typeclass is not something a type "inherits" from, and instead uses type parameter for even the `This` type. +Type parameters in Haskell are also introduced implicitly rather than explicitly. +The `resolve` example above would become something like: + + resolve :: (Resolvable t, Resolver u t, ResolveDestination v t) => + ResolutionContext u -> List t -> v + +We see here that the constraints are all grouped together in the `(...) =>` clause before the actual type signature of the function. +That clause serves a similar semantic role to `where` clauses in these other languages. + +Proposed Approach +----------------- + +For any kind of declaration that Slang allows to have generic parameters, we will allow a `where` clause to appear after the *header* of that declaration. +A `where` clause consists of the (contextual) keyword `where`, following by a comma-separated list of *constraints*: +```csharp + struct MyStuff : IFoo + where T : IFoo, IBar + where T : IBaz + where U : IArray + { ... } +``` +A `where` clause is only allowed after the header of a declaration that has one or more generic parameters. + +Each constraint must take the form of one of the type parameters from the immediately enclosing generic parameter list, followed by a colon (`:`), and then followed by a type expression that names an interface or a conjunction of interfaces. +Multiple constraints can be defined for the same parameter. + +We haven't previously defined what the header of a declaration is, so we briefly illustrate what we mean by showing where the split between the header and the *body* of a declaration is for each of the major kinds of declarations that are supported. In each case a comment `/****/` is placed between the header and body: + +```csharp +// variables: +let v : Int /****/ = 99; +var v : Int /****/ = 99; +Int v /****/ = 99; + +// simple type declarations: +typealias X : IFoo /****/ = Y; +associatedtype X : IFoo /****/; + +// functions and other callables: +Int f(Float y) /****/ { ... } +func f(Float y) -> Int /****/ { ... } +init(Float y) /****/ { ... } +subscript(Int idx) -> Float /****/ { ... } + +// properties +property p : Int /****/ { ... } + +// aggregates +extension Int : IFoo /****/ { ... } +struct Thing : Base /****/ { ... } +class Thing : Base /****/ { ... } +interface IThing : IBase /****/ { ... } +enum Stuff : Int /****/ { ... } +``` +In practice, the body of a declaration starts at the `=` for declarations with an initial-value expression, at the opening `{` for declarations with a `{}`-enclosed body, or at the closing `;` for any other declarations. + +With introduction of `where` clauses, we can extend type system to allow more kinds of type constraints. In this proposal, +we allow type constraints followed by `where` to be one of: +- Type conformance constraint, in the form of `T : IBase` +- Type equality constraint, in the form of `T == X` + +In both cases, the left hand side of a constraint can be a simple generic type parameter, or any types that are dependent on some +generic type parameter. For example, the following is allowed: +```csharp +interface IFoo { associatedtype A; } +struct S + where T : IFoo + where T.A == U +{} +``` + +Detailed Explanation +-------------------- + +### Implementation + +The compiler implementation already represents generics in a form where the type parameters are encoded separately from the constraints that depend on them. +The constraints act somewhat like additional unnamed parameters of a generic. +At the Slang IR level these constraint parameters are made into explicit parameters used to pass around *witness tables*. + +During parsing, a `where` clause can simply add the constraints to the outer generic (and error out if there isn't one). +The actual representation of constraints will be no different than before, so many downstream compilation steps should be unaffected. + +Some parts of the codebase have historically assumed that a given generic type parameter can have at most *one* constraint; +these cases will need to be identified and fixed to allow for zero or more constraints per parameter. + +Semantic checking of generics will need to validate that the left-hand side of each constraint is a direct reference to one of the type parameters of the immediately enclosing generic; +previously, the semantic checking logic could *assume* that this was the case, since the parser would only create constraints in that form. + +### Interaction With Overloading and Redeclaration + +Probably the most important semantic issue that arises from `where` clauses is deciding whether two different function declarations count as distinct overloads, or as redeclarations (or redfinitions) of the same function signature. + +The existing form for declaring constraints: + + void f( ... ) + { ... } + +should be treated as sugar for the equivalent `where`-based form: + + void f( ... ) + where T : IFoo + { ... } + +The two declarations of `f` there should not only be counted as redeclarations/redefinitions, but they should also be *indistinguishable* to all clients of the module where they appear. +A module that `import`s the module defining `f` should not be able to tell which form it was declared with. +Both forms of the definition should result in the *same* signature and mangled name in Slang IR. + +Furthermore, with `where` clauses it becomes possible to write equivalent constraints in more than one way. +A `where` clause can be used instead of a conjunction of interfaces: + + void f( ... ) + { ... } + + void f( ... ) + where T : IFoo, + T : IBar + { ... } + +It is also possible to use `where` clauses to introduce constraints that are *redundant*, either by repeating the same constraint: + + void f( ... ) + where T : IFoo, + T : IFoo + { ... } + +or by constraining a type to two interfaces, where one inherits from the other: + + interface IBase {} + interface IDerived : IBase {} + + void f( ... ) + where T : IBase, + T : IDerived + { ... } + +Technically it was already possible to have redundancy in a constraint by using a conjunction of two interfaces where one inherits from the other: + + void f( ... ) + { ... } + +One question that is raised by the possibility of redundant constraints is whether the compiler should produce a diagnostic for them and, if so, whether it should be a warning or an error. +While it may seem obvious that redundant constraints are to be avoided, it is possible that refactoring of `interface` hierarchies could change whether existing constraints are redundant or not, potentially forcing widespread edits to code that is semantically unambiguous (and just a little more verbose than necessary). +We propose that redundant constraints should probably produce a warning, with a way to silence that warning easily. + +### Canonicalization + +The long and short of the above section is that there can be multiple ways to write semantically equivalent generic declarations, by changing the form, order, etc. of constraints. +We want the signature of a function (and its mangled name, etc.) to be identical for semantically equivalent declaration syntax. +In order to ensure that a declaration's mangled name is independent of the form of its constraints, we must have a way to *canonicalize* those constraints. + +The Swift compiler codebase includes a document that details the rules used for canonicalization of constraints for that compiler, and we can take inspiration from it. +Our constraints are currently much more restricted, so canonicalization can follow a much simpler process, such as: + +* Start with the list of user-written constraints, in declaration order +* Iterate the following to convergence: + * For each constraint of the form `T : ILeft & IRight`, replace that constraint with constraints `T : ILeft` and `T : IRight` +* Remove each constraint that is implied by another constraint + * For now, that means removing `T : IBase` if there is already a constraint `T : IDerived` where `IDerived` inherits from `IBase` +* Sort the constraints + * For constraints `T : IFoo` and `U : IBar` on different type parameters, order them based on the order of the type parameters `T` and `U` + * For constraints `T : IFoo` and `T : IBar` on the *same* type parameter, order them based on a canonicalized ordering on the interfaces `IFoo` and `IBar` + +The above ordering assumes that we can produce a canonical ordering of `interface`s. +More generally, we will eventually want a canonical ordering on all types and *values* that might appear in constraints. +For now, we will limit ourselves to an ordering on nominal types, and other declaration references: + +* A generic parameter is always ordered before anything other than generic parameters + * Parameters from outer generics are ordered before those from inner generics + * Parameters from the same generic are ordered based on their order in the parameter list +* Two declaration references to distinct declarations are ordered based on a lexicographic order for their qualified names, meaning: + * If one qualified name is a prefix of the other (e.g., `A.B` and `A.B.C`), then the prefix is ordered first + * Otherwise, compare the first name component (from left to right) where the names differ, and order them based on a lexicographic string comparison of the name at that component. + +Alternatives Considered +----------------------- + +There really aren't any compelling alternatives to `where` clauses among the languages that Slang takes design influence from. +We could try to design something to solve the same problems from first principles, but the hypothetical benefits of doing so are unclear. + +When it comes to the syntactic details, we could consider disallow type lists in the right hand side of a conformance constraint, and return allow multiple constraints to be separated with comma and sharing with one `where` keyword: + + struct MyStuff : Base, IFoo + where T : IFoo, + T : IBar + { ... } + +This alternative form may result in more compact code without needing duplicated `where` clause, but may be harder to achieve tidy diffs when editing the constraints on declarations. + +Future Directions +----------------- + +### Allow more general types on the right-hand side of `:` + +Currently, the only constraints allowed using `:` have a concrete (non-`interface`) type on the left-hand side, and an `interface` (or conjunction of interfaces) on the right-hand side. +In the context of `class`-based hierarchies, we can also consider having constraints that limit a type parameter to subtypes of a specific concrete type: + + class Base { ... } + class Derived : Base { ... } + + void f( ... ) + where T : Base + { ... } + +### Allow `where` clauses on non-generic declarations + +We could consider allowing `where` clauses to appear on any declaration nested under a generic, such that those declarations are only usable when certain additional constraints are met. +E.g.,: + + struct MyDictionary + { + ... + + K minimumKeyUsed() + where K : IComparable + { ... } + } + +In this example, the user's dictionary type can be queried for the minimum key that is used for any entry, but *only* if the keys are comparable. + +Most of what can be done with this more flexible placement of `where` clauses can *also* be accomplished using extensions. +E.g., the above example could instead be written: + + struct MyDictionary + { ... } + + extension MyDictionary + where K : IComparable + { + K minimumKeyUsed() + { ... } + } + +### Implied Constraints + +In many cases a generic function signature will use the type parameters as explicit arguments to generic types that impose their own requirements. +To be concrete, consider: + + struct Dictionary + where K : IHashable + { ... } + + V myLookupFunc( + Dictionary dictionary, K key, V default) + { ... } + +In this case, the current Slang language rules will reject `myLookupFunc`. The type of the `dictionary` parameter is passing `K` as an argument to `Dictionary<...>` but does not have an in-scope constraint that ensures that `K : IHashable`. +The current compiler requires the function to be rewritten as: + + V myLookupFunc( + Dictionary dictionary, K key, V default) + where K : IHashable + { ... } + +But this additional constraint ends up being pointless; in order to invoke `myLookupFunc` the programmer must have a `Dictionary` to pass as argument for the `dictionary` parameter, which means that the `Dictionary` type must already be well-formed based on the information the caller function has. + +The compiler can eliminate the need for such constraints by adding additional rules for expanding the set of constraints on a generic during canonicalization. +For any generic type `X` appearing in: + +* the signature of a function declaration +* the bases of a type declaration +* the existing generic constraints + +The expansion step would add whatever constraints are required by `X`, with the arguments `A, B, C, ...` substituted in for the parameters of `X`. diff --git a/proposals/002-type-equality-constraints.md b/proposals/002-type-equality-constraints.md new file mode 100644 index 0000000..3361272 --- /dev/null +++ b/proposals/002-type-equality-constraints.md @@ -0,0 +1,191 @@ +Allow Type Equality Constraints on Generics +=========================================== + +We propose to allow *type equality* constraints in `where` clauses. + +Status +------ + +In progress. + +Background +---------- + +As of proposal [001](001-where-clauses.md), Slang allows for generic declarations to include a *`where` clause* which enumerates constraints on the generic parameters that must be satisfied by any arguments provided to that generic: + + V findOrDefault( HashTable table, K key ) + where K : IHashable, + V : IDefaultInitializable + { ... } + +Currently, the language only accepts *conformance* constraints of the form `T : IFoo`, where `T` is one of the parameters of the generic, and `IFoo` is either an `interface` or a conjunction of interfaces, which indicate that the type `T` must conform to `IFoo`. + +This proposal is motivated by the observation that when an interface has associated types, there is currently no way for a programmer to introduce a generic that is only applicable when an associated type satisfies certain constraints. + +As an example, consider an interface for types that can be "packed" into a smaller representation for in-memory storage (instead of a default representation optimized for access from registers): + + interface IPackable + { + associatedtype Packed; + + init(Packed packed); + Packed pack(); + } + +Next, consider an hypothetical interface for types that can be deserialized from a stream: + + interface IDeserializable + { + init( InputStream stream ); + } + +Given these definitions, we might want to define a function that takes a packable type, and deserializes it from a stream: + + T deserializePackable( InputStream stream ) + where T : IPackable + { + return T( T.Packed(stream) ); + } + +As written, this function will fail to compile because the compiler cannot assume that `T.Packed` conforms to `IDeserializable`, in order to support initialization from a stream. + +A brute-force solution would be to add the `IDeserializable` constraint to the `IPackable.Packed` associated type, but doing so may not be consistent with the vision the designer of `IPackable` had in mind. Indeed, there is no reason to assume that `IPackable` and `IDeserializable` even have the same author, or are things that the programmer trying to write `deserializePackable` can change. + +It might seem that we could improve the situation by introducing another generic type parameter, so that we can explicitly constraint it to be deserializable: + + T deserializePackable( InputStream stream ) + where T : IPackable, + P : IDeserializable + { + return T( U(stream) ); + } + +This second attempt *also* fails to compile. +In this case, there is no way for the compiler to know that `T` can be initialized from a `P`, because it cannot intuit that `P` is meant to be `T.Packed`. + +Our two failed attempts can each be fixed by introducing two new kinds of constraints: + +* Conformance constraints on associated types: `T.A : IFoo` + +* Equality constraints on associated types: `T.A == X` + +Related Work +------------ + +Both Rust and Swift support additional kinds of constraints on generics, including the cases proposed here. +The syntax in those languages matches what we propose. + +Proposed Approach +----------------- + +In addition to conformance constraints on generic type parameters (`T : IFoo`), the compiler will also support constraints on associated types of those parameters (`T.A : IFoo`), and associated types of those associated types (`T.A.B : IFoo`), etc. + +In addition, the compiler will accept constraints that restrict an associated type (`T.A`, `T.A.B`, etc.) to be equal to some other type. +The other type may be a concrete type, another generic parameter, or another associated type. + +Detailed Explanation +-------------------- + +### Parser + +The parser already supports nearly arbitrary type exprssions on both sides of a conformance constraint, and then validates that the types used are allowed during semantic checking. +The only change needed at that level is to split `GenericTypeConstraintDecl` into two cases: one for conformance constraints, and another for equality constraints, and then to support constraints with `==` instead of `:`. + +### Semantic Checking + +During semantic checking, instead of checking that the left-hand type in a constraint is always one of the generic type parameters, we could instead check that the left-hand type expression is either a generic type parameter or `X.AssociatedType` where `X` would be a valid left-hand type. + +The right-hand type for conformance constraints should be checked the same as before. + +The right-hand type for an equality constraint should be allowed to be an arbitrary type expression that names a proper (and non-`interface`) type. + +One subtlety is that in a type expression like `T.A.B` where both `A` and `B` are associated types, it may be that the `B` member of `T.A` can only be looked up because of another constraint like `T.A : IFoo`. +When performing semantic checking of a constraint in a `where` clause, we need to decide which of the constraints may inform lookup when resolving a type expression like `X.A`. +Some options are: + +* We could consider only constraints that appear before the constraint that includes that type expression. In this case, a programmer must always introduce a constraint `X : IFoo` before a constraint that names `X.A`, if `A` is an associated type introduced by `IFoo`. + +* We could consider *all* of the constraints simultaneously (except, perhaps, the constraint that we are in the middle of checking). + +The latter option is more flexible, but may be (much) harder to implement in practice. +We propose that for now we use for first option, but remain open to implementing the more general case in the future. + +Given an equality constraint like `T.A.B == X`, semantic checking needs detect cases where an `X` is used and a `T.A.B` is expected, or vice versa. +These cases should introduce some kind of cast-like expression, which references the type equality witness as evidence that the cast is valid (and should, in theory, be a no-op). + +Semantic checking of equality constraints should identify contradictory sets of constraints. +Such contradictions can be simple to spot: + + interface IThing { associatedtype A; } + void f() + where T : IThing, + T.A == String, + T.A == Float, + { ... } + +but they can also be more complicated: + + void f() + where T : IThing, + U : IThing, + T.A == String, + U.A == Float, + T.A == U.A + { ... } + +In each case, an associated type is being constrained to be equal to two *different* concrete types. +The is no possible set of generic arguments that could satisfy these constraints, so declarations like these should be rejected. + +We propose that the simplest way to identify and diagnose contradictory constraints like this is during canonicalization, as described below. + +### IR + +At the IR level, a conformance constraint on an associated type is no different than any other conformance constraint: it lowers to an explicit generic parameter that will accept a witness table as an argument. + +The choice of how to represent equality constraints is more subtle. +One option is to lower an equality constraint to *nothing* at the IR level, under the assumption that the casts that reference these constraints should lower to nothing. +Doing so would introduce yet another case where the IR we generate doesn't "type-check." +The other option is to lower a type equality constraint to an explicit generic parameter which is then applied via an explicit op to convert between the associated type and its known concrete equivalent. +The representation of the witnesses required to provide *arguments* for such parameters is something that hasn't been fully explored, so for now we propose to take the first (easier) option. + +### Canonicalization + +Adding new kinds of constraints affects *canonicalization*, which was discussed in proposal 0001. +Conformane constraints involving associated types should already be order-able according to the rules in that proposal, so we primarily need to concern ourselves with equality constraints. + +We propose the following approach: + +* Take all of the equality constraints that arise after any expansion steps +* Divide the types named on either side of any equality constraint into *equivalence classes*, where if `X == Y` is a constraint, then `X` and `Y` must in the same equivalence class + * Each type in an equivalence class will either be an associated type of the form `T.A.B...Z`, derived from a generic type parameter, or a *independent* type, which here means anything other than those associated types. + * Because of the rules enforced during semantic checking, each equivalence class must have at least one associated type in it. + * Each equivalence class may have zero or more independent types in it. +* For each equivalence class with more than one independent type in it, diagnose an error; the application is attempting to constrain one or more associated types to be equal to multiple distinct types at once +* For each equivalence class with exactly one independent type in it, produce new constraints of the form `T.A.B...Z == C`, one for each associated type in the equivalence class, where `C` is the independent type +* For each equivalence class with zero independent types in it, pick the *minimal* associated type (according to the type ordering), and produce new constraints of the form `T.A... == U.B...` for each *other* associated type in the equivalence class, where `U.B...` is the minimal associated type. +* Sort the new constraints by the associated type on their left-hand side. + +Alternatives Considered +----------------------- + +The main alternative here would be to simply not have these kinds of constraints, and push programmers to use type parameters instead of associated types in cases where they want to be able to enforce constraints on those types. +E.g., the `IPackable` interface from earlier could be rewritten into this form: + + + interface IPackable + { + init(Packed packed); + Packed pack(); + } + +With this form for `IPackable`, it becomes possible to use additional type parameters to constraint the `Packed` type: + + T deserializePackable( InputStream stream ) + where T : IPackable, + P : IDeserializable + { + return T( U(stream) ); + } + +While this workaround may seem reasomable in an isolated example like this, there is a strong reason why languages like Slang choose to have both generic type parameters (which act as *inputs* to an abstraction) and associated types (which act as *outputs*). +We believe that associated types are an important feature, and that they justify the complexity of these new kinds of constraints. \ No newline at end of file diff --git a/proposals/003-atomic-t.md b/proposals/003-atomic-t.md new file mode 100644 index 0000000..c846ad3 --- /dev/null +++ b/proposals/003-atomic-t.md @@ -0,0 +1,153 @@ +SP #003 - `Atomic` type +============== + + +Status +------ + +Author: Yong He + +Status: Implemented. + +Implementation: [PR 5125](https://github.com/shader-slang/slang/pull/5125) + +Reviewed by: Theresa Foley, Jay Kwak + +Background +---------- + +HLSL defines atomic intrinsics to work on free references to ordinary values such as `int` and `float`. However, this doesn't translate well to Metal and WebGPU, +which defines `atomic` type and only allow atomic operations to be applied on values of `atomic` types. + +Slang's Metal backend follows the same technique in SPIRV-Cross and DXIL->Metal converter that relies on a C++ undefined behavior that casts an ordinary `int*` pointer to a `atomic*` pointer +and then call atomic intrinsic on the reinterpreted pointer. This is fragile and not guaranteed to work in the future. + +To make the situation worse, WebGPU bans all possible ways to cast a normal pointer into an `atomic` pointer. In order to provide a truly portable way to define +atomic operations and allow them to be translatable to all targets, we will also need an `atomic` type in Slang that maps to `atomic` in WGSL and Metal, and maps to +`T` for HLSL/SPIRV. + + +Proposed Approach +----------------- + +We define an `Atomic` type that functions as a wrapper of `T` and provides atomic operations: +```csharp +enum MemoryOrder +{ + Relaxed = 0, + Acquire = 1, + Release = 2, + AcquireRelease = 3, + SeqCst = 4, +} + +[sealed] interface IAtomicable {} +[sealed] interface IArithmeticAtomicable : IAtomicable, IArithmetic {} +[sealed] interface IBitAtomicable : IArithmeticAtomicable, IInteger {} + +[require(cuda_glsl_hlsl_metal_spirv_wgsl)] +struct Atomic +{ + T load(MemoryOrder order = MemoryOrder.Relaxed); + + [__ref] void store(T newValue, MemoryOrder order = MemoryOrder.Relaxed); + + [__ref] T exchange(T newValue, MemoryOrder order = MemoryOrder.Relaxed); // returns old value + + [__ref] T compareExchange( + T compareValue, + T newValue, + MemoryOrder successOrder = MemoryOrder.Relaxed, + MemoryOrder failOrder = MemoryOrder.Relaxed); +} + +extension Atomic +{ + [__ref] T add(T value, MemoryOrder order = MemoryOrder.Relaxed); // returns original value + [__ref] T sub(T value, MemoryOrder order = MemoryOrder.Relaxed); // returns original value + [__ref] T max(T value, MemoryOrder order = MemoryOrder.Relaxed); // returns original value + [__ref] T min(T value, MemoryOrder order = MemoryOrder.Relaxed); // returns original value +} + +extension Atomic +{ + [__ref] T and(T value, MemoryOrder order = MemoryOrder.Relaxed); // returns original value + [__ref] T or(T value, MemoryOrder order = MemoryOrder.Relaxed); // returns original value + [__ref] T xor(T value, MemoryOrder order = MemoryOrder.Relaxed); // returns original value + [__ref] T increment(MemoryOrder order = MemoryOrder.Relaxed); // returns original value + [__ref] T decrement(MemoryOrder order = MemoryOrder.Relaxed); // returns original value +} + +extension int : IArithmeticAtomicable {} +extension uint : IArithmeticAtomicable {} +extension int64_t : IBitAtomicable {} +extension uint64_t : IBitAtomicable {} +extension double : IArithmeticAtomicable {} +extension float : IArithmeticAtomicable {} +extension half : IArithmeticAtomicable {} + +// Operator overloads: +// All operator overloads are using MemoryOrder.Relaxed semantics. +__prefix T operator++(__ref Atomic v); // returns new value. +__postfix T operator++(__ref Atomic v); // returns original value. +__prefix T operator--(__ref Atomic v); // returns new value. +__postfix T operator--(__ref Atomic v); // returns original value. +T operator+=(__ref Atomic v, T operand); // returns new value. +T operator-=(__ref Atomic v, T operand); // returns new value. +T operator|=(__ref Atomic v, T operand); // returns new value. +T operator&=(__ref Atomic v, T operand); // returns new value. +T operator^=(__ref Atomic v, T operand); // returns new value. +``` + +We allow `Atomic` to be defined in struct fields, as array elements, as elements of `RWStructuredBuffer` types, +or as groupshared variable types or `__ref` function parameter types. For example: + +```hlsl +struct MyType +{ + int ordinaryValue; + Atomic atomicValue; +} + +RWStructuredBuffer atomicBuffer; + +void main() +{ + atomicBuffer[0].atomicValue.atomicAdd(1); + printf("%d", atomicBuffer[0].atomicValue.load()); +} +``` + +In groupshared memory: + +```hlsl +void main() +{ + groupshared atomic c; + c.atomicAdd(1); +} +``` + +Note that in many targets, it is invalid to use `atomic` type to define a local variable or a function parameter, or in any way +to cause a `atomic` to reside in local/function/private address space. Slang should be able to lower the type +into its underlying type. The use of atomic type in these positions will simply have no meaning. However, we are going to leave +this legalization as future work and leave such situation as undefined behavior for now. + +This should be handled by a legalization pass similar to `lowerBufferElementTypeToStorageType` but operates +in the opposite direction: the "loaded" value from a buffer is converted into an atomic-free type, and storing a value leads to an +atomic store at the corresponding locations. + +For non-WGSL/Metal targets, we can simply lower the type out of existence into its underlying type. + +# Related Work + +`Atomic` type exists in almost all CPU programming languages and is the proven way to express atomic operations over different +architectures that have different memory models. WGSL and Metal follows this trend to require atomic operations being expressed +this way. This proposal is to make Slang follow this trend and make `Atomic` the recommended way to express atomic operation +going forward. + +# Future Work + +As discussed in previous sections, we should consider adding a legalization pass to allow `Atomic` type to be used anywhere in +any memory space, and legalize them out to just normal types if they are used in memory spaces where atomic semantic has no/trivial +meaning. diff --git a/proposals/004-initialization.md b/proposals/004-initialization.md new file mode 100644 index 0000000..20904c6 --- /dev/null +++ b/proposals/004-initialization.md @@ -0,0 +1,447 @@ +SP #004: Initialization +================= + +This proposal documents the desired behavior of initialization related language semantics, including default constructor, initialization list and variable initialization. + +Status +------ + +Status: Design Approved, implementation in-progress. + +Implementation: N/A + +Author: Yong He + +Reviewer: Theresa Foley, Kai Zhang + +Background +---------- + +Slang has introduced several different syntax around initialization to provide syntactic compatibility with HLSL/C++. As the language evolve, there are many corners where +the semantics around initialization are not well-defined, and causing confusion or leading to surprising behaviors. + +This proposal attempts to provide a design on where we want the language to be in terms of how initialization is handled in all different places. + +Related Work +------------ + +C++ has many different ways and syntax to initialize an object: through explicit constructor calls, initialization list, or implicitly in a member/variable declaration. +A variable in C++ can also be in an uninitialized state after its declaration. HLSL inherits most of these behvior from C++ by allowing variables to be uninitialized. + +On the other hand, languages like C# and Swift has a set of well defined rules to ensure every variable is initialized after its declaration. +C++ allows using the initialization list syntax to initialize an object. The semantics of initialization lists depends on whether or not explicit constructors +are defined on the type. + +Proposed Approach +----------------- + +In this section, we document all concepts and rules related to initialization, constructors and initialization lists. + +### Default Initializable type + +A type is considered "default-initializable" if it provides a constructor that can take 0 arguments, so that it can be constructed with `T()`. + +### Variable Initialization + +Generally, a variable is considered uninitialized at its declaration site without an explicit value expression. +For example, +```csharp +struct MyType { int x ; } + +void foo() +{ + MyType t; // t is uninitialized. + var t1 : MyType; // same in modern syntax, t1 is uninitialized. +} +``` + +However, the Slang language has been allowing implicit initialization of variables whose types are default initializable types. +For example, +```csharp +struct MyType1 { + int x; + __init() { x = 0; } +} +void foo() { + MyType t1; // `t1` is initialized with a call to `__init`. +} +``` + +We would like to move away from this legacy behavior towards a consistent semantics of never implicitly initializing a variable. +To maintain backward compatibility, we will keep the legacy behavior, but remove the implicit initialization when the variable is defined +in modern syntax: +```csharp +void foo() { + var t1: MyType; // `t1` will no longer be initialized. +} +``` +We will also remove the default initilaization semantics for traditional syntax in modern Slang modules that comes with an explicit `module` declaration. + +Trying to use a variable without initializing it first is an error. +For backward compatibility, we will introduce a compiler option to turn this error into a warning, but we may deprecate this option in the future. + +### Generic Type Parameter + +A generic type parameter is not considered default-initializable by-default. As a result, the following code should leave `t` in an uninitialized state: +```csharp +void foo() +{ + T t; // `t` is uninitialized at declaration. +} +``` + +### Synthesis of constructors for member initialization + +If a type already defines any explicit constructors, do not synthesize any constructors for initializer list call. An initializer list expression +for the type must exactly match one of the explicitly defined constructors. + +If the type doesn't provide any explicit constructors, the compiler need to synthesize the constructors for the calls that that the initializer +lists translate into, so that an initializer list expression can be used to initialize a variable of the type. + +For each type, we will synthesize one constructor at the same visibility of the type itself: + +The signature for the synthesized initializer for type `V struct T` is: +```csharp +V T.__init(member0: typeof(member0) = default(member0), member1 : typeof(member1) = default(member1), ...) +``` +where `V` is a visibility modifier, `(member0, member1, ... memberN)` is the set of members that has visibility `V`, and `default(member0)` +is the value defined by the initialization expression in `member0` if it exist, or the default value of `member0`'s type. +If `member0`'s type is not default initializable and the the member doesn't provide an initial value, then the parameter will not have a default value. + +The synthesized constructor will be marked as `[Synthesized]` by the compiler, so the call site can inject additional compatibility logic when calling a synthesized constructor. + +The body of the constructor will initialize each member with the value coming from the corresponding constructor argument if such argument exists, +otherwise the member will be initialized to its default value either defined by the init expr of the member, or the default value of the type if the +type is default-initializable. If the member type is not default-initializable and a default value isn't provided on the member, then such the constructor +synthesis will fail and the constructor will not be added to the type. Failure to synthesis a constructor is not an error, and an error will appear +if the user is trying to initialize a value of the type in question assuming such a constructor exist. + +Note that if every member of a struct contains a default expression, the synthesized `__init` method can be called with 0 arguments, however, this will not cause a variable declaration to be implicitly initialized. Implicit initialization is a backward compatibility feature that only work for user-defined `__init()` methods. + +### Single argument constructor call + +Call to a constructor with a single argument is always treated as a syntactic sugar of type cast: +```csharp +int x = int(1.0f); // is treated as (int) 1.0f; +MyType y = MyType(arg); // is treated as (MyType)arg; +MyType x = MyType(y); // equivalent to `x = y`. +``` + +The compiler will attempt to resolve all type casts using type coercion rules, if that failed, will fall back to resolve it as a constructor call. + +### Inheritance Initialization +For derived structs, slang will synthesized the constructor by bringing the parameters from the base struct's constructor if the base struct also has a synthesized constructor. For example: +```csharp +struct Base +{ + int x; + // compiler synthesizes: + // __init(int x) { ... } +} +struct Derived : Base +{ + int y; + // compiler synthesizes: + // __init(int x, int y) { ... } +} +``` + +However, if the base struct has explicit ctors, the compiler will not synthesize a constructor for the derived struct. +For example, given +```csharp +struct Base { int x; __init(int x) { this.x = x; } } +struct Derived : Base { int y;} +``` +The compiler will not synthesize a constructor for `Derived`, and the following code will fail to compile: +```csharp + +Derived d = {1}; // error, no matching ctor. +Derived d = {1, 2}; // error, no matching ctor. +Derived d = Derived(1); // error, no matching ctor. +Derived d = Derived(1, 2); // error, no matching ctor. +``` + + +### Initialization List + +Slang allows initialization of a variable by assigning it with an initialization list. +Generally, Slang will always try to resolve initialization list coercion as if it is an explicit constructor invocation. +For example, given: +```csharp +S obj = {1,2}; +``` +Slang will try to convert the code into: +```csharp +S obj = S(1,2); +``` + +Following the same logic, an empty initializer list will translate into a default-initialization: +```csharp +S obj = {}; +// equivalent to: +S obj = S(); +``` + +Note that initializer list of a single argument does not translate into a type cast, unlike the constructor call syntax. Initializing with a single element in the initializer list always translates directly into a constructor call. For example: +```csharp +void test() +{ + MyType t = {1}; + // translates to direct constructor call: + // MyType t = MyType.__init(1); + // which is NOT the same as: + // MyType t = MyType(t) + // or: + // MyType t = (MyType)t; +} +``` + +If the above code passes type check, then it will be used as the way to initialize `obj`. + +If the above code does not pass type check, and if there is only one constructor for`MyType` that is synthesized as described in the previous section (and therefore marked as `[Synthesized]`, Slang continues to check if `S` meets the standard of a "legacy C-style struct` type. A type is a "legacy C-Style" type if it is a: +- Basic scalar type (e.g. `int`, `float`). +- Enum type. +- Sized array type where the element type is C-style type. +- Tuple type where all member types are C-style types. +- A "C-Style" struct. + +A struct is C-Style if all of the following conditions are met: +- It does not inherit from any other types. +- It does not contain any explicit constructors defined by the user. +- All its members have the same visibility as the type itself. +- All its members are legacy C-Style types. +Note that C-Style structs are allowed to have member default values. +In such case, we perform a legacy "read data" style consumption of the initializer list to synthesize the arguments to call the constructor, so that the following behavior is valid: + +```csharp +struct Inner { int x; int y; }; +struct Outer { Inner i; Inner j; } + +// Initializes `o` into `{ Inner{1,2}, Inner{3,0} }`, by synthesizing the +// arguments to call `Outer.__init(Inner(1,2), Inner(3, 0))`. +Outer o = {1, 2, 3}; +``` + +If the type is not a legacy C-Style struct, Slang should produce an error. + +### Legacy HLSL syntax to cast from 0 + +HLSL allows a legacy syntax to cast from literal `0` to a struct type, for example: +```hlsl +MyStruct s { int x; } +void test() +{ + MyStruct s = (MyStruct)0; +} +``` + +Slang treats this as equivalent to a empty-initialization: +```csharp +MyStruct s = (MyStruct)0; +// is equivalent to +MyStruct s = {}; +``` + +Examples +------------------- +```csharp + +// Assume everything below is public unless explicitly declared. + +struct Empty +{ + // compiler synthesizes: + // __init(); +} +void test() +{ + Empty s0 = {}; // Works, `s` is considered initialized via ctor call. + Empty s1; // `s1` is considered uninitialized. +} + +struct CLike +{ + int x; int y; + // compiler synthesizes: + // __init(int x, int y); +} +void test1() +{ + CLike c0; // `c0` is uninitialized. + + // case 1: initialized with synthesized ctor call using legacy logic to form arguments, + // and `c1` is now `{0,0}`. + // (we will refer to this scenario as "initialized with legacy logic" for + // the rest of the examples): + CLike c1 = {}; + + // case 2: initialized with legacy initializaer list logic, `c1` is now `{1,0}`: + CLike c2 = {1}; + + // case 3: initilaized with ctor call `CLike(1,2)`, `c3` is now `{1,2}`: + CLike c3 = {1, 2}; +} + +struct ExplicitCtor +{ + int x; + int y; + __init(int x) {...} + // compiler does not synthesize any ctors. +} +void test2() +{ + ExplicitCtor e0; // `e0` is uninitialized. + ExplicitCtor e1 = {1}; // calls `__init`. + ExplicitCtor e2 = {1, 2}; // error, no ctor matches initializer list. +} + +struct DefaultMember { + int x = 0; + int y = 1; + // compiler synthesizes: + // __init(int x = 0, int y = 1); +} +void test3() +{ + DefaultMember m; // `m` is uninitialized. + DefaultMember m1 = {}; // calls `__init()`, initialized to `{0,1}`. + DefaultMember m2 = {1}; // calls `__init(1)`, initialized to `{1,1}`. + DefaultMember m3 = {1,2}; // calls `__init(1,2)`, initialized to `{1,2}`. +} + +struct PartialInit { + // warning: not all members are initialized. + // members should either be all-uninitialized or all-initialized with + // default expr. + int x; + int y = 1; + // compiler synthesizes: + // __init(int x, int y = 1); +} +void test4() +{ + PartialInit i; // `i` is not initialized. + PartialInit i1 = {2}; // calls `__init`, result is `{2,1}`. + PartialInit i2 = {2, 3}; // calls `__init`, result is {2, 3} +} + +struct PartialInit2 { + int x = 1; + int y; // warning: not all members are initialized. + // compiler synthesizes: + // __init(int x, int y); +} +void test5() +{ + PartialInit2 j; // `j` is not initialized. + PartialInit2 j1 = {2}; // error, no ctor match. + PartialInit2 j2 = {2, 3}; // calls `__init`, result is {2, 3} +} + +public struct Visibility1 +{ + internal int x; + public int y = 0; + // the compiler does not synthesize any ctor. + // the compiler will try to synthesize: + // public __init(int y); + // but then it will find that `x` cannot be initialized. + // so this synthesis will fail and no ctor will be added + // to the type. +} +void test6() +{ + Visibility1 t = {0, 0}; // error, no matching ctor + Visibility1 t1 = {}; // error, no matching ctor + Visibility1 t2 = {1}; // error, no matching ctor +} + +public struct Visibility2 +{ + // Visibility2 type contains members of different visibility, + // which disqualifies it from being considered as C-style struct. + // Therefore we will not attempt the legacy fallback logic for + // initializer-list syntax. + internal int x = 1; + public int y = 0; + // compiler synthesizes: + // public __init(int y = 0); +} +void test7() +{ + Visibility2 t = {0, 0}; // error, no matching ctor. + Visibility2 t1 = {}; // OK, initialized to {1,0} via ctor call. + Visibility2 t2 = {1}; // OK, initialized to {1,1} via ctor call. +} + +internal struct Visibility3 +{ + // Visibility3 type is considered as C-style struct. + // Because all members have the same visibility as the type. + // Therefore we will attempt the legacy fallback logic for + // initializer-list syntax. + // Note that c-style structs can still have init exprs on members. + internal int x; + internal int y = 2; + // compiler synthesizes: + // internal __init(int x, int y = 2); +} +internal void test8() +{ + Visibility3 t = {0, 0}; // OK, initialized to {0,0} via ctor call. + Visibility3 t1 = {1}; // OK, initialized to {1,2} via ctor call. + Visibility3 t2 = {}; // OK, initialized to {0, 2} via legacy logic. +} + +internal struct Visibility4 +{ + // Visibility4 type is considered as C-style struct. + // And we still synthesize a ctor for member initialization. + // Because Visibility4 has no public members, the synthesized + // ctor will take 0 arguments. + internal int x = 1; + internal int y = 2; + // compiler synthesizes: + // internal __init(int x = 1, int y = 2); +} +internal void test9() +{ + Visibility4 t = {0, 0}; // OK, initialized to {0,0} via ctor call. + Visibility4 t1 = {3}; // OK, initialized to {3,2} via ctor call. + Visibility4 t2 = {}; // OK, initialized to {1,2} via ctor call. +} +``` + +### Zero Initialization + +The Slang compiler supported an option to force zero-initialization of all local variables. +This is currently implemented by adding `IDefaultInitializable` conformance to all user +defined types. With the direction we are heading, we should remove this option in the future. +For now we can continue to provide this functionality but through an IR rewrite pass instead +of changing the frontend semantics. + +When users specifies `-zero-initialize`, we should still use the same front-end logic for +all the checking. After lowering to IR, we should insert a `store` after all `IRVar : T` to +initialize them to `defaultConstruct(T)`. + + +Q&A +----------- + +### Should global static and groupshared variables be default initialized? + +Similar to local variables, all declarations are not default initialized at its declaration site. +In particular, it is difficult to efficiently initialized global variables safely and correctly in a general way on platforms such as Vulkan, +so implicit initialization for these variables can come with serious performance consequences. + +### Should `out` parameters be default initialized? + +Following the same philosophy of not initializing any declarations, `out` parameters are also not default-initialized. + +Alternatives Considered +----------------------- + +One important decision point is whether or not Slang should allow variables to be left in uninitialized state after its declaration as it is allowed in C++. In contrast, C# forces everything to be default initialized at its declaration site, which come at the cost of incurring the burden to developers to come up with a way to define the default value for each type. +Our opinion is we want to allow things as uninitialized, and to have the compiler validation checks to inform +the developer something is wrong if they try to use a variable in uninitialized state. We believe it is desirable to tell the developer what's wrong instead of using a heavyweight mechanism to ensure everything is initialized at declaration sites, which can have non-trivial performance consequences for GPU programs, especially when the variable is declared in groupshared memory. diff --git a/proposals/005-write-only-textures.md b/proposals/005-write-only-textures.md new file mode 100644 index 0000000..698ea6e --- /dev/null +++ b/proposals/005-write-only-textures.md @@ -0,0 +1,61 @@ +SP #005: Write-Only Textures +================= + +Add Write-Only texture types to Slang's core module. + + +Status +------ + +Status: Design Review. + +Implementation: N/A + +Author: Yong He + +Reviewer: + +Background +---------- + +Slang inherits HLSL's RWTexture types to represent UAV/storage texture resources, this works well for HLSL, GLSL, CUDA and SPIRV targets. +However Metal has the notion of write only textures, and WebGPU has limited support of read-write textures. In WebGPU, a read-write texture can only have +uncompressed single-channel 32bit texel format, which means a `RWTexture2D` cannot be used to write to a `rgba8unorm` texture. + +To provide better mapping to write-only textures on Metal and WebGPU, we propose to add write-only textures to Slang to allow writing portable code +without relying on backend workarounds. + +Proposed Approach +----------------- + +Slang's core module already defines all texture types as a single generic `_Texture` type, where `access` is a value parameter +representing the allowed access of the texture. The valid values of access are: + +``` +kCoreModule_ResourceAccessReadOnly +kCoreModule_ResourceAccessReadWrite +kCoreModule_ResourceAccessRasterizerOrdered +kCoreModule_ResourceAccessFeedback +``` + +We propose to add another case: + +``` +kCoreModule_ResourceAccessWriteOnly +``` + +to represent write-only textures. + + +Also add the typealiases prefixed with "W" for all write only textures: +``` +WTexture1D, WTexture2D, ... +``` + +These types will be reported in the reflection API with `access=SLANG_RESOURCE_ACCESS_WRITE`. + +Write-only textures support `GetDimension` and `Store(coord, value)` methods. `Load` or `subscript` is not defined for write-only texture types, +so the user cannot write code that reads from a write-only texture. + +Write only textures are supported on all targets. For traditional HLSL, GLSL, SPIRV and CUDA targets, they are translated +exactly the same as `RW` textures. For Metal, they map to `access::write`, and for WGSL, they map to `texture_storage_X`. diff --git a/proposals/007-variadic-generics.md b/proposals/007-variadic-generics.md new file mode 100644 index 0000000..8034c03 --- /dev/null +++ b/proposals/007-variadic-generics.md @@ -0,0 +1,679 @@ +SP #007: Variadic Generics +================= + +Variadic generics is the ability to define and use generic types and functions that has arbitrary number of generic type parameters. +For example, a tuple type can be represented as a generic type that has zero or any number of type parameters, i.e. a variadic generic. +Variadic types and functions are key building blocks to allow tuple types in the language, and will also enable us to define a +`IFunc` interface that represents a callable value. `IFunc` interface can allow users to start writing code that +takes "callback" functions as parameters and start to functors or adopting more functional programming idioms. + +Supporting variadic generics is a big step up in Slang type system's expressive power, and will allow more meta programming logic to be +written in native Slang code rather than on top of it with macros or custom code generation tools. + +Status +------ + +Status: Implemented. + +Author: Yong He. + +Implementation: + [PR 4833](https://github.com/shader-slang/slang/pull/4833), + [PR 4849](https://github.com/shader-slang/slang/pull/4849), + [PR 4850](https://github.com/shader-slang/slang/pull/4850), + [PR 4856](https://github.com/shader-slang/slang/pull/4856) + +Reviewed by: Kai Zhang, Jay Kwak, Ariel Glasroth. + +Background +---------- + +We have several cases that will benefit from variadic generics. One simplest example is the `printf` function is currently +defined to have different overloads for each number of arguments. The downside of duplicating overloads is the bloating the core +module size and a predefined upper limit of argument count. If users are to build their own functions that wraps the `printf` +function, they will have to define a set of overloads for each number of arguments too, further bloating code size. + +Some of our users would like to implement the functor idiom in their shader code with interfaces. This is almost possible +with existing support of generics and interfaces. For example: +``` +// Define an interface for the callback function +interface IProcessor +{ + void process(int data); +} + +// The callback function `p` is represented as a functor conforming to the `IProcessor` interface. +void process(TProcessor p, int data[N]) +{ + for (int i = 0; i < N; i++) + p.process(data[i]); +} + +// Define the functor as a type that conforms to `IProcessor`. +struct MyProcessorFunc : IProcessor +{ + void process(int data) { ... } +} + +void user(int myData[N]) +{ + // Define an instance of the functor, and pass it to `process`. + MyProcessorFunc functor = ...; + process(functor, myData); +} +``` + +While this can work, it requires a lot of boilterplate from the user. For each shape of callback, the user must define +a separate interface. We can reduce this boilterplate if the system has builtin support for `IFunc`: + +``` +// The callback function `p` is represented as a functor conforming to the `IProcessor` interface. +void process>(TProcessor p, int data[N]) +{ + for (int i = 0; i < N; i++) + p.process(data[i]); +} + +// Define the functor as a type that conforms to `IProcessor`. +struct MyProcessorFunc : IFunc +{ + void process(int data) { ... } +} + +void user(int myData[N]) +{ + // Define an instance of the functor, and pass it to `process`. + MyProcessorFunc functor = ...; + process(functor, myData); +} +``` + +The above code eliminates the user defined interface by using the builtin `IFunc` interface. By making `IFunc` builtin, +we can open the path for the compiler to synthesize conformances to `IFunc` for ordinary functions and in the future +add support for lambda expressions that automatically conform to `IFunc`, further simplify the user code into something like: + +``` +// The callback function `p` is represented as a functor conforming to the `IProcessor` interface. +void process>(TProcessor p, int data[N]) +{ + for (int i = 0; i < N; i++) + p.process(data[i]); +} + +void user(int myData[N]) +{ + process((int x)=>{...}, myData); +} +``` + +Related Work +------------ + +Variadic generics is an advance type system feature that is missinng in many modern languages including C# and Rust. +Swift adds support for variadic generics recently in late 2022/2023, and this proposal largely follows Swift's design. +C++ has variadic templates that achieves similar results within the template system. +Rust supports variadics in a macro system layered above its core type system. While this can solve many user issues, +we decided to not go through this path because macros and templates must be expanded before core type checking, which means that +they can't integrate nicely in modules and compiled into IR independently of their use sites. + + +Proposed Approach +----------------- + +Slang can follow Swift's solution for variadic generics. A user can define a variadic generic with the syntax: + +``` +void myFunc(expand each T v) {...} +``` + +The code above defines a generic function that has a __generic type pack parameter__ `T` with the `each` keyword before `T`. +The function's parameter list is defined as `expand each T v`, which should be interpreted as a parameter `v` whose type is +`expand each T`. `expand each T` is a type that represents a pack of types. A parameter whose type is a pack of types can +accept zero or more arguments during function call resolution. + +`myFunc` can be called with arbitrary number of arguments: +``` +myFunc(); // OK, zero arguments +myFunc(1, 2.0f, 3.0h); // OK, three arguments with different types. +``` + +A function can forward its variadic parameter to another function that accepts variadic parameter with the `expand` expression: +``` +void caller(expand each T v) +{ + myFunc(expand each v); +} +``` + +Generic type pack parameters can be nested, and there can be more than one variadic generic parameters in a single generic decl: +``` +struct Parent +{ + void f(...) {...} // OK, nested generics with type pack parameters +} +void g(...) { ... } // OK, more than one type pack parameter in a single generic. +``` + +However, when more than one generic type pack parameters is referenced in a single `expand` expression, there is an implicit +requirement that these type packs will have the same number of elements. For example: +``` + // implicitly requiring T and U to have same number of elements. +void g(expand Pair pairs) {...} + +void user() +{ + // We will match (int, float) to `T`, and (uint16_t, half) to `U`: + g( + Pair(1, 2), + Pair(1.0f, 2.0h) ); +} +``` + +In the example above, the type `expand Pair` defines a pack of types where each element in the pack is formed by +replacing `each T` and `each U` in the __pattern type__ `Pair` with the corresponding elements in type pack `T` and `U`. +Because the pattern type `Pair` references two different type pack parameters `T` and `U`, we require that `T` and `U` +has the same number of types, this allows us to resolve `g` by evenly dividing the the argument list +into two parts, such that `T = (int, float)` and `U = (uint16_t, half)`. With that, `expand Pair` is then substituted +into a type pack `(Pair, Pair)`. + +Generic type pack parameters can have type constraints: + +``` +void f(expand each T v) {} +``` + +This means that every type in the type pack `T` must conform to the interface `IFloat`. +You can use any expression inside `expand` when it is used on values: +``` +interface IGetValue +{ + int getValue(); +} + +void print(each T)(expand each T) {...} + +void f(expand each T v) +{ + print(expand (each v).getValue()); +} +``` + +Here, `expand (each v).getValue()` will expand the pattern expression `(each v).getValue()` into a pack of values. The result of this `expand` expression +is a pack of values where each element of the pack is computed by substituting `each v` in the pattern expression with each element in `v`. The resulting +pack of `int` values is then passed to `print` function that also takes a pack of values. + +For now, we require that all variadic generic type packs to appear in the end of a parameter list, after any ordinary parameters. This means that the following +definitions are invalid: + +``` +void f() {} // Error, ordinary parameter `U` after type pack. +void g() {} // Error, ordinary parameter after type pack. +void k() {} // Error. +void h() {} // OK. +``` + +Additionally, we establish these restrictions on how `expand` and `each` maybe used: +- The pattern type of an `expand` type expression must capture at least one generic type pack parameter in an `each` expression. +- The type expression after `each` must refer to a generic type pack parameter, and the `each` expression can only appear inside an `expand` expression. + +These rules means that expressions like `expand int`, or `each T` on its own are invalid expressions. + +Similarly, when using `expand` and `each` on values, we require that: +- The pattern expression of an `expand` expression must capture at least one value whose type is a generic type pack parameter. +- The expression after `each` must refer to a value whose type is a generic type pack parameter, and the `each` expression can only appear inside an `expand` expression. + +Combined with type euqality constraints, variadic generic type pack can be used to define homogeneously typed parameter pack: +``` +void calcInts(expand each T values) where T == int +{ + ... +} +``` + +Detailed Explanation +-------------------- + +To implement variadic generics, we need to introduce several semantic constructs in our type system. +### `GenericTypePackParameterDecl` +When a generic parameter is defined with the `each` keyword, such as in `void f`, the parser should create a new type of AST node inside the generic, and we name this +AST node a `GenericTypePackParameterDecl`. With this additional, a generic parameter can be `GenericTypeParameterDecl`, `GenericValueParameterDecl`, `GenericTypeConstraintDecl` +and `GenericTypePackParameterDecl`. When the user defines type constraints on a generic type pack parameter, we will form a `GenericTypeConstraintDecl` whose `subType` is a +`DeclRefType` referencing the `GenericTypePackParameterDecl`. + +### Type Pack +A type pack represents a pack of types. The simplest form of a type pack is a `ConcreteTypePack` that is a list of concerete type packs, such as `(int, float, float3)`. +In a generic decl such as `void f(T v)`, `T` refers to an abstract type pack represented by the generic type pack parameter `T`. The type of parameter `v` in this case +is a `DeclRefType(GenericTypePackParameterDecl "T")`. +The most general case of a type pack is defined by the `expand PatternExpr` type expression. In this case, the expression will be translated into a `ExpandType`, representing a +abstract type pack that can be evaluated by substituting all `each X` expressions in the `PatternExpr` with a corresponding element in `X`, and joining all the resulting element types +into a type pack. + +Note that a `ConcreteTypePack` is very similar in semantic meaning to a `Tuple`, with the exception that `ConcreteTypePack` also bears the automatic flattening semantic, such that +`ConcreteTypePack(ConcreteTypePack(a,b), c)` is equivalent and can be simplified to `ConcreteTypePack(a,b,c)`. + +In summary, a type pack can be represented by one of: +- `ConcreteTypePack`, a simple concrete list of element types. +- `DeclRefType(GenericTypePackParameterDecl)`, a simple reference to a generic type pack parameter. +- `ExpandType(PatternType)`, an abstract type pack resulting from expanding and evaluating `PatternType`. + +### `ExpandType` and `EachType` +The type expression `expand each T` should be translated into `ExpandType(EachType(T), T)`. Here the first argument in `ExpandType` is the `PatternType`, which is what we will +use to expand into a concrete type pack. The second argument `T` represents all the generic type pack parameters that is being captured by `PatternType`. The reason to explicitly +keep track of captured generic type pack parameters is to make it easy to determine the size of the type pack without having to look into `PatternType`, and to ensure we never lose +the size info even when the pattern type itself is substituted into something that is independent of any generic type pack parameters. + +For example, consider the substitution process on following case: + +``` +typealias F = int; // result of F is not dependent on T. +typealias MyPack = expand F; +typealias Pack3 = MyPack; +``` + +We can know from this definition that `Pack3` should evaluate to `(int, int, int)`. But let's see step-by-step how this is done in the type system. + +First, `Pack3` is evaluated to `MyPack`. To further resolve this, we will plugin the argument `ConcreteTypePack(float, double, void)` into the +definition of `MyPack`. The definition `expand F` is represented as: +``` +ExpandType( + pattern = DeclRefType( + GenericAppDeclRef(F, + args = [EachType(DeclRefType(GenericTypePackParamDecl "T"))]) + ), + capture = DeclRefType(GenericTypePackParamDecl "T") +) +``` + +But this type is simplifiable because `F` refers to a type alias whose definition is: + +``` +DeclRefType(StructDecl "int") +``` + +So the `expand F` type can be further simplified down to: + +``` +ExpandType( + pattern = DeclRefType(StructDecl "int"), + capture = DeclRefType(GenericTypePackParamDecl "T") +) +``` + +Note that in this definition, the pattern type no longer contains any references to any `GenericTypePackParamDecl` so there is no way for us +to know how many elements the `ExpandType` should expand into just from the pattern type itself. Fortunately, we still kept a reference to +the generic type param decl through the `capture` argument in the `ExpandType`. This will allow us to evaluate it into `(int, int, int)` when +we apply substitution `T=ConcreteTypePack(float, double, void)` to it. + +Let's take a look at another more contrived example to understand the substitution process. Assume we have: + +``` +interface IFoo +{ + associatedtype Assoc; +}; +struct Foo : IFoo +{ + typealias Assoc = int; +}; +struct Foo2 : IFoo +{ + typealias Assoc = float; +}; +typealias MyPack = expand (each T).Assoc; +typealias Pack2 = MyPack; +``` + +When evaluating `Pack2`, we will first form a `ConcreteTypePack(Foo, Foo2)` and use it to substitute the `T` parameter in `MyPack`. This will result in `MyPack`. +Then we continue to resolve this type alias by substituting `expand (each T).Assoc` with `T = ConcreteTypePack(Foo, Foo2)`. + +The expression `expand (each T).Assoc` is translated into + +``` +ExpandType( + pattern = + DeclRefType( + LookupDeclRef( + EachType(DeclRefType(GenericTypePackParamDecl "T")), + IFoo::assoc + ) + ), + capture = DeclRefType(GenericTypePackParamDecl "T") +) +``` + +Substituting this with `DeclRefType(T) = ConcreteTypePack(Foo, Foo2)` we will get: + +``` +ExpandType( + pattern = + DeclRefType( + LookupDeclRef( + EachType(ConcreteTypePack(Foo, Foo2)), + IFoo::assoc + ) + ), + capture = ConcreteTypePack(Foo, Foo2) +) +``` + +Since the captured type pack in the `ExpandType` is already a concrete type pack, we should be able to turn this `ExpandType` into a +`ConcreteTypePack`, by substituting `pattern` twice, with `EachType(...)` replaced with a corresponding element in the input `ConcreteTypePack` to form: +``` +ConcreteTypePack( + DeclRefType( + LookupDeclRef( + Foo, + IFoo::assoc + ) + ), + DeclRefType( + LookupDeclRef( + Foo2, + IFoo::assoc + ) + ) +) +``` + +And by resolving the `LookupDeclRef`, we will get: +``` +ConcreteTypePack( + DeclRefType(StructDecl "int"), + DeclRefType(StructDecl "float") +) +``` + +Which is the correct representation for type pack `(int, float)`. + +#### Simplification Rules of `Expand` and `Each` Types + +By the definition of `expand` and `each`, we have these simplification rules: +- `expand each T` => `T` +- `each expand T` => `T` + + +### Type Constraints for Subtype Relationships + +We define the sub-type relationship for type packs so that: given type pack `TPack`, we say +`TPack` is a subtype of `IFoo` (noted as `TPack:IFoo`) if every type in `TPack` is a subtype of `IFoo`. + +In a generic definition `__generic`, we will say the type pack `T` is a subtype of +`IFoo`. In the generic definition, we will have a `GenericTypeConstraintDecl` where +`subType = DeclRefType(GenericTypePackParamDecl "T")` and `supType = IFoo`. The fact that `T:IFoo` is +represented by a `DeclaredSubtypeWitness` whose `declRef` will point to this +`GenericTypeConstraintDecl`. + +The subtype witness for a `ConcreteTypePack(T0, T1, ... Tn) : IBase` is represented by +`TypePackSubtypeWitness(SubtypeWitness(T0:IBase), SubtypeWitness(T1:IBase), ..., SubtypeWitness(Tn:IBase))`. + +If a type pack `T` is a subtype of `IBase`, then `each T` is also a subtype of `IBase`. +The subtype witness for a `EachType(typePack) : IBase` is represented by +`EachTypeWitness(SubtypeWitness(typePack : IBase))`. + +If a pattern type `P` is a subtype of `IBase`, then `expand P` is also a subtype of `IBase`. +The subtype witness for a `ExpandType(patternType, capture) : IBase` is represented by +`ExpandSubtypeWitness(SubtypeWitness(pattern : IBase))`. + +Similar to `ExpandType` and `EachType`, we will have simplification rules such that: + +- `ExpandSubtypeWitness(EachSubtypeWitness(x))` => `x` +- `EachSubtypeWitness(ExpandSubtypeWitness(x))` => `x`. + +#### Canonical Representation of `TransitiveSubtypeWitness` for Type Packs + +Given: +``` +interface IBase +{ +} + +interface IDerived : IBase +{ +} + +__generic ... +``` + +The witness of `DeclRefType("T")` conforming to `IDerived` will be represented by +``` +DeclaredSubtypeWitness( + sub = DeclRefType(GenericTypePackParamDecl "T") + sup = `IBase`. +) +``` + +To represent the witness of `DeclRefType("T")` conforming to `IBase`, we will need to make use +of the `TransitiveSubtypeWitness`. For simplicity of IR generation, we would like to have `TransitiveSubtypeWitness` +not to deal with the case that `sub` is a type pack. + +Therefore, instead of representing `DeclRefType("T") : IBase` as something like: + +``` +TransitiveSubtypeWitness( + subIsMid = DeclaredSubtypeWitness( + sub = DeclRefType(GenericTypePackParamDecl "T") + sup = `IBase`), + midIsSup = DeclaredSubtypeWitness(DeclRef(Iderived:IBase)) +) +``` + +In the above definition, the `subType` of the witness is a type pack, which isn't very convenient to work with. +Instead, we will represent the same witness as: + +``` +ExpandSubtypeWitness( + TransitiveSubtypeWitness( + subIsMid = EachWitness(DeclaredSubtypeWitness( + sub = DeclRefType(GenericTypePackParamDecl "T") + sup = `IBase`)), + midIsSup = DeclaredSubtypeWitness(DeclRef(Iderived:IBase)) + ) +) +``` + +Note that in this second representation is effectively representing `expand ((each T) : IBase)`, where the `subType` of the `TransitiveSubtypeWitness` +is now a `EachType` and no longer a type pack. Doing this transformation will allow us to avoid the situation where we transitive witness lookup +is done on a pack of witnesses, and therefore simplifying the IR. + +### Matching Arguments to Packs + +When resolving overload to form a `DeclRef` to a generic decl or resolving overload in a function call, we need to match arguments to generic/function +parameters. Before introducing variadic type packs, this matching is trivial: an argument at index `i` will match to a parameter at index `i`. + +With type packs, we need to generalize this logic. Because we have required that all type pack parameters to appear at the end of the generic or function parameter +list, we can still match argument 1:1 to parameters for all the non type pack parameters first. Once we have matched arguments to non type pack parameters and there +are additional arguments remaining, they must be for type pack parameters. If an argument is itself a concrete or abstract type pack, then we can continue to match +that argument 1:1 to the parameter. If not, then we require all the remaining arguments are individual types and not type packs. Because we require all type pack +parameters to have equal size, we can divide the remaining arguments evenly by the number of type pack parameters, and form a `TypePack`/`ValuePack` from that number +of arguments and supply it to each type pack parameter. + +For example, assume we have: +``` +struct S +``` + +When resolving the overload for `S`, we have three parameters: `T`, `U`, `V` and five arguments: `int`, `int`, `void`, `float`, `bool`. +We will first perform argument match and match `T=int`. Now we have four arguments remaining and two type pack parameters. We can then divide 4 by 2 to get the +number of elements for each type pack argument, and form a `TypePack(int, void)` and use it as the matched argument for `U`, and form a `TypePack(float, bool)` +and use it as the matched argument for `V`. + +After matching and the remaining overload resolution logic, `S` will be represented as: +``` +GenericAppDeclRef + genericDecl = "S" + args = [ + DeclRefType("int"), // For `T` + TypePack(DeclRefType("int"), DeclRefType("void")), // For `U` + TypePack(DeclRefType("float"), DeclRefType("bool")) // For `V` + ] +``` + +Similarly, when resolving a function call with variadic parameters, we will perform argument matching and create `PackExpr` to use as argument to a packed parameter. Given: +``` +void f(int x, expand each T t, expand each U u) {...} +``` + +A call in the form of `f(3, Foo(), Bar(), 1.0f, false)` will be converted to: + +``` +f(3, Pack(Foo(), Bar()), Pack(1.0f, false)) +``` + +After resolving the call. The `Pack(...)` represents the `PackExpr` synthesized by the compiler to create a `ValuePack` whose type is a `TypePack`, so +it can be used as argument to a `TypePack` parameter. + +### IR Representation + +#### Expressing Types + +A concrete type pack is represented as `IRTypePack(T0, T1, ..., Tn)` in the IR, and an abstract type pack such as an `expand` type will eventually be specialized into an `IRTypePack`. This means that a function parameter whose type is a type pack is translated into a single parameter of `IRTypePack` type. Again, `IRTypePack` is in many ways similar to `IRTupleType`, except that `IRTypePack` are automatically flattened into enclosing type packs during specialization. + +We will represent `expand` and `each` types in the IR almost 1:1 as they are represented in the AST. Note that types are hoistable insts in Slang IR and is globally deduplicated based on their operands, representing it in the natural way will allow these types to take advantage from Slang IR's global deduplication service. + +This means that `each T` is represented as `IREachType(T)`, and `expand patternType` is represented as `IRExpandType(PatternType, capturedTypePacks)` +in the IR. + +For example, the type `expand vector`, where `T` and `U` are generic type pack parameters, is represented in the IR as: +``` +%T = IRParam : IRTypePackParameterKind; +%U = IRParam : IRTypePackParameterKind; + +%et = IREach %T; +%eu = IREach %U; + +%v = IRVectorType(%et, %eu) +%expandType = IRExpandType(%v, %T, %U) // v is pattern; T,U are captured type packs. +``` + +Note that this kind of type hierarchy representation is only used during IR lowering in order to benefit from IR global deduplication of type definitions. The representation in this form isn't convenient for specialization. +Once lowered to IR step is complete, we will convert all type representation to the same form as value represenataion described in the following section. + +#### Expressing Values + +A value whose type is a type pack is called a value pack. A value pack is represented in the IR as a `IRMakeValuePack` inst. +For example, the value pack `(1,2,3)` will be represented in the IR as: +``` +IRMakeValuePack(1,2,3) : IRTypePack(int, int, int) +``` + +An `expand(PatternExpr)` expression should be represented in the IR as: +``` +%e = IRExpand : IRExpandType(...) +{ + IRBlock + { + %index = IRParam : int; + yield PatternExpr; // may use `index` here. + } +} +``` +The `IRExpand` is treated like an compile-time for loop where the loop body is expressed as basic blocks as the children of the `IRExpand` inst. +The body starts with a `%index` parameter that represents the loop index within the value pack, and the CFG inside `IRExpand` should end with a single +`yield` that is a terminal instruction "returning" the mapped value for element at `%index` in the input value pack. + +For example, given `v` as value pack whose type is a type pack, `let x = expand (each v) + 1` will be represented in the IR as: + +``` +%v = /*some value pack whose type is a TypePack*/ +%x = IRExpand : IRTypePack(...) +{ + IRBlock + { + %index = IRParam : int; + %e = IRGetTupleElement(%v, %index); + %r = IRAdd %e 1; + IRYield %r; + } +} +``` + +In this simple example, the `IRExpand` contains only one basic block. It is possible for `IRExpand` to have more than one basic blocks if the pattern expression +contains a `?:` operator, in which case there will be a branching CFG structure inside the `IRExpand`. + +Also note that `each v` is translated into `IRGetTupleElement(%v, %index)` that extacts the element at `%index` from the tuple value represented by `%v`. + +#### IR Specialization + +Specializing the IR for an `IRExpand` inst with a concrete value pack is very similar to loop unrolling. Given the example in the previous section +on expression `expand (each v) + 1`, we can specialize the `IRExpand` inst with `v` being an known value pack such as `IRMakeTuple(1,2,3)` in two steps. + +Step 1 is to copy the children of the `IRExpand` inst three times into where the `IRExpand` inst itself is located, and during each copy, we replace +all references to `IRParam` with the concrete index for the copy. Therefore, specializing the above IR code with `IRMakeTuple(1,2,3)` will lead to: + +``` +%block0 = IRBlock +{ + %e0 = IRGetTupleElement(%v, 0); + %r0 = IRAdd %e0 1; + yield %r0; +} +%block1 = IRBlock +{ + %e1 = IRGetTupleElement(%v, 1); + %r1 = IRAdd %e1 1; + yield %r1; +} +%block2 = IRBlock +{ + %e2 = IRGetTupleElement(%v, 2); + %r2 = IRAdd %e2 1; + yield %r2; +} +%mergeBlock = IRBlock +{ + ... +} +``` + +Step 2 is to hookup each copied blocks by replacing all the `yield` instructions with `branch` instructions, and form the final result of the value pack +by packing up all the values computed at each "loop iteration" in an `IRMakeValuePack` inst: + +``` +%block0 = IRBlock +{ + %e0 = IRGetTupleElement(%v, 0); + %r0 = IRAdd %e0 1; + branch %block1; +} +%block1 = IRBlock +{ + %e1 = IRGetTupleElement(%v, 1); + %r1 = IRAdd %e1 1; + branch %block2; +} +%block2 = IRBlock +{ + %e2 = IRGetTupleElement(%v, 2); + %r2 = IRAdd %e2 1; + branch %mergeBlock; +} +%mergeBlock = IRBlock +{ + %expand = IRMakeValuePack(%r0, %r1, %r2); +} +``` + +With this, we can replace the original `IRExpand` inst with `%expand` and specialization is done. The specialized instructions like `IRGetTupleElement(%v, 0)` will be picked up +in the follow-up step during specialization and replaced with the actual value at the specified index since `%v` is a known value pack represented by `IRMakeValuePack`. So after +folding and other simplifications, we should result in +``` +%expand = IRMakeValuePack(2,3,4) +``` +When specializing the original expression with `IRMakeValuePack(1,2,3)` in the IR. + +Specialization of types and witness follows the same idea of value specialization, but since types and witnesses are represented directly as ordinary insts and operands instead of the +nested children of an `IRExpand`, we will use a recursive process on the type structure to perform the specialization. Most of the recursion logic should be trivial, and the only +interesting case is when specializing `IRExpandType` and `IREachType`. During the recursion process, we should maintain a state called `indexInPack` to represent the current expansion +index when specializing the pattern type of an `IRExpandType`, and then when we get to specialize an `IREachType(TPack)`, we should know which index in the pack we are currently +expanding by looking at the `indexInPack` context variable, and replace `IREachType(TypePack(T0, T1, ... Tn))` with the `T` at `indexInPack`. + +After the specialization pass, there should be no more `IRExpand` and `IRExpandType` instructions in the IR. And we can lower t he remaining `IRTypePack` the same way as `IRTupleType`s. + + +Alternatives Considered +----------------------- + +We considered the C++ `...` operator syntax and Swift's `repeat each` syntax and ended up picking Swift's design because it is easier to parse and is less ambiguous. Swift is strict about requiring `each` to precede a generic type pack parameter so `void f(T v)` is not a valid syntax to prevent confusion on what `T` is in this context. In Slang we don't require this because `expand each T` is always simplified down to `T`, and refer to the type pack. + +We also considered not adding variadic generics support to the language at all, and just implement `Tuple` and `IFunc` as special system builtin types, like how it is done in C#. However we +believe that this approach is too limited when it comes to what the user can do with tuples and `IFunc`. Given Slang's position as a high performance GPU-first language, it is more important for Slang than other CPU languages to have a powerful type system that can provide zero-cost abstraction for meta-programming tasks. That lead us to believe that the language and the users can benefit from proper support of variadic generics. diff --git a/proposals/008-tuples.md b/proposals/008-tuples.md new file mode 100644 index 0000000..a052585 --- /dev/null +++ b/proposals/008-tuples.md @@ -0,0 +1,140 @@ +SP #008 - Tuples +============== + +Now that we have variadic generics in the language following [SP #007], we should now be able to support `Tuple` type as a core language feature. +`Tuple` types are useful in many places to reduce boilerplate in user code, such as in function return types to eliminate the need of defining +`struct`s that are used only for invoking the function. Adding `Tuple` types to Slang will also simplify interop with other languages such as Python +and C++ that have tuple types. + +Status +------ + +Author: Yong He + +Status: Implemented. + +Implementation: [PR 4856](https://github.com/shader-slang/slang/pull/4856). + +Reviewed by: Jay Kwak, Kai Zhang, Ariel Glasroth. + +Background +---------- + +Tuple type is widely supported in almost all of the modern programming languages including C++, C#, Swift, Rust, Python. Supporting tuple types +in Slang will bring the language to parity with other languages and allow users to practice the same coding idioms in Slang, and allow Slang code +to interop more directly with other parts of the user application written in other languages. + + +Proposed Approach +----------------- + +With variadic generics support, we can now easily define a Tuple type in the core module as: +``` +__generic +__magic_type(TupleType) +struct Tuple +{ + __intrinsic_op($(kIROp_MakeTuple)) + __init(expand each T); +} +``` + +This will allow users to instantiate tuple types from their code with `Tuple(v0, v1, v2)`. + +### Constructing Tuple Values + +To make it easy to construct tuples, we will define a `makeTuple` function in the core module as: +``` +__intrinsic_op($(kIROp_MakeTuple)) +Tuple makeTuple(expand each T values); +``` + +With generic argument inferencing, this will enable user to write: +``` +makeTuple(1, 2.0f) // returns Tuple(1, 2.0f) +``` + +### Accessing Tuple Elements + +We can extend the logic of vector element accessing to access tuple elements. Given `t` as a tuple, these expressions are valid: +``` +t._0 // Access the first element +t._1 // Access the second element +``` + +### Swizzling + +We can easily support tuple swizzles: +``` +let t = Tuple(1, 2.0f); +let v = t._1_0; +// v == Tuple(2.0f, 1) +``` + +### Concatenation + +We can define tuple concatenation operation in the core module as: +``` +Tuple concat(Tuple first, Tuple second) +{ + return makeTuple(expand each first, expand each second); +} +``` + + +### Counting + +The `countof` expression can be used on type packs or tuple values to obtain the number of elements in a type pack or tuple. +And this result should be usable as a compile-time constant such as in a generic argument. + +``` +int bar() +{ +} +int foo() +{ + bar(); // OK, countof T is a compile time constant. + + Tuple t; + let c = countof t; // OK, countof can be used on tuple values. +} +``` + +### Operator Overloads + +We should have builtin operator overloads for all comparison operators if every element type of a tuple conforms to `IComparable`. +This can be supported by defining an overload for these operators in the core module in the form of: +``` +bool assign(inout bool r, bool v) { r = v; return v; } + +__generic +bool operator < (Tuple t0, Tuple t1) +{ + bool greater = false; + bool equals = true; + expand greater || assign(equals, equals && (each t0) == (each t1)) && assign(greater, (each t0) > (each t1)); + return !greater && !equals; +} +``` + + +Alternatives Considered +---------------- + +Should we allow other operator overloads for tuples? This seems useful to have, but right now this is a bit tricky +because we haven't really settled on builtin interfaces. We need to finalize things like `IFloat`, `IInteger`, +`IArithmetic`, `ILogic` etc. first. + +Should we automatically treat `Tuple` type to conform to any interface `IFoo` if every element in the tuple conforms to +`IFoo`? We can't because this is not well-defined. For example, if `IFoo` has a method that returns `int`, +should the tuple type's equivalent method return `Tuple` or just `int`? In some cases you want one but +other times you want the other. And if the method returns a tuple, it is no longer consistent with the base interface +definition so this is all ill-formed. + +We also considered having an overload of `concat` that appends individual elements to the end of a tuple, such as: +``` +Tuple concat(Tuple t, each U values); +``` +However, this could lead to surprising behavior when the user writes `concat(t0, t1, t2)` where t1 and t2 are also tuples. +Having this overload means the result would be `(t0_0, t0_1, ... t0_n, t1, t2)` where the user could be expecting `t1` and `t2` +to be flattened into the resulting tuple. To avoid this surprising behavior, we decide to not include this overload in the core module. diff --git a/proposals/009-ifunc.md b/proposals/009-ifunc.md new file mode 100644 index 0000000..373e2de --- /dev/null +++ b/proposals/009-ifunc.md @@ -0,0 +1,142 @@ +SP #009 - IFunc interface +============== + +Now that we have variadic generics in the language following [SP #007], we should now be able to define a builtin `IFunc` interface that represent +things that can be called with the `()` operator. This will allow users to write generic functions that takes a callback object and adopt more +functional programming idioms. + +Status +------ + +Author: Yong He + +Status: Implemented. + +Implementation: [PR 4905](https://github.com/shader-slang/slang/pull/4905) [PR 4926](https://github.com/shader-slang/slang/pull/4926) + +Reviewed by: Kai Zhang, Jay Kwak + +Background +---------- + +Callback is an idiom that frequently show up in complex codebases. Currently, Slang users can implement this idiom with +interfaces: + +``` +interface ICondition +{ + bool test(int x); +} + +int countElement(int data[100], ICondition condition) +{ + int count = 0; + for (int i = 0; i < data.getCount(); i++) + if (condition.test(data[i])) + count++; + return count; +} + +int myCondition(int x) { return x%2 == 0; } // select all even numbers. + +struct MyConditionWrapper : ICondition +{ + bool test(int x) { return myCondition(x); } +}; + +void test() +{ + int data[100] = ...; + int count = countElement(data, MyConditionWrapper()); +} +``` + +As can be seen, this is a lot of boilerplate. With a builtin `IFunc` interface, we can +allow the compiler to automatically make ordinary functions conform to the interface, +eliminating the need for defining interfaces and wrapper types. + + +Proposed Approach +----------------- + +We should support overloading of `operator()`, and use the function call syntax to call the `operator()` member, similar to C++: +``` +struct Functor +{ + int operator()(float p) {} +} + +void test() +{ + Functor f; + f(1.0f); +} +``` + +We propose `IFunc`, `IMutatingFunc`, `IDifferentiableFunc` and `IDiffernetiableMutatingFunc` that is defined as follows: + +``` +// Function objects that does not have a mutating state. +interface IMutatingFunc +{ + [mutating] + TR operator()(expand each TP p); +} + +// Function objects with a mutating state. +interface IFunc : IMutatingFunc +{ + TR operator()(expand each TP p); +} + +// Differentiable functions +interface IDifferentiableMutatingFunc : IMutatingFunc +{ + [Differentiable] + [mutating] + TR operator()(expand each TP p); +} + +interface IDifferentiableFunc : IFunc, IDifferentiableMutatingFunc +{ + [Differentiable] + TR operator()(expand each TP p); +} +``` + +The `IMutatingFunc` interface is for defining functors that has a mutable state. The following example demonstrates its use: + +``` +void forEach(int data[100], inout IMutatingFunc f) +{ + for (int i = 0; i < data.getCount(); i++) + f(data[i]); +} + +struct CounterFunc : IMutatingFunc +{ + int count; + + [mutating] + void operator()(int data) + { + if (data % 2 == 0) + count++; + } +}; + +void test() +{ + int data[100] = ...; + CounterFunc f; + f.count = 0; + forEach(data, f); + printf("%d", f.count); +} +``` + +# Coercion of ordinary functions + +Eventually, we should allow ordinary functions to be automatically coerceable to `IFunc` interfaces. But this is scoped out +for the initial `IFunc` work, because we believe the implementation can be simpler if we support lambda function first, then +implement ordinary function coercion as a special case of lambda expressions. \ No newline at end of file diff --git a/proposals/010-new-diff-type-system.md b/proposals/010-new-diff-type-system.md new file mode 100644 index 0000000..6115781 --- /dev/null +++ b/proposals/010-new-diff-type-system.md @@ -0,0 +1,285 @@ +# SP #010: New Differentiable Type System + +## Problem +Our current `IDifferentiable` system has some flaws. It works fine for value types, since we can assume that every input gets a corresponding output or 'return' value. It works poorly for buffer/pointer types, since we don't 'return' a buffer, but simply want the getters/setters to be differentiable, and the resulting type to have a second buffer/pointer for the differential data. + +Here's a demonstrative example with our current codebase when we use value types (like `float`) +```csharp +[Differentiable] +float add(float a, float b) +{ + return a + b; +} + +// Synthesized derivative: +[Differentiable] +void s_bwd_add(DifferentialPair dpa, DifferentialPair dpb, float.Differential d_out) +{ + // A backward derivative method is currently responsible for 'setting' the differential values. + dpa = DifferentialPair(dpa.p, d_out); + dpb = DifferentialPair(dpb.p, d_out); +} +``` + +Unfortunately, this makes little sense if we decide to use buffer or pointer types: +```csharp +struct DiffPtr : IDifferentiable +{ + StructuredBuffer bufferRef; + uint64 offset; + + [Differentiable] T get() { ... } + [Differentiable] void set(T t) { ... } + /* + Problem 1: + We use custom derivatives for get() and set() to backprop and + read gradients. If DiffPtr is differentiable, then get() and + set() need to operate on the *pair* type and not this struct type. + There is no proper way to do this currently. + */ +}; + +[Differentiable] +void add(DiffPtr a, DiffPtr b, DiffPtr output) +{ + output.set(a.get() + b.get()); +} + +// Synthesized derivative: +[Differentiable] +void s_bwd_add( + inout DifferentialPair> a, + inout DifferentialPair> b, + inout DifferentialPair> output) +{ + /* + Problem 2: + + Current backward mode semantics require that the method assume that the differentials + a.d and b.d are empty/zero, and it is the backward method's job to populate the result. + + It doesn't make sense to 'set' the differential part since it is a buffer ref. + Rather, we want the user to provide the differential pointer, and use custom derivatives of + the getters/setters to propagate derivatives. + + This also means methods like dzero(), dadd() and dmul() make no sense + in the context of pointer types. They cannot be initialized within a derivative method. + */ +} + +``` + +## Workarounds +At the moment the primary workaround is to use a **non-differentiable buffer type** with differentiable methods, and always initialize the object with two pointers for both the primal and differential buffers. This is how our `DiffTensorView` object works. +Unfortunately, this is a rather hacky workaround with several drawbacks: +1. `DiffTensorView` does not conform to `IDifferentiable`, but is used for derivatives. This makes our type system less useful as checks for `is_subtype` from applications using reflection need workarounds to account for corner cases like these. +2. `DiffTensorView` always has two buffer pointers even when used in non-differentiable methods. This is extra data in the struct, and potentially extra tensor allocations (we explicitly handle this case in `slangtorch` by leaving the diff part uninitialized if a primal method is invoked) +3. Higher-order derivatives don't work well with this workaround. Differentiating a method twice needs a set of 4 pointers, but we need to account for this ahead of time by using new types like `DiffDiffTensorView` that worsens the problem of carrying around extra data where its not required. + + +## Solution + +We'll need to make the following 4 additions/changes: +### 1. `[deriv_method]` function decorator. +Intended for easy definition of custom derivatives for struct methods. It has the following properties: +1. Accesses to `this` within `[deriv_method]` are differential pairs. +2. Methods decorated with `[deriv_method]` cannot be called as regular methods (they can still be explicitly invoked with `bwd_diff(obj.method)`), and do not show up in the auto-complete list. + +See the next section for example uses of `[deriv_method]`. + +### 2. Split `IDifferentiable` interface: `IDifferentiableValueType` and `IDifferentiablePtrType` +This approach moves away from "type-driven" derivative semantics and towards more "function-driven" derivative semantics. +We no longer have a `dadd` , `dzero`, `dmul` etc.. we use default initialization instead of `dzero` and the backward derivative of the `use` method for `dadd` + +Further, `IDifferentiablePtrType` types don't have any of these properties. They do not need a way to 'add', and it is especially important that there is no default initializer. We never want the compiler to be able to create a new object of `IDifferentiablePtrType` since we want to get the user-provided pointers. + +Additionally, we can use `IDifferentiableValueType` as the current `IDifferentiable` for backwards compatibility (it should just work in 95% of cases, since no one really defines dadd/dzero/dmul explicitly anyway) + +Here's the new set of base interfaces: +```csharp +interface __IDifferentiableBase { } // Helper type for our implementation. +interface IDifferentiableValueType : __IDifferentiableBase +{ + associatedtype Differential : IDifferentiableValueType & IDefaultInitializable; + [Differentiable] This use(); // auto-synthesized +} + +interface IDifferentiablePtrType : __IDifferentiableBase +{ + associatedtype Differential : IDifferentiablePtrType; +} + +``` + +Some extras in the core module allow us to constrain the diffpair type for things like `IArithmetic` +```csharp +// --- CORE MODULE EXTRAS --- + +interface ISelfDifferentiableValueType : IDifferentiableValueType +{ + // Force arithmetic types to be a differential pair of the same two types. + // Make it simple to define derivatives of arithmetic operations. + // + associatedtype Differential : This; +} + +extension IFloat : ISelfDifferentiableValueType +{ } + +extension float +{ + // trivial auto-synthesis (maybe we even prevent the user from overriding this) + float use() { return this; } + + // trivial auto-synthesis (maybe we even prevent the user from overriding this). + [ForwardDerivativeOf(use)] + [deriv_method] void use_fwd() { return this; } + + // auto-synthesized if necessary by invoking the use_bwd for all fields. + // we need to provide implementation for 'leaf' types. + [BackwardDerivativeOf(use)] + [deriv_method] [mutating] void use_bwd(float d) { this.d += d; } +} + +// The new system lets us define differentiable pointers easily. +// IDifferentiablePtrType'd values are simply treated as references, so they can be freely +// duplicated without requiring a `use()` for correctness. +// +struct DPtr : IDifferentiablePtrType +{ + typealias Differential = DPtr; + + Buffer buffer; + uint64 offset; + + [BackwardDerivative(get_bwd)] + [BackwardDerivative(get_fwd)] + T get() { return this.buffer[offset]; } + + [deriv_method] DifferentialPair get_fwd() + { + return diffPair(this.p.buffer[offset], this.d().buffer[offset]); + } + + [deriv_method] void get_bwd(Differential d) + { + return this.d.InterlockedAdd(offset, d); + } + + DPtr operator+(uint o) { return DPtr{buffer, offset + o}; } +} + +// Or we can define a fancier differentiable pointer that does a hashgrid +struct DHashGridPtr : IDifferentiablePtrType +{ + typealias Differential = DPtr; + + Buffer buffer; + uint64 offset; + + [BackwardDerivative(get_bwd)] + [BackwardDerivative(get_fwd)] + T get() { return this.buffer[offset]; } + + [deriv_method] DifferentialPair get_fwd() + { + return diffPair(this.p().buffer[offset], this.d().buffer[offset]); + } + + [deriv_method] void get_bwd(Differential d) + { + return this.d().InterlockedAdd(offset * N + hash(get_thread_id()), d); + } +} +``` + +### 3. Every time we 'reuse' an object that conforms to `IDifferentiableValueType`, we split it with `use()` , and we use `__init__()` where necessary to initialize an accumulator. +Example: +```csharp +float f(float a) +{ + add(a, a); +} +float add(float a, float b) +{ + return a + b; +} + +// Synthesized derivatives +void add_bwd(inout DiffPair dpa, inout DiffPair dpb, float d_out) +{ + dpa = diffPair(dpa.p, d_out); + dpb = diffPair(dpb.p, d_out); +} + +// Preprocessed-f (before derivative generation) +float f_with_use_expansion(float a) +{ + DiffPair a_extra = a.use(); + return add(a, a_extra); +} + +// After fwd-mode: +DiffPair f_fwd(DiffPair dpa) +{ + DiffPair dpa_extra = dpa.use_fwd(); + return add_fwd(a, a_extra_fwd); +} + + +// bwd-mode: +void f_bwd(inout DiffPair dpa, float d_out) +{ + // fwd-pass + + // split + DiffPair dpa_extra = dpa.use_fwd(); + // ------- + + // bwd-pass + dpa_extra_bwd = DiffPair(dpa_extra.p, float.Differential::__init__()); + add_bwd(dpa, dpa_extra, d_out); + + // merge + dpa.use_bwd(dpa_extra); +} +``` + +### 4. Objects that conform to `IDifferentiablePtrType` are used without splitting. They are simply not 'transposed' at all, because there is nothing to transpose. The fwd-mode pair is used as is. +Here's the same example above, but with the `DPtr` type defined above. + +```csharp +void f(DPtr a, DPtr output) +{ + add(a, a, output); +} + +void add(DPtr a, DPtr b, DPtr output) +{ + output.set(a.get() + b.get()); +} + +// Synthesized derivatives +// (note: no inout req'd for IDifferentiablePtrType) +// important difference is that `ptr` types don't get transposed, only +// methods on the objects are. +// they DO NOT have a default initializer (the user must supply the differential part) +void add_bwd( + DifferentialPair> dpa, + DifferentialPair> dpb, + DifferentialPair> output) +{ + // forward pass. + var a_p = dpa.p.get(); + var b_p = dpb.p.get(); + // ---- + + // backward pass. + float.Differential d_val = DPtr::set_bwd(output); // set_bwd works on the entire pair. + DifferentialPair a_get_bwd = diffPair(a_p, float.Differential::__init__()); + DifferentialPair b_get_bwd = diffPair(b_p, float.Differential::__init__()); + operator_float_add_bwd(a_get_result_bwd, b_get_result_bwd, d_val); + DPtr::get_bwd(dpa); + DPtr::get_bwd(dpb); +} +``` diff --git a/proposals/011-structured-binding.md b/proposals/011-structured-binding.md new file mode 100644 index 0000000..59b671e --- /dev/null +++ b/proposals/011-structured-binding.md @@ -0,0 +1,47 @@ +SP #011: Structured Binding +================= + +Tuple types can reduce boilterplate code of defining auxiliary structs, but they can introduce readability issues because the elements are not named. +To mitigate this issue, we should support structured binding as a convenient way to access tuple elements with meaningful names. + +# Status + +Status: Proposal in review. + +Implementation: N/A + +# Proposed Approach + +Users should be able to use `let` syntax to assign a composite type to a binding structure: + +``` +let tuple = makeTuple(1.0f, 2, 3); +let [a, b, c] = tuple; +``` + +Where the `let [...]` statement is a syntactic sugar of: +``` +let a = tuple._0; +let b = tuple._1; +let c = tuple._2; +``` + +The right hand side of a structured binding can be a tuple, an array, or a struct type. +It is not an error if the composite value has more elements than the binding structure. + +Mutable bindings are not allowed. + +# Alternatives Considered + +We could have allowed mutable bindings in the syntax of: +``` +var [a,b,c] = ... +``` +That defines mutable variables a,b,c whose values are copied from the structure. +However, mutable bindings can lead to confusions when modifying `a` doesn't change the value +int the source composite object from the binding. To avoid this confusion, we simply disallow +it. + +Supporting mutation on the original composite object can be tricky as it involves reference types +that are not existent in the language. For simplicity we consider that to be out of scope of this +proposal. \ No newline at end of file diff --git a/proposals/012-language-version-directive.md b/proposals/012-language-version-directive.md new file mode 100644 index 0000000..3a75e50 --- /dev/null +++ b/proposals/012-language-version-directive.md @@ -0,0 +1,221 @@ +SP #012: Introduce a `#language` Directive +========================================= + +Status: Design Review + +Implementation: - + +Author: Theresa Foley + +Reviewer: - + +Introduction +------------ + +We propose to add a preprocessor directive, `#language` to allow a Slang source file to specify the version of the Slang language that is used in that file. +The basic form is something like: + + // MyModule.slang + + #language slang 2024.1 + + ... + +In the above example, the programmer has declared that their file is written using version 2024.1 of the Slang language and standard library. +Slang toolsets with versions below 2024.1 will refuse to compile this file, since it might use features that they do not support. +Slang toolsets with versions greater than 2024.1 *may* refuse to compile the file, if they have removed support for that language version. + +Putting a language version directly in source files allows the Slang language and standard library to evolve (including in ways that remove existing constructs) without breaking existing code. +A single release of the Slang toolchain may support a range of language versions, with different supported language/library constructs, and select the correct features to enable/disable based on the `#language` directive. + +When a release of the Slang toolchain doesn't support the language version requested by a programmer's code, the diagnostics produced can clearly state the problem and possible solutions, such as switching to a different toolchain version, or migrating code. + +Background +---------- + +Like many programming languages, Slang experiences a tension between the desire for rapid innovation/evolution and stability. +One of the benefits that users of Slang have so far enjoyed has been the rapid pace of innovation in the language and its standard library. +However, as developers start to have larger bodies of Slang code, they may become concerned that changes to the language could break existing code. +There is no magical way to keep innovating while also keeping the language static. + +This proposal is an attempt to find a middle road between the extremes of unconstrained evolution and ongoing stasis. + +Related Work +------------ + +### GLSL ### + +The most obvious precedent for the feature proposed here is the [`#version` directive](https://www.khronos.org/opengl/wiki/Core_Language_(GLSL)#Version) in GLSL, which can be used to specify a version of the GLSL language being used and, optionally, a profile name: + + #version 460 core + +There are some key lessons from the history of GLSL that are worth paying attention to: + +* When OpenGL ES was introduced, the OpenGL ES Shading Language also used an identical `#version` directive, but the meaning of a given version number was different between GLSL and GLSL ES (that is, different language features/capabilities were implied by the same `#version`, depending on whether one was compiling with a GLSL or GLSL ES compiler). The use of the optional profile name is highly encouraged when there might be differences in capability not encoded by just the version number. + +* Initially, the version numbers for OpenGL and GLSL were not aligned. For example, OpenGL 2.0 used GLSL 1.10 by default. This led to confusion for developers, who needed to keep track of what API version corresponded to what language version. The version numbers for OpenGL and GLSL became aligned starting with OpenGL 3.3 and GLSL 3.30. + +* A common, but minor, gotcha for developers is that the GLSL `#version` directive can only be preceded by trivia (whitespace and comments) and, importantly, cannot be preceded by any other preprocessor directives. This limitation has created problems when applications want to, e.g., prepend a sequence of `#define`s to an existing shader that starts with a `#version`. + +When a GLSL file does not include a `#version` directive, it implicitly indicates version 1.10. +This is a safe o + +### Racket ### + +While it is a very different sort of language than Slang, it is valuable to make note of the Racket programming language's [`#lang` notation](https://docs.racket-lang.org/guide/Module_Syntax.html#%28part._hash-lang%29). + +A `#lang` line like: + + #lang scribble/base + +indicates that the rest of the file should be read using the language implementation in the module named `scribble/base`. +Different modules can implement vastly different languages (e.g., `scribble/base` is a LaTeX-like document-preparation language). + +This construct in Racket is extremely flexible, allowing for entirely different languages (e.g., custom DSLs) to be processed by the Racket toolchain, but it could also trivially be used to support things like versioning: + + #lang slang 2024.1 + +While we do not necessarily need or want the same degree of flexibility for the Slang language itself, it is worth noting that the Slang project, and its toolchain, is in the situation of supporting multiple distinct languages/dialects (Slang, a GLSL-flavored dialect, and an HLSL-flavored dialect), and has extensive logic for inferring the right language to use on a per-file basis from things like file extensions. + +The Racket toolchain treats files without a `#lang` line as using the ordinary Racket language, and provide whatever language and library features were current at the time that toolchain was built. + +### Other Languages ### + +Most other language implementations do not embed versioning information in source files themselves, and instead make the language version be something that is passed in via compiler options: + +* gcc and clang use the `-std` option to select both a language and a version of that language: e.g., `c99` vs `c++14`. + +* dxc uses the `-HV` option to specify the version of the HLSL language to use, typically named by a year: e.g., `2016` or `2021`. + +* Rust developers typically use configuration files for the Cargo package manager, which allows specifying the Rust language and compiler version to use with syntax like `rust-version = "1.56"`. + +When language versions are not specified via these options, most toolchains select a default, but that default may change between releases of the toolchain (e.g., recent versions of clang will use C++17 by default, even if older releases of the toolchain defaulted to C++14 or lower). + + +Proposed Approach +----------------- + +### Language and Compiler Versions ### + +We will differentiate between two kinds of versions, which will have aligned numbering: + +* The *language version* determines what language features (keywords, attributes, etc.) and standard library declarations (types, functions, etc.) are available, and what their semantic guarantees are. + +* The *compiler version* or *toolset version* refers to the version of a release of the actual Slang tools such as `slangc`, `slang.dll`, etc. + +This proposal doesn't intend to dictate the format used for version numbers, since that is tied into the release process for the Slang toolset. +We expect that version numbers will start with a year, so that, e.g., `2025.0` would be the first release in the year 2025. + +A given version of the Slang toolset (e.g, `2024.10`) should always support the matching language version. + +If this proposal is accepted, we expect releases of the Slang toolset to support a *range* of language versions, ideally covering a full year or more of backwards compatibility. +This proposal does not seek to make any guarantees about the level of backwards compatibility, leaving that the Slang project team to determine in collaboration with users. + +### `#language` Directives ### + +Any file that is being processed as Slang code (as opposed to HLSL or GLSL) may have a `#language` directive and, if it does, that directive determines the language version required by that file. + +A `#language` directive specifying Slang version 2025.7 would look like: + + #language slang 2025.7 + +If the toolset being used supports the requested version, it will process that file with only the capabilities of that language version. +If the requested version is out of the range supported by the toolset (either too old or too new) compilation will fail with an appropriate diagnostic. + +If a file has *no* version directive, then the toolset will process that file as if it requested the version corresponding to the toolset release (e.g., a 2025.1 toolset release would compile such a file using language version 2025.1). + +Detailed Explanation +-------------------- + +* A `#language` directive must only be preceded by trivia (whitespace and comments). + +* The directive must always be of the form `#language slang `; it is not valid to only list a version number without the language name `slang`. + +* The version number follows the syntactic form of a floating-point literal, and might be lexed as one for simplicity, but internally each of the components of the version should be treated as an integer. For example, a version `2026.10` is *not* equivalent to `2026.1`, and is a higher version number than `2026.9`. + +* We are not proposing strict adherence to [Semantic Versioning](https://semver.org/) at this time. + +* The `#language` directive will not support specifying a version in the form `MAJOR.MINOR.PATCH` - only `MAJOR` and `MAJOR.MINOR` are allowed. The assumption is that patch releases should always be backwards-compatible, and a given toolset can always safely use the highest patch number that matches the requested version. + +* If the version number is given as just the form `MAJOR` instead of `MAJOR.MINOR`, then the toolset will use the highest language version it supports that has that major version. That is, `#language slang 2026` is not an alias for `2026.0`, but instead acts as a kind of wildcard, matching any `2026.*` version. + +* The directive is allowed to be given as just `#language slang`, in which case the toolset will use the highest supported language version, as it would when the directive is absent. + +* If an explicit compiler option was used to select a language other than Slang (e.g., via `-lang hlsl` to explicitly select HLSL), then the `#language` directive described in this proposal will result in an error being diagnosed. + +* When the toolset version and the language version are not the same (e.g., a `2026.1` compiler is applied to `2025.3` code), the request language version *only* affects what language and library constructs are supported, and their semantics. Things like performance optimizations, supported targets, etc. are still determined by the toolset. + +Alternatives Considered +----------------------- + +### Compiler Options ### + +The main alternative here is to allow the language version to be specified via compiler options. +The existing `-lang` option for `slangc` could be extended to include a language version: e.g., `slang2025.1`. + +This proposal is motivated by extensive experience with the pain points that arise when semantically-significant options, flags, and capabilities required by a project are encoded not in its source code, but only in its build scripts or other configuration files. +Anybody who has been handed a single `.hlsl` file and asked to simply compile it (e.g., to reproduce a bug or performance issue) likely knows the litany of questions that need to be answered before that file is usable: what is the entry point name? What stage? What shader model? + +The addition of the `[shader(...)]` attribute to HLSL represents a significant improvement to quality-of-life for developers, in part because it encodes the answers to two of the above question (the entry point name and stage) into the source code itself. +We believe this example should be followed, to enable as much information as possible that is relevant to compilation to be embedded in the source itself. + +### Leaving Out the Language Name ### + +It is tempting to support a directive with *just* the language version, e.g. something like: + + #version 2025.3 + +We strongly believe that including an explicit language name is valuable for future-proofing, especially given the lessons from GLSL and GLSL ES mentioned earlier. + +The requirement of an explicit language name has the following benefits: + +* It avoids any possible confusion with the GLSL `#version` directive, which serves a similar purpose but has its own incompatible version numbering. If we supported using just a bare version number, then there would be a strong push to use the name `#version` for our directive instead of `#language`. + +* It leaves the syntax open to future extensions, such that the Slang toolset could recognize other languages/dialects (e.g., supporting a `#language hlsl 2021` directive). Further down that particular rabbit-hole would be support for Racket-style DSLs. + +* It could potentially encourage other languages with overlapping communities (such as HLSL, WGSL, etc.) to adopt a matching/compatible directives, thus increasing the capability for tooling to automatically recognize and work with multiple languages. + +### Naming ### + +The name for this directive could easily lead to a lot of bikeshedding, with major alternatives being: + +* We could use `#version` to match GLSL, *especially* if we decide to change the approach and allow the language name `slang` to be elided. However, as discussed above, we think that this is likely to cause confusion between Slang's directive and the GLSL directive, especially when the Slang toolchain supports both languages. + +* We could hew closely to the precedent of Racket and use `#lang` instead of `#language`. There is no strong reason to pursue actual *compatibility* between the two (i.e., so that Slang code can be fed to the Racket toolchain, or vice versa), so the benefits of using the exact same spelling are minimal. This proposal favors being more humane and spelling things out fully over using contractions. + +### What is the right default? ### + +Given that existing Slang code doesn't use anything like `#language`, we cannot require that all files add the directive without breaking **all** existing code (which is unacceptable). +Thus we need to provide a default behavior for files that don't use `#language`, and there are seemingly only two reasonable options: + +* We can follow the precedent of GLSL and treat the absence of the directive as a request for the lowest possible version of the language. In our case, that would mean locking code without a `#language` to whatever was the last language version before `#language` was introduced. + +* We can follow the precedent of most other toolsets, and treat the absence of the directive as a request for the latest *stable* language version. That is, in the absence of a directive a program should have access to all *non-experimental* language features. + +While the first option was the right choice for GLSL (where existing applications might feed existing GLSL to new GPU drivers with new compiler implementations, and need it to Just Work), it ultimately leads to the directive being *de facto* required after a certain point (does anybody intentionally write GLSL 1.10 code any more?). + +Furthermore, interpreting a missing directive as asking for some old language version will not play nicely with our intention to allow newer toolset releases to (eventually) drop support for older language versions. It is probably reasonable for Slang releases in 2026 to no longer support the 2024 version of the language, but if that meant they would have to reject all source files that omit `#language` because of a version mismatch... again, we are back to the recognition that the directive has become required without us explicitly saying so. + +The second option means that omitting the `#language` directive will remain a meaningful choice for developers: it indicates an intention to track the language as it evolves, and to accept that some things might break along the way. +Any developer that does not accept those terms would need to specify the language version they are sticking with, which is exactly what the `#language` directive does. + + +Future Directions +----------------- + +Some of the more likely future directions include: + +* Allow the version part of `#language` to support a patch number, so that code that requires a particular bug fix to compile can be annotated with that fact and thus fail compilation cleanly when processed with a buggy toolset version. + +* Extend support for versioning to modules other than the standard library. The `#language` directive effectively introduces a versioning scheme for the Slang standard library and allows a user-defined module to specify the version(s) it is compatible with. This system could be extended to allow all modules (including user-defined modules) to define their own version numbers, and for `import` declarations to identify the version of a dependency that is required. + + * Taking such a direction should only be done with a careful survey of existing approaches to versioning used by package managers for popular languages, to ensure that we do not overlook important features that make management of dependency versions practical. + +Some less likely or less practical directions include: + +* Add other supported language names. Supporting something like `#language hlsl 2021` would allow for our HLSL-flavored dialect to recognize different versions of HLSL without resorting to command-line flags. + +* If we did the above, we would probably need to consider allowing a "safe" subset of preprocessor directives to appear before a `#language` line - most importantly, `#if` and `#ifdef`, so that one could conditionally compile a `#language` line depending on whether HLSL-flavored code is being processed with the Slang toolchain (which would support `#language`) or other compilers like dxc (which might not). + +* In the far-flung future, we could consider the Racket-like ability to have a `#language` directive support looking up a language implementation module matching the language name, and then parsing the rest of the file as that language, for DSL support, etc. + diff --git a/proposals/013-aligned-load-store.md b/proposals/013-aligned-load-store.md new file mode 100644 index 0000000..ea6f495 --- /dev/null +++ b/proposals/013-aligned-load-store.md @@ -0,0 +1,58 @@ +SP #013: Aligned load store +========================================= + +Status: Experimental + +Implementation: [PR 5736](https://github.com/shader-slang/slang/pull/5736) + +Author: Yong He (yhe@nvidia.com) + +Reviewer: + +Introduction +---------- + +On many architectures, aligned vector loads (e.g. loading a float4 with 16 byte alignment) is often more efficient than ordinary unaligned loads. Slang's pointer type does not encode any additional alignment info, and all pointer read/writes are by default assuming the alignment of the underlying pointee type, which is 4 bytes for float4 vectors. This means that loading from a `float4*` will result in unaligned load instructions. + +This proposal attempts to provide a way for performance sensitive code to specify an aligned load/store through Slang pointers. + + +Proposed Approach +------------ + +We propose to add intrinsic functions to perform aligned load/store through a pointer: + +``` +T loadAligned(T* ptr); +void storeAligned(T* ptr, T value); +``` + +Example: + +``` +uniform float4* data; + +[numthreads(1,1,1)] +void computeMain() +{ + var v = loadAligned<8>(data); + storeAligned<16>(data+1, v); +} +``` + +Related Work +------------ + +### GLSL ### + +GLSL supports the `align` layout on a `buffer_reference` block to specify the alignment of the buffer pointer. + +### SPIRV ### + +In SPIRV, the alignment can either be encoded as a decoration on the pointer type, or as a memory operand on the OpLoad and OpStore operations. + +### Other Languages ### + +Most C-like languages allow users to put additional attributes on types to specify the alignment of the type. All loads/stores through pointers of the type will use the alignment. + +Instead of introducing type modifiers on data or pointer types, Slang should explicitly provide a `loadAligned` and `storeAligned` intrinsic functions to leads to `OpLoad` and `OpStore` with the `Aligned` memory operand when generating SPIRV. This way we don't have to deal with the complexity around rules of handling type coercion between modified/unmodified types and recalculate alignment for pointers representing an access chain. Developers writing performance sentisitive code can always be assured that the alignment specified on each critical load or store will be assumed, without having to work backwards through type modifications and thinking about the typing rules associated with such modifiers. \ No newline at end of file diff --git a/proposals/014-extended-length-vectors.md b/proposals/014-extended-length-vectors.md new file mode 100644 index 0000000..4dcc7f8 --- /dev/null +++ b/proposals/014-extended-length-vectors.md @@ -0,0 +1,193 @@ +# SP #014: Extended Length Vectors + +This proposal introduces support for vectors with 0 or more than 4 components in +Slang, extending the current vector type system while maintaining compatibility +with existing features. + +## Status + +Status: Design Review + +Implementation: N/A + +Author: Ellie Hermaszewska + +Reviewer: TBD + +## Background + +Currently, Slang supports vectors between 1 and 4 components (float1, float2, +float3, float4, (etc for other element types)), following HLSL conventions. +This limitation stems from historical GPU hardware constraints and HLSL's +graphics-focused heritage. However, modern compute applications may require +working with longer vectors for tasks like machine learning, scientific +computing, and select graphics tasks. + +Related Work + +- C++: std::array provides fixed-size array containers but lacks + vector-specific operations. SIMD types like std::simd (C++23) support + hardware-specific vector lengths. + +- CUDA: While CUDA doesn't provide native long vectors, libraries like Thrust + implement vector abstractions. Built-in vector types are limited to 1-4 + components (float1, float2, float3, float4). + +- OpenCL: Provides vector types up to 16 components (float2, float4, float8, + float16, etc.) with full arithmetic and logical operations. + +- Modern CPU SIMD: Hardware support for longer vectors continues to grow: + - Intel AVX-512: 512-bit vectors (16 float32s) + - ARM SVE/SVE2: Scalable vector lengths up to 2048 bits + - RISC-V Vector Extensions: Variable-length vector support + +These approaches demonstrate different strategies for handling longer vectors, +from fixed-size containers to hardware-specific implementations. + +## Motivation + +The primary motivation for extended length vectors is to support mathematical +operations and algorithms that naturally operate on higher-dimensional vectors. +While Slang already supports arrays for data storage, certain computations +specifically require vector semantics and operations that arrays don't provide. + +A key principle of this proposal is consistency: there is no fundamental +mathematical reason to limit vectors to 4 components. While the current limit +stems from graphics hardware history, modern compute applications shouldn't be +constrained by this arbitrary boundary. Supporting any natural length N +provides a clean, orthogonal design that follows mathematical principles rather +than historical limitations. + +Some example use cases: + +- Geometric Algebra and Clifford Algebras: + + - 6D vectors for Plücker coordinates (representing lines in 3D space) + - 6D vectors for screw theory in robotics (combining rotation and translation) + - 8D vectors for dual quaternions in their vector representation + - Higher-dimensional geometric products and outer products + +- Machine Learning: + + - Neural network feature vectors where vector operations (dot products, + normalization) are fundamental + - Distance metrics in high-dimensional embedding spaces + - Principal Component Analysis with multiple components + +- Scientific Computing: + - Spherical harmonics: Vector operations on coefficient spaces beyond 4D + - Quantum computing: State vectors in higher-dimensional Hilbert spaces + +This extension maintains Slang's mathematical vector semantics while enabling +computations that naturally operate in higher dimensions. The focus is not on +data storage (which arrays already handle) but on preserving vector-specific +operations and mathematical properties in higher dimensions and improving +consistency in the language. + +## Proposed Approach + +We propose extending Slang's vector type system to support vectors of arbitrary +length, supporting as many of the operations available to 4-vectors as +possible. Hardware limitations will prohibit a complete implementation, for +example certain atomic operations may not be possible on longer vectors. + +Key aspects: + +- Support vectors of any length that is a natural number, subject to the same + limits as fixed-length arrays +- Maintain existing syntax for vectors up to length 4 +- Maintain numeric swizzling operators. +- Support standard vector operations (add, multiply, etc.) + +## Detailed Explanation + +### Type System Integration + +There are no type system changes required, as the vector type constructor is +already parameterized over arbitrary vector length. + +### Operations + +#### Component Access: + +```slang +vector v; +float f0 = v.x; // First component +float f1 = v[1]; // Array-style access +float2 f3 = v_6_7; // last two member +vector = float2(1,2).xxxxxxxx; // extended swizzling example +``` + +#### Arithmetic Operations: + +Any operations currently supported and generic over restricted vector length +will be made unrestricted. + +Part of the scope of this work is to generate a precise list. + +For example all componentwise operations will be supported, as well as +reductions. + +Cross product will remain restricted to 3-vectors, and will not be overloaded +to 0-vectors or 7-vectors; this is due to worse type inference and error +messages should overloads be added. + +#### Atomic Operations: + +Not supported + +### Storage Operations: + +Most platforms restrict the type of data able to be stored in textures, this +proposal does not intend to work-around these restrictions. + +### Implementation Details + +Memory Layout: + +- Vectors are stored in contiguous memory +- Alignment follows platform requirements +- Padding may be required for certain lengths, for example padding to the + nearest multiple of 4 + +Performance Considerations: + +- No performance degredation for 1,2,3,4-vectors +- SIMD implementation work possible, not initially required + +## Alternatives Considered + +Fixed Maximum Length: + +- Could limit to common sizes (8, 16, 32) +- Simpler implementation but less flexible +- Rejected due to limiting future use cases + +Do nothing: + +- See motivation section + +## Additional notes + +### Zero-Length Vectors + +This proposal includes support for zero-length vectors. While seemingly +unusual, zero-length vectors are mathematically well-defined and provide +important completeness properties: + +- They are the natural result of certain slicing operations +- They serve as the identity element for vector concatenation + +### Extended Matrices + +While this proposal focuses on vectors, it naturally suggests a future +extension to matrices beyond the current 4x4 limit. Extended matrices would +follow similar principles. + +However, extended matrices introduce additional considerations: + +- Memory layout and padding strategies for large matrices +- Optimization of common operations (multiplication, transpose) + +These matrix-specific concerns are best addressed in a separate proposal that +can build upon the extended vector foundation established here. diff --git a/proposals/015-descriptor-handle.md b/proposals/015-descriptor-handle.md new file mode 100644 index 0000000..b123a39 --- /dev/null +++ b/proposals/015-descriptor-handle.md @@ -0,0 +1,260 @@ +SP #015 - `DescriptorHandle` type +============== + +## Status + +Author: Yong He + +Status: In Experiment. + +Implementation: [PR 6028](https://github.com/shader-slang/slang/pull/6028) + +Reviewed by: Theresa Foley, Jay Kwak + +## Background + +Textures, sampler states and buffers are typically passed to shader as opaque handles whose size and storage address is undefined. These handles are communicated to the GPU via "bind states" that are modified with host-side APIs. Because the handle has unknown size, it is not possible to read, copy or construct such a handle from the shader code, and it is not possible to store the handle in buffer memory. This makes both host code and shader code difficult to write and prevents more flexible encapsulation or clean object-oriented designs. + +With the recent advancement in hardware capabilities, a lot of modern graphics systems are adopting a "bindless" parameter passing idiom, where all resource handles are passed to the shader in a single global array, and all remaining references to texture, buffers or sampler states are represented as a single integer index into the array. This allows the shader code to workaround the restrictions around the opaque handle types. + +Direct3D Shader Model 6.6 introduces the "Dynamic Resources" capability, which further simplifies the way to write bindless shader code by removing the need to even declare the global array. + +We believe that graphics developers will greatly benefit from a system defined programming model for the bindless parameter passing idom that is versatile and cross-platform, which will provide a consistent interface so that different shader libraries using the bindless pattern can interop with each other without barriers. + +## Proposed Approach + +We introduce a `DescriptorHandle` type that is defined as: +``` +struct DescriptorHandle : IComparable + where T : IOpaqueDescriptor +{ + [require(hlsl_glsl_spirv)] + __init(uint2 value); // For HLSL, GLSL and SPIRV targets only. +} +``` +Where `IOpaqueDescriptor` is an interface that is implemented by all texture, buffer and sampler state types: + +```slang +enum DescriptorKind +{ + Unknown, + Texture, + CombinedTextureSampler, + Buffer, + Sampler, + AccelerationStructure, +} +interface IOpaqueDescriptor +{ + static const DescriptorKind kind; +} +``` + +All builtin types that implements `IOpaqueDescriptor` interface provide a convenience typealias for `DescriptorHandle`. For example, `Texture2D.Handle` is an alias for `DescriptorHandle`. + +### Basic Usage + +`DescriptorHandle` should provide the following features: + +- `operator *` to deference the pointer and obatin the actual descriptor handle `T`. +- Implicit conversion to `T` when used in a location that expects `T`. +- When targeting HLSL, GLSL and SPIRV, `DescriptorHandle` can be explicitly casted to and from a `uint2` value. +- Equality comparison. + +For example: + +```slang +uniform DescriptorHandle texture; + +// `SamplerState.Handle` is equivalent to `DescriptorHandle`. +uniform SamplerState.Handle sampler; + +void test() +{ + // Explicit cast from bindless handle to an uint2 value. + // (Available on HLSL, GLSL and SPIRV targets only) + let idx = (uint2)texture; + + // Constructing bindless handle from uint2 value. + // (Available on HLSL, GLSL and SPIRV targets only) + let t = DescriptorHandle(idx); + + // Comparison. + ASSERT(t == texture); + + // OK, `t` is first implicitly dereferenced to producee `Texture2D`, and + // then `Texture2D::Sample` is called. + // The `sampler` argument is implicitly converted from `DescriptorHandle` + // to `SamplerState`. + t.Sample(sampler, float2(0,0)); + + // Alternatively, the following syntax is also allowed, to + // make `DescriptorHandle` appear more like a pointer: + t->Sample(*sampler, float2(0, 0)); +} +``` + +A `DescriptorHandle` type has target-dependent size, but it is always a concrete/physical data type and valid in all memory locations. For HLSL and SPIRV targets, it is represented by a two-component vector of 32-bit unsigned integer (`uint2`), and laid out as such. On these targets, builtin conversion functions are provided to construct +a `DescriptorHandle` from a `uint2` value. + +On targets where descriptor handles are already concrete and sized types, `DescriptorHandle` simply translates to `T`, and has size and alignment that matches the corresponding native type, which is queryable with Slang's reflection API. + +This means that on all targets where `DescriptorHandle` is supported, you can use a `DescriptorHandle` type in any context where an ordinary data type, e.g. `int` type is allowed, such as in buffer elements. + +### Obtaining Descriptor from `DescriptorHandle` + +Depending on the target platform and the design choices of the user's application, the way to obtain the actual +descriptor handle from a `DescriptorHandle` integer handle can vary. Slang does not dictate how this conversion is done, +and instead, this is left to the user via Slang's link-time specialization ability. + +Slang defines the following core module declarations: + +```slang +extern T getDescriptorFromHandle(DescriptorHandle handle) where T : IOpaqueDescriptor +{ + // Default Implementation + return defaultGetDescriptorFromHandle(handle); +} +``` + +The `getDescriptorFromHandle` is used to convert from a bindless handle to actual opaque resource handle. +If this function is not provided by the user, the default implementation defined in the core module will be used. + +By default, the core module implementation of `getDescriptorFromHandle` should use the `ResourceDescriptorHeap` and +`SamplerDescriptorHeap` builtin object when generating HLSL code. When generating code on other targets, `getDescriptorFromHandle` +will fetch the descriptor handle from a system defined global array of the corresponding descriptor type. + +If/when SPIRV is extended to expose similar capabilities as D3D's `ResourceDescriptorHeap` feature, we should change the default implementation +to use that instead. Until we know the default implementation of `getDescriptorFromHandle` is stable, we should advise users +to provide their own implementation of `getDescriptorFromHandle` to prevent breakages. + +If the user application requires a different bindless implementation, this default behavior can be overrided by defining +`getDescriptorFromHandle` in the user code. Below is a possible user-space implementation of `getDescriptorFromHandle` +for Vulkan: + +```slang + +// All texture and buffer handles are defined in descriptor set 100. +[vk::binding(0, 100)] +__DynamicResource<__DynamicResourceKind.General> resourceHandles[]; + +// All sampler handles are defined in descriptor set 101. +[vk::binding(0, 101)] +__DynamicResource<__DynamicResourceKind.Sampler> samplerHandles[]; + +export T getDescriptorFromHandle(DescriptorHandle handle) where T : IOpaqueDescriptor +{ + if (T.kind == ResourceKind.Sampler) + return samplerHandles[((uint2)handle).x].asOpaqueDescriptor(); + else + return resourceHandles[((uint2)handle).x].asOpaqueDescriptor(); +} +``` + +The user can call `defaultGetDescriptorFromHandle` function from their implementation of `getDescriptorFromBindlessHandle` to dispatch to the default behavior. + +### Uniformity + +By default, the value of a `DescriptorHandle` object is assumed to be dynamically uniform across all +execution threads. If this is not the case, the user is required to mark the `DescriptorHandle` as `nonuniform` +*immediately* before dereferencing it: +```slang +void test(DescriptorHandle t) +{ + nonuniform(t)->Sample(...); +} +``` + +If the descriptor handle value is not uniform and `nonuniform` is not called, the result may be +undefined. + +### Combined Texture Samplers + +On platforms without native support for combined texture samplers, we will use both components of the +underlying `uint2` value: the `x` component stores the bindless handle for the texture, and the `y` component stores the bindless handle for the sampler. + +For example, given: + +```slang +uniform DescriptorHandle s; +void main() +{ + float2 uv = ...; + s.SampleLevel(uv, 0.0); +} +``` + +The Slang compiler should emit HLSL as follows: + +```hlsl +uniform uint2 s; +void main() +{ + float2 uv = ...; + Texture2D(ResourceDescriptorHeap[s.x]).SampleLevel( + SamplerState(SamplerDescriptorHeap[s.y]), + uv, + 0.0); +} +``` + +## Alternatives Considered + +We initially considered to support a more general `DescriptorHandle` where `T` can be any composite type, for example, allowing the following: + +```slang +struct Foo +{ + Texture2D t; + SamplerState s; + float ordinaryData; +} + +uniform DescriptorHandle foo; +``` + +which is equivalent to: + +```slang +struct Bindless_Foo +{ + DescriptorHandle t; + DescriptorHandle s; + float s; +} +uniform Bindless_Foo foo; +``` + +While relaxing `T` this way adds an extra layer of convenience, it introduces complicated +semantic rules to the type system, and there is increased chance of exposing tricky corner +cases that are hard to get right. + +An argument for allowing `T` to be general composite types is that it enables sharing the same +code for both bindless systems and bindful systems. But this argument can also be countered by +allowing the compiler to treat `DescriptorHandle` as `T` in a special mode if this feature is found to be useful. + +For now we think that restricting `T` to be an `IOpaqueDescriptor` type will result in a much simpler implementation, and is likely sufficient for current needs. Given that the trend of modern GPU architecture is moving towards bindless idioms and the whole idea of opaque handles may disappear in the future, we should be cautious at inventing too many heavy weight mechanisms around opaque handles. Nevertheless, this proposal still allows us to relax this requirement in the future if it becomes clear that such feature is valuable to our users. + +In the initial version of this propsoal, `DescriptorHandle` is named `Bindless`. During discussion, we determined that this naming can be confusing to users who are coming from general GPU compute community and haven't heard of the term "bindless resources". We believe `DescriptorHandle` is a better name because it reflects the essense of the type more accurately, and is consistent with D3D12 terminology in that `DescriptorHandle` is the shader side representation of the `D3D12_GPU_DESCRIPTOR_HANDLE` structure. + +The initial version of the proposal defines `DescriptorHandle` to be backed by an 8-byte integer value independent of the target. This is changed +so that Slang only guarantees `DescriptorHandle` to be a phyiscal data type, and will have target-dependent size. Slang guarantees that `DescriptorHandle` +will be lowered to a `uint2` value when targeting HLSL, GLSL and SPIRV, but not on other targets. This is because on targets where `T` is already a +phyisical type, their size can vary and may not fit in an 8-byte structure. For example, `StructuredBuffer` maps to a `{T*, size_t}` structure when +targeting CUDA, which takes 16 bytes. In the meanwhile, forcing `DescriptorHandle` to be `uint64_t` makes the feature unusable for lower-tier hardware +where 64-bit integers are not supported. Representing the handle with `uint2` allows the feature to be used without requiring this additional +capability. + +The initial proposal also reserves a value for invalid/null handle. This is removed because we cannot find +a safe value that won't be used across all targets we support. In particular, this is not possible on CUDA +and Metal because it is not possible to interpret these handles as plain integers. + +## Conclusion + +This proposal introduces a standard way to achieve bindless parameter passing idom on current graphics platforms. +Standardizing the way of writing bindless parameter binding code is essential for creating reusable shader code +libraries. The convenience language features around `DescriptorHandle` type should also make shader code easier to write +and to maintain. Finally, by using Slang's link time specialization feature, +this proposal allows Slang to not get into the way of dicatating one specific way of passing +the actual descriptor handles to the shader code, and allows the user to customize how the conversion from integer handle +to descriptor handle is done in a way that best suites the application's design. \ No newline at end of file diff --git a/proposals/016-slangpy.md b/proposals/016-slangpy.md new file mode 100644 index 0000000..baf28e4 --- /dev/null +++ b/proposals/016-slangpy.md @@ -0,0 +1,366 @@ +# SP#016: SlangPy + +## Status + +Author: Chris Cummings, Benedikt Bitterli, Sai Bangaru, Yong He + +Status: Design Review + +Implementation: + +Reviewed by: + +## Background + +With first class support of automatic differentiation, Slang makes it easy to write differentiable SIMT kernel code that seamlessly interops with GPU accelerated graphics features such as ray tracing. Slang has the potential to bridge traditional graphics applications with machine learning techniques, to help researchers and developers in both the Graphics and the AI communities to rapidly prototype and deploy high-performance, reusable SIMT kernel code for their neural applications. + +However, making kernel code easy to write is half the story. Today slang users who want to use slang from within python, e.g. to train a neural graphics MLP, have to write thousands of lines of glue code to expose Slang and simple graphics apis to PyTorch. We want to make it effortless to get going with neural graphics algorithms, so we want to remove the need for this boilerplate. + +## Goals + +Since most of the AI community is on Python, we propose to introduce a Python package called `slangpy` that allows python code to call Slang shaders easily. + +It is worth noting before going into SlangPy goals further, we actually tried a previous version of Slang<->Python binding in the SlangTorch project. The design here is highly influenced by learnings there — those learnings are summarized below, but broadly, we want to expose the full range of Slang features to python, allowing python code to run all forms of Slang code not just on CUDA, but on all target graphics APIs that Slang supports, without a hard dependency on PyTorch. +With that in mind, the SlangPy package is to enable rapid prototyping of graphics/machine learning applications with Slang being the kernel language. To achieve this goal, SlangPy should offer the following features: + +- Python abstraction of GPU features, with simple to use functions and classes to initialize and control GPU execution. +- Hide unnecessary low level graphics API concepts such as command encoding and parameter passing, and allow use of the graphics pipeline with a couple of lines of python code instead of thousands of lines of C++ code. +- Allow Slang code to be called from Python as directly as possible through automatic boilerplate generation to allow calling any slang function on a tensor/array domain. +- Support authoring differentiable renderers by allowing automatic differentiation of graphics shader entrypoints for both rasterization and raytracing pipelines. +- Interop with PyTorch without hard dependency on PyTorch. Available for users who need to work with PyTorch, but for those who don't, there is no need to install pytorch to use SlangPy. + +## Proposed Approach + +The key ideas behind `slangpy` are: + +- We use Slang's reflection API to understand the structure of Slang code, and expose the Slang functions as callable functions inside python objects. +- When loading a Slang module into Python, you connect that module to a SlangPy device — this device is where the slang functions should run when called, and allows you to specify e.g. which gpu, which api [Vulkan, OptiX, CUDA] etc. +- SlangPy's binding model aims to make simple, unambiguous calls very easy, with little to no extra syntax. However for complex or ambiguous calls, a 'map' method is provided that allows the user to be explicit about how to vectorize a function and resolve any ambiguities. +- The execution model of SlangPy is that codegen and shader compilation for a given function all happens at the function call time. We then cache the generated code on disk so that subsequent runs have minimal overhead. + + +We will develop of SlangPy in a parallel GitHub repo to slang, named slangpy. This is because we would like SlangPy to have a different development and release cadence from the slang compiler, and we want to avoid having slang users who are building from source pull in unrelated dependencies if they are not using SlangPy. + +In keeping with Slang's general feature process, the SlangPy feature set will be treated as experimental - and may stay this way for quite some time. This is because the space is fairly dynamic, and we expect as we try to use this with real developers, the interfaces may change quite a lot. + +### Calling Slang Entrypoints From Python +The most basic level of service that SlangPy provides is to allow Python code to run shaders written in Slang. For example, given an `add` function in Slang: + +```hlsl +// A simple function that adds two numbers together +float add(float a, float b) +{ + return a + b; +} +``` + +The user can define a wrapping compute shader entrypoint to run the `add` function on two input buffers, and store the result in an output buffer: + +```hlsl +// example.slang + +[numthreads(128, 1, 1)] +void addBuffer(NDBuffer a, NDBuffer b, NDBuffer result, int i : SV_DispatchThreadID) +{ + result[i] = add(a[i], b[i]); +} +``` + +Here the `NDBuffer` type is defined by SlangPy to represent a `N`-dimensional structured buffer of element type `T`. + +With SlangPy, the user can run this `addBuffer` compute shader with a few Python calls: + +```python +# example.py + +import slangpy as spy + +# Create a slangpy GPU execution context +device = spy.init_device() + +# Load a slang the module +module = spy.Module.load_from_file(device, "example.slang") + +# Create numpy arrays +a = numpy.random.rand(1024) +b = numpy.random.rand(1024) +r = numpy.zeros(1024); + +# Create buffers for kernel call +bufferA = spy.NDBuffer(device, shape = (1024,)) +bufferB = spy.NDBuffer(device, shape = (1024,)) +bufferRs = spy.NDBuffer(device, shape = (1024,)) + +# Populate buffer contents +bufferA.from_numpy(a) +bufferB.from_numpy(b) +bufferRs.from_numpy(r) + +# Call the function and print the result +module.addBuffer.dispatch(1024, bufferA, bufferB, bufferRs) +print(bufferRs) +``` + +As you can see, calling Slang code using raw dispatch requires the user to write a lot of boilerplate code in both Slang and Python. To simplify this task, SlangPy also provides a next level of services: marshaling of common Python types and automatic wrapping of functions into kernels, to eliminate the need to explicitly define the compute shader entry point in Slang, and to create buffers manually in Python. + +### Calling Slang Functions Directly From Python + +The following code achieves the same `add` function, without writing boilerplate for the compute shader entrypoint, or buffer marshalling in Python. + +First, our slang module needs to provide a definition for `add`, but no need to define `addBuffer` entrypoint. + +```hlsl +// example.slang + +// A simple function that adds two numbers together +float add(float a, float b) +{ + return a + b; +} + +``` + +And we will allow this function to be called from Python directly: + +```python +import slangpy as spy + +# Create a SlangPy GPU execution context +device = spy.init_device() + +# Load a slang the module +module = spy.Module.load_from_file(device, "example.slang") + +# Call the function and print the result +pa = numpy.random.rand(1024) +pb = numpy.random.rand(1024) + +result = module.add(pa, pb) +print(result) +``` + +When the user calls `add` directly in this style, SlangPy will use Slang's reflection API to find out the parameter types of `a` and `b`, and infer how to wrap the function into a kernel with the runtime type of Python variables `pa` and `pb`. Since the `add` is taking two float scalar parameters, and the user is passing two numpy arrays, SlangPy knows it needs to generate a wrapper kernel entrypoint that dispatches the `add` function over a thread grid of size 1024, and create temporary buffers to hold the input and output contents. + +### Device Object +The user starts with creating a SlangPy `device`. This represents a GPU execution context from where you can load slang modules and run Slang code on a GPU device. You can specify what target API you want to run the Slang code on. Currently we support creating a D3D12, Vulkan or CUDA+Optix backed device. + +### Type Marshaling + +SlangPy generates code that marshals between Slang and Python. The binding model we use is designed to make the common case easy, without requiring boilerplate. When the binding code detects ambiguity, then the call will fail and require the user to make a more explicit call. +The result is that we are aligned on a principle to make the common case easy, but don't try to be clever when there is ambiguity, and the user is required to call `map()` on the function to explicitly map buffer dimensions to parameters. + +#### Marshaling of basic types + +A basic scalar integer or float can be passed to a slang function as is: +```hlsl +// test.slang +void test(int x, float y) { … } +``` + +```python +# test.py +module = spy.Module.load_from_file(device, “test.slang”) + +module.test(5, 6.0) // OK, passing a basic scalar types as is. +``` + +#### Marshaling of array, vector and matrix types + +You can also map Python and numpy arrays to vectors, matrices, or arrays: +``` +// test.slang +void test1(float3 x) {} +void test1(float3 x[2]) {} +void test2(float3x3 x) {} +``` + +``` +# test.py +module = spy.Module.load_from_file(device, “test.slang”) + +module.test1([5, 6, 7]) // OK, passing a matching size array to a vector. +module.test2([[1,2,3],[5, 6, 7]]) // OK, passing matching size array to a vector array. +module.test2(numpy.zeros(3,2)) // OK, passing a numpy array of matching dimensions. +module.test3(numpy.zeros(3,3)) // OK, passing a numpy array of matching dimensions. +``` + +#### Marshaling of a struct types + +SlangPy should allow users to pass a python dictionary to a function parameter that accepts a struct: +```hlsl +// test.slang +struct MyType { int x; float y; } +void test(MyType v) { ... } +``` + +```python +# test.py +module = spy.Module.load_from_file(device, “test.slang”) + +module.test({"x":5, "y":6.0}) # OK, passing a tuple to a struct type. +``` + +In case of ambiguity, the user can be explicit from the python side what Slang type should the dictionary be converted to, before used as argument: + +```python +# test.py +module.test({'_type': 'MyType', 'x': 5, 'y': 6.0}) +``` + +### NDBuffer +SlangPy provides an `NDBuffer` type to represent a device-side buffer that can hold any data: + +A buffer can be created by calling the `NDBuffer` constructor: + +```python +image_1 = spy.NDBuffer(device, dtype=module.Pixel, shape=(16, 16)) +image_2 = spy.NDBuffer(device, dtype=module.Pixel, shape=(16, 16)) +``` + +The buffer can be initialized from a numpy array: + +```python +image_2.from_numpy(0.1 * np.random.rand(16 * 16 * 3).astype(np.float32)) +``` + +Buffers can be used to call Slang functions directly: + +```python +result = module.add(image_1, image_2) +``` + +In this scenario, `module.add` is the Python side object that maps to the `add` function in `example.slang`. Recall that `add` is defined as: + +```hlsl +float add(float a, float b) +{ + return a + b; +} +``` + +When this Slang function is called with `a` and `b` being buffers, SlangPy will automatically generate a compute shader entrypoint that dispatches to `add` for every corresponding element in buffer `a` and `b`, and then invokes the generated compute shader. This is called automatic __broadcasting__. SlangPy will also automatically allocate a new buffer to hold the result of the computation and return the buffer object from the call to `module.add`. +If a more complex pattern of buffer element to execution thread mapping is required, SlangPy provides the 'map' function to be explicit about what argument dimensions map to what execution threads (kernel dimensions). + +On the Python side, we also provide a `NDDifferentiableBuffer` type that acts as a pair of `NDBuffer`s of the same shape, to hold both the primal and the gradient values. A `NDDifferentiableBuffer` object can be passed directly to a `DifferentialPair` typed parameter in Slang. + +### Function Objects + +For each Slang function, SlangPy will create a python side function object during loading of the slang module, which will allow users to invoke from python. + +The function object provides methods and properties to allow invoking the slang function in various ways. + +Specifically, a function object provides: + +- `call(args...)`: Call the function with a given set of arguments. This will generate and compile a new kernel if need be, then immediately dispatch it and return any results. +- `append_to(args...)`: Similar to call, but appends the dispatch to a command list for future submission rather than immediate dispatch. +- `constants(dict[str,any])`: specify link time constants to specialize the function. +- `set()`: set additional global uniform parameters. +- `map()`: specifies how to map each argument dimensions to kernel dimensions. +- `bwds` property: returns a function object for the backward derivative propagation function. +- `return_type()`: specify the desired return type of the function when called from python. + +### Interop With Pytorch + +Slangpy supports creating a NDBuffer from a pytorch tensor: + +```python +t = torch.tensor(...) +buffer = spy.NDBuffer(t) +``` + +And you can create a PyTorch tensor from an NDBuffer: + +```python +torchTensor = buffer.torch() +``` + +### Example: Training An MLP with SlangPy + +Along with the core functionalities of SlangPy, we should also ship a MLP library in SlangPy's release package, to make using and training MLPs easy. + +You can write a slang function that uses an MLP to compute a value: +```slang +// example.slang +import mlp; + +[Differentiable] +float[8] compute(MLP<32,8> mlp, float input[32]) +{ + return mlp.run(input); +} +``` +And you can call `compute` from python as: + +```python +import slangpy as spy + +device = spy.init_device(); +module = device.load_module_from_file(“example.slang”) + +N = 128 +inputTensor = spy.Tensor.numpy(device, numpy.random.rand(N, 32).astype(numpy.float32)) +outputTensor = spy.Tensor.numpy(device, numpy.random.rand(N, 8).astype(numpy.float32)) +outputTensor = module.compute(mlp_weights, inputTensor, _result=Tensor) +``` + +To train an MLP, you can define a loss function in Slang, and call the backward derivative of `loss` and use it in a training loop: + +```slang +// example.slang + +[Differentiable] +float loss(MLP<32,8> mlp, float input[8], float target[8]) +{ + float result = 0.0; + let output = compute(mlp, input); + for (int i = 0; i < 8; i++) result += square(output[i], target[i]) + return result; +} + +void updateWeights(inout float param, float grad, float step) +{ + param -= grad * step; +} +``` + +In python code, you can invoke the backward derivative of `loss` with `module.loss.bwds`: + +```python +# example.py +from slangpy.types import Tensor + +max_iterations = 1e5 +mlp_weights = Tensor.from_numpy(device, numpy.random.rand(32, 8).astype(numpy.float32)) +mlp_weights = mlp_weights.with_grads() + +lossBuffer = Tensor.zeros(device, dtype=module.float, shape=(N, )) +lossGrad = Tensor.from_numpy(device, numpy.ones(shape=(N, ), dtype=numpy.float32)) +lossBuffer = lossBuffer.with_grads(grad_in=lossGrad) + +targetOutput = Tensor.from_numpy(device, numpy.random.rand(N, 8).astype(numpy.float32)) + +learning_rate = 1e-4 + +for iter in range(iterations):: + mlp_weights.grad_out.clear() + module.loss.bwds(mlp_weights, inputTensor, targetOutput, _result=lossBuffer) + + module.updateWeights(mlp_weights.flatten_dtype(), mlp_weights.grad_out.flatten_dtype(), learning_rate) + + currentLoss = lossBuffer.to_numpy().mean() + + print(f'Iteration {iter}. Loss:{currentLoss}') + +``` + +## Learnings from Slang Torch + +We currently provide a slang-torch python package to allow Slang code to beused as a PyTorch kernel. Slang-torch also provides python functions to load a slang module, and call an exported Slang function from python. + +The main feedback we received from slang-torch users is that it still requires developers to write a substantial amount of boilerplate both in Slang to wrap ordinary functions in entrypoints, and in Python to setup the parameters for the kernel call. These boilerplate setups slow down iteration. + +Another issue with slang-torch is that it is tightly coupled with PyTorch tensors. For users who are not working with PyTorch, this dependency is very inconvenient. + +Finally, slang-torch uses CUDA as the only execution target, while SlangPy is cross-platform and can run using D3D, Vulkan and CUDA/Optix as the underlying API to bring access to a much wider range of GPU features. + +This proposal aims at addressing these issues in slang-torch, and make it truly effortless to use Slang from Python. diff --git a/proposals/017-shader-record.md b/proposals/017-shader-record.md new file mode 100644 index 0000000..abd16a8 --- /dev/null +++ b/proposals/017-shader-record.md @@ -0,0 +1,371 @@ +SP #017: Disambiguate `uniform` via `[[push_constant]]` and `[[shader_record]]` +======================================================================= + +This document proposes a first step toward resolving ambiguities and bugs that +arise from the current usage of `uniform` (and to a lesser extent, `varying`) +in Slang, particularly in ray tracing pipelines. + + + +It suggests promoting Vulkan's `[[push_constant]] uniform T` and +`[[shader_record]] uniform T` global annotations to more universal `[[push_constant]] T` and +`[[shader_record]] T` entrypoint parameter annotations. These annotations can then be used +to unify compile-time mappings across APIs like Direct3D, Vulkan, OptiX, etc. These +new annotations act as clearer, more consistent markers of data binding points, +compared to the over-loaded semantics of `uniform` parameters today. + +Additionally, this document proposes annotations to more clearly distinguish between +payload and hit attribute entry point parameters, extending the use of `[[payload]] T` and +by introducing `[[hit_attribute]] T` respectively. This would allow users to more explicitly +mark payload parameters as being constant, eg `void anyhit([[payload]] in MyConstPayload p)`, +where current usage, `void anyhit(in MyConstPayload p)`, would otherwise incorrectly map +`p` to hit attribute, and `void anyhit(MyConstPayload p)` would map `p` to the shader record. + +Status +------ + +**Status**: Design Review + +**Implementation**: +No implementation yet; to be proposed in future PR(s). + +**Author**: +*Nate Morrical* + +**Reviewer**: +*Tess Foley, Slang Team & Community* + +Background +---------- + +Historically, shading languages like RenderMan Shading Language (RIB) had a +relatively simple distinction between `uniform` values (meaning, constant over +a surface) and `varying` values (those that vary per-vertex or per-fragment). +In more modern GPU pipelines, however, the notion of what is "uniform" and what +"varies" has become nuanced---shaders run at different rates (e.g., per-thread, +per-warp, per-thread-group, per-intersected-geometry, recursion depth, etc.), +and these concepts vary further depending on the API (Vulkan vs. Direct3D) and the +pipeline stage (vertex, fragment, compute, or ray tracing). + +In ray tracing specifically, the term `uniform` has an additional burden: +- For `raygen` or `closesthit` or `anyhit` or `intersection` shaders, + parameters marked `uniform` are mapped to the per-shader-record data + (in Vulkan, via the *shader binding table* record). +- For compute shaders or other pipelines, the same `uniform` today + maps to push constants (e.g., Vulkan's push constants or D3D root constants). + +This has led to confusion and mismatch in portability: +- In Vulkan, certain rapidly-changing parameters go into push constants, while other, + more fixed parameters (eg vertex/index buffer pointers) go into the shader record. + When mixed entry points are present (for example, a compute shader animating + triangle vertex positions, followed by subsequent RT entrypoints to render that + mesh), what uniforms map to where is inconsistent. + - Additionally, with mixed entry point setups, (iiuc) there is no way to leverage + Vulkan's per-entry-point-type push constants mechanism for ray tracing entry points. + Raygen push constants map to closest hit push constants, which map to closest hit, etc... + Instead of pushing constants to a specific stage type, users today must instead + update all SBT entries, eg for all closest hits, to achieve the same expected behavior. +- In Direct3D, by contrast, these concepts do not have direct 1:1 equivalents + in HLSL or DXIL. Instead, developers simulate push constants/shader-record + data via constant buffer (`ConstantBuffer`) or resource descriptors, + with CPU-side code controlling the *root signature* (including local root + signatures for ray tracing). + +Certain annotations today currently lead to undefined behavior and system crashes. +- For example, consider the following, where annotations are supplied but ignored. + ``` + [shader("raygeneration")] + void simpleRayGen([push_constants] ConstData params, [shader_record] RayGenData record) { ... } + ``` + - This causes undefined behavior, as both implicitly add the keyword `uniform`, then both map to the shader record, and + annotations are otherwise ignored. + +Moreover, the keywords `uniform` and `varying` do not convey the actual +"rate" or "lifetime scope" well here. One developer's "uniform" might need to be +"per-warp" or "per-thread" in another context. + +Beyond this, the subtle distinction in default behavior between `in T` and `inout T` on ray tracing +entrypoints can quickly lead to undefined behavior, where the user's intention to mark a +payload value as constant instead remaps the payload to either hit attribute registers, or the shader +binding table. + + +Related Work +------------ +In the original **RenderMan Shading Language** (RSL), the syntax of a declaration was +`[class] [type] [ "[ n ]" ]` where class may be `constant,`, `uniform`, `varying`, +or `vertex`. For traditional shaders, `uniform` and `varying` had very precise meanings +rooted in how RenderMan handled per-primitive or per-sample shading. Pixar's documentation +and the RenderMan specification at the time defined them as: +- `uniform` variables are those whose values are constant over whatever portion of the + surface begin shaded, while `varying` variables are those that may take on different values at different locations on the + surface being shaded. + > For example, shaders inherit a color and a transparency from the graphics state. + These values do not change from point to point on the surface and are thus uniform variables. + Color and opacity can also be specified at the vertices of geometric primitives (see Section 5, + Geometric Primitives). In this case they are bilinearly interpolated across the surface, and + therefore are varying variables. [RISpec, Section 11](https://hradec.com/ebooks/CGI/RPS_13.5/prman_technical_rendering/users_guide/RISpec-html/section11.html) +- As the language evolved, ambiguities emerged with respect to the "rate" that values vary. + For example, `facevarying` was added in a [subsequent revision](https://hradec.com/ebooks/CGI/RPS_13.5/prman_technical_rendering/users_guide/RISpec-html/appendix.I.html) + to disambiguiate certain interpolation behaviors for subdivision surfaces. + > Associated with each geometric primitive definition are additional primitive variables that + are passed to their shaders. These variables may define quantities that are constant over + the surface (class constant), piecewise-constant but with separate values per subprimitive + (class uniform), bilinearly interpolated (class varying and facevarying), or fully interpolated + (class vertex). If the primitive variable is uniform, there is one value per surface facet. + If the primitive variable is varying, there are four values per surface facet, one for each corner + of the unit square in parameter space (except polygons, which are a special case). On parametric + primitives (quadrics and patches), varying primitive variables are bilinearly interpolated across + the surface of the primitive. Colors, opacities, and shading normals are all examples of varying + primitive variables. If a primitive variable is facevarying it will be linearly interpolated. + [More here](https://hradec.com/ebooks/CGI/RPS_13.5/prman_technical_rendering/users_guide/RISpec-html/section5.html#primitive%20variables) + +As shading languages evolved, the rate to which something varies or remains uniform has continued to grow +in complexity. +- **GLSL**: Introduced specialized input/output qualifiers for geometry, + tessellation, etc., but only partially addresses the nuance of multiple + compute or ray tracing rates, let alone mixed entry points. +- **HLSL**: In Direct3D 12, "root constants" and "local root signatures" can + emulate Vulkan's push constants and shader record. HLSL, however, does not + have a built-in type or keyword that designates "this is root-constant data." +- **Slang**: Currently allows global-scope `cbuffer` or `ConstantBuffer` with + attributes like `[[vk::push_constant]]`, or `[[vk::shader_record]]` to map to + Vulkan's push constants and the shader binding table. However, having a plain + `uniform` parameter in an entry point can become ambiguous when compiling + to multiple backends or mixing multiple entry points in the same Slang module. + Global attributed uniform buffers prevent more localized and reduced usage of + per-dispatch constant values. + +The following attempts to disambiguate the two overloaded uses of `uniform` that +Slang users face today. + +Proposed Approach +----------------- + + +1. **Extend the use of `[[push_constant]] T` to entry point parameters** + We would extend the use of `[[push_constant]]` to become semantically equivalent to: + ``` + [[vk::push_constant]] ConstantBuffer + ``` + + This attribute would be used like so: + + ``` + [shader("raygen")] + void entrypoint_1([[push_constant]] FirstParams p) {...} + + [shader("closesthit")] + void entrypoint_2([[push_constant]] SecondParams p) {...} + ``` + + Note, usage of this annotation would allow users to dop the keyword `uniform` entirely, in favor + of the more rate-specific binding nomenclature. + + For targets that do not have a first-class notion of push constants, + Slang would map types marked with this annotation to a normal constant buffer + or equivalent. By extending the use of this annotation, we give developers a clear, + explicit signal in their Slang code that a given parameter is intended to be + "per dispatch/draw" or "root constant" data. + + Additionally, this resolves ambiguities where `[[push_constant]]` parameters proceed + shader record parameters, leading to undefined behavior when both are unintentionally mapped to the + shader record. + + From here, we would amend the comment, + > Mark a global variable as a Vulkan push constant. + which appears in Slang's vscode extension to reflect the updated usage, and intended universal + behavior across targets. The annotation being universal would then signal that behavior is well defined across + all possible backends, rather than specifically to Vulkan. + + +2. **Extend the use of `[[shader_record]] T`** + Likewise, we would extend the use of `[[shader_record]]` to become systematically equivalent to: + ``` + [[vk::shader_record]] ConstantBuffer + // or + void entrypoint([[vk::push_constant]] uniform T, ...) + ``` + Again, this annotation would drop the keyword `uniform` entirely. + + This annotation clarifies that the data is intended to reside in the shader binding + table record for ray tracing pipelines in Vulkan and in OptiX. For Direct3D 12, Slang + would map this to a local root signature. There, the new annotation is purely a semantic + wrapper around `ConstantBuffer` but with additional reflection information letting + the application code handle it properly across APIs. + +3. **Map all Vulkan Entry-Point `uniform` Parameters to `[[push_constant]]`** + * Define bare `uniform` as `[[push_constant]]` for all entry points, including ray + tracing shaders. This is the most common usage pattern, so I'd argue it makes sense for + this to be default behavior. Ensuring consistency of behavior of `uniform` across all entry + point types when no rate-specific attribute is specified will help disambiguate current + usage in mixed compute and RT entry point setups. Then, encourage developers to declare + typed parameters as `[[push_constant]]` or `[[shader_record]]` to remove ambiguity. + +4. **Enhance Reflection** + Slang's reflection mechanism should expose a distinct "kind" or category for + entry point parameters marked as `[[push_constant]]` vs. `[[shader_record]]`. + The application code can then inspect reflection data to see how the compiler decided to place + each parameter. + +5. **Consolidate Push Constants / Shader Record Data** + * As an optimization, or even a required step in certain backends, multiple + `[[push_constant]]` declarations in a single scope could be fused into one physical + push-constant region at the IR/legalization step. + * Similarly, `[[shader_record]]` annotated declarations for a single entry point could be + linearized into a single region, with each field laid out in memory consecutively. + This design ensures we do not end up with multiple push constants or multiple + local root signatures overshadowing each other. + +6. **Introduce `[[hit_attribute]]`** + A new user-facing annotation that is only legal for ray tracing entry points, and that is + semantically equivalent to: + ``` + void entrypoint(in T param) + ``` + This annotation clarifies that the given parameter is intended to map to the hit attribute + registers assigned by either built-in intersectors or user-geometry intersectors (as opposed + to payload registers with constant usage, or to the shader record when uniform is omitted + entirely). + +7. **Extend `[[payload]]`** + Similar to `[[shader_record]]` and `[[push_constant]]`, we would extend the use of the `[[payload]]` + annotation to clarify that an entry point parameter is intended to map to user-driven push constant registers. + This would be semantically equivalent to: + ``` + void entrypoint(inout T param) + ``` + And would be used like so: + ``` + void entrypoint([[payload]] T param) + ``` + + This annotation clarifies that the given parameter is intended to map to the payload registers + assigned by the user. `in`, `out`, and `inout` would all become legal attributes on the variable, with + `inout` being the default. + + This would resolve the ambiguity regarding `in` and implicit `uniform` incorrectly mapping to hit attributes + and shader records. + + +Detailed Explanation +-------------------- +1. **Language-Level Model** + In a future ideal version of Slang, we might introduce explicit syntax for + specifying data "rates," e.g. `[[thread_group]]`, `[[wave]]`, `[[lane]]`, or more. + This is consistent with advanced GPU programming models that differentiate per-thread, + per-wavefront, per-group, per-dispatch, etc. However, this proposal focuses on + disambiguating push constant uniforms from shader record uniforms, as these are the most + pressing distinction. + +2. **Entry-Point Parameter Rules** + * If an entry point parameter is annotated with `[[push_constant]]`, Slang recognizes it as + data bound as push constants (Vulkan), launch parameters (OptiX), or root constants (D3D). + Any additional `uniform` attribute following `[[push_constant]]` is optional, as `uniform` is + already implied. + * If an entry point parameter is annotated with `[[shader_record]]`, Slang recognizes it as + data bound as part of the local record (Vulkan's and OptiX's SBT, D3D's local root signature). + Similarly, any additional `uniform` attribute following `[[shader_record]]` is optional, as `uniform` is + already implied. + * If the function parameter is labeled `uniform`, but does not specify one of the + above explicit types, the compiler may insert an "implicit" `[[push_constant]]` to optimize + performance based on the target language. + * Default functionality of `uniform` will be to map to `[[push_constant]]` for all entry point + types. + +3. **IR and Reflection Impact** + * For Vulkan, the compiler can remap annotations to the necessary `[[vk::push_constant]]` or + `[[vk::shader_record]]` attributes under the hood. + * For D3D, the compiler would still produce a `ConstantBuffer` or resource + binding in reflection metadata, but with an additional + `slang::TypeReflection::Kind::PushConstant` or `slang::TypeReflection::Kind::ShaderRecord` hint. + This allows app code to unify or alias these buffers with local or global root + signatures. + + +4. **Migration Strategy** + * Existing Slang code using `[[vk::push_constant]] ConstantBuffer` or + `[[vk::shader_record]] ConstantBuffer` remains valid; it just becomes a more + verbose variant of the promoted annotations. + * We would want to legalize the use of `[[vk::push_constant]]` and `[[vk::shader_record]]` + as recognized attributes of entry point parameters. (Today, these compile, but don't seem to + be respected.) + * Legacy usage of bare uniform parameters in ray tracing entry points is rare. Still, + we might want to emit a warning and guidance on migration to more rate-specific annotations + while the change is new. + +5. **Example** +Rather than this: +``` +[shader("anyhit")] +void myAnyHitShader( + in T1 a, + inout T2 b, + T3 c, + uniform T4 d +) {...} +``` + +We would now support the following: + +``` +[shader("anyhit")] +void myAnyHitShader( + [[hit_attribute]] T1 a, + [[payload]] inout T2 b, + [[push_constant]] T3 c, + [[shader_record]] T4 d +) {...} +``` + +* On Vulkan, constant "uniform" data is now compiled to two distinct binding regions: + * `T3` in the push constant region + * `T4` in the shader binding table record +* On D3D, the same code would reflect as two `ConstantBuffer` regions, with +reflection metadata marking them as "intended for push constant" vs. "intended +for shader record." +* `[[payload]]` and `[[hit_attribute]]` distinguish which registers map to `T1` and `T2` +* Any remaining unannotated parameters would default to `[[push_constant]]`, which would match +current behavior with other entry point types. + +It would also now be possible to specify different push constant structures in a mixed ray tracing +entrypoint setup. + +Alternatives Considered +----------------------- +1. Keep Using uniform as is + * We could continue to rely on uniform parameters, automatically mapping them to + different memory regions based on stage. However, this has proven confusing, and has + negative performance implications, especially in complex pipelines mixing ray tracing, + compute, and graphics. It also does not convey the notion of "what is uniform over what," + which is critical in advanced GPU code. + +2. Global Scope or Separate Slang Modules + * One could define separate .slang files for each entry point or set of parameters, + so that only one usage of `[[vk::push_constant]]` or `[[vk::shader_record]]` is + visible at a time. This can work for small projects, but breaks down at scale and + is not a satisfying long-term fix. + +3. `SV_Payload` and `SV_Attributes` + * At one point in time, we had `SV_RayPayload`, which has since been removed + (but still seemingly exists in our OptiX examples?). Technically, both ray payload + and hit attribute register offsets are considered to be "system values", though personally + I find I prefer `([[annotation]] T myT)` over `(T myT : SV_SystemValue)`. + +4. `[[constant]]` rather than `[[push_constant]]` + * "Push Constant" verbage comes from Vulkan, however, other target API like OptiX/CUDA have + similar concepts. In OptiX, such parameters are called "launch parameters". Using a more general + name like `[[constant]]` could help clarify that the attribute is meant for mroe than just Vulkan. + Still, extending `[[push_constant]]` might be a more natural approach, seeing as we already have + this annotation. + +By extending `[[push_constant]]` and `[[shader_record]]` annotations, we offer a clearer +language-level model that reduces the ambiguity around uniform and bridges the gap +across GPU backends and pipeline stages. This proposal is a stepping stone toward a +more comprehensive "rates" system in Slang, while providing immediate, practical +improvements to developers. diff --git a/proposals/018-packed-data-intrinsics.md b/proposals/018-packed-data-intrinsics.md new file mode 100644 index 0000000..dda4124 --- /dev/null +++ b/proposals/018-packed-data-intrinsics.md @@ -0,0 +1,114 @@ +SP #018: Data pack/unpack intrinsics +==================================== + +Adds intrinsics for converting numeric vector data to and from packed unsigned integer values. + +## Status + +Status: Design Review. + +Implementation: N/A + +Author: Darren Wihandi and Slang community. + +Reviewer: Yong He. + +## Background + +Data packing/unpacking intrinsics provide great utility. Slang's core module, which derives from HLSL, already defines integer pack/unpack intrinsics that were introduced in SM 6.6. +Floating-point variants, however, are undefined. Floating-point pack/unpack intrinsics exist as built-in intrinsics in GLSL, SPIRV, Metal, and WGSL but not in HLSL and Slang, and +there is no way to access the intrinsics provided by the other shader language targets. + +## Proposed Approach + +We propose to add new packed-data intrinsics to cover both floating-point and integer types. Although the HLSL integer intrinsics are already implemented, integer variants are also +added to obtain independence from the HLSL specifications and syntax. For floating-point processing, different variants exist that handle conversion between unorm, snorm, and +unnormalized/standard IEEE 754 floats and vectors of 16-bit and 32-bit floats. + +### Floating-point Unpack Functions + +A set of unpack intrinsics are added to decompose a 32-bit integer of packed 8-bit or 16-bit float chunks and reinterpret them as a vector of unorm, snorm, or standard floats and +halfs. Each 8-bit or 16-bit chunk is converted to either a normalized float through a conversion rule or a standard IEEE-754 floating-point. +``` +float4 unpackUnorm4x8ToFloat(uint packedVal); +half4 unpackUnorm4x8ToHalf(uint packedVal); + +float4 unpackSnorm4x8ToFloat(uint packedVal); +half4 unpackSnorm4x8ToHalf(uint packedVal); + +float2 unpackUnorm2x16ToFloat(uint packedVal); +half2 unpackUnorm2x16ToHalf(uint packedVal); + +float2 unpackSnorm2x16ToFloat(uint packedVal); +half2 unpackSnorm2x16ToHalf(uint packedVal); + +float2 unpackHalf2x16ToFloat(uint packedVal); +half2 unpackHalf2x16ToHalf(uint packedVal); +``` +### Floating-point Pack Functions + +A set of pack intrinsics are added to pack a vector of unorm, snorm, or standard floats and halves to a 32-bit integer of packed 8-bit or 16-bit float chunks. Each vector element is converted +to an 8-bit or 16-bit integer chunk through conversion rules, then packed into one 32-bit integer value. + +``` +uint packUnorm4x8(float4 unpackedVal); +uint packUnorm4x8(half4 unpackedVal); + +uint packSnorm4x8(float4 unpackedVal); +uint packSnorm4x8(half4 unpackedVal); + +uint packUnorm2x16(float2 unpackedVal); +uint packUnorm2x16(half2 unpackedVal); + +uint packSnorm2x16(float2 unpackedVal); +uint packSnorm2x16(half2 unpackedVal); + +uint packHalf2x16(float2 unpackedVal); +uint packHalf2x16(half2 unpackedVal); +``` + +### Integer Unpack Functions + +A set of unpack intrinsics are added to decompose a 32-bit integer containing four packed 8-bit signed or unsigned integer values and reinterpret them as vectors of 16-bit or 32-bit integers. +These intrinsics support sign-extension for signed integers and zero-extension for unsigned integers. + +``` +uint32_t4 unpackUint4x8ToUint32(uint packedVal); +uint16_t4 unpackUint4x8ToUint16(uint packedVal); + +int32_t4 unpackInt4x8ToInt32(uint packedVal); +int16_t4 unpackInt4x8ToInt16(uint packedVal); +``` + +### Integer Pack Functions + +A set of pack intrinsics are added to convert a vector of 16-bit or 32-bit signed or unsigned integers into a 32-bit packed representation, storing only the lower 8 bits of each value. +The clamped variants ensure each 8-bit value is clamped to [0, 255] for unsigned values and [-128, 127] for signed values. + +``` +uint packUint4x8(uint32_t4 unpackedVal); +uint packUint4x8(uint16_t4 unpackedVal); + +uint packInt4x8(int32_t4 unpackedVal); +uint packInt4x8(int16_t4 unpackedVal); + +uint packUint4x8Clamp(int32_t4 unpackedVal); +uint packUint4x8Clamp(int16_t4 unpackedVal); + +uint packInt4x8Clamp(int32_t4 unpackedVal); +uint packInt4x8Clamp(int16_t4 unpackedVal); +``` + +### Implementation Details + +#### Normalized float conversion rules +Normalized float conversion rules are standard across GLSL/SPIRV, Metal, and WGSL. Slang follows these standards. The conversion rules for each target are detailed in: +- Section 8.4, `Floating-Point Pack and Unpack Functions`, of the GLSL language specification, which is also used by the SPIR-V extended instrucsion for GLSL. +- Section 7.7.1, `Conversion Rules for Normalized Integer Pixel Data Types`, of the Metal Shading language specification. +- Sections [16.9 Data Packing Built-in Functions](https://www.w3.org/TR/WGSL/#pack-builtin-functions) and [16.10 Data Unpacking Built-in Functions](https://www.w3.org/TR/WGSL/#unpack-builtin-functions) of the WebGPU Shading language specification. + +#### Targets without built-in intrinsics +For targets without built-in intrinsics, the implementation is done manually using a combination of arithmetic and bitwise operations. + +#### Built-in packed Datatypes +Unlike HLSL's implementation, which introduces new packed datatypes (uint8_t4_packed and int8_t4_packed), unsigned 32-bit integers are used directly, and no new packed datatypes are introduced. diff --git a/proposals/019-cooperative-vector.md b/proposals/019-cooperative-vector.md new file mode 100644 index 0000000..0d103ec --- /dev/null +++ b/proposals/019-cooperative-vector.md @@ -0,0 +1,407 @@ +# SP #019: Cooperative Vector + +This proposal introduces support for cooperative vectors in Slang. + +## Status + +Status: Design Review + +Implementation: N/A + +Author: Ellie Hermaszewska + +Reviewer: TBD + +## Background + +Slang supports cooperative vector operations, which are optimized for +matrix-vector multiplies, for example in the acceleration of small neural +network evaluations. This design of this feature is based on the SPIR-V +extension SPV_NV_cooperative_vector. + +Cooperative vectors are a new set of types that, unlike normal vector types, +have arbitrary length and support a limited set of operations. They are +designed to cooperate behind the scenes when performing matrix-vector +multiplies, without requiring fully occupied subgroups or uniform control flow. + +## Semantics + +- Cooperative vectors are logically stored in the invocation they belong to, +but can cooperate behind the scenes for matrix-vector operations. + +- Unlike cooperative matrices, cooperative vectors don't require a fully +occupied subgroup or uniform control flow, although these conditions can +increase performance. + +- The order of arithmetic operations in these functions is +implementation-dependent. The SPIR-V extension specifies that the internal +precision of floating-point operations is defined by the client API. + +- Integer operations used in multiplication are performed at the precision of +the result type and are exact (with the usual wrapping rules). + +- Cooperative vector types can not (yet) be stored themselves in buffers + +## Slang API + +### CoopVec Type + +The core of this feature is the `CoopVec` type: + +```hlsl +struct CoopVec : IArray, IArithmetic +{ + // Zero constructor + __init(); + // Broadcast + __init(T t); + // Coercion + __init(CoopVec other); + // Variadic component-wise constructor for example CoopVec(1,2,3) + __init(expand each U args) where U == T; + + // Array-like access + T __subscript(int index); +} +``` + +For initialization there are several options: + +- zero initialization `CoopVec()` +- broadcast initialization `CoopVec(255)` +- variadic initialization `CoopVec(0, 128, 255)` +- casting initialization `CoopVec(CoopVec())` + +It can be indexed with the subscript operator + +- `CoopVec(1, 28, 546, 9450)[2] == 546` +- `CoopVec(1, 11, 105, 816)[1] = 12` + +Other operations include: + +- binary operators `+`, `-`, `*`, `/`, `%`, these behave as elementwise operations +- unary negation `-` +- comparison operators `==`, `!=`, `<`, `>`, `<=`, `>=`, implementing a lexicographic ordering +- `min`, `max`, `clamp`, `step`, `exp`, `log`, `tanh`, `atan`, `fma`, all operating elementwise + +It's also possible to set values in mutable cooperative vectors with `fill(T +t)` and `copyFrom(CoopVec other)`. + +## Basic Usage + +### Loading and Storing + +Cooperative vectors can by loaded and stored from buffers. + +- `[RW]StructuredBuffer` +- `[RW]ByteAddressBuffer` +- `groupshared` arrays + +> Note that `StructuredBuffers` are not supported for the HLSL backend + +This can be done using the static member function `load(Buffer buffer, int32_t byteOffset)` + +For StructuredBuffers and groupshared the type of the buffer element determines +the cooperative vector element type, note that the offset must be a +multiple of the element stride in bytes. + +```hlsl +StructuredBuffer inputBuffer; + +RWByteAddressBuffer outputBuffer; + +func foo() +{ + int myOffsetInBytes = 64; + // Load a cooperative vector using the type-infering wrapper + let vecA = coopVecLoad<5>(buffer, myOffsetInBytes); + + // Load using the static member function + let vecB = CoopVec<5, int32_t>.load(buffer); // implicit zero offset + + // Perform operations... + let vecC = vec + vecB; + + // Store a cooperative vector + vecC.store(buffer, 128); +} +``` + +The full types are as so: + +```hlsl +CoopVec coopVecLoad(ByteAddressBuffer buffer, int32_t byteOffset = 0); +CoopVec coopVecLoad(RWByteAddressBuffer buffer, int32_t byteOffset = 0); +CoopVec coopVecLoad(StructuredBuffer buffer, int32_t byteOffset = 0); +CoopVec coopVecLoad(RWStructuredBuffer buffer, int32_t byteOffset = 0); +CoopVec coopVecLoad(__constref groupshared const T[M] data, int32_t byteOffset = 0); +``` + +> Be aware that the target platform might impose alignment constraints on the +> offset + +### Matrix Multiplication + +Below is an example of matrix matrix multiplication with bias. + +Matrix multiplication operations (`coopVecMatMul`, `coopVecMatMulPacked`, +`coopVecMatMulAdd` and `coopVecMatMulAddPacked`) perform a matrix-vector +multiply where the vector is treated as a column vector and is +left-multiplied by the matrix. + +> Please take care to make sure that the buffer interpretations are supported +> by your implementation. Not all platforms support all combinations + +The `...Packed` variants are the most general functions, where the user is able +to fully specify the width of the matrix, (although this is strongly dependent +on the `inputInterpretation` parameter). When not using packed inputs, the +matrix width must be equal to the input vector's length, and the +non-`...Packed` variants wrap this common use case. + +```slang +StructuredBuffer inputBuffer; +StructuredBuffer matrixBuffer; +StructuredBuffer biasBuffer; +RWStructuredBuffer outputBuffer; + +func foo() +{ + let vec = coopVecLoad<4>(inputBuffer); + // The result type is determined by the first two generic parameters, in + // this case int32_t and 4, + let result = coopVecMatMulAdd( + vec, + // Matrix buffer interpretation and offset in bytes + CoopVecComponentType::SignedInt8, + matrixBuffer, + 0, + // Bias buffer interpretation and offset in bytes + CoopVecComponentType::SignedInt8, + biasBuffer, + 0, + // Output interpretation + CoopVecComponentType::SignedInt32, + // Matrix transposition + CoopVecMatrixLayout::RowMajor, + false, + // matrix stride + 4 + ); + coopVecStore(result, outputBuffer); +} +``` + +```slang +StructuredBuffer packedInputBuffer; +StructuredBuffer matrixBuffer; +StructuredBuffer biasBuffer; +RWStructuredBuffer outputBuffer; + +func bar() +{ + let packedVec = coopVecLoad<1>(packedInputBuffer); + let k = 4; // 1 * the packing factor + // The result type is still determined by the first two generic parameters, + // in this case int32_t and 4, + let result = coopVecMatMulAddPacked( + vec, + // Matrix buffer interpretation and offset in bytes + CoopVecComponentType::SignedInt8Packed, + k, + matrixBuffer, + 0, + // Bias buffer interpretation and offset in bytes + CoopVecComponentType::SignedInt8, + biasBuffer, + 0, + // Output interpretation + CoopVecComponentType::SignedInt32, + // Matrix transposition + CoopVecMatrixLayout::RowMajor, + false, + // matrix stride + 4 + ); + coopVecStore(result, outputBuffer); +} +``` + +The full types: + +```hlsl +[require(cooperative_vector)] +CoopVec coopVecMatMulPacked< + T : __BuiltinArithmeticType, + let M : int, + let PackedK : int, + U : __BuiltinArithmeticType + >( + CoopVec input, + constexpr CoopVecComponentType inputInterpretation, + constexpr int k, + $(buffer.type) matrix, + int32_t matrixOffset, + constexpr CoopVecComponentType matrixInterpretation, + constexpr CoopVecMatrixLayout memoryLayout, + constexpr bool transpose, + constexpr uint matrixStride = 0 +); + +[require(cooperative_vector)] +CoopVec coopVecMatMul< + T : __BuiltinArithmeticType, + let M : int, + let K : int, + U : __BuiltinArithmeticType + >( + CoopVec input, + constexpr CoopVecComponentType inputInterpretation, + $(buffer.type) matrix, + int32_t matrixOffset, + constexpr CoopVecComponentType matrixInterpretation, + constexpr CoopVecMatrixLayout memoryLayout, + constexpr bool transpose, + constexpr uint matrixStride = 0 +); + +[require(cooperative_vector)] +CoopVec coopVecMatMulAddPacked + ( + CoopVec input, + constexpr CoopVecComponentType inputInterpretation, + constexpr int k, + $(buffer.type) matrix, + int32_t matrixOffset, + constexpr CoopVecComponentType matrixInterpretation, + $(buffer.type) bias, + int32_t biasOffset, + constexpr CoopVecComponentType biasInterpretation, + constexpr CoopVecMatrixLayout memoryLayout, + constexpr bool transpose, + constexpr uint matrixStride = 0 +); + +[require(cooperative_vector)] +CoopVec coopVecMatMulAdd< + T : __BuiltinArithmeticType, + let M : int, + let K : int, + U : __BuiltinArithmeticType + >( + CoopVec input, + constexpr CoopVecComponentType inputInterpretation, + $(buffer.type) matrix, + int32_t matrixOffset, + constexpr CoopVecComponentType matrixInterpretation, + $(buffer.type) bias, + int32_t biasOffset, + constexpr CoopVecComponentType biasInterpretation, + constexpr CoopVecMatrixLayout memoryLayout, + constexpr bool transpose, + constexpr uint matrixStride = 0 +); +``` + +There also exist in-place matrix multiplication accumulation member functions, +following the signatures above. + +- `matMulAccum` +- `matMulAccumPacked` +- `matMulAddAccum` +- `matMulAddAccumPacked` + +### Accumulation Operations + +- The `coopVecOuterProductAccumulate` operation computes the outer product of two +vectors and atomically accumulates the result into a buffer + +- The `coopVecReduceSumAccumulate` operation performs a component-wise atomic +addition of a vector into a buffer. + +```hlsl +void coopVecOuterProductAccumulate< + T : __BuiltinArithmeticType, + let M : int, + let N : int + >( + CoopVec a, + CoopVec b, + $(buffer.type) matrix, + int32_t matrixOffset, + constexpr uint matrixStride, + constexpr CoopVecMatrixLayout memoryLayout, + constexpr CoopVecComponentType matrixInterpretation, +) + +void coopVecReduceSumAccumulate< + T : __BuiltinArithmeticType, + let N : int + >( + CoopVec v, + $(buffer.type) buffer, + int32_t offset +) +``` + +> Note that these operations are not accelerated on HLSL + +### Enums and Constants + +```hlsl +enum CoopVecMatrixLayout +{ + RowMajor, + ColumnMajor, + InferencingOptimal, + TrainingOptimal +}; + +enum CoopVecComponentType +{ + FloatE4M3, + FloatE5M2, + Float16, + Float32, + Float64, + SignedInt8, + SignedInt16, + SignedInt32, + SignedInt64, + SignedInt8Packed, + UnsignedInt8, + UnsignedInt16, + UnsignedInt32, + UnsignedInt64, + UnsignedInt8Packed +}; +``` + +## SPIR-V Translation + +Cooperative vector operations in Slang are directly translated to their +corresponding SPIR-V instructions: + +- `CoopVec` corresponds to `OpTypeCooperativeVectorNV` +- `coopVecLoad` corresponds to `OpCooperativeVectorLoadNV` +- `coopVecStore` corresponds to `OpCooperativeVectorStoreNV` +- `coopVecMatMul` corresponds to `OpCooperativeVectorMatrixMulNV` +- `coopVecMatMulAdd` corresponds to `OpCooperativeVectorMatrixMulAddNV` +- `coopVecOuterProductAccumulate` corresponds to `OpCooperativeVectorOuterProductAccumulateNV` +- `coopVecReduceSumAccumulate` corresponds to `OpCooperativeVectorReduceSumAccumulateNV` + +## HLSL Translation + +The types and operations are lowered to the experimental API described here: +https://confluence.nvidia.com/pages/viewpage.action?spaceKey=DX&title=CooperativeVectors+aka+Neural+Graphics+for+DX + +Please note that SPIR-V is the recommended backend at this time due to +targeting a more stable extension. + +## Translation for other targets + +The `CoopVec` type is lowered to a fixed size array and cooperative operations +are emulated in each thread. diff --git a/proposals/020-stage-switch.md b/proposals/020-stage-switch.md new file mode 100644 index 0000000..d2cd94f --- /dev/null +++ b/proposals/020-stage-switch.md @@ -0,0 +1,93 @@ +# SP#020: `stage_switch` + +## Status + +Author: Yong He + +Status: In Experiment + +Implementation: [PR 6311](https://github.com/shader-slang/slang/pull/6311) + +Reviewed by: Jay Kwak + +## Background + +We need to provide a mechanism for authoring stage-specific code that works with the capability system. For example, the user may want to define a function `ddx_or_zero(v)` that returns `ddx(v)` when called from a fragment shader, and return `0` when called from other shader stages. Without a mechanism for writing stage-specific code, there is no way to define a valid function that can be used from both a fragment shader and a compute shader in a single compilation. + +The user can workaround this problem with the preprocessor: + +``` +float ddx_or_zero(float v) +{ +#ifdef FRAGMENT_SHADER + return ddx(v); +#else + return 0.0; +#endif +} + +[shader("compute")] +[numthread(1,1,1)] +void computeMain() { ddx_or_zero(...); } + +[shader("fragment")] +float4 fragMain() { ddx_or_zero(...); } +``` + +However, this require the application to compile the source file twice with different pre-defined macros. It is impossible to use a single compilation to generate one SPIRV module that contains both the entrypoints. + +## Proposed Approach + +We propose to add a new construct, `__stage_switch` that works like `__target_switch` but switches on stages. With `__stage_switch` the above code can be written as: + +``` +float ddx_or_zero(float v) +{ + __stage_switch + { + case fragment: + return ddx(v); + default: + return 0.0; + } +} + +[shader("compute")] +[numthread(1,1,1)] +void computeMain() +{ + ddx_or_zero(...); // returns 0.0 +} + +[shader("fragment")] +float4 fragMain() +{ + ddx_or_zero(...); // returns ddx(...) +} +``` + +With `__stage_switch`, the two entrypoints can be compiled into a single SPIRV in one go, without requiring setting up any preprocessor macros. + +Unlike `switch`, there is no fallthrough between cases in a `__stage_switch`. All cases will implicitly end with a `break` if it is not written by the user. However, one special type of fallthrough is supported, that is when multiple `cases` are defined next to each other with nothing else in between, for example: + +``` +__stage_switch +{ +case fragment: +case vertex: +case geometry: + return 1.0; +case anyhit: + return 2.0; +default: + return 0.0; +} +``` + +## Alternatives Considered + +We considered to reuse the existing `__target_switch` and extend it to allow switching between different stages. However this turns out to be difficult to implement, if ordinary capabilities are mixed together with stages, because specialization to stages needs to happen at a much later time in the compilation pipeline compared to specialization to capabilities. Using a separate switch allows us to easily tell apart the code that requires specialization at different phases of compilation, and also allow us to provide cleaner error messages. + +## Conclusion + +`__stage_switch` adds the missing functionality from `__target_switch` that allows the user to write stage-specific code that gets specialized for each unique entrypoint stage. This works together with the capability system to provide early type-system checks to ensure the correctness of user code, without requiring use of preprocessor to protect calls to stage specific functions. diff --git a/proposals/README.md b/proposals/README.md new file mode 100644 index 0000000..4dbf076 --- /dev/null +++ b/proposals/README.md @@ -0,0 +1,18 @@ +Proposals +========= + +This directory contains proposals / "RFCs" for Slang language/compiler/system features. +In general, proposals are used for features that are large or complicated enough that the design and/or plan benefits from being discussed in detail before we commit to making code changes. + +## How to make a proposal ## + +1. Copy the template doc, `000-template.md` to a new document. Fill in the details of your proposal, and give the document a descriptive name following the same formatting. Leave the number as `000`. +2. Submit a PR with your proposal doc in the `slang/docs/proposals/ directory` to solicit input from the maintainers of Slang. +3. Integrate feedback and iterate until you have affirmative approval from maintainers. +4. Include maintainer approval information and update the proposal status in your proposal header prior to merge. + +The maintainer accepting the merge should assign the proposal a number. + +Implementation of features large enough to require a proposal doc should not begin until the doc has been accepted and merged. + + diff --git a/proposals/implementation/ast-ir-serialization.md b/proposals/implementation/ast-ir-serialization.md new file mode 100644 index 0000000..cc65c07 --- /dev/null +++ b/proposals/implementation/ast-ir-serialization.md @@ -0,0 +1,286 @@ +Improving AST and IR (De)Serialization +====================================== + +Background +---------- + +Slang supports serialization of modules after front-end compilation. +A serialized module contains a combination of AST-level and IR-level information. + +The AST-level information is primarily concerned with the `Decl` hierarchy of the module, and is used for semantic checking of other modules that `import` it. +The serialized AST needs to record information about types, as well as functions/methods and their signatures. +In principle, there is no need for function *bodies* to be encoded, and any non-`public` declarations could also be stripped. + +The IR-level information is primarily concerned with encoding the generated Slang IR code for the `public` symbols in the module, so that they can be linked with other modules that might reference those symbols by their mangled name. +The serialized IR for a module does *not* encode information sufficient for semantic checking of modules that `import` it. + +Currently, deserialization of the AST or IR for a module is an all-or-nothing operation. +Either the entire `Decl` hierarchy of the AST is deserialized and turned into in-memory C++ objects, or none of it is. +Similarly, we can either construct the `IRInst` hierarchy for an entire module, or none of it. + +Releases of the Slang compiler typically included a serialized form of the core module, and the runtime cost of deserializing this module has proven to be a problem for users of the compiler. +Because parts of the Slang compiler are not fully thread-safe/reentrant, the core module must be deserialized for each "global session," so that deserialization cost is incurred per-thread in scenarios with thread pools. +Even in single-threaded scenarios, the deserialization step adds significantly to the startup time for the compiler, making single-file compiles less efficient than compiling large batches of files in a single process. + +Overview of Proposed Solution +----------------------------- + +The long/short of the proposed solution is to enable lazy/on-demand deserialization of both the AST and IR information for a module. + +Enabling on-demand deserialization requires defining a new on-disk representation for serialized AST and IR modules. +Defining entirely new formats for serialization, implementing them, and then switching over to them will be a large task on its own. +It is recommended that we retain *both* the old and new implementations of serialization in the codebase, until we are confident that we are ready to "flip the switch" and exclusively support the new one. + +In addition to changes related to the new serialized formats, there will of course be changes required to the C++ code that interacts with previously-compiled modules. +This document will attempt to highlight the critical places where logic will need to change for both the AST and IR. + +IR +--- + +We will start by discussing the IR case, because it is simpler than the AST case. +The way the compiler works with Slang IR was designed with the assumption that on-demand deserialization would eventually be desired, so there are (hopefully) fewer thorny issues. + +### Intercept During IR Linking + +We expect that the primary (and perhaps only) point where on-demand deserialization will be needed is when running IR-level linking. + +IR linking takes as input a collection of one or more IR modules, and information to identify one or more public/exported symbols (typically shader entry points) for which target code is desired. +The linker creates a *fresh* `IRModule` for the linked result, and clones/copies IR instructions from the input modules over to the output using a superficially-simple process: + +1. Given an instruction in an input module to be copied over, use an `IRBuilder` on the output module to create a deep copy of that instruction and its children. + +2. Whenever an instruction being copied over references another top-level instruction local to the same input module (that is, one without a linkage decoration), either construct a deep copy of the referenced instruction in the output module, or find and re-use a copy that was made previously. + +3. Whenever an instruction being copied over references a top-level instruction that might be resolved to come from another module (that is, one with a linkage decoration), use the mangled name on the linkage decoration to search *all* of the input modules for candidate instructions that match. Use some fancy logic to pick one of them (the details aren't relevant at this exact moment) and then copy the chosen instruction over, more or less starting at (1) above. + +A key observation is that nothing about these steps actually *cares* if the input module is realized in-memory as a bunch of `IRInst`s, or just as serialized data. +Furthermore, there are only two cases where a top-level instruction in an input module might need to be copied over to the output module: + +* When it is referenced by another instruction inside the same module +* When it is referenced from another module by its mangled name + +So, the long/short of the proposed changes to the C++ code for the IR is to make it so that the input to IR linking is a collection of one or more **serialized** IR modules, and the linker only deserializes the specific top-level instructions that need to be copied over to the output. +Effectively, the linker is just deserializing instructions from the serialized input modules *into* the output module. + +### On-Disk Encoding + +This document will sketch one *possible* on-disk encoding for an IR module that supports on-demand deserialization, but it is not intended to take away the freedom for the implementer to make other choices. + +We propose that the serialized IR should use a section-based format where the entire file can be memory-mapped in and then the byte range of individual sections can be found without any actual deserialization or copying of data. + +Two of the sections in the file will contain: + +* The raw bytes representing serialized IR instruction trees +* An array of *top-level instruction entries* + +Top-level instruction entries can be referenced by index (a *top-level instruction index*). +Each entry specifies the range of bytes in the "raw bytes" section above that encodes that instruction and its operands/children. +We propose that index 0 be used as a special case to indicate a null or missing instruction; the entry at that index should not be used for real data. +The "raw bytes" encoding of a given top-level instruction and its tree of children need *not* support random access. + +When encoding the operands of an instruction in the "raw bytes" section, there must be a way to determine whether an operand refers to another instruction in the same top-level-instruction tree, or refers to *another* top-level instruction tree. +Operands that refer to other top-level instruction trees will store the top-level instruction index, so that the matching entry can be found easily. + +When performing on-demand deserialization, application code can easily maintain a map from the top-level instructio index to the corresponding `IRInst*` to cache and re-use deserialized instructions. +It can even use a flat array of `IRInst*` allocated based on the number of entries, for simplicity. + +Along with the above sections, the serialized format should contain: + +* A *string table* which stores the raw data for any strings in the IR module (including mangled symbol names) and allows strings to be referenced by a simple *string table index* +* A hash-table or other acceleration structure that maps mangled names (as string table indices) to top-level instruction indices. + +> Note: One detail being swept under the rug a bit here is what to do when a module has multiple top-level instructions with the *same* mangled name. +> So long as we retain that flexibility (which we may not need in the presence of `target_switch`), the acceleration structure might have to map from a mangled name to a *list* of top-level instruction indices. + +If we decide that storing a serialized hash-table adds too much complexity, we can instead store a flat array of pairs (string-table-index, top-level-instruction-index) sorted by the content of those strings, and then do a binary search. +Whether we use hashing or a binary search, it would be ideal if looking up a top-level instruction by mangled name did not require deserializing the acceleration structure. + +> Note: Another small detail here is that the serialized format being proposed does not clearly distinguish cases (2) and (3) in the deserialization/linking process described above. +> More or less, the linker would find an operand that references another top-level instruction in the same module, and would then deserialize it on-demand from that module, along the lines of case (2). +> If, after deserialization, we find that the instruction has a linkage decoration, we can jump to step (3) and scan for instructions in *other* modules that match on name. + + +AST +--- + +### Intercept During Lookup and Enumeration + +There are many more parts of the compiler that touch AST structures that *might* be in a partially-deserialized state, so there are far more contact points that will have to be discovered and handled. + +At the most basic, the proposal is to change `ContainerDecl` so that it supports having only a subset of its child declarations (aka "members") loaded at a time. +The two main ways that the child declarations are accessed are: + +* Enumeration of *all* members (or all members of a given class), which involves iterating over the `ContainerDecl::members` array. + +* Lookup of members matching a specific name, which involves using the `ContainerDecl::memberDictionary` dictionary. + +Currently the `memberDictionary` field is private, and has access go through methods that check whether the dictionary needs to be rebuilt. +The `members` field should also be made private, so that we can carefully intercept any code that wants to enumerate all members of a declaration. + +We should probably also make the `memberDictionary` field map from a name to the *index* of a declaration in `members`, instead of directly to a `Decl*`. + +> Note: We're ignoring the `ContainerDecl::transparentMembers` field here, but it does need to be taken into account in the actual implementation. + +There is already a field `ContainerDecl::dictionaryLastCount` that is used to encode some state related to the status of the `memberDictionary` field. +We can update the representation used by that field so that it supports four cases instead of the current three: + +* If `count == members.getCount()`, then the `members` and `memberDictionary` fields are accurate and ready to use for enumeration and lookup, respectively. + +* Otherwise, if `count >= 0`, then the `members` array is accurate, and the `memberDictionary` includes the first `count` members, but not those at or after `count` in the array. + +* If `count == -1`, then the `members` array is accurate, but the the `memberDictionary` is invalid, and needs to be recreated from scratch. + +* If `count == -2`, then we are in a new case where the declaration is being lazily loaded from some external source. + +In the new "lazy-loading" case, any entries in the `memberDictionary` will be accurate, but the absence of an entry for a given `Name*` does *not* guarantee that the declaration has no children matching that name. +The `members` array will either be empty, or will be correctly-sized for the number of children that the declaration has. +The entries in `members` may be null, however, if the corresponding child declaration has not been deserialized. + +We will need to attach a pointer to information related to lazy-loading to the `ContainerDecl`. +The simplest approach would be to add a field to `ContainerDecl`, but we could also consider using a custom `Modifier` if we are concerned about bloat. + +#### Enumeration + +If some piece of code wants to enumerate all members of a given `ContainerDecl` that is in the lazy-loading mode, then we will need to: + +* Allocate a correctly-sized `members` array, if one has not already been created. + +* Walk through each child-declaration index and on-demand load the child declaration at that index (if its entry in `members` is null) + +This is a relatively simple answer, and it is likely that the biggest problems will arise around code that is currently enumerating all members of a container but that we would rather *didn't* (e.g., code that enumerates all `extension`s). + +One potential cleanup/improvement would be to create a unique `Name*` for each kind of symbol that has no name of its own. +E.g., each `init` declaration could be treated as-if its name was `$init`, and so on for `$subscript`, `$extension`, etc. +That change would mean that enumerating all child declarations of certain specific classes is equivalent to *looking up* child declarations of a given name. +We should consider making such a change if/when we see that code is enumerating all declarations and forcing full deserialization where it wasn't needed. + +#### Lookup + +The main place where the `ContainerDecl::memberDictionary` field needs to be accessed is during name lookup. +When looking up a name in a `ContainerDecl` that is in lazy-loading mode, the process would be: + +* If the lookup finds a valid index in `memberDictionary`, then that is the index of the first child declaration with that name (and the others can be found via the `Decl::nextInContainerWithSameName` field). + +* We could potentially use a sentinel value like a `-1` index in `memberDictionary` to indicate that there are definitely no members of that name. + +* Otherwise, we will need to inspect the serialized representation of the given `ContainerDecl` to see if there are not-yet-deserialized members matching that name. + +In that last case, we either find that there were *no* matching members in the serialized data, in which case we could stash a sentinel value in `memberDictionary`, or we find that there *were* one or more members, in which case we should deserialize those members into the `members` array, stash the indices of the first one into `membersDictionary` and then return the resulting (deserialized) members. + +### On-Disk Encoding + +Similar to what is proposed for the IR, we propose to use a section-based format for AST serialization, and that two of the key sections should be: + +* An array of *AST entries* each referenced by an *AST entry index* +* A section for the raw data of each serialized AST entry + +As in the IR case, we propose that an AST entry index of zero be used to represent a null or missing entry. + +Like the IR, the AST has an underlying design where each node has some number of *children*, which it owns, and also some number of *operands*, which it references. +Despite the similarity, the AST structure is more complicated than the IR structure for a few reasons: + +* The operands of one AST node might reference AST nodes from outside the same top-level declaration, but that are not *themselves* top-level declarations (they might be a child of a child of a top-level declaration). + +* While much of the AST structure is made of `Decl`s, there can also be references to `Type`s and `DeclRef`s, etc. Some of the uniformity of the IR ("everything is an `IRInst`") is missing. + +These complications lead to two big consequences for the encoding: + +* The array of *AST entries* will not just contain the entries for top-level `Decl`s. It needs to contain an entry for each `Decl` that might be referenced from elsewhere in the AST. For simplicity, it will probably contain *all* `Decl`s that are not explicitly stripped as part of producing the serialized AST. + +* The array won't even consist of just `Decl`s. It will also need to have entries for things like `DeclRef`s and `Type`s that can also be referenced as operands of AST nodes. + +As a stab at a simple representation, each AST entry should include: + +* A *tag* that defines the subclass of the node (more or less like the tags we use on AST nodes at runtime) + +* A range of bytes in the raw data that holds the serialized representation of that node (e.g., its operands) + +An entry for a `ContainerDecl` should include (whether directly or encoded in the raw data...) + +* A contiguous range of AST entry indices that represent the direct children of the node, in declaration order (the order they'd appear in `ContainerDecl::members`) + +* A contiguous range of AST entry indices that represent all the descendents of the node + +Index `1` in the entry array should probably represent the entire module, and thus establish a root for the entire `Decl` hierarchy and associated ranges. +We should require that a parent declaration is always listed before its children. + +Given the above representation, there is no need to explicitly encode the parent of a `Decl`. +Given an AST entry index for a `Decl`, we can find its parent by recursing through the hierarchy starting at the root, and doing a binary search at each hierarchy level to find the (unique) child declaration at that level which contains that index in its range of descendents. + +When there is a request to on-demand deserialize a `Decl` based on its AST entry index, we would need to first deserialize each of its ancestors, up the hierarchy. +That on-demand deserialization of the ancestors can follow the flow given above for recursively walking the hierarchy to find which declaration at each level contains the given index. + +In order to support lookup of members of a declaration by name, we propose the following: + +* A string table for storing all the strings used in the AST (including names), so that each can be references by a string table index + +* A lookup structure that maps each string table index to a list of *all* `Decl`s in the module that have that name, in sorted order. + +As in the IR case, the lookup structure could be something like a hash table, or it could simply be an array of key-value pairs where the keys are sorted by their string values. + +Given a parent decl `P`, and a name for a member we want to look up in it, the procedure would basically be: + +* Use the lookup structure to find the (sorted) list of AST entry indices for all `Decl`s with the given name + +* Use binary searches on that sorted list to find the subset of indices that are within the range that represent child declarations of `P`. + +> Note: One key detail being glossed over here is when lookup needs to traverse through the "bases" of a type. +> The `InheritanceDecl`s that represent the bases of a type are very similar to "transparent" declarations. +> If we want to support lazy lookup of members of type declarations like `struct`s, we would need to be able to eagerly deserialize the `InheritanceDecl` members, without also deserializing all the others. +> This might be a good use case for the idea pitched above, of giving all unnnamed declarations of a given category a single synthetic name (e.g., `$base`), so that they can be queried by ordinary by-name lookup, which would trigger on-demand deserialization. + +It would be possible to store individual lookup structures as part of the serialized data for each `ContainerDecl`, but the approach given here sacrifices some possible efficiency in the lookup step for the sake of storing less data on each AST node. + +Shared Stuff +------------ + +Some bits of the implementation described here are the same, or have large overlap, between the IR and AST case. +This section describes some points about implementation that could apply to both. + +### String Tables + +It seems reasonable to encode string tables the same way for both IR and AST (and any other parts of Slang that would like to serialize lots of strings). + +As with other structures described above, a string table should probably be split into two pieces: + +* An array of *entries*, one for each string in the table +* A range of raw bytes for the data associated with each entry + +As with other structures, we recommend leaving index 0 unused, to represent a null or absent string (as something distinct from an *empty* string). + +In practice, the string tables created by a compiler like Slang (and *especially* any string tables that might include things like mangled names) will contain many strings with common prefixes. In order to optimize for this case, we can store strings in a structure inspired by a "suffix tree." + +First we collect all of the strings that need to be stored. +Then we perform a lexicogrpahic sort on them. +Then for each string table entry, we store: + +* The size, in bytes, of the *prefix* it shares with the preceding string in the table +* The byte range in the "raw data" area for the data of the string that comes after that prefix + +### Abbreviations + +As discussed above, both the IR and AST can be described in simplified terms as a hierarchy of nodes, where each node comprises: + +* An opcode/tag +* Zero or more *operands*, which are references to other nodes in the hierarchy +* Zero or more *children*, which are uniquely owned by this node + +> Note: The Slang IR representation sticks very zealously to this model, but the AST is a lot more loose as a byproduct of starting as a purely ad hoc C++ class hierarchy. +> Ideally we should be able to serialize the AST *as if* it had a more uniform structure than it really has, but we might also want to do refactoring passes on the in-memory representation of the AST to make it more uniform. + +A serialized module will typically contain a large number of nodes, and there will often be a high degree of redundancy between nodes. +That redundancy can be exploited to reduce the size of each node. + +The basic idea is that rather than explicitly encoding the opcode/tag and number of children, each serialized node would encode an index into a table of *abbreviations* stored as a section of the serialized data. +This idea is loosely based on how abbreviations work in DWARF debug information, although greatly simplified. + +Each entry in the abbreviation table would store: + +* The opcode/tag used by all nodes that are defined with this abbreviation +* (Optionally) zero or more operands that are used as a prefix on the operands of nodes using this abbreviation +* The number of additional operands to read for each node +* (Optionally) some information on the number of children, or on structurally-identical children that all nodes created from this abbreviation should share as a prefix (e.g., a set of IR decorations) + +When deserializing a node, code would read its abbreviation index first, and then look up the corresponding abbreviation to both find important information about the node, and also to drive deserialization of the rest of its data (e.g., by determining how many operands to read before reading in children). + +In cases where the low-level serialization uses things like variable-length encodings for integers, the abbreviations can be sorted so that the most-frequently used abbreviations have the lowest indices, and thus take the fewest bits/bytes to encode. diff --git a/proposals/legacy/001-basic-interfaces.md b/proposals/legacy/001-basic-interfaces.md new file mode 100644 index 0000000..669537b --- /dev/null +++ b/proposals/legacy/001-basic-interfaces.md @@ -0,0 +1,254 @@ +Basic Interfaces +================ + +The Slang core module is in need of basic interfaces that allow generic code to be written that abstracts over built-in types. +This document sketches what the relevant interfaces and their operations might be. + +Status +------ + +In discussion. + +Background +---------- + +One of the first things that a user who comes from C++ might try to do with generics in Slang is write an operation that works across `float`s, `double`s, and `half`s: + +``` +T horizontalSum( vector v ) +{ + return v.x + v.y + v.z + v.w; +} +``` + +A function like `horizontalSum` does not compile because without a constraint on the type parameter `T`, the compiler has no reason to assume that `T` supports the `+` operator. +A new user is often stymied at this point, because no appropriate `interface` seems to exist, and there does not appear to be a way to *define* an appropriate interface. + +As a user gets more experienced with Slang, they may learn how to use `extension`s to define something nearly suitable: + +``` +interface IMyAddable { This add(This rhs); } + +extension float : IMyAddable { float add(float rhs) { return this + rhs; } } +// ... + +T horizontalSum( vector v ) +{ + return v.x.add(v.y).add(v.z).add(v.w); +} +``` + +While that approach works (or should work), it requires a user to know how to use `extension`s and the `This` type, which are complicated even for experienced users. The resulting code is also less readable because it uses `.add(...)` instead of the ordinary `+` operator. + +Many more users end up finding out about the `__BuiltinFloatingPointType` interface, and write something like: + +``` +T horizontalSum( vector v ) +{ + return v.x + v.y + v.z + v.w; +} +``` + +This alternative is much more palatable to users, but it results in them using a double-underscored interface (which we consider to mean "implementation details that are subject to change"). Users often get tripped up when they find out that certain operations that make sense to be available through `__BuiltinFloatingPointType` are not available (because those operations were not needed in the definition of the core module, which is what the `__` interfaces were created to support). + +Related Work +------------ + +There are several languages that have constructs similar to our `interface`s, and which provide built-in interfaces for simple math operations that are suitable for use with the built-in types provided by the language. + +Existing solutions can be broadly categorized based on whether their built-in interfaces are related to semantic/mathematical structures, or are purely about specific classes of operators. + +Haskell and Swift are both examples of languages where the built-in interfaces are intended to be semantic. Haskell provides type classes such as `Additive`, `Ring`, `Algebraic`, `RealTranscendental`, etc. +Swift is similar (although it provides a less complete hierarchy of algebraic structures than Haskell), but also includes more detail amount machine number representations, so that it has `BinaryFloatingPoint`, `FixedWidthInteger`, etc. + +Rust is in the other camp, where it has a built-in interface to correspond to each of its overloadable operators. The `Add` and `Sub` traits allow the built-in `+` and `-` operators to be overloaded for a user-defined type, but impose no implicit or explicit semantic expectations on those operations. + +It may help to describe a concrete example of how the difference between the two camps affects design. The Rust `Add` trait is implemented by the left-hand-side type, and does not constrain the right-hand side or result type of an addition. A Rust programmer may implement `Add` for a type `X` so that `x + ...` expects a right-hand-side operand of some other type `Y` and produces a result of yet *another* type `Z`. Knowing that `X` supports the `Add` trait does *not* mean that it is possible to take the sum of a list of `X`s, because there is no guarantee that `x0 + x1` is valid, or that `X` has a logical "zero" value that could be used as the sum of an empty list. + +In contrast, in Swift a type `X` that conforms to `AdditiveArithmetic` must provide a `+` operation that takes two `X` values and yields an `X`. It also requires that `X` provide a `static` property `zero` of type `X`, to represent its zero value. As a result, it is possible to write a generic function in switch that can compute the sum of a list of `T` values, provided `T` conforms to `AdditiveArithmetic`. + +Proposed Approach +----------------- + +Slang supports operators as ordinary overloadable functions, so the rationale behind the Rust operator traits does not seem to apply. We propose to implement a modest hierarchy of numeric interfaces in the style of Haskell/Swift. + +### Changes to Operator Lookup + +Currently, when Slang encounters an operator invocation like `a + b`, it treats this as more or less equivalent to a function call `+( a, b )`. The compiler looks up `+` in the current lexical environment, and then applies overload resolution to the result of lookup. + +We propose that the rules in that case should be changed so that lookup *also* perform lookup of the operator (`+` in this case) in the context of the static types of `a` and `b`. That change would in theory allow "operator overloads" to be defined as `static` functions within a type they apply to (whether on the left-hand or right-hand side). As a consequence, such a change would also mean that `interface`s could conveniently include operator overloads as requirements. + +### IAdditive + +The `IAdditive` interface is for types where addition, subtraction, and zero have meaning. + +``` +interface IAdditive +{ + // The zero value for this type + static property zero : This { get; } + + // Add two values of this type + static func +(left: This, right: This) -> This; + + // Subtract two values of this type + static func -(left: This, right: This) -> This; +} +``` + +### INumeric + +The `INumeric` interface is for types that are more properly number-like. +Note that this interface does not define division, because the division operations on integers and floating-point numbers are sufficiently different in semantics. + +``` +interface INumeric : IAdditive +{ + // Initialize from an integer + __init< T : IInteger >( T value ); + + // Multiply two values of this type + static func *(left: This, right: This) -> This; +} +``` + +### ISignedNumeric + +Only signed numbers logically support negation (although we all know it also gets applied to unsigned numbers, where it has meaningful and use semantics). + +``` +interface ISignedNumeric : INumeric +{ + // Negate a value of this type + static prefix func -(value: This) -> This; +} +``` + + +### IInteger + +The `IInteger` interface codifies the basic things that a generic wants to be able to access for any integer type. + +``` +interface IInteger : INumeric +{ + // Smallest representable value + static property minValue : This { get; } + + // Largest representable value + static property maxValue : This { get; } + + // Initialize from a floating-point value + // (what rounding mode? round-to-nearest-even?) + __init< T : IFloatingPoint >( T value ); + + + // Integer quotient + static func /(left: This, right: This) -> This; + + // Integer remainder (or is it modulus? or is it undefined which?) + static func %(left: This, right: This) -> This; +} +``` +### IUnsignedInteger + +``` +interface IUnsignedInteger : IInteger +{ + +} +``` + +### ISignedInteger + +The main interesting thing we'd want from a signed integer type is to be able to convert it to the same-size unsigned integer type. + +``` +interface ISignedInteger : IInteger, ISignedNumeric +{ + // Equivalent unsigned type (can always hold magnitude) + associatedtype Unsigned : IUnsignedInteger; + + // Get the magnitude of this value (may not be representable + // as `This` type, if it is `minValue`) + property magnitude : Unsigned { get; }; +} +``` + +### IFloatingPoint + +The `IFloatingPoint` interface provides the minimum of what users expect a floating-point type to support. +It includes the ability to check for special values (not-a-number, infinities), as well as the value of various standard constants. + +``` +interface IFloatingPoint : INumeric, ISignedNumeric +{ + property isFinite : bool { get; } + property isInfinite : bool { get; } + property isNaN : bool { get; } + property isNormal : bool { get; } + property isDenormal : bool { get; } + + // TODO: breaking into magnitude/exponent + + static property infinity : This { get; } + static property nan : This { get; } + static property pi : This { get; } + + // TODO: min/max finite values, smallest non-zero value, etc. + + // Initialize from another floating-point value. + __init< T : IFloatingPoint >( T value ); + + // Floating-point division + static func /(left: This, right: This) -> This; +} +``` + +### ISpecialFunctions + +The `ISpecialFunctions` interface is for floating-point types that also have full support for the standard suite of special functions provided by something like ``. +It is pulled out as a distinct interface from `IFloatingPoint` because many platforms support floating-point types like `double` without also having full support for special functions on those types. + +``` +interface ISpecialFunctions : IFloatingPoint +{ + static This cos(This value); + static This sin(This value); + // TODO: fill this out +} +``` + +Questions +--------- + +### Should these all be `IBuiltin*`? Should we have separate interfaces for built-in and user types? + +The main reason for the current `__Builtin` interfaces is that it allows us to define built-in functions that are generic over those interfaces, but which map to a single instruction in the Slang IR. The relevant operations are not currently defined as + +### What should the naming convention be for `interface`s in Slang? + +These would be the first `interface`s officially exposed by the core module. +While most of our existing code written in Slang uses an `I` prefix as the naming convention for `interface`s (e.g., `IThing`), we have never really discussed that choice in detail. +Whatever we decide to expose for this stuff is likely to become the de facto convention for Slang code. + +The `I` prefix is precedented in COM and C#/.net/CLR, which are likely to be familiar to many developers using Slang. +Because of COM, it is also the convention used in the C++ API headers for Slang and GFX. + +The Rust/Swift languages do not distinguish between traits/protocols and other types. +This choice is intentional, and it might be good to understand the motivation behind it. +At least one potential benefit to not distinguishing such types is that beginning programmers can write code that is "more generic" than they might otherwise write. + +Alternatives Considered +----------------------- + +One important alternative is to follow the precedent of Rust and avoid basing these interfaces on semantic structures. +That choice is important in Rust in part because there is no way for a type to support an operator other than by implementing the built-in operator traits. +If the operator traits had prescriptive semantics, they might cause problems for types that want to support the operators but cannot fit within the semantic constraints. +In contrast, Slang allows operator overloads to be defined independent of interfaces (they are orthogonal features), so there is no risk of developers being "locked in" by our attempts to provide richer interfaces. + +Conversely, one could worry that our interfaces do not provide *enough* semantics. We may find that users need additional interfaces that sit "in between" these ones, or that carve up the same operations into smaller units. +This proposal contends that we need to have *something* in this space, and that it doesn't make sense to try to get these interfaces 100% perfect until we've had some lived experience with them. +Fortunately, the Slang language is not yet at a point of trying to guarantee perfect source stability of these interfaces, nor anything like strong binary compatibility guarantees. +If we make mistakes here, we have time to fix them. + diff --git a/proposals/legacy/002-api-headers.md b/proposals/legacy/002-api-headers.md new file mode 100644 index 0000000..5efbc1d --- /dev/null +++ b/proposals/legacy/002-api-headers.md @@ -0,0 +1,952 @@ + +Revise Slang/GFX API Headers +============================ + +The public C/C++ API headers for Slang (and GFX) are in need of cleanup and refactoring for us to reach a "1.0" API. +This document attempts to document the guidelines that we will follow in such a refactor. + +Status +------ + +In discussion. + +Background +---------- + +The Slang API header (`slang.h`) has evolved over many years, going back as far as the "Spire" research project, which predates Slang (Spire is the reason for the `sp` prefix on functions in the C API). + +At some point, we made a conscious decision to move toward a COM-based API, both because it would simplify our story around binary compatibility and because it would allow us to provide more convenient API models in cases where subtyping/inheritance is fundamental to the domain. +Unfortunately, the net result has been that we have two different APIs cluttering up the same header file (the old C one, and the newer C++/COM one), and the one that is presented *first* is actually the one we would rather users didn't reach for. +The two APIs are sometimes out of sync, with one providing services the other doesn't. + +While the GFX project started later and was thus able to start using COM interfaces and a C++ API from the start, it still faces some challenges around API evolution and binary compatibility. +As support for GPU features (whether pre-existing or new) gets added, we find that the various interfaces want to grow and the various `Desc` structures want to add new fields. +Without care, neither of those is a binary-compatible change for user code. + +A concern across both Slang and GFX is that we have tended to design our APIs around the *most complicated* use cases we intend to support, at the expense of the *simplest* cases. +We know that we cannot remove support for difficult cases, but it would be good to support concise code for simple use cases, and to support a "progressive disclosure" style that allows users to gradually adopt more involved API constructs as they become necessary. + +Related Work +------------ + +There are obviously far too many C/C++ APIs and approaches to design for C/C++ APIs for us to review them all. +We will simply note a few key examples that can be relevant for comparison. + +The gold standard for C/C++ APIs is ultimately plain C. Plain C is easy for most systems programmers to understand and benefits from having a well-defined ABI on almost every interesting platform. FFI systems for other languages tend to work with plain C APIs. Clarity around ABI makes it easy to know what changes/additions to a plain C API will and will not break binary compatibility. The Cg compiler API and the Vulkan GPU API are good examples of C-based APIs in the same domains as Slang and GFX, respectively. These APIs reveal some of the challenges of using plain C for large and complicated APIs: + +* The lack of subtype polymorphism is a problem when a domain fundamentally has subtyping. The Cg reflection API uses a single `CGtype` type to represent all types, so that the an operation like `cgGetMatrixSize` is applicable to any type, not just matrix types. The API cannot guide a programmer toward correct usage, and must define what happens in all possible incorrect cases. + +* C has no built-in model for error signalling or handling. Error codes and out-of-band values (`NULL`, `-1`) are the norm, and there are a multitude of API-specific conventions for how they are applied. + +* C has no built-in model for memory and lifetime management. Most APIs either expose create/delete pairs or some kind of per-type reference-counting retain/release operations. In either case, the application developer is left to ensure that the operations are correctly invoked, often by writing their own C++ wrapper around the raw C API. + +Some developers opt for a "Modern C++" philosophy where the public API of a system makes direct use of standard C++ library types where possible. +For example, strings are passed as `std::string`s, cases that need polymoprhism expose `class` hierarchies, types that benefit from reference-counted lifetime management, explicitly uses `std::shared_ptr<...>`, and errors are signaled by `throw`ing exceptions. +The Falcor API ascribes to aspects of this approach. +The biggest challenges with Modern C++ APIs are: + +* Source compatibility can usually be achieved, but binary compatibility is hard to achieve even *within* a version of a system, must less across versions. The central problem is that C++ ABIs are often compiler-specific (rather than standard on a platform), and even for a single compiler like gcc or Visual Studio, the binary interface to the C++ standard library can and does break between versions. + +* While C++ exceptions are a built-in error-handling scheme, they are almost universally disliked among the kinds of developers who use APIs like Slang. Enabling exceptions in most compilers adds overhead, and actually using exceptions for their intended purpose (catching and handling errors) tends to be onerous. + +* Reference-counted lifetime management in Modern C++ relies on standard library types like `std::shared_ptr` - a type that is both inefficient and inconvenient. Most developers in our target demographic end up using "intrusive" reference counts (when they use reference-counting at all) because they are more efficient and convenient. + +COM is first and foremost an idiomatic way of using C++ to define APIs that are reasonably convenient while also dealing with the recurring problems of typical C and C++ approaches. +COM defines rules for error handling, memory management, and interface versioning that all compatible APIs can use. +While code using COM-based APIs is often verbose, it is largely consistent across all such APIs. + +A key place where COM does *not* provide a complete answer is around fine-grained "extensibility" of APIs, of the kind that commonly occurs with GPU APIs like OpenGL, D3D, and Vulkan. +Across such APIs, we see a wide variety of strategies to dealing with extensibility: + +* OpenGL uses an approach where objects are typically opaque but mutable, and a large number of fine-grained operations are used to massage an object into the correct state for use. Often the fine-grained state-setting operations are all able to use a single API entry point for key-value parameter setting (e.g., `glTexParameteri`), and a new feature can be exposed simply by defining constants for new keys and/or values. When new operations are required, they need to be queried using string-based lookup of functions. + +* D3D11 uses COM interfaces and "desc" structures (called "descriptors" at the time). For example, a mutable `D3D11_RASTERIZER_DESC` structure is filled in and used to create an *immutable* `ID3D11RasterizerState`. If extended features are required, a new interface like `ID3D11RasterizerState1` and/or a new descriptor type like `D3D11_RASTERIZER_DESC1` is defined. In all cases, the "desc" structure holds the union of all state that a given type supports. + +* Vulkan uses "desc" structures (usually called "info" or "create info" structures), which contain a baseline set of state/fields, along with a linked list of dynamically-typed/-tagged extension structures. New functionality that only requires changes to "desc" structures can be added by defining a new tag and extension structure. New operations are added in a manner similar to OpenGL. + +* D3D12 also uses COM interfaces and "desc" structures (although now officially called "descriptions" to not overload the use of "descriptor" in descriptor tables), much like D3D11, and sometimes uses the same approach to extensibility (e.g., there are currently `ID3D12Device`, `ID3D12Device`, ... `ID2D12Device9`). In addition, D3D12 has also added two variations on Vulkan-like models for creating pipeline state (`ID3D12Device2::CreatePipelineState` and `ID3D12Device5::CreateStateObject`), using a notion of more fine-grained "subojects" that are dynamically-typed/-tagged and each have their own "desc". + +It is important to note that even with the nominal flexibility that COM provides around versioning, D3D12 has opted for a more fine-grained approach when dealing with something as complicated as GPU pipeline state. + +Proposed Approach +----------------- + +The long/short of the proposal is: + +* The primary API interface to Slang (and GFX) should be COM-based and use C++. Convenient C++ features like `enum class` are usable when they do not constrain binary compatibility. + +* Extensibility and versioning (where appropriate) should make use of "desc"-style tagged structures. Each of Slang and GFX will define its own `enum class` for the space of tags, rather than us trying to coordinate across the APIs. + +* We will focus on providing "shortcut" operations in the public API that allow developers to optimize for common cases and reduce the amount of boilerplate. + +* We will expose a C API that wraps the COM using `inline` functions. We will attempt to make the C API idiomatic when/as possible. + +* We can eventually/graduatelly provide a set of C++ wrapper/utility types that can further streamline the experience of using Slang, by hiding some of the details of COM reference counting, and "desc" structs". The utility code could also translate COM-style result codes into C++ exceptions, if we find that this is desired by some users. + +Detailed Explanation +-------------------- + +At the end of this document there is a lengthy code block that sketches a possible outline for what the `slang.h` header *could* look like. + +Questions +--------- + +### Will we generate all or some of the API header? If so, what will be the "ground truth" version? + +Note that Vulkan and SPIR-V benefit from having ground-truth computer-readable definitions, allowing both header files and tooling code to be generated. + +### Can we actually make a reasonably idiomatic C API that wraps a COM one, or should we admit defeat and have everything look like `slang__(...)`? + +Alternatives Considered +----------------------- + +We haven't seriously considered many alternatives in detail, other than the possibility of a plain C API (which we have tried, but not been able to make work). + +Appendix: A Header of Notes +--------------------------- + +The following code represents an sketch of a header that tries to match this proposal (and includes a lot of its own discussion/comentary). + +``` +/* Slang API Header (Proposed) + +This file is an attempt to sketch how the API headers for both +Slang and gfx could be organized in order to provide a nice +experience for developers who belong to different camps in terms +of what they want to see. + +Goals: + +* Support both C and C++ access to the API, with matching types, etc. + +* Able to use COM interfaces including use of inheritance/polymorphism + +* When compiling as C++, it should be possible to mix-and-match both C and C++ APIs + +Constraints: + + +Questions: + +* Do we actually need to restrict to block comments for pedantic compatibility + with old C versions? Are line comments close enough to universally-supported? + +*/ + +#ifdef __cplusplus + +/* Because of our goals above, we will actually end up with what amounts to +two copies of the C API, depend on whether or not we are compiling as C++. + +We start with the C++ COM-lite API, since that is the baseline. Note that +in this proposal, everything is being defined in the `slang` namespace, +rather than first declaring many things as C types and then mapping that +over to C++. +*/ + +namespace slang +{ + /* Note that in this proposal, everything is being declared in the `slang` + namespace first, rather than the old model of declaring various things + in C and then importing them into the namespace. + */ + +/* Basic Types */ + + typedef int32_t Int32; + /* typedefs for basic types, as needed ... */ + + +/* Enumerations and Constants */ + + /* Non-flag enumerations will use `enum class`. If we need to support clients + using older C++ versions/compilers, we can discuss macro-based ways to try + to work around this. + + Except in cases where there is an *extremely* compelling reason to do something + different, all enumerations use the `int` tag type that is the default for + `enum class`. + */ + + + /** Severity of a diagnostic generated by the compiler. + ... + */ + enum class Severity + { + Note, /**< An informative message. */ + /* ... */ + }; + + /* TODO: We need a clearly-defined policy for how to handle "flags" enumerations. + + My strong opinion is that we should generally avoid flags in public API just + because of the tendency to run out of bits sooner or later, but I also understand + the appeal... + */ + + struct SlangTargetFlag + { + enum : UInt32 + { + DumpIR = 1 << 0, + /* ... */ + }; + }; + typedef UInt32 SlangTargetFlags; + + /* I'm note sure about whether the `Result` type ought to be declared with `enum class`. + It would be nice to have the extra level of type safety, but it would also make our + `Result` incompatible with macros and template types that are intended to work with + `HRESULT`s. + */ + enum Result : Int32 + { + /* I *do* think we should go ahead and define all the cases of `Result`s + that we expect our API to traffic in right here in the `enum`, so that users + can easily inspect result codes in the debugger. + */ + + OK = 0, + /* ... */ + }; + +/* Forward Declarations */ + + /* Simple Types */ + struct UUID; + /* ... */ + + /* "Desc" Types */ + struct SessionDesc; + /* ... */ + + /* Interface Types */ + struct ISession; + + +/* Structure Types */ + + /* Theres's not much to say for the easy case... */ + + struct UUID + { + uint32_t data1; + uint16_t data2; + uint16_t data3; + uint8_t data4[8]; + }; + /* ... */ + + /* The more interesting bit is "descriptor" sorts of structures, which + we've done a lot of back-and-forth on how best to support. + + I'm going to write up something here while also acknowledging that picking + a good policy for how to handle this stuff is an orthogonal design choice. + */ + + enum class DescType : UInt32 + { + None, + SessionDesc, + /* ... */ + }; + enum class DescTag : UInt64; + + #define SLANG_MAKE_DESC_TAG(TYPE) DescTag(UInt64(slang::DescType::TYPE) << 32 | sizeof(slang::##TYPE)) + + struct SessionDesc + { + DescTag tag = SLANG_MAKE_DESC_TAG(SessionDesc); + + TargetDesc const* targets = nullptr; + Int targetCount = 0; + /* ... */ + + /* Note: There is some subtlety here if we use default member + initializers here, but also want to expose these types directly + via the C API (in cases where somebody is using the C API but + a C++ compiler). + + The tag approach here is intended to support something akin to + the Vulkan style, when using the C API: + + SlangSessionDesc sessionDesc = { SLANG_DESC_TAG_SESSION_DESC }; + ... + + That code will not compile under C++11, because of the default + members initializers in `slang::SessionDesc`, but it *will* compile + under C++14 and later. + + If we want to deal with C++11 compatibility in that case, we can, but + it would slightly clutter up the way we declare these things. Realistically, + we'd just split the two types: + + struct _SessionDesc { ... data but no initialization ... }; + struct SessionDec : _SessionDesc { ... put a default constructor here ... }; + + I'm not a fan of that option if we can avoid it. + */ + }; + + /* ... */ + + /* *After* all the "desc" types are defined, we can actually define the enum + for their tags (just to make life easier for users looking at things in their + debugger. + */ + enum class DescTag : UInt64 + { + None = 0, + SessionDesc = SLANG_MAKE_DESC_TAG(SessionDesc), + /* ... */ + }; + + /* Versioning: If we are in a situation where we'd like to change a type that + has already been tagged, we should first consider just creating an additional + "extension" desc type, to be used together with the original. By adding + suitable convenience APIs, we can make this easy to work with. + + If we really do decide that we want a new version of a specific desc, we should + start by doing the thing D3D does, and make a new numbered type: + + struct SessionDesc { ... the original ... }; + struct SessionDesc1 { ... the new one ... }; + + When possible, the new type should use matching field names/types and ordering. + Even if we are just adding fields, we should not try using inheritance (just + because the C++ spec doesn't guarantee enough about how inheritance is implemented). + + The new structure types should get its own desc type/tag, distinct from the original. + + If we decide that we want clients to compile against the latest version of these + types by default, we can shuffle around the names, but we need to be careful to + *also* shuffle the `DescType` cases (so that the binary values stay the same): + + struct SessionDesc0 { ... the original ... }; + struct SessionDesc1 { ... the new one ... }; + typedef SessionDesc SessionDesc1; + + At the point where we introduce a second version, it is probably the right time + to enable developers to lock in to any version they choose. In the code above + the user can always just use `SessionDesc0` or `SessionDesc1` explicitly, or they + can just stick with `SessionDesc` in the case where they always want the latest + at the point they compile. + + (If we wanted to get really "future-proof" we'd define every struct with the `0` + prefix right out of the gate, and always have the `typedef` in place. I'm not convinced + that would ever pay off.) + + I expect most of this to be a non-issue if we are zealous about using fine-grained + rather than catch-all descriptors at this level of the API. + + (There's more I could talk about here, but this isn't supposed to be the topic at hand) + */ + + +/* Interfaces */ + + /* There's an open question of how to name the `IUnknown` equivalent + once things are all namespaced. We could use `slang::IUnknown`, but I fear + that could lead to complications or confusion for codebases that also use + MS-provided COM-ish APIs. + */ + + struct ISession : public ISlangUnknown + { + SLANG_COM_INTERFACE(...); + + /* In order to maximize our ability to evolve the API while maintaining + binary compatibility, I'm going to recommend the somewhat bold step + of defaulting to making interface entry points that are "implementation + details" rather than intended for direct use in most cases. + */ + + virtual SLANG_NO_THROW Result SLANG_MCALL _createCompileRequest( + void const* const* descs, + Int descCount, + UUID const& uuid, + void** outObject) = 0; + + /* Instead, most users will directly call the operations only through + wrappers that provide conveniently type-safe behavior: + */ + inline Result createCompileRequest( + CompileRequestDesc const& desc, + ICompileRequest** outCompileRequest) + { + return _createCompileRequest( + &desc, 1, SLANG_UUID_OF(ICompileRequest), + (void**)outCompileRequest); + } + + /* An important property of this design is that we can easily define + convenience overloads that take direct parameters for common cases: + */ + inline Result createCompileRequestForPath( + const char* path, + ICompileRequest** outCompileRequest) + { + ...; + } + + /* Versioning: Note that we can easily define convenience overloads + for multiple versions of descriptor types (`CompileRequest` and + `CompileRequest`), or for *combinations* of descriptor types: + */ + inline Result createCompileRequest( + CompileRequestDesc const& requestDesc, + ExtraFeatureDesc const*& extraFeatureDesc, + ICompileRequest** outCompileRequest) + { + void* descs[] = { &requestDec, &extraFeatureDesc }; + return _createCompileRequest + descs, 2, SLANG_UUID_OF(ICompileRequest), + (void**)outCompileRequest); + } + + /* As a final detail, we should consider whether to support overloads + that work with either our `slang::ComPtr` *or* an application's own + smart pointer type. + + The user could override the smart pointer type by defining macros. + The defaults would be: + + #define SLANG_SMART_PTR(TYPE) slang::ComPtr + #define SLANG_SMART_PTR_WRITE_REF(PTR) ((PTR).writeRef()) + */ +#ifndef SLANG_DISABLE_SMART_POINTER_OVERLOADS + inline Result createCompileRequest( + CompileRequestDesc const& desc, + SLANG_SMART_PTR(ICompileRequest)* outCompileRequest) + { + return _createCompileRequest( + &desc, 1, SLANG_UUID_OF(ICompileRequest), + SLANG_SMART_PTR_WRITE_REF(*outCompileRequest)); + ) +#endif + + /* If we ever have cases where we want to support utility/wrapper operations + of higher complexity than what we feel comfortbale making `inline` (that is, + stuff that might be best off in a `slang-utils` static library) we could conceivably + handle those via judicious use of `extern`: + */ + inline Result createCompileRequestFromJSON( + char const* jsonBlob, //< a serialized form of the compilation state + ICompileRequest** outCompileRequest, + { + extern Result slang_ISession_createCompileRequestFromJSON( + char const*, ICompileRequest**); + return slang_ISession_createCompileRequestFromJSON(jsonBlob, outCompileRequest); + } + /* I doubt we'd ever really need that kind of approach, and would always decide + that functionality either belongs in core Slang (perhaps as a new derived interface) + or can go as global (non-member) functions in a utility library. + */ + + /* ... */ + } + + /* Note: I'm assuming here that we continue our implicit contract in terms + of versioning of COM interfaces: + + * Every interface is thought of as having a contract about who can *provide* + and who can *consume* it. For many (like `ISession`) only the Slang implementation + is supposed to provide it and only users consume it. Some callback interfaces + go the other way, and a few (like `slang::IBlob`) need to go both directions. + + * For interfaces that Slang provides and the user consumes, we can append new + `virtual` methods onto the end. This realistically needs a check somewhere, such + that we fail creation of the `IGlobalSession` if the user compiled against a + header that exposes the new method but is linking a DLL that doesn't. This is + what the `SLANG_API_VERSION` in the original header is supposed to be for: we + should increment it when we expand the API contract, and the global-session + creation should fail if the application is asking for too new of a version. + + * For interfaces that Slang consumes, we cannot realistically add/remove anything. + Theoretically, we could delete some of the `virtual` methods if we no longer + expect to call them, but that could still break client code that uses `override` + on their definitions. + + * We need to try very hard not to change the interface types of parameters to + non-wrapper COM methods, even if the result should be binary-compatible. There + are cases where it might be reasonable and "type-safe," but each and every one + probably needs clear auditing. + + Ideally the rules we use for Slang-provided interfaces can help us avoid the + proliferation of `IThing`, `IThing2`, `IThing3`, etc. We need to be careful about + that in the long run, though, because we may find that it causes problems in the wild + if software needs to interact with Slang in a context where the developer cannot + control the version of the Slang DLL they will be using at runtime. + + In theory, we could solve that problem by letting a user pass `SLANG_API_VERSION` + *in* to the header via a `#define`, and have us skip over any declarations introduced + after the given version. + */ +} + +/* Back outside the namespace (but still in the case where we know C++ is +supported), we can define the C-compatible API by using the C++ API +as its underlying representation. +*/ + +extern "C" +{ + +/* Basic Types */ + + /* The C-API types can just be `typedef`s of the C++ ones */ + + typedef slang::Int32 SlangInt32; + /* ... */ + +/* Enumerations and Constants */ + + /* For the case where we *know* a C++ compiler is being used, we + can actually use the C++ `enum class` declarations to provide + the enumerated types of the C API. + */ + typedef slang::Severity SlangSeverity; + + /* We can use macros to define the C-API enum cases, while still + preserving the type safety of the `enum class` approach. + + Note: We could also use `static const`s here, but that seems like + overkill. + */ + #define SLANG_SEVERITY_NONE (slang::Severity::None) + /* ... */ + +/* Structure Types */ + + /* For the case of providing the C API for a C++ compiler, we can + directly use the C++ structure types in all cases. */ + + typedef slang::UUID SlangUUID; + typedef slang::SessionDesc SlangSessionDesc; + /* ... */ + +/* Interfaces */ + + /* Because we are compiling as C++, defining the types for the interfaces + is easy, and we can easily pass objects between modules/files that are + using the two version of the API without any casting: + */ + typedef slang::ISession SlangSession; + /* ... */ + + /* In order to have a plain-C API, the user of course needs a way to + dispatch into those interfaces. + + Note: There is a big question here of whether the API header should be + trying to define the C API functions `inline` here or not. + + The argument for using `inline` is that it doesn't add any additional + requirements for somebody using the C API from within C/C++, compared to + the C++ API. + + The argument *against* is that for things like binding to other languages, + the user would probably prefer that these operations have linkage. + + Realistically, the right thing is for the header to include both declarations + *and* definitions, but to allow the application to conditionalize the inclusion + of the definitions *and* enable/disable the use of `inline` for declarations/definitions. + A user could use that control to compile their own linkable stub with C-compatible + functions. + */ + + /* We need to provide the fully-general version of the function, for clients + that might need it, but we probably don't want that to be the first one users + reach for. + */ + inline SlangResult _SlangSession_createCompileRequest( + SlangSession* session, + void const* const* descs, + SlangInt descCount, + SlangUUID const* uuid, + void** outObject) + { + return session->_createCompileRequest(descs, descCount, uuid, outObject); + } + + /* The catch here is that the C++ API used overloading as a way + to provide convenient wrappers around the fully-general core operations, + and also to provide versioning support. + + We could define the same set of overloads here, with the same names, for + use by clients who don't actually care about C compatibility but just like + a C-style API. That is probably worth doing. + + Otherwise, we realistically need to start defining some de-facto naming + scheme and/or versioning for stuff in the C API. At least one wrapper + should be "blessed" as the default one. + */ + inline SlangResult SlangSession_createCompileRequest( + SlangCompileRequestDesc const* desc, + SlangCompileRequest** outCompileRequest) + { + return session->_createCompileRequest(*desc, outCompileRequest); + } + + /* Note that we need/want to provide wrappers for *all* the operations + on each interface, even the ones they inherit. E.g.:*/ + inline uint32_t SlangSession_addRef( + SlangSession* session) + { ... } + /* The reason for this is so that a pure-C user doesn't *have* to rely on + implicit conversion of these types to their bases (which in this path is + made possible via C++ features, but wouldn't be available in a true pure-C + world). + */ + + /* If/when we start to deal with versioning of either the "desc" type or + the interface involved in such an operation, we will need to do the numeric-suffix + thing or similar stuff to distinguish the old and new functions. + + We can probably do some work to always make the latest version (or at least + the one we want users to be using) have the short/clean name. Binary compatibility + shouldn't actually break so long as the signature of the new function can technically + handle calls of the old form (since the COM-level bottleneck function won't care about + the static types of descs - just their tags). + */ + + /* Finally, the C API level is where we should define the core factory entry + point for creating and initializing the Slang global session (just like + in the current header). Here we just generalize it for creaitng "any" global + object, based on a UUID and a bunch of descs. + */ + SLANG_API SlangResult slang_createObject( + void const* const* descs, + Int descCount, + UUID const* uuid, + void** outObject); + + /* The actual global session creation is then a wrapper like everything else. + */ + inline SlangResult SlangGlobalSession_create( + SlangGlobalSessionDesc const* desc, + SlangGlobalSession** outGlobalSession) + { + return slang_createObject( + &dec, 1, SLANG_UUID_OF(slang::IGlobalSession), (void**)outGlobalSession); + } +} + +#else + +/* All of the above declarations (even the C-level ones) only work if we are +compiling as C++. Thus we need a distinct strategy to define everything in the +case where we are compiling as pure C. + +The basic strategy isn't that hard: we just do things the raw C way. +There will be a lot of repetition involved, but this proposal assumes we are +generating as much of the API as possible anyway. +*/ + +/* Basic Types */ + + /* We just define the basic types directly, without the indirection + through the declarations in the `slang::` namespace. + */ + + typedef int32_t SlangInt32; + /* ... */ + +/* Enumerations and Constants */ + + /* Every enum in this case is a `typedef` plus an actual `enum`: + */ + + typedef int SlangSeverity; + enum + { + SLANG_SEVERITY_NONE = 0, + /* ... */ + }; + + /* ... */ + +/* Structure Types */ + + /* The simple case stays simple, just with the gross bit of + duplicating a *lot* of what we already had in the C++ API. + + (There's a big design question here of whether we can/should try + to remove as much duplication as possible in order to reduce + boilerplate, even if it comes at the cost of clarity because of + heavier reliance on macros, etc.) + */ + + struct SlangUUID + { + uint32_t data1; + uint16_t data2; + uint16_t data3; + uint8_t data4[8]; + }; + /* ... */ + + /* The desc-related stuff is really just a translation of the + same basic ideas to plain C: */ + + typedef SlangUInt32 SlangDescType; + enum + { + SLANG_DESC_TYPE_NONE = 0, + SLANG_DESC_TYPE_SessionDesc, + /* ... */ + }; + typedf SlangUInt64 SlangDescTag; + + #define SLANG_MAKE_DESC_TAG(TYPE) SlangDescTag(UInt64(SLANG_DESC_TYPE_##TYPE) << 32 | sizeof(Slang##TYPE)) + + struct SlangSessionDesc + { + SlangDescTag tag; + + SlangTargetDesc const* targets; + SlangInt targetCount; + /* ... */ + }; + /* ... */ + + #define SLANG_DESC_TAG_SESSION_DESC SLANG_MAKE_DESC_TAG(SessionDesc) + /* ... */ + +/* Forward Declarations */ + + typedef struct SlangSession SlangSession; + /* ... */ + +/* Interfaces */ + + /* There's already a lot known about how to define COM interfaces for + consumption from C, so this is actually mostly straightforward. + + Note: these definitions would *only* be needed in the case where we + are compiling the actual implementations of the C API functions. It + is possible that we can/should just not bother with these, under + the assumption that anybody who wants a true pure-C API probably wants + a linkable "stub" library anyway, in which case we can provide that + library ourselves, and compile it as C++. + */ + + /* TODO: The big thing I'm skipping here is setup for the UUIDs. + I think we can provide C-compatible macros for those pretty easily, + but exactly what that should look like is maybe more complicated. */ + + struct SlangSession + { + /* The long/short is that we define a pointer field to a struct + of function pointers, which matches the expected C++ virtual + function table layout. + */ + + struct + { + /* Note: methods from all base interfaces need to go here... */ + + SLANG_NO_THROW SlangResult SLANG_MCALL (*_createCompileRequest)( + SlangSession* session, + void const* const* descs, + SlangInt descCount, + SlangUUID const* uuid, + void** outObject); + + /* ... */ + + } * vtbl; + }; + /* ... */ + + /* With the core type declarations out of the way, the actual functions + that forward to it are easy enough: + */ + inline SlangResult _SlangSession_createCompileRequest( + SlangSession* session, + void const* const* descs, + SlangInt descCount, + SlangUUID const* uuid, + void** outObject) + { + /* The only interesting complications here are the `->vtbl` + and the need to pass `session` explicitly. We could probably + macro away the difference if we don't want to have distinct + C-API-compiled-via-C++ and C-API-compiled-via-C cases. + */ + return session->vtbl->_createCompileRequest( + session, descs, descCount, uuid, outObject); + } + + /* The declarations of the global session creation stuff are almost + identical, so there's no real need to dpulicate it here. + */ + + /* For the true pure-C users, we probably want to provide convenience + functions and/or macros to enable the casts that should be statically + possible.*/ + inline SlangUnknown* SlangSession_asUnknown(SlangSession* session) + { + return (SlangUnknown*) session; + } + /* ... */ + + +/* + +Okay, so that's the basic idea of the proposal for how to expose our API(s). + +I realize this didn't get into the actual details of type hierarchies or what +the actual "desc" types need to be for Slang and gfx. The focus here was much +more on the syntactic side of things, in terms of how we can define our API +so that both C and C++ are usable and can be freely intermixed within a codebase. + +*/ + +/* There's probably an entire additional document that could be written about +utility/wrapper stuff to make the interfaces nicer for C++ users. Some examples +follow: + +We could consider having a hierarchy of wrapper smart-pointer types that codify the +reference-counting policies without the user having to really think about `ComPtr` stuff: + + struct Unknown + { + public: + // typical stuff... + + + protected: + slang::IUnknown* _ptr = nullptr; + } + + struct Session : Unknown + { + public: + ISession* get() const { return (ISession*)_ptr; } + operator ISession*() const { return get(); } + + Result createCompileRequest( + CompileRequestDesc const& desc, + CompileRequest* outCompileRequest) + { ... } + } + +Another thing to consider is whether any of our COM-ish wrappers should allow for +use of exceptions instead of `Result`s: + + struct ISession : ... + { + ... + +#if SLANG_ENABLE_SMART_PTR + ... + + #if SLANG_ENABLE_EXCEPTIONS + SLANG_SMART_PTR(ICompileRequest) createCompileRequest( + CompileRequestDesc const& desc) + { + SLANG_SMART_PTR(ICompileRequest) compileRequest; + SLANG_THROW_IF_FAIL(_createCompileRequest( + &desc, 1, SLANG_UUID_OF(IComileRequest), comileRequest.writeRef())); + return compileRequest; + } + + ... + #endif +#endif + } + +Both for the sake of C API and especially for gfx (both C and C++), we should consider +defining some coarse-grained aggregate desc types as utilities: + + struct SimpleRasterizationPipelineStateDesc + { + // sub-descs for all the relevant pieces: + // + PipelineProgramDec program; + DepthStencilDesc depthStencil; + MultisampleDesc multisample; + PrimitiveTopologyDesc primitiveTopology; + NVAPIDesc nvapi; + // ... + + // "fluent"-style setters for all the relevant pieces: + + SimpleRasterizationPipelineStateDesc& setEnableDepthTest(bool value) + { + markDepthStencilDescUsed(); + depthStencil.enableDepthTest = value; + return *this; + } + + // ... + + // This is also the logical granularity to provide things like + // List members for attachments, etc. rather than just pointer-and-count: + + private: List colorAttachments; + public: AttachmentDesc& addColorAttachment(); + + // There should also be convenience constructors common cases + // (especially relevant for things like textures). + + // In the simplest implementation strategy, we keep a bitmask for which + // of the sub-descs have actually beem used (either requested by the user, + // or set to non-default values): + // + enum class SubDesc { Program, DepthStencil, ... Count }; + uint32_t usedSubDescs = 0; + void markSubDescUsed(SubDesc d) + { + uint32_t bit = 1 << int(d); + if(usedSubDesc & bit) return; + + usedSubDescs |= bit; + updatePointers(); + } + + // We then maintain a compacted array of all the sub-descriptors needed + // to form the combined state for passing along to the lower-level API. + // + void* subDescs[int(SubDesc::Count)]; + int subDescCount = 0; + + void updatePointers() + { + subDescCount = 0; + if(usedSubDescs & (1 << int(Program))) + { + subDescs[subDescCount++] = &program; + } + /// ... + } + }; + +While the implementation of this monolithic desc types would not necessarily be pretty, +it would enable users who want the benefits of the "one big struct" approach to get +what they seem to want. + +The next step down this road is to take these aggregate desc types and turn them into +actual API objects for the purposes of the C API, so that users can more conveniently +create stuff: + + GFXRasterizationPipelineStateBuilder* GFXDevice_beginCreateRasterizationPipelineState( + GFXDevice* device); + + void GFXRasterizationPipelineStateBuilder_setEnableDepthTest( + GFXRasterizationPipelineStateBuilder* builder, + bool enable); + + // Note: frees the given `builder`, so user doesn't have to do it manually + GFXPipelineState* GFXRasterizationPipelineStateBuilder_create( + GFXRasterizationPipelineStateBuilder* builder); + +Obviously the function names are very verbose there, but they could probably be cleaned +up a lot if we want to go down this route. Certainly, if we decide that C API users are +not going to be inclined to use a lot of fine-grained descs, this starts to seem like +an increasingly attractive way to go. +*/ + +#endif +``` \ No newline at end of file diff --git a/proposals/legacy/003-error-handling.md b/proposals/legacy/003-error-handling.md new file mode 100644 index 0000000..e8fb444 --- /dev/null +++ b/proposals/legacy/003-error-handling.md @@ -0,0 +1,296 @@ +Error Handling +============== + +Slang should support a modern system for functions to signal, propagate, and handle errors. + +Status +------ + +In discussion. + +Background +---------- + +Errors happen. It is impossible for any programming language to statically rule out the possibility of unexpected situations arising at runtime. +There are a wide variety of strategies used in programming, both provided by languages and enforced by idiom in codebases. + +Not all errors are alike, in that some are more expected and reasonable to handle than others. +Most errors can fit into a few broad categories like: + +* Unrecoverable or nearly unrecoverable failures like resource exhaustion (out of memory), or an OS-level signal to terminate the process. + +* Incorrct usage of an API in ways that violate invariants. For example, passing a negative value to a function that says it only accepts positive values. + +* Out-of-range or otherwise invalid data coming from program users. For example, a console program asks the user to type a number, but the user enters some string that does not parse as a number. + +* Failure of operation that will usually succeed, but for which exceptional circumstances can lead to failures. For example, when reading from an open file we typically expect success, but failure is possible for many reasons outside of a programmer's control (like network disruption when accessing a remote file). A robust program often wants to recover from such failures, but often the policy for how recovery should occur is at a higher level than the code that first detects the error. + +These different categories often benefit from different strategies: + +* Typically there is neither a reason nor a desire to do anything about nearly-unrecoverable errors; the program has well and truly crashed. + +* When programmers violate the invariants of an API, they typically want to know about it as early as possible (during development) so they can fix their code. Breaking into the debugger is often the best answer, and in many cases trying to propagate or recover from such failures would be wasted effort. + +* When an operation could fail due to mal-formed data coming from a user, programmers typically want to be forced to handle the failure case at the point where the error may arise. In languages that have an `Optional` or `Maybe` type, it is often easiest to return that. + +* Unpredictable, exceptional, and recoverable errors are among the hardest to deal with, and often benefit from direct language support. + +The Slang language currently doesn't have direct support for *any* form of error handling, but this document focuses on errors in the last of the categories above. + +Related Work +------------ + +In the absence of language support, developers typically signal and propagate errors using *error codes*. The COM `HRESULT` type is a notable example of a well-defined system for using error codes in C/C++ and other languages. +Error codes have the benefit of being easy to implement, and relatively light-weight. +The main drawback of error codes is that developers often forget to check and/or propagate them, and when they do remember to do so it adds a lot of boilerplate. +Additionally, reserving the return value of every function for returning an error code makes code more complex because the *actual* return value must be passed via a function parameter. + +C++ uses *exceptions* for errors in various categories, including unpredictable but recoverable failures. +Propagation of errors up the call stack is entirely automatic, with unwinding of call frames and destruction of their local state occurring as part of the search for a handler. +Neither functions that may throw nor call sites to such functions are syntactically marked. +Exceptions in C++ have often been implemented in ways that add overhead and require complicated support in platform ABIs and intermediate languages to support. + +Java uses exceptions with similar rules to C++, but adds a restriction that functions must be marked with the types of exceptions they may throw or propagate, except for those that inherit from `RuntimeException`, which are intended to represent some of the other categories of error in our taxonomy (like simple invariant violations and nearly unrecoverable errors). +The need to mark every function that might fail (or propagate failure) was seen by most developers at the time as unreasonably onerous. +Developers often smuggled other kinds of exceptions out through `RuntimeException`s, to get them through API layers that were not designed to support exceptions. + +Both Rust and Swift try to strike a balance between error codes and languages with exceptions. +At a high level, each takes an approach where the generated code is comparable to an error-code-based solution (so that no special ABI or IL support is needed), but direct syntactic support makes propagating and/or handling errors more convenient. + +In Rust, a function that returns `std::Result` either returns successfully with a value of type `SomeType`, or fails with an error of type `SomeError`. +The `Result` type is itself just a Rust `enum`, so that results can be handled by pattern-matching with `match`, `if let`, etc. +Direct syntactic support is added so that in the body of a `Result`-returning function, a postfix `?` operator can be applied to an expression of type `Result` to implicitly propagate `E` on any failure, and return the `X` value otherwise. +Some higher-order functions can Just Work with `Result`-returning functions, if their signatures are compatible, but many operations like `map()`, `fold()`, etc. need distinct overloads that support `Result`s. +Functions that return `X` and those that return `Result` are not directly convertible. + +Swift provides more syntactic support for errors than Rust, although the underlying mechanism is similar. +A Swift function may have `throws` added between the parameter list and return type to indicate that a function may yield an error. +All errors in Swift must implement the `Error` protocol, and all functions that can `throw` may produce any `Error` (although there are proposals to extend Swift with "typed `throws`"). +Any call site to a `throws` function must have a prefix `try` (e.g., `try f(a, b)`), which works similarly to Rust's `?`; any error produced by the called function is propagated, and the ordinary result is returned. +Swift provides an explicit `do { ... } catch ...` construct that allows handlers to be established. +It also provides for conversion between exceptions and an explicit `Result` type, akin to Rust's. +Higher-order functions may be declared as `rethrows` to indicate that whether or not they throw depends on whether or not any of their function-type parameters is actually a `throws` function at a call site. +Any non-`throws` function/closure may be implicitly converted to the equivalent `throws` signature, so that non-throwing functions are subtypes of the throwing ones. + + +The model used in Swift is compatible with the more general notion of *effects* in type theory. +A simple model of function types like `D -> R` can be extended to support zero or more effects `E0`, `E1`, etc. that live "on the arrow": `D -{E0, E1}-> R`. +Purely functional languages like Haskell sometimes use monads as a way to represent effects: a function `D -> IO R` is effectively a function from `D` to `R` with the addition effect that it may perform IO. +Making effects more explicit allows a type system to reason about sub-typing in the presence of effects (a function type without effect `E` is a subtype of a function with that effect), and to express code that is generic over effects. + +Proposed Approach +----------------- + +We propose a modest starting point for error handling in Slang that can be extended over time. +The model borrows heavily from Swift, but also focuses on strongly-typed errors. + +The core module will provide a built-in interface for errors, initially empty: + +``` +interface IError {} +``` + +User code can define their own types (`struct` or `enum`) that conform to `IError`: + +``` +enum MyError : IError +{ + BadHandle, + TimedOut, + // ... +} +``` + +User-defined functions (in both traditional and "modern" syntax) will support a `throws ...` clause to specify the type of errors that the function may produce: + +``` +float f(int x) throws MyError { ... } + +func g(x: int) throws -> float MyError { ... } +``` + +Call sites to a `throws` function must wrap any potentially-throwing expression with a `try`: + +``` +float g(int y) throws MyError +{ + return 1.0f + try f(y-1); +} +``` + +Code can explicitly raise an error using a `throw` expression: + +``` +throw MyError.TimedOut; +``` + +We will allow `catch` clauses to come at the end of any `{}`-enclosed scope, where they will apply to any errors produced by `throw` or `try` expressions in that scope. + +``` +{ + ... + try f(...); + ... + + catch( e: MyError ) { ... } +} +``` + +We will also want to add `defer` statements, as they are defined in Go, Rust, Swift, etc. +The statements under a `defer` will always be run when exiting a scope, even if exiting as part of error propagation. + +Detailed Explanation +-------------------- + +Consider a function that uses most of the facilities we have defined: + +``` +float example(int x) throws MyError +{ + if(someCondition) + { + throw MyError.TimedOut; + } + ... + defer { someCleanup(); } + ... + { + let y : int = 1 + try g(...); + + catch(e : MyError) + { ... } + } + ... + return someValue; +} +``` + +We will show how a function in this form can be transformed via incremental steps into something that can be understood and compiled without specific support for errors. + +### Change Signature + +First, we transform the signature of the function so that it returns something akin to an `Optional` and returns its result via an `out` parameter, and modify any `return` points to write the `out` parameter and return `null` (the not-present case of `Optional`): + +``` +MyError example_modified(int x, out float _result) +{ + ... + + _result = someValue; + return null; +} +``` + +### Desugar `try` Expressions + +Next we can convert any `try` expressions into a more explicit form, to match the transformation of signature. A statement like this: + +``` +let y : int = 1 + try g(...); +``` + +transforms into something like: + +``` +var _tmp : int; +let _err : Optional = g_modified(..., out _tmp); +if( _err != null ) +{ + throw _err.wrappedValue; +} +let y : int = 1 + _tmp; +``` + +### Desugar `throw` Expressions + +For every `throw` site in a function body, there will either be no in-scope `catch` clause that matches the type thrown, or there will be exactly one most-deeply-nested `catch` that statically matches. +Front-end semantic checking should be able to associate each `throw` with the appropriate `catch` if any. + +For `throw` sites with no matching `catch`, the operation simply translates to a `return` of the thrown error (because of the way we transformed the function signature). + +For `throw` sites with a matching `catch`, we treat the operation a a "`goto` with argument" that jumps to the `catch` clause and passes it the error. +Note that our IR structure already has a concept of "`goto` with arguments". + +### Desugar `defer` Statements + +Handling of `defer` statements is actually the hardest part of this proposal, and as such we should probably handle `defer` as a distinct feature that just happens to overlap with what is being proposed here. + +### Subtyping: Front-End + +We should (at some point) add a `Never` type to the Slang type system, which would be an uninhabitable type suitable for use as the return type of functions that never return: + +``` +func exit(code: int) -> Never; // C `exit()` never returns +``` + +`Never` is effectively a subtype of *every* type and, as such, an expression of type `Never` can be implicitly converted to any type. + +A `throw` expression has the type `Never`, allowing a user to write code like: + +``` +// Because `Never` can convert to `int`, this is valid: +int x = value > 0 ? value : throw MyError.OutOfBounds; +``` + +A function without a `throws` clause is semantically equivalent to a function with `throws Never`. +If we make that equivalence concrete at the type-system level, then a higher-order function can be generic over both throwing and non-throwing functions: + +``` +func map( + f: (D) throws E -> R, + l: List) + throws E -> List; +``` + +A function type with `throws X` is a subtype of a function with `throws Y` if `X` is a subtype of `Y`. +That includes the case where `X` is `Never`, so that a non-`throws` function type is a subtype of any `throws` function type with the same parameter/result signature. + +### Subtyping: Low-Level + +The subtyping relationship for `Never` *values* is irrelevant to codegen. Any place in the IR that has a `Never` value available to it represents unreachable code. + +The subtyping relationship for `Never` in function types is more challenging, both for result types and error types. At the most basic, we can inject trampoline/thunk functions at any points where we have a `Never`-yielding function and need a function that returns `X` to pass along. + +If we were doing low-level code generation for a platform where we can define our ABI, it would be possible to have `throws` and non-`throws` functions use distinct calling conventions, such that: + +* The orinary parameters and reuslts are passed in the same registers/locations in both conventions. + +* The error value (if any) in the `throws` convention is passed via registers/locations that are callee-save in the non-`throws` convention. + +Under that model, a call site to a potentially-`throws` function can initialize the registers/locations for the error result to `null`/zero before dispatching to the callee. +If the callee is actually a non-`throws` function it would not touch those registers, and no error would be detected. +In that case, a non-`throws` function/closure could be used directly as a `throws` one with no conversion. +Such calling-convention trickery isn't really possible to implement when emitting code in a high-level language like HLSL/GLSL or C/C++. + +Questions +-------------- + +### Should we support the superficially simpler case of "untyped" `throws`? + +Having an `IError` interface allows us to eventually decide that `throws` without an explicit type is equivalent to `throw IError`. +It doesn't seem necessary to implement that convenience for a first pass, especially when there are use cases for `throws` that might not want to get into the mess of existential types. + +### Should the transformations described here be implemented during AST->IR lowering, or at the IR level? + +That's a great question! My guess is that some desugaring will happen during lowering, but we will probably want to keep `throws` functions more explicitly represented in the IR until fairly late, so that we can desugar them differently for different targets (if desired). + +### Do we need `Optional` to be supported to make this work? + +It is unlikely that we'd need it to be a user-visible feature in a first pass, but we might want it at the IR level. +For this feature to work, we really need `sizeof(Optional)` to be the same as `sizeof(X)` for simple cases where `X` is an `enum` or (for suitable targets) a type that is pointer-based. + +A first pass at the feature might only support cases where error types are `enum`s and where the zero value is the "no error" case. + +### Should we have a `Result` type akin to what Rust/Swift have? Should a `throws E` function be equivalent to one that returns `Result`? + +That all sounds nice, but for now it seems like overkill. +Slang doesn't really have any facilities for programming with higher-order functions, pattern matching, etc. so adding types that mostly shine in those cases seems like a stretch. + +Alternatives Considered +----------------------- + +We could decide that Slang shouldn't be in the business of providing error-handling sugar *at all* and make this a problem for users. +That isn't really a reasonable plan for any modern language, but it is the status quo and null hypothesis if we don't start in on a better plan. + +We could try to focus on C++ interop/compatibility and decide that errors in Slang should use exceptions, and only make "proper" language-supported error handling available to platforms that support exceptions at a suitably low level. +Doing so would give us all the disadvantages of C++ exceptions, and also mean that most of our users wouldn't end up using our error-handling tools, because doing so would render code non-portable. diff --git a/proposals/legacy/004-com-support.md b/proposals/legacy/004-com-support.md new file mode 100644 index 0000000..dc0bd52 --- /dev/null +++ b/proposals/legacy/004-com-support.md @@ -0,0 +1,240 @@ +COM Support +=========== + +When Slang is used as a host/CPU programming language, it is likely that users will want to use interact with COM interfaces, either by consuming them or implementing them. +The Slang language and compiler should provide some first-class features to make working with COM interfaces feel lightweight and natural. + +Status +------ + +Implemented. + +Background +---------- + +COM is not perfect, but it is one of the only real solutions for cross-platform portable C++ APIs that care about binary compatibility and versioning. +Developers who use Slang are likely to write code that uses COM, whether to interact with Slang itself (and/or GFX), or with platform APIs like D3D. + +While COM provides idioms for addressing many practical challenges, it is also inconvenient in that it introduces a lot of *boilerplate*: + +* COM types all need to implement the core `IUnknown` operations for casting/querying and reference counting. + +* Code using COM interfaces needs to perform `AddRef` and `Release` operations manually, or use smart pointer types to automate lifetime management. + +* Code that calls into COM interfaces typically needs to use boilerplate code and/or macros to deal with `HRESULT` error codes, handling or propagating them as needed. + +Our in-progress work on supporting CPU programming in Slang emphasizes supporting idiomatic code without a lot of boilerplate. +Our intended path includes things that are compatible with the COM philosophy, like reference-counted lifetime management and idiomatic use of result/error codes, but those features don't currently align with the more explicit style used by COM in C/C++. + +Related Work +------------ + +The .NET platform includes some support for allowing .NET `interface`s and COM interfaces to interoperate. +TODO: Need to study this and learn how it works. + +Proposed Approach +----------------- + +We propose to allow COM interfaces to be declared using the Slang `interface` construct, with an appropriate attribute or modifier: + +``` +[COM] interface IDevice +{ + ITexture createTexture(__read TextureDesc desc) throws HRESULT; + + void setTexture(int index, ITexture texture); +} +``` + +A declaration like the above will translate into output C++ along the lines of: + +``` +struct IDevice : public IUnknown +{ + virtual HRESULT SLANG_MCALL createTexture(TextureDesc const& desc, ITexture** _result) = 0; + + virtual void SLANG_MCALL setTexture(int index, ITexture* texture) = 0; +}; +``` + +Key things to note: + +* The `[COM] interface` becomes a C++ `struct` that inherits from `IUnknown` +* Methods defined in the `[COM] interface` become pure-virtual `SLANG_MCALL` methods in C++ +* Parametes/values of a `[COM] interface` type `IFoo` in Slang translate to `IFoo*` in C++ +* Methods that have a `throws HRESULT` clause are transformed to have an `HRESULT` return type and an output parameter for their result + +A Slang `class` can declare that it implements zero or more `[COM] interface`s. Code like this: + +``` +class MyTexture : ITexture +{ + // ... +} + +class MyDevice : IDevice +{ + ITexture createTexture(__read TextureDesc desc) throws HRESULT + { + return ...; + } + + void setTexture(int index, ITexture texture) + { + ...; + } +} +``` + +translates into output C++ like this: + +``` +class MyTexture : public slang::Object, public ITexture +{ + // ... +}; + +class MyDevice : public slang::Object, public IDevice +{ + virtual HRESULT QueryInterface(REFIID riid, void **ppvObject) { ... } + virtual ULONG AddRef() { ... } + virtual ULONG Release() { ... } + + HRESULT createTexture(TextureDesc const& desc, ITexture** _result) SLANG_OVERRIDE + { + _result = ...; + return S_OK; + } + + void setTexture(int index, ITexture* texture) SLANG_OVERRIDE + { + ... + } +} +``` + +All Slang `class`es translate to C++ classes that inherit from `slang::Object` (equivalent to the `RefObject` type within the current Slang implementation). +A `class` that inherits any `[COM]` interfaces includes an implementation of `IUnknown` plus the methods to override all the interface requirements. + +In ordinary code that makes use of `[COM] interface` types: + +``` +struct Stuff +{ + ITexture t; +} +ITexture getTexture(Stuff stuff) +{ + return stuff.t; +} +ITexture someOperations(IDevice device) +{ + let t = device.createTexture(...); + return t; +} +``` + +the C++ output uses C++ smart-pointer types for local variables, `struct` fields, and function results: + +``` +struct Stuff +{ + ComPtr t; +}; +ComPtr getTexture(Stuff stuff) +{ + return stuff.t; +} +HRESULT someOperations(IDevice* device, ComPtr& _result) +{ + ComPtr t; + HRESULT _err = device->createTexture(&t); + if(_err < 0) return _err; + + _result = t; + return 0; +} +``` + +As a small optimization, an `in` parameter of a `[COM] interface` type translates as a C++ parameter of just the matching pointer type (see `device` above). + +Note that the translation of idiomatic `HRESULT` return codes into `throws HRESULT` functions in Slang allows code working with COM interfaces to benefit from the convenience of the Slang error handling model. + + +Detailed Explanation +-------------------- + +This is a case where the simple explanation above covers most of the interesting stuff. + +There are a lot of semantic checks we'd need/want to implement to make sure `[COM]` interfaces are used correctly: + +* Any `interface` that inherits from one or more other `[COM]` interfaces must itself be `[COM]` + +* Any concrete type that implements a `[COM]` interface must be a `class` + +There are also detailed implementation questions to be answered around the in-memory layout of `class` types that implement `[COM]` interfaces. +In particular, we might want to be able to optimize for the case of a single-inheritance `class` hierarchy that mirrors a `[COM] interface` hierarchy, since this comes up often for COM-based APIs: + +``` +interface IBase { void baseFunc(); } +interface IDerived : IBase { void derivedFunc(); } + +class BaseImpl : IBase { ... } +class DerivedImpl : BaseImpl, IDerived { ... } +``` + +Using a naive translation to C++, the `DerivedImpl` type could end up with *three* different virtual function table (`vtbl`) pointers embedded in it: one for `slang::Object`, one for `IBase`, and one for `IDerived`. +Clearly the `vtbl`s for `IBase` and `IDerived` could be shared, but C++ `class`es can't easily express this. +Furthermore, if we are able to tune our strategy for layout, we can set things up so that `[COM] interface`s consume `vtbl` slots starting at index `0` and counting up, while any `virtual` methods in `class`es consume slots starting at index `-1` and counting *down*. +Using such a layout strategy we can actually allow a type like `DerivedImpl` above to use only a *single* `vtbl` pointer. + +Questions +-------------- + +### Should we emit COM code that works at the plain C level, or idiomatic C++? + +I honestly don't know. Emitting idomatic C++ (and using things like smart pointers) certainly makes the output code easier to understand. + +### Can we make this work with more advanced features of Slang interfaces? + +Some Slang features don't have perfect analogues. For example, given that `[COM] interface`s can only be conformed to by `class` types, the use of `[mutating]` isn't especially meaningful. + +There is no reason why a `[COM] interface` couldn't make use of `static` methods, but there would be no way to call those from C++ without an instance of the interface type. + +A `[COM] interface` could include `property` declarations, provided that we define the rules for how they translate into getter/setter operations in the generated output. + +One interesting case is that a `[COM] interface` could allow use of `This`, as well as `associatedtype`s that are themselves constrained by a `[COM] interface`. +For example, we could instead device our `IDevice` interface from before as: + +``` +[COM] interface IDevice +{ + associatedtype Texture : ITexture; + + Texture createTexture(__read TextureDesc desc) throws HRESULT; + + void setTexture(int index, Texture texture); +} +``` + +We could set things up so that the `associatedtype` has no impact on the C++ translation of `IDevice`: all parameter/result types that use the `Texture` associated type would translate to `ITexture*` in C++. +As such, a more refined `interface` like this would not disrupt the binary interface of a COM-based API, but could be allowed to express more of the constraints of the underlying API at compile time. +For example, the above use of `associatedtype` would prevent Slang code from mixing up textures across devices: + +``` +void someFunc( IDevice a, IDevice b ) +{ + let x = a.createTexture(...); + let y = b.createTexture(...); + + a.setTexture(0, y); // COMPILE TIME ERROR! +} +``` + +In this example, `y` has type `b.Texture`, while `a.setTexture` expects an argument of type `a.Texture` (a distinct type, even if it also conforms to `ITexture`). +The benefits of this approach are probably purely hypothetical until we make it a lot easier to work with dependent types like `a.Texture` in Slang code. + +Alternatives Considered +----------------------- + +The main alternative would be to have Slang's model for interop with C/C++ focus primarily on C alone, and only allow use of COM-based APIs through C-compatible interfaces. diff --git a/proposals/legacy/005-components.md b/proposals/legacy/005-components.md new file mode 100644 index 0000000..ff53d0f --- /dev/null +++ b/proposals/legacy/005-components.md @@ -0,0 +1,507 @@ +Components +========== + +We propose to extend Slang with a construct for defining coarse-grained *components* that can be used to assemble shader programs. + +Status +------ + +Under discussion. + +Background +---------- + +First, a bit of terminology. In the context of a specific language like Slang, a term like "component" or "module" will often have a narrow meaning, but when we want to discuss "modularity" broadly we need to have a way to refer to the *things* we want to have be modular: the units of modularity. +In this document we will use the term *unit* to refer abstractly to anything that is a unit of modularity for some context/purpose/system, and try to reserve other terms for cases where we mean something more specific. + +While Slang has many features that address modularity for "small" units, it is still lacking in constructs that adequately address the needs of "large" units. +These are qualitative distinctions, but some examples may clarify the kind of distinction we mean. +An interface `ILight` for light sources, and a `struct` type `OmniLight` that conforms to it are small units. +An entire `LightSampling` module in a path tracer is a much larger unit. + +The main tools that Slang provides for "small" units are: `interface`s, `struct`s, and `ParameterBlock`s. +Interfaces allow a developer to codify the types and operations that clients of a unit may rely on, as well as the requirements that implementations must provide. +Structure types are the main way developers can implement an interface, and it is important for GPU efficiency that Slang `struct`s are *value types*. +By using `ParameterBlock`s, developers can connect the units of modularity used *within* shader code with the parameters passed from *outside* their shaders. + +When we talk about "large" units of modularity, the main thing Slang provides are, well, modules. +A Slang module is basically just a collection of global-scope declarations: types, functions, shader parameters, and entry points. +The `import` construct allows modules to express a dependency on one another, and if/when we add `public`/`internal` visibility qualifiers it will also be able to restrict clients of a module to its defined interface. + +What modules *don't* provide is any of the flexibility that `interface`s provide for "small" units like `struct` types. +There is no first-class way for a Slang programmer to define a common interface that multiple modules implement, and then to express another piece of code that can work with any of those implementations. +Aside from falling back to preprocessor hackery (which negates many of the benefits of Slang in terms of separate compilation), the only way for developers to try to recoup those benefits is to use the tools for "small" units. + +Let's consider a placeholder/pseudocode set of Slang modules that work together: + +``` +// Lighting.slang + +interface ILight { ... } + +StructuredBuffer gLights; + +void doLightingStuff() { ... } +``` + +``` +// Materials.slang +import Lighting; + +interface IMaterial { ... } +StructuredBuffer gMaterials; + +void doMaterialStuff() +{ + doLightingStuff(); +} +``` + +``` +// Integrator.slang +import Lighting; +import Materials; + +ParameterBlock gIntegratorParams; + +void doIntegeratorStuff() +{ + doLightingStuff(); + doMaterialsStuff(); +} +``` + +The details of the module implementations is not the important part here. +The key is that each module defines a collection of types, operations, and shader parameters, and there is a dependency relationship between the modules. +Note that the `Integrator` module depends on both `Lighting` and `Materials`, but it does *not* need to be actively concerned with the fact that `Materials` *also* depends on `Lighting`. + +If we look only at a leaf module like `Lighting` it seems simple enough to translate it into something based on our "small" modularity features: + +``` +// Lighting.slang + +interface ILight { ... } + +interface ILightingSystem +{ + void doLightingStuff(); +} + +struct DefaultLightingSystem : ILightingSystem +{ + StructuredBuffer lights; + + void doLightingStuff() { ... } +} +``` + +Here we were able to move most of the global-scope code in `Lighting.slang` into a `struct` type called `DefaultLightingSystem`. +We were also able to define an explicit `interface` for the system, which makes explicit that we don't consider `gLights` part of the public interface of the system (only `doLighting()`). +By defining the interface, we also create the possibility of plugging in other implementations of `ILightingSystem` - for example, we can imagine a `NullLightingSystem` that actually doesn't perform any lighting (perhaps useful for performane analysis). + +Translating `Materials` in the same way that we did for `Lighting` leads to some immediate questions: + +``` +// Materials.slang +import Lighting; + +interface IMaterial { ... } + +interface IMaterialSystem +{ + void doMaterialStuff(); +} + +struct DefaultMaterialSystem +{ + StructuredBuffer materials; + + void doMaterialStuff() + { + /* ???WHAT GOES HERE??? */.doLightingStuff(); + } +} +``` + +When our `DefaultMaterialSystem` wants to invoke code for lighting, it needs a way to refer to the lighting system. +Beyond that, we want it to be able to work with *any* implementation of `ILightingSystem`. + +A naive first attempt might be to give `DefaultMaterialSystem` field that refers to an `ILightingSystem`: + +``` +struct DefaultMaterialSystem +{ + ILightingSystem lighting; + ... + void doMaterialStuff() + { + lighting.doLightingStuff(); + } +} +``` + +An approach light that (or even one using a `ParameterBlock`) runs into the problem that it is going to force the Slang compiler to use its layout strategy for dynamic dispatch, which cannot handle the resource types in `DefaultLightingSystem` when compiling for most current GPU targets. +There are other problems with directly aggregating an `ILightingSystem` into our material system, but those will need to wait for a bit. + +If we want to allow the code in our `DefaultMaterialSystem` to statically specialize to the type of the lighting system, we end up to use generics, either by making the whole type generic: + +``` +struct DefaultMaterialSystem< L : ILightingSystem > +{ + ParameterBlock lighting; + ... +} +``` + +or by just making the `doMaterialStuff()` operation generic, with the lighting system being passed in as a parameter. + +``` +struct DefaultMaterialSystem +{ + ... + void doMaterialStuff< L : ILightingSystem>( L lighting ) + { + lighting.doLightingStuff(); + } +} +``` + +Each of those options moves the responsibility for managing the lighting system type up a level of abstraction: whatever code works with a material system needs to manage the details. + +When we now step up the next level to the `Integrator` module, the approach using `struct`s really starts to show cracks. +We have the option of making the `DefaultIntegeratorSystem` a generic on the type of *both* subsystems, and reference them via parameter blocks: + +``` +struct DefaultIntegratorSystem< L : ILightingSystem, M : IMaterialSystem > +{ + ParameterBlock lighting; + ParameterBlock material; + ... +} +``` + +or we have to make all the relevant operations on the integrator take both subsystems as pass the relevant subsystem instances in as parameters: + +``` +struct DefaultIntegratorSystem +{ + ... + void doIntegeratorStuff< L : ILightingSystem, M : IMaterialSystem >( + L lighting, + M material) + { + lighting.doLightingStuff(); + material.doMaterialsStuff(lighting); + } +} +``` + +In each case, more and more responsibiity for configuration of implementation details is being punted up to the next higher level of abstraction. +In the first case, somebody else is responsible for instantiating a type like: + +``` +DefaultIntegeratorSystem> +``` + +Also, the application code that works with that messy type needs to make sure to fill in *one* parameter block for the lighting system, but set it into *both* the material system and integrator. + +In the second case, note how the integrator already has to ensure that it passes along the `lighting` subsystem to the `doMaterialStuff` operation, and anybody who invokes `doIntegratorSutff` would have to do the same, but for *two* subsystems. + +The whole thing doesn't scale and becomes intractable with more than a few subsystems. +Trying to work anything like inheritance into the mix just falls flat completely. + +Related Work +------------ + +There is a lot of work in general-purpose programming languages around defining larger-scale modularity units. + +SML (Standard ML) has both modules and *signatures*, which are effectively interfaces for modules. +Modules can be parameterized on other modules based on signatures, and instantiated to use different concrete implementations. + +Beta and gbeta unify both classes and functions into a single construct called a *pattern*, and show that patterns (including pattern inheritance) can be used for things akin to traditional modules. +A variety of techniques for *family polymorphism* in world of Java and similar languages followed on from that tradition. +The Scala language continues in the same vein, with papers and presentations on Scala advocating for using `class`es to model large units of modularity akin to modules. + +In the world of "enterprise" software using Java, C#, JavaScript, etc. there is a large family of techniques and system for "dependency inversion" which is used to automate some or all of the process of "wiring up" the concrete implementations of various subsystems/components based on explicit representations of dependencies (often attached as metadata on the fields of a type). + +Modern general-purpose game engines like Unity and Unreal often use a "component" concept, where a game entity/object is composed of multiple loosely-coupled sub-objects (components). +Often these systems allow dependencies between component types to be stated explicitly, with runtime or tools support for ensuring that objects are not created with unsatisifed dependencies. + +Note that almost all of the approaches enumerated above rely deeply on the fact that a dependency of unit `X` on unit `Y` can be handily represented as a single pointer/reference in most general-purpose programming languages. For example, in C++: + +``` +class Y { ... }; +class X +{ + Y* y; + ... +}; +``` + +In the above, an instance of `X` can always find the `Y` it depends on easily and (relatively) efficiently. +There is no particularly high overhead to having `X` directly store an indirect reference to `Y` (at least not for coarse-grained units), and it is trivial for multiple units like `X` to all share the same *instance* of `Y` (potentially even including mutable state, for applications that like that sort of thing). + +In general most CPU languages (and especially OOP ones) can express the concepts of "is-a" and "has-a" but they often don't distinguish between when "has-as" means "refers-to-and-depends-on-a" vs. when it means "aggregates-and-owns-a". +This is important when looking at a GPU language like Slang, where "aggergates-and-owns-a" is easy (we have `struct` types), but "refers-to-and-depends-on-a" is harder (not all of our targets can really support pointers). + +Proposed Approach +----------------- + +We propose to introduce a new construct called a *component type* to Slang, which can be used to describe units of modularity larger than what `struct`s are good for, but that is intentionally defined in a way that allows it to be used in cases where fully general `class` types could not be supported. + +To render our earlier examples in terms of component types: + +``` +// Lighting.slang + +interface ILight { ... } + +interface ILightingSystem +{ + void doLightingStuff(); +} + +__component_type DefaultLightingSystem : ILightingSystem +{ + StructuredBuffer lights; + + void doLightingStuff() { ... } +} +``` + +``` +// Materials.slang +import Lighting; + +interface IMaterial { ... } +interface IMaterialSystem +{ + void doMaterialStuff(); +} + +__component_type DefaultMaterialSystem : IMaterialSystem +{ + __require lighting : ILightingSystem; + + StructuredBuffer materials; + + void doMaterialStuff() + { + lighting.doLightingStuff(); + } +} +``` + +``` +// Integrator.slang +import Lighting; +import Materials; + +interface IIntegratorSystem +{ + void doIntegratorStuff(); +} + +__component_type DefaultIntegratorSystem : IIntegratorSystem +{ + __require ILightingSystem; + __require IMaterialSystem; + + ParameterBlock params; + + void doIntegeratorStuff() + { + doLightingStuff(); + doMaterialsStuff(); + } +} +``` + +The `__component_type` keyword is akin to `struct` or `class`, but introduces a component type. +Component types are similar to both structure and class types in that they can: + +* Conform to zero or more interfaces +* Define fields, methods, properties, and nested types +* Eventually: optionally inherit from one (or more) other component types + +The key thing that a `__component_type` can do that a `struct` cannot (but that a `class` might be allowed to) is include nested `__require` declarations. +In the simplest form, a require declaration is of the form: + +``` +__require someName : ISomeInterface; +``` + +Within the scope where the `__require` is visible, `someName` will refer to a value that conforms to `ISomeInterface`, but code need not know what the value is (nor what its type is). +The other form of `__require`: + +``` +__require ISomeInterface; +``` + +Can be seen as shorthand for something like: + +``` +__require _anon : ISomeInterface; +using anon.*; // NOTE: not actual Slang syntax +``` + +One more construct is needed to complete the feature, and it can introduced by illustrating a concerete type that pulls together our default implementations: + +``` +__component_type MyProgram +{ + __aggregate lighting : DefaultLightingSystem; + __aggregate DefaultMaterialSystem; + __aggregate DefaultIntegeratorSystem; +} +``` + +An `__aggregate` declaration is only allowed inside a `__component_type` (or a `class`, if we allow it). +Similar to `__require`, an `__aggregate` can either name the thing being aggregated, or leave it anonymous (and have its members imported into the current scope). +The semantics of `__aggregate SomeType` are similar to just declaring a field of `SomeType`, but the key distinction is that the aggregated sub-object is conceptually allocated and initialized as *part* of the outer object (one alternative name for the keyword would be `__part`). +It is not possible to assign to an `__aggregate` member like it is a field (although if the type is a reference type, *its* fields are visible and might still be mutable). + +At the point where an `__aggregate SomeType` member is declared, the front-end semantic checking must be able to find/infer the identity of a value to use to satisfy each `__require` member in `SomeType`. +For example, because `DefaultIntegratorSystem` declares `__require IMaterialSystem`, the compiler searches in the current context for a value that can provide that interface. +It finds a single suitable value: the value implicitly defined by `__aggregate DefaultMaterialSystem`, and thus "wires up" the input dependency of the `DefaultIntegratorSystem`. + +It is possible for a `__require` in an `__aggregate`d member to be satisfied via another `__require` of its outer type: + +``` +__component_type MyUnit +{ + __require ILightingSystem; + __aggregate DefaultMaterialSystem; +} +``` + +In the above example, the `ILightingSystem` requirement in `DefaultMaterialSystem` will be satisfied using the `ILightingSystem` `__require`d by `MyUnit`. + +In cases where automatic search and connection of dependencies does not work (or yields an ambiguity error), the user will need some mechanism to be able to explicitly specify how dependencies should be satisfied. + +While the above examples do not show it, component types should be allowed to contain shader entry points. + +Detailed Explanation +-------------------- + +Component types need to be restricted in where and how they can be used, to avoid creating situations that would give them all the flexibility of arbitrary `class`es. +The only places where a component type may be used are: + +* `__require` and `__aggregate` declarations +* Function parameters +* Generic arguments (??? Need to double-check how this can go wrong) + +Any given `__component_type` either has no `__require`s and is thus concrete, or it has a nonzero number of `__require`s and is abstract. + +We can ignore the `__require`s in a component type (if any) and form an equivalent `struct` type. +In that `struct` type, `__aggregate`s turn into ordinary fields. +For example, the `MyProgram` (concrete) and `MyUnit` (abstract) types above become: + +``` +struct MyProgram +{ + DefaultLightingSystem lighting; + DefaultMaterialSystem _anon0; + DefaultIntegeratorSystem _anon1; +}; +struct MyUnit +{ + DefaultMaterialSystem _anon2; +}; +``` + +When a component type is used as a function parameter (including an implicit `this` parameter) it effectively maps to a function that takes additional (generic) parameters corresponding to each `__require`. +For example, given: + +``` +void doStuff( MyUnit u ) { ... u.doMaterialStuff(); ... } +``` + +we would generate something like: + +``` +void doStuff< T : ILightingSystem >( + T _anon3, // for the `__require : ILightingSystem` in `MyUnit` + MyUnit u) +{ + ... + DefaultMaterialSystem_doMaterialStuff(_anon3, u._anon2); + ... +} +``` + +Note that when the generated code invokes an operation through one of the `__aggregate`d members of a component type, where the aggregated type had `__require`ments, the compiler must pass along the additional parameters that represent those requirements in the current context. + +Effectively, the compiler generates all of the boilerplate parameter-passing that the programmer would have otherwise had to write by hand. + +It might or might not be obvious that the notion of "component type" being described here has a clear correspondence to the `IComponentType` interface provided by the Slang runtime/compilation API. +It should be possible for us to provide reflection services that allow a programmer to look up a component type by name and get an `IComponentType`. +The existing APIs for composing, specializing, and linking `IComponentType`s should Just Work for explicit `__component_type`s. +Large aggregates akin to `MyProgram` above can be defined entirely via the C++ `IComponentType` API at program runtime. + +Questions +--------- + +### How do we explain to users when to use component types vs. when to use modules? Or `struct`s? + +The basic advice should be something like: + +* If the thing feels subsystem-y, favor modules or component types. If it feels object-y or value-y, use a `struct`. This is loose, but intuition is good here. + +* If the thing wants to depend on other subsystems through `interface`s, to allow mix-and-match flexibility, it should probably be a component type and not a module. + +* If the thing wants to have its own shader parameters, then we encourage users to consider that a component type is likely to be what they want, so that they don't pollute the global scope. + +That last point is important, since a component type allows users to define a collection of global shader parameters and entry points that use them as a unit, without putting those parameters in the global scope, which is something that was not really possible before. + +### Can the `__component_type` construct just be subsumed by either `struct` or `class`? + +Maybe. The key challenge is that component types need to provide the "look and feel" of by-reference re-use rather than by-value copying. A `__require T` should effectively act like a `T*` and not a bare `T` value, so I am reluctant to say that should map to `struct`. + +### But what about `[mutating]` operations and writing to fields of component types, then? + +Yeah... that's messy. If component types really are by-reference, then they should be implicitly mutable even without passing as `inout`, and should ideally also support aliasing. We need to make sure we get clarity on this. + +### Is `__aggregate` really required? Isn't it basically just a field? + +An `__aggregate X` acts a lot like a field if `X` is a *value* type, but in cases where `X` is a *reference* type, there is a large semantic distinction. + +### How does all this stuff relate to inheritance? + +There are some things that can be done with (multiple) inheritance that can also be expressed via `__require`s. For example, both can represent the "diamond" pattern: + +``` +class A { ... } +class B : A { ... } +class C : A { ... } +class D : B, C { ... } +``` + +``` +__component_type A { ... } +__component_type B { __require A; ... } +__component_type C { __require A; ... } +__component_type D { __require B; __require C; ... } +``` + +The Spark shading language research project used multiple mixin class inheritance to compose units of shader code akin to what are being proposed here as component types (hmm... I guess that should go into related work...). + +In general, using inheritance to model something that isn't an "is-a" relationship is poor modeling. +Inheritance as a modelling tool cannot capture some patterns that are possible with `__aggregate` (notably, with mixin inheritance you can't get multiple "copies" of a component). +Most importantly, when inheritance is abused for modeling like this, the resulting code can be confusing. Consider: + +``` +abstract class MyFeature : ISystemA, ISystemB +{ ... } +``` + +From this declaration, it is not possible to tell whether `MyFeature` implements just `ISystemA`, just `ISystemB`, both, or neither. +The distinction between an inheritance clause ("I implement this thing") vs. a `__require` ("I need *somebody else* to implement this thing") is important documentation. + +Alternatives Considered +----------------------- + +I'm not aware of any big design alternatives that don't amount to more or less the same thing with different syntax. +One alternatives is to try to do something like ML-style "signature" for our modules, and allow something like `import ILighting` to allow module-level dependencies on abstracted interfaces. + +Another alternative is to do what this document proposes, but make it work with the existing `struct` keyword (or `class`) instead of adding a new one. diff --git a/proposals/legacy/006-artifact-container-format.md b/proposals/legacy/006-artifact-container-format.md new file mode 100644 index 0000000..04daf3b --- /dev/null +++ b/proposals/legacy/006-artifact-container-format.md @@ -0,0 +1,1119 @@ +Shader Container Format +======================= + +This proposal is for a file hierarchy based structure that can be used to represent compile results and more generally a 'shader cache'. Ideally it would feature + +* Does not require an extensive code base to implement +* Flexible and customizable for specific use cases +* Possible to produce a simple fast implementation +* Use simple and open standards where appropriate +* Where possible/appropriate human readable/alterable +* A way to merge, or split out contents that is flexible and easy. Ideally without using Slang tooling. + +Should be able to store + +* Compiled kernels +* Reflection/layout information +* Diagnostic information +* "meta" information detailing user specific, and product specific information +* Source +* Debug information +* Binding meta data +* Customizable, and user specified additional information + +API support needs to allow + +* Interchangeable use of static shader cache/slang compilation/combination of the two + * Implies compilation needs to be initiated in some way that is compatible with shader cache keying +* Ability to store compilations as they are produced + +Needs to be able relate and group products such that with suitable keys, it is relatively fast and easy to find appropriate results. + +It's importance/relevance + +* Provides a way to represent complex compilation results +* Could be used to support an open standard around 'shader cache' +* Provide a standard 'shader cache' system that can be used for Slang tooling and customers +* Supports Slang tooling and language features + +## Use + +There are several kinds of usage scenario + +* A runtime shader cache +* A runtime shader cache with persistence +* A capture of compilations +* A baked persistent cache - must also work without shader source +* A baked persistent cache, that is obfuscated + +A runtime shader cache has the following characteristics: + +* Can works with mechanisms that do not require any user control (such as naming). Ie purely inputs/options can define a 'key'. +* It is okay to have keys/naming that are not human understandable/readable. +* The source is available - such that hashes based on source contents can be produced. +* It does not matter if hashes/keys are static between runs. +* It is not important that a future version would be compatible with a previous version or vice versa. +* Could be all in memory. +* May need mechanism/s to limit the working set +* Generated source can be made to work, because it is possible to hash generated source + +At the other end of the spectrum a baked persistent cache + +* Probably wants user control over naming +* Doesn't have access to source so can't use that as part of a hash +* Probably doesn't have access to dependencies +* Having some indirection between a request and a result is a useful feature +* Ideally can be manipulated and altered without significant tooling +* Generated source may need to be identified in some other way than the source itself + +It should be possible to serialize out a 'runtime shader cache' into the same format as used for persistent cache. It may be harder to use such a cache without Slang tooling, because the mapping from compilation options to keys will probably not be simple. + +Status +------ + +## Gfx + +There is a run time shader cache that is implemented in gfx. + +There is some work around a file system backed shader cache in gfx. + +## Artifact System + +The Artifact provides a mechanism to transport source/compile results through the Slang compiler. It already supports most of the different items that need to be stored. + +Artifact has support for "containers". An artifact container is an artifact that can contain other artifacts. Support for different 'file system' style container formats is also implemented. The currently supported underlying container formats supported are + +* Zip +* Riff + * Riff without compression + * Riff with deflate + * Riff with LZ4 compression + +Additionally the mechanisms already implemented support + +* The OS filesystem +* A virtual file system +* A 'chroot' of the file system (using RelativeFileSystem) + +In order to access a file system via artifact, is as simple as adding a modification to the default handler to load the container, and to implement `expandChildren`, which will allow traversal of the container. In general this works in a 'lazy' manner. Children are not expanded, unless requested, and files not decompressed unless required. The system also provides a caching mechanism such that a representation, such as uncompressed blob, can be associated with the artifact. + +Very little code is needed to support this behavior because the IExtFileArtifactRepresentation and the use of the ISlangFileSystemExt interface, mean it can work using the existing mechanisms. + +It is a desired feature of the container format that it can be represented as 'file system', and have the option of being human readable where appropriate. Doing so allows + +* Third party to develop tools/formats that suit their specific purposes +* Allows different containers to used +* Is generally simple to understand +* Allows editing and manipulating of contents using pre-existing extensive and cross platform tools +* Is a simple basis + +This documents is, at least in part, about how to structure the file system to represent a 'shader cache' like scenario. + +Incorporating the 'shader container' into the Artifact system will require a suitable Payload type. It may be acceptable to use `ArtifactPayload::CompileResults`. The IArtifactHandler will need to know how to interpret the contents. This will need to occur lazily at the `expandChildren` level. This will create IArtifacts for the children that some aspects are lazily evaluated, and others are interpreted at the expansion. For example setting up the ArtifactDesc will need to happen at expansion. + +Background +========== + +The following long section provides background discussion on a variety of topics. Jump to the [Proposed Approach](#proposed-approach) to describe what is actually being suggested in conclusion. + +To enumerate the major challenges + +* How to generate a key for the runtime scenario +* How to produce keys for the persistent scenario - implies user control, and human readability +* How to represent compilation in a composable 'nameable' way +* How to produce options from a named combination + +The mechanism for producing keys in the runtime scenario could be used to check if an entry in the cache is out of date. + +A compilation can be configured in many ways. Including + +* The source including source injection +* Pre-processor defines +* Compile options - optimization, debug information, include paths, libraries +* Specialization types and values +* Target and target specific features API/tools/operating system +* Specific version of slang and/or downstream compilers +* Pipeline aspects +* Slang components + +In general we probably don't want to use the combination of source and/or the above options as a 'key'. Such a key would be hard and slow to produce. It would not be something that could be created and used easily by an application. Moreover it is commonly useful to be able to name results such that the actual products can be changed and have things still work. + +Background: Hashing source +========================== + +Hashing source is something that is needed for runtime cache scenario, as it is necessary to generate a key purely from 'input' which source is part of. It can also be used in the persistent scenario, in order to validate if everything is in sync. That sync checking might perhaps only be performed when source and other resources are available. + +The fastest/simplest way to hash source, is to take the blob and hash that. Unfortunately there are several issues + +* Ignores dependencies - if this source includes another file the hash will also need to depend on that transitively +* Hash changes with line end character encoding +* Hash is sensitivve to white space changes in general + +A way to work around whitespace issues would be to use a tokenizer, or a 'simplified' tokenizer that only handles the necessary special cases. An example special case would be white space in a string is always important. Such a solution does not require an AST or rely on a specific tokenizer. A hash could be made of concatenation of all of the lexemes with white space inserted between. + +Another approach would be to hash each "token" as produced. Doing so doesn't require memory allocation for the concatenation. You could special case short strings or single chars, and hash longer strings. + +## Dependencies + +Its not enough to rely on hashing of input source, because `#include` or other resource references, such as modules or libraries may be involved. + +If we are are relying on dependencies specified at least in part by `#include`, it implies the preprocessor be executed. This could be used for other languages such as C/C++. Some care would need to be taken because *some* includes will probably not be located by our preprocessor, such as system include paths in C++. For the purpose of hashing, an implementation could ignore `#includes` that cannot be resolved. This may work for some scenarios - but doesn't work in general because symbols defined in unfound includes might cause other includes. Thus this could lead to other dependencies not being found, or being assumed when they weren't possible. + +In practice whilst not being perfect it may work well enough to be broadly usable. + +## AST + +A hash could be performed via the AST. This assumes + +1) You can produce an AST for the input source - this is not generally true as source could be CUDA, C++, etc +2) The AST would have to be produced post preprocessing - because prior to preprocessing it may not be valid source +3) If 3rd parties are supposed to be able to produce a hash it requires their implementing a Slang lexer/parser in general +4) Depending on how the AST is used, may not be stable between versions + +Another disadvantage around using the AST is that it requires the extra work and space for parsing. + +Using the AST does allow using pre-existing Slang code. It is probably more resilient to structure changes. It would also provide slang specific information more simply - such as imports. + +## Slang lexer + +If we wanted to use Slang lexer it would imply the hash process would + +1) Load the file +2) Lex the file +3) Preprocess the file (to get dependencies). Throw these tokens away (we want the hash of just the source) +4) Hash the original lex of the files tokens +5) Dependencies would require hashing and combining + +For hashing Slang language and probably HLSL we can use the Slang preprocessor tokenizer, and hash the tokens (actually probably just the token text). + +Using the Slang lexer/preprocessor may work reasonably for other languages such as C++/C/HLSL/GLSL. It does imply a reliance on a fairly large amount of slang source. + +## Simplified Lexer + +We may want to use some simple lexer. A problem with using a lexer at all is that it adds a great amount of complexity to a stand alone implementation. The simplified lexer would + +* Simplify white space - much we can strip +* Honor string representations (we can't strip whitespace) +* Honor identifiers +* We may want some special cases around operators and the like +* Honor `#include` (but ignore preprocessor behavior in general) +* Ignore comments + +We need to handle `#include` such that we have dependencies. This can lead to dependencies that aren't required in actual compilation. + +We need to honor some language specific features - such as say `import` in Slang. + +Such an implementation would be significantly simpler, and more broadly applicable than Slang lexer/parser etc. Writing an implementation would determine how complex - but would seem to be at a minimum 100s of lines of code. + +We can provide source for an implementation. We could also provide a shared library that made the functionality available via COM interface. This may help many usage scenarios, but we would want to limit the complexity as much as possible. + +## Generated Source + +Generated source can be part of a hash if the source is available. As touched on there are scenarios where generated source may not be available. + +We could side step the issues around source generation if we push that problem back onto users. If they are using code generation, the system could require providing a string that uniquely identifies the generation that is being used. This perhaps being a requirement for a persistent cache. For a temporary runtime cache, we can allow hash generation from source. + +Background: Hashing Stability +============================= + +Ideally a hashing mechanism can be resilient to unimportant changes. The previous section described some approaches for changes in source. The other area of significant complexity is around options. If options are defined as JSON (or some other 'bag of values') hashing can be performed relatively easily with a few rules. If the representation is such that if a value is not set, the default is used, it is resilient to changes of options that are explicitly set. + +When the hashing is on some native representation this isn't quite so simple, as a typical has function will include all fields. A field value, default or not will alter the hash. Therefore adding or removing a field will necessarily change the hash. + +One way around this would be to use a hashing regime that only altered the hash if the values are not default. + +```C++ + +Hash calcHash(const Options& options) +{ + const Options defaultOptions; + + Hash = ...; + if (option.someOption != defaultOption.someValue) + { + hash = combineHash(hash, option.someOption.getHash()); + } + // ... +} +``` + +This could perhaps be simplified with some macro magic. + +``` + +// Another + +struct HashCalculator +{ + template + void hashIfDifferent(const field& f, const T& defaultValue) + { + if (value != defaultValue) + { + hash = combineHash(hash, value.getHash()); + } + } + + const T defaultValue; + const T* value; + Hash hash; +}; + +Hash calcHash(const Options& options) +{ + HashCalculator calc; + const Options defaultOptions; + + calc.hashIfDifferent(options.someOption, defaultOptions.someOption); + // ... + Hash = ...; + +} +``` + +This is a little more clumsy, but if we wanted to use a final native representation, it is workable. + +Note that the ordering of hashing is also important for stability. + +Background: Key Naming +====================== + +The container could be seen as a glorified key value store, with the key identifying a kernel and associated data. + +Much of the difficulty here is how to define the key. If it's a combination of the 'inputs' it would be huge and complicated. If it's a hash, then it can be short, but not human readable, and without considerable care not stable to small or irrelevant changes. + +For a runtime cache type scenario, the instability and lack of human readability of the key probably doesn't matter too much. It probably is a consideration how slow and complicated it is to produce the key. + +For any cache that is persistent how naming occurs probably is important. Because + +* Our 'options' aren't going to make much sense with other compilers (if we want the standard to be more broadly applicable) +* The options we have will not remain static +* Having an indirection is useful from an application development and shipping perspective +* That the *name* perhaps doesn't always have to indicate every aspect of a compilation from the point of view of the application + +One idea touched on in this document is to move 'naming' into a user space problem. That compilations are defined by the combination of 'named' options. In order to produce a shader cache name we have a concatination of names. The order can be user specified. Order could also break down into "directory" hierarchy as necessary. + +Some options will need to be part of some order. This is perhaps all a little abstract so as an example + +```JSON +{ + // Configuration + + "configuration" : { + + "debug" : { + "group" : "configuration", + "optimization" : "0", + "debug-info" : true, + "defines" : [ "-DDEBUG=1" ] + }, + "release" : { + "optimization" : "2", + "debug-info" : true, + "defines" : [ "-DRELEASE=1" ] + }, + "full-release" : { + "optimization" : "3", + "debug-info" : false, + "defines" : [ "-DRELEASE=1", "-DFULL_RELEASE=1" ] + } + }, + + // Target + "target" : { + "vk" : { + } + "d3d12" : { + } + + "cpu" : { + } + }, + + // Stage + "stage" : { + "compute" : { + }, + }, + + combinations : [ + { + key : [ "vk", "compute", ["release", "full-release"] ], + options : + { + "optimization" : 1 + } + } + ] +} +``` + +The combination in this manner doesn't quite work, because some combinations may imply different options. The "combinations" section tries to address this by providing a way to 'override' behavior. This could of course be achieved with a call back mechanism. We may also want to have options that don't appear in the key, allowing 'overriding' behavior without needing a 'combinations' section. The implication is that when used in the application when looking up only the 'named' configuration is needed. + +This whole mechanism provides a way of specifying a compilation by a series of names, that can produce a unique human readable key. It is under user control, but the mechanism on how the combination takes place is at least as a default defined within an implementation. + +It may be necessary to define options by tool chain. Doing so would mean the names can group together what might be quite different options on different compilers. Having options defined in JSON means that the mechanisms described here can be used for other tooling. If the desire is to have some more broadly applicable 'shader cache' representation this is desirable. + +If it is necessary obfuscate the contents, it would be possible to put the human readable key though a hash, and then the hash can be used for lookup. + +## Container Location + +We could consider the contents of the container as 'flat' with files for each of the keys. There could be several files related to a key if we use file association mechanism (as opposed to a single manifest). + +Whilst this works it probably isn't particularly great from an organizational point of view. It might be more convenient if we can use the directory hierarchy if at least optionally. For example putting all the kernels for a target together... + +``` +/target/name-entryPoint +``` + +Or + +``` +/target/name/entryPoint-generated-hash +``` + +Where 'generated-hash' was the hash for generated source code. + +Perhaps this information would be configured in a JSON file for the repository. + +What happens if we want to obfuscate? We could make the whole path obfuscated. We could in the configuration describe which parts of the name will be obfuscated, such that it's okay to see there are different 'target' names. + +## Default names + +When originally discussed, the idea was that all options can be named, and thus any combination is just a combination of names. That combination produces the key. + +Whilst this works, it might make sense to allow 'meta' names for common types. The things that will typically change for a fixed set of options would be + +* The input translation unit source +* The target (in the Slang sense) +* The entryPoint/stage (can we say the entry point name implies the stage?) + +We could have pseudo names for these commonly changed values. If there are multiple input source files for a translation unit, we could key on the first. + +We could also have some 'names' that are built in. For example default configuration names such as 'debug' and 'release'. They can be changed in part of configuration but have some default meaning. That options can perhaps override the defaults. + +Using the pseudo name idea might mean it is possible to produce reasonable default names. Moreover we can still use the hashing mechanism to either report a validation issue, or trigger recompilation when everything needed to do as much is available. + +## Target + +A target can be quite a complicated thing to represent. Artifact has + +* 'Kind' - executable, library, object code, shared library +* 'Payload' - SPIR-V, DIXL, DXBC, Host code, Universal, x86_64 etc... + * Version +* 'Style' - Kernel, host, unknown + +This doesn't take into account a specific 'platform', where that could vary a kernel depending on the specific features of the platform. There are different versions of SPIR-V and there are different extensions. + +This doesn't cover the breadth though because for CPU targets there is additionally + +* Operating system - including operating system version +* Tool chain - Compiler + +Making this part of the filename could lead to very long filenames. The more detailed information could be made available in JSON associated files. + +This section doesn't provide a specific plan on how to encapsulate the subtlety around a 'target'. Again how this is named is probably something that is controllable in user space, but there are some reasonable defaults when it is not defined. + +Background: Describing Options +============================== + +We need some way to describe options for compilation. The most 'obvious' way would be something like the IDownstreamCompiler interface and associated types + +```C++ +struct Options +{ + Includes ...; + Optimizations ...; + Miscellaneous ...; + Source* source[]; + EntryPoint ... ; + ... + CompilerSpecific options; +} + +ICompiler +{ + Result compile(const Options* options, IArtifact** outArtifact); +} +``` + +For this to work we need a mapping from 'options' to the cached result (if there is one). There are problem around this, because + +* Items may be specified multiple times +* Ideally the hash would or at least could remain stable with updated to options +* Also ideally the user might want control over what constitutes a new version/key +* Calculating a hash is fairly complicated, and would need to take into account ordering + +Another option might be to split common options from options that are likely to be modified per compilation. For example + +``` +struct Options +{ + const char* name; ///< Human readable name + Includes ...; + Optimizations ...; + Miscellaneous ...; + Source* source[]; + ... + CompilerSpecific options; +} + +struct CompileOptions +{ + Stage stage ...; + SpecializationArgs ...; + EntryPoint ...; +}; + +ICompiler +{ + Result createOptions(Options* options, IOptions* outOptions); + + Result compile(IOptions* options, const CompileOptions* compileOptions, IArtifact** outArtifact); +} +``` + +Having the split greatly simplifies the key production, because we can use the unique human name, and the very much simpler values of CompileOptions to produce a key. + +This specifying of options in this way is tied fairly tightly to the Slang API. We can generalize the named options by allowing more than one named option set. + +## Bag of Named Options + +Perhaps identification is something that is largely in user space for the persistent scenario. You could imagine a bag of 'options', that are typically named. Then the output name is the concatenation of the names. If an option set isn't named it doesn't get included. Perhaps the order of the naming defines the precedence. + +This 'bag of options' would need some way to know the order the names would be combined. This could be achieved with another parameter or option that describes name ordering. Defining the ordering could be achieved if different types of options are grouped, by specifying the group. The ordering would only be significant for named items that will be concatenated. The ordering of the options could define the order of precedence of application. + +Problems: + +How to combine all of these options to compile? +How to define what options are set? Working at the level of a struct doesn't work if you want to override a single option. +The grouping - how does it actually work? It might require specifying what group a set of options is in. + +An advantage to this approach is that policy of how naming works as a user space problem. It is also powerful in that it allows control on compilation that has some independence from the name. + +We could have some options that are named, but do not appear as part of the name/path within the container. The purpose of this is to allow customization of a compilation, without that customization necessarily appearing within the application code. The container could store group of named options that is used, such that it is possible to recreate the compilation or perhaps to detect there is a difference. + +### JSON options + +One way of dealing with the 'bag of options' issue would be to just make the runtime json options representation, describe options. Merging JSON at a most basic level is straight forward. For certain options it may make sense to have them describe adding, merging or replacing. We could add this control via adding a key prefix. + +```JSON +{ + "includePaths" : ["somePath", "another/path"], + "someValue" : 10, + "someEnum" : enumValue, + "someFlags" : 12 +} +``` + +As an example + +```JSON +{ + "+includePaths" : ["yet/another"], + "intValue" : 20, + "-someValue" : null, + "+someFlags" : 1 +} +``` + +When merged produces + +```JSON +{ + "includePaths" : ["somePath", "another/path", "yet/another"], + "someEnum" : enumValue, + "someFlags" : 13, + "intValue" : 20 +} +``` + +It's perhaps also worth pointing out that using JSON as the representation provides a level of compatibility. Things that are not understood can be ignored. It is human readable and understandable. We only need to convert the final JSON into the options that are then finally processed. + +One nice property of a JSON representation is that it is potentially the same for processing and hashing. + +### Producing a hash from JSON options + +One approach would be to just hash the JSON if that is the representation. We might want a pass to filter out to just known fields and perhaps some other sanity processing. + +* Filtering +* Ordering - the order of fields is generally not the order we want to combine. One option would be to order by key in alphabetical order. +* Handling values that can have multiple representations (if we allow an enum as int or text, we need to hash with one or ther other) +* Duplicate handling + +Alternatively the JSON could be converted into a native representation and that hashed. The problem with this is that without a lot of care, the hash will not be stable with respect to small changes in the native representation. + +Another advantage of using JSON for hash production, is that it is something that could be performed fairly easily in user space. + +Two issues remain significant with this approach + +* Filtering - how? +* Handling multiple representations for values + +Filtering is not trivial - it's not a question of just specifying what fields are valid, because doing so requires context. In essence it is necessary to describe types, and then describe where in a hierarchy a type is used. + +I guess this could be achieved with... JSON. For example + +``` +{ + "types": + { + "SomeType": + { + "kind" : "struct", + "derivesFrom": "..." + fields: { + [ "name", "type", "default"] + } + }, + "MainOptions": + { + "..." + } + }, + "structure" : + { + "MainOptions" + } +} +``` + +When traversing we use the 'structure' to work out where a type is used. + +This is workable, but adds additional significant complexity. + +The issue around different representations could also use the information in the description to convert into some canonical form. + +The structure could potentially generated via reflection information. + +## Native bag of options + +Options could be represented via an internal struct on which a hash can be performed. + +Input can be described as "deltas" to the current options. The final options is the combination of all the deltas - which would produce the final options structure for use. The hash of all of the options is the hash of the final structure. + +How are the deltas described? + +The in memory representation is not trivial in that if we want to add a struct to a list we would need a way to describe this. + +Whilst in the runtime the 'field' could be uniquely identified an offset, within a file format representation it would need to be by something that works across targets, and resistant to change in contents. + +## Slangs Component System + +Slang has a component system that can be used for combining options to produce a compilation. An argument can be made that it should be part of the hashing representation, as it is part of compilation. + +If combination is at the level of components, then as long as components are serializable, we can represent a compilation by a collection of components. Has several derived interfaces... + +* IEntryPoint +* ITypeConformance +* IModule + +Can be constructed into composites, through `createCompositeComponentType`, which describes aspects of the combination takes place. + +If the components were serializable (as say as JSON), we could describe a compilation as combination of components. If components are named, a concatenation of names could name a compilation. + +It doesn't appear as if there is a way to more finely control the application of component types. For example if there was a desire to change the optimization option, it would appear to remain part of the ICompileRequest (it's not part of a component). This implies this mechanism as it stands whilst allowing composition, doesn't provide the more nuanced composition. Additional component types could perhaps be added which would add such control. + +Perhaps having components is not necessary as part of the representation, as 'component' system is a mechanism for achieving a 'bag of options' and so we can get the same effect by using that mechanism without components. + +Background: Describing Options +============================== + +The 'naming' options idea implies that options and ways of combining options can be stored within the configuration for a container. Perhaps there is additionally a runtime API that allows creation of deltas. + +## 'Bag of named options' + +Perhaps identification is something that is largely in user space for the persistent scenario. You could imagine a bag of 'options', that are typically named. Then the output name is the concatenation of the names. If an option set isn't named it doesn't get included. Perhaps the order of the naming defines the precedence. + +This 'bag of options' would need some way to know the order the names would be combined. This could be achieved with another parameter or option that describes name ordering. Defining the ordering could be achieved if different types of options are grouped, by specifying the group. The ordering would only be significant for named items that will be concatenated. The ordering of the options could define the order of precedence of application. + +Problems: + +How to combine all of these options to compile? +How to define what options are set? Working at the level of a struct doesn't work if you want to override a single option. +The grouping - how does it actually work? It might require specifying what group a set of options is in. + +An advantage to this approach is that policy of how naming works as a user space problem. It is also powerful in that it allows control on compilation that has some independence from the name. + +### JSON options + +One way of dealing with the 'bag of options' issue would be to just make the runtime JSON options representation, describe options. Merging JSON at a most basic level is straight forward. For certain options it may make sense to have them describe adding, merging or replacing. We could add this control via adding a key prefix. + +```JSON +{ + "includePaths" : ["somePath", "another/path"], + "someValue" : 10, + "someEnum" : enumValue, + "someFlags" : 12 +} +``` + +As an example + +```JSON +{ + "+includePaths" : ["yet/another"], + "intValue" : 20, + "-someValue" : null, + "+someFlags" : 1 +} +``` + +When merged produces + +```JSON +{ + "includePaths" : ["somePath", "another/path", "yet/another"], + "someEnum" : enumValue, + "someFlags" : 13, + "intValue" : 20 +} +``` + +It's perhaps also worth pointing out that using JSON as the representation provides a level of compatibility. Things that are not understood can be ignored. It is human readable and understandable. We only need to convert the final JSON into the options that are then finally processed. + +One nice property of a JSON representation is that it is potentially the same for processing and hashing. + +### Producing a hash from JSON options + +One approach would be to just hash the JSON if that is the representation. We might want a pass to filter out to just known fields and perhaps some other sanity processing. + +* Filtering +* Ordering - the order of fields is generally not the order we want to combine. One option would be to order by key in alphabetical order. +* Handling values that can have multiple representations (if we allow an enum as int or text, we need to hash with one or ther other) +* Duplicate handling + +Alternatively the JSON could be converted into a native representation and that hashed. The problem with this is that without a lot of care, the hash will not be stable with respect to small changes in the native representation. + +Another advantage of using JSON for hash production, is that it is something that could be performed fairly easily in user space. + +Two issues remain significant with this approach + +* Filtering - how? +* Handling multiple representations for values + +Filtering is not trivial - it's not a question of just specifying what fields are valid, because doing so requires context. In essence it is necessary to describe types, and then describe where in a hierarchy a type is used. + +I guess this could be achieved with... JSON. For example + +``` +{ + "types": { + "SomeType": + { + "kind" : "struct", + "derivesFrom": "..." + fields: { + [ "name", "type", "default"] + } + }, + "MainOptions": + { + "..." + } + }, + "structure" : + { + "MainOptions" + } +} +``` + +When traversing we use the 'structure' to work out where a type is used. + +This is workable, but adds additional significant complexity. + +The issue around different representations could also use the information in the description to convert into some canonical form. + +The structure could potentially generated via reflection information. + +## Native bag of options + +Options could be represented via an internal struct on which a hash can be performed. + +Input can be described as "deltas" to the current options. The final options is the combination of all the deltas - which would produce the final options structure for use. The hash of all of the options is the hash of the final structure. + +How are the deltas described? + +The in memory representation is not trivial in that if we want to add a struct to a list we would need a way to describe this. + +Whilst in the runtime the 'field' could be uniquely identified an offset, within a file format representation it would need to be by something that works across targets, and resistant to change in contents. That implies it should be a name. + +If we ensure that all types involved in options as JSON serializable via reflection, this does provide a way for code to traffic between and manipulate the native types. + +How do we add a structure to a list? + +```JSON +{ + "+listField", { "structField" : 10, "anotherField" : 20 } +} +``` + +The problem perhaps is how to implement this in native code? It looks like it is workable with the functionality already available in RttiUtil. + +## Slangs Component System + +Slang has a component system that can be used for combining options to produce a compilation. An argument can be made that it should be part of the hashing representation, as it is part of compilation. + +If combination is at the level of components, then as long as components are serializable, we can represent a compilation by a collection of components. Has several derived interfaces... + +* IEntryPoint +* ITypeConformance +* IModule + +Can be constructed into composites, through `createCompositeComponentType`, which describes aspects of the combination takes place. + +If the components were serializable (as say as JSON), we could describe a compilation as combination of components. If components are named, a concatenation of names could name a compilation. + +It doesn't appear as if there is a way to more finely control the application of component types. For example if there was a desire to change the optimization option, it would appear to remain part of the ICompileRequest (it's not part of a component). This implies this mechanism as it stands whilst allowing composition, doesn't provide the more nuanced composition. Additional component types could perhaps be added which would add such control. + +Perhaps having components is not necessary as part of the representation, as 'component' system is a mechanism for achieving a 'bag of options' and so we can get the same effect by using that mechanism without components. + +Discussion: Container +===================== + +## Manifest or association + +A typical container will contain kernels - in effect blobs. The blobs themselves, or the blob names are not going to be sufficient to express the amount of information that is necessary to meet the goals laid out at the start of this document. Some extra information may be user supplied. Some extra information might be user based to know how to classify different kernels. Therefore it is necessary to have some system to handle this metadata. + +As previously discussed the underlying container format is a file system. Some limited information could be inferred from the filename. For example a .spv extension file is probably SPIR-V blob. For more rich meta data describing a kernel something more is needed. Two possible approaches could be to have a 'manifest' that described the contents of the container. Another approach would to have a file associated with the kernel that describes it's contents. + +Single Manifest Pros + +* Single file describes contents +* Probably faster to load and use +* Reduces the amount of extra files +* Everything describing how the contents is to be interpreted is all in one place + +Single Manifest Cons + +* Not possible to easily add and remove contents - requires editing of the manifest, or tooling + * Extra tooling specialized tooling was deemed undesirable in original problem description +* Manifest could easily get out of sync with the contents + +Associated Files Pros + +* Simple +* Can use normal file system tooling to manipulate +* The contents of the container is implied by the contents of the file system + * Easier to keep in sync + +Associated Files Cons + +* Requires traversal of the container 'file system' to find the contents +* Might mean a more 'loose' association between results + +Another possible way of doing the association is via a directory structure. The directory might contain the 'manifest' for that directory. + +Given that we want the format to represent a file system, and that we would want it to be easy and intuitive how to manipulate the representation, using a single manifest is probably ruled out. It remains to be seen which is preferable in practice, but it seems likely that using 'associated files' is probably the way to go. + +## How to represent data + +As previously discussed, unless there is a very compelling reason not to we want to use representations that are open standards and easy to use. We also need such representations to be resilient to changes. It is important that file formats can be human readable or easily changeable into something that is human readable. For these reasons, JSON seems to be a good option for our main 'meta data' representation. Additionally Slang already has a JSON system. + +If it was necessary to have meta data stored in a more compressed format we could consider also supporting [BSON](https://en.wikipedia.org/wiki/BSON). Conversion between BSON and JSON can be made quickly and simply. BSON is a well known and used standard. + +Discussion: Container Layout +============================ + +We probably want + +* Global configuration information - describe how names map to contents +* Configuration that is compiler specific + * The format could support configuration for different compilers +* Use 'associated' file style for additional information to a result + +``` +config +config/global.json +config/slang.json +config/dxc.json +source/ +source/some-source.slang +source/some-header.h +``` + +The `source` path holds all the unique source used during a compilation. This is the 'deduped' representation. Any include hierarchy is lost. Names are generated such that they remain the same as the original where possible, but are made unique if not. The 'dependency' file for a compilation specifies how the source as included maps to the source held in the source directory. The source held in the repository like this provides a way to repeat compilations from source, but isn't the same as the source hierarchy for compilation and is typically a subset. + +`config/global.json` holds configuration that applies to the whole of the container. In particular how names map to the container contents - say the use of directories, or naming concat order. +`config/slang.json` holds how names map to option configuration that is specific to Slang + +We may want to have some config that applies to all different compilers. + +We may want to use the 'name' mechanism for some options, but commonly changing items such as the translation unit source name, entry point name can be passed directly (and used as part of the name). + +Let's say we have a config that consists of `target`, `configuration`. And we use the source name and entry point directly. We could have a configuration that expressed this as a location + +```JSON +{ + "keyPath" : "$(target)/$(filename)/$(entry-point)-$(configuration)" +} +``` + +Lets say we compile `thing.slang`, with entry point 'computeMain' and options + +``` +target: vk +configuration: release +``` + +We end up with + +``` +vk/thing/computeMain-release.spv +vk/thing/computeMain-release.spv-info.json +vk/thing/computeMain-release.spv-diagnostics.json +vk/thing/computeMain-release.spv-layout.json +vk/thing/computeMain-release.spv-dependency.json +``` + +`-info.json` holds the detailed information about what is in spv to identify the artifact, but also for the 'system' in general. Including + +`-dependency.json` is a mapping of source 'names' as part of compilation to the file + +* Artifact type +* The combination of options (which might be *more* than in the path) +* Hashes/other extra information + +Other items associated with the main 'artifact' - typically stored as 'associated' in an artifact, are optional and could be deleted. + +Having an extension, on the associated types is perhaps not necessary. Doing so makes it more clear what items are from a usability point of view. If this data can be represented in multiple ways - say JSON and BSON it also makes clear which it is. + +Discussion: Interface +===================== + +There are perhaps two ends of the spectrum of how an interface might work. One one end would be the interface is a 'slang like' compiler interface, with the the most extreme version being it *is* the Slang compiler interface. Having this interface like this means + +* There is direct access to all options +* If your application already uses the Slang interface, it can just be switched out for the cache +* Has full knowledge of the compilation - making identification unambiguous, and trivially allowing fallback to actually doing a compilation +* Trivially supports more unusual aspects of the API such as the component system +* There is no (or very little) new API, the shader container API *is* the Slang API + +It also means + +* The API has a very large surface area +* It works at the detail of the API +* Does not provide an application level indirection to some more meaningful naming/identification +* It is tied to the Slang compiler - so can't be seen as an interface to 'shader container's more generally +* Naming will almost certainly need to include a hash +* The hash will be hard to produce independently (it will be hard to calculate just anyway) + +More significantly + +* Higher requirement for source + * Could store hashes of source seen + * If there is source injection does this even make sense? + * Could store modules as Slang IR +* For generated source it requires the source +* All source does in general need to be hashed - as paths do not indicate uniqueness +* How is this obfuscated? The amount of information needed is *all of the settings*. + +At the other end of the spectrum the interface could be akin to passing in a set of user configurable parameter "names", that identify the input 'options'. The most extreme form might look something like.... + +``` +class IShaderContainer +{ + Result getArtifact(const char*const* configNames, Count configNamesCount, const Options& options, IArtifact** outArtifact); + Result getOrCreateArtifact(const char*const* configNames, Count configNamesCount, const Options& options, IArtifact** outArtifact); +}; +``` + +Q: Perhaps we don't return an Artifact, because the IArtifact interface is not simple enough. Maybe it returns a blob and an ArtifactDesc? +Q: We could simplify the IArtifact interface by moving to IArtifactContainer. Perhaps we should do this just for this reason? +Q: Is there a way to access other information - diagnostics for example? With IArtifact that can be returned as associated data. We don't want to create by default probably. +Q: If we wanted to associate 'user data' with a result how do we do that? It could just be JSON stored in the `-info`? +Q: We could have a JSON like interface for arbitrary data? + +The combination of the 'configNames' produce the key/paths within the container. + +It would probably be desirable to be able to create 'configNames' through an API. This would have to be Slang specific, and not part of this interface. The config system could be passed into the construction of the container. Doing so might contain all the information to map names to how to compile something. + +This interface may be a little too abstract, and perhaps should have parameters for common types of controls. + +As previously touched on it may be useful to pass in configuration that is *not* part of the key name to override compilation behavior. + +This style means + +* Naming is trivial +* Hashing is often not necessary +* Issues such as 'generated source' are pushed to user space +* Is user configurable +* Main interface is very simple and small +* Can be used with other compilers - because the interface is not tied to Slang in any way +* Implies the container format itself can be used trivially +* Human/application centered +* Hashing of source/options is still possible, for a variety of purposes, but is not a *requirement* as it doesn't identify a compilation. + * Meaning a simpler/less stable hashing might be fine + +With this style is it implied that the identification of a unique combination is a user space problem. For example that the source is static in general, and if not generated source identification is a user space problem. It's perhaps important to note that mechanisms previously discussed - such as hashing the source can still be useful and used. The hashing of source could be used to identify in a development environment that a recompilation is required. Or an edit of source could be made, and a single command could update all contents that is applicable automatically. These are more advanced features, and are not necessary for a user space implementation, which typically do not require the capability. + +More problematically + +* Doesn't provide a runtime cache that 'just works' for example just using the slang API +* Needs to provide a way given a combination of config names to produce the appropriate settings + * If it is just a delivery mechanism this isn't a requirement +* Probably needs both an API and 'config' mechanisms to describe options +* The indirection may lose some control + +All things considered, based on the goals of the effort it seems to make more sense to have an interface that is in the named config style. Because + +* It allows trivial 3rd party implementation +* It works with other compilers - (important if it's to work as some kind of standard) +* Provides an easy to understand mapping from input to the contents of the cache +* Can use more advanced features (like source hashing) if desired + +How config options are described or combined may be somewhat complicated, but is not necessary to use the system, and allows different compilers to implement however is appropriate. + +Discussion : Deduping source +============================ + +When compiling shaders, typically much of the source is shared. Unfortunately it is not generally possible to just save 'used source', because some source can be generated on demand. One way this is already performed by users is to use a specialized include handler, that will inject the necessary code. + +It is not generally possible therefore to identify source by path, or unique identity (as used by the slang file system interface). + +It is also the case that compilations can be performed where the source is passed by contents, and the name is not set, or not unique. + +The `slang-repro` system already handles these cases, and outputs a map from the input path to the potentially 'uniquified' name within the repro. + +You could imagine a container holding a folder of source that is shared between all the kernels. In general it would additionally require a map of each kernel that would map names to uniqified files. + +In the `slang-repro` mechanism the source is actually stored in a 'flat' manner, with the actual looked up paths stored within a map for the compilation. It would be preferable if the source could be stored in a hierarchy similar to the file system it originates. This would be possible for source that are on a file system, but would in general lead to deeper and more complex hierarchies contained in container. + +Including source, provides a way to distribute a 'compilation' much like the `slang-repro` file. It may also be useful such that a shader could be recompiled on a target. This could be for many reasons - allowing support for future platforms, allowing recompilation to improve performance or allowing compilation to happen on client machines for rare scenarios on demand. + +We may want to have tooling such that directories of source can be specified and are added to the deduplicated source library. + +We may also want to have configuration information that describes how the contents maps to search paths. It might be useful to have only the differences for lookup stored for a compilation, and some or perhaps multiple configuration files that describe the common cases. + +Discussion : Artifact With Runtime Interface +============================================ + +It should be noted *by design* `IArtifactContainer`s children is *not* a mechanism that automatically updates some underlying representation, such as files on the file system. Once a IArtifactContainer has been expanded, it allows for manipulation of the children (for example adding and removing). The typical way to produce a zip from an artifact hierarchy would be to call a function that writes it out as such. This is not something that happens incrementally. + +For an in memory caching scenario this choice works well. We can update the artifact hierarchy as needed and all is good. + +In terms of just saving off the whole container - this is also fine as we can have a function given a hierarchy that saves off the contents into a ISlangMutableFileSystem, such that it's on the file system or compressed. + +If we want the representation to be *synced* to some backing store this presents some problems. It seems this most logically happens as part of the compilation interface implementation. The Artifact system doesn't need to know anything about such issues directly. + +Once a compilation is complete, an implementation could save the result in Artifact hierarchy and write out a representation to disk from that part of the hierarchy. For some file systems doing this on demand is probably not a great idea. For example the Zip file system does not free memory directly when a file is deleted. Perhaps as part of the interface there needs to be a way to 'flush' cached data to backing store. Lastly there could be a mechanism to write out the changes (or the new archive). + +Discussion : Other +================== + +It would be a useful feature to have tooling where it is possible to + +## Generating the container + +* Generate updated kernels automatically offline + * For example when a source file changed + * For example when a config file changed + * Just force rebuilding the whole container +* Specify the combinations that are wanted in some offline manner + * Perhaps compiling in parallel + * Perhaps noticing aspects such that work can be shared + +## Obfuscation + +* Most simply could be using a hash of a 'key'. +* Or perhaps if the desire is to obfuscate at the application level, a hash of the *names* as input could be used + +## Stripping containers + +At a minimum there needs to be mechanisms to be able to strip out information that is not needed for use on a target. + +There probably also additionally needs to be a way to specify items such that names, such as type names, source names, entry point names, compile options and so forth are not trivially contained in the format, as their existence could leak sensitive information about the specifics of a compilation. + +## Indexing + +No optimized indexed scheme is described as part of this proposal. + +Indexing is probably something that happens at the 'runtime interface' level. The index can be built up using the contents of the file system. + +No attempt at an index is made as part of the container, unless later we find scenarios where this is important. Not having an index means that the file system structure itself describes it's contents, and allows manipulation of the containers contents, without manipulation of an index or some other tooling. + +## Slang IR + +It may be useful for a representation to hold `slang-ir` of a compilation. This would allow some future proofing of the representation, because it would allow support for newer versions of Slang and downstream compilers without distributing source. + +Related Work +============ + +* Shader cache system as part of gfx (https://github.com/lucy96chen/slang/tree/shader-cache) +* Lumberyard [shader cache](https://docs.aws.amazon.com/lumberyard/latest/userguide/mat-shaders-custom-dev-cache-intro.html) +* Unreal [FShaderCache](https://docs.unrealengine.com/5.0/en-US/fshadercache-in-unreal-engine/) +* Unreal [Derived Data Cache - DDC](https://docs.unrealengine.com/4.26/en-US/ProductionPipelines/DerivedDataCache/) +* Microsoft [D3DSCache](https://github.com/microsoft/DirectX-Specs/blob/master/d3d/ShaderCache.md) + +Lumberyard uses the zip format for its '.pak' format. + +Microsoft D3DSCache provides a binary keyed key-value store. + +## Gfx + +Gfx has a runtime shader cache based on `PipelineKey`, `ComponentKey` and `ShaderCache`. ShaderCache is a key value store. + +A key for a pipeline is a combination of + +``` + PipelineStateBase* pipeline; + Slang::ShortList specializationArgs; +``` + +`ShaderComponentID` can be created on the ShaderCache, from + +``` + Slang::UnownedStringSlice typeName; + Slang::ShortList specializationArgs; + Slang::HashCode hash; +``` + +For reflected types, a type name is generated if specialized. + +The shader cache can be thought of as being parameterized by the pipeline and associated specialization args. It appears to currently only support specialization types. + +Gfx does not appear to support any serialization/file representation. + + + +Proposed Approach +================= + +Based on the goals described in the introduction, the proposed approach is + +* Use a collection of named options to describe a compilation + * Requires a mechanism to provide combining + * Have some additional symbols (such as + field name prefix as described elsewhere) to describe how options should be applied +* Meaning of names can be described within configuration and through an API + * API might be compiler specific +* Some names contribute to the key, whilst others do not + * Non inclusion in key allows customization for a specific result without a key change +* Some options within a configuration can use standard names, others will require being compiler specific +* Probably easiest to use a native representation for combining + * Using the collection of names approach makes hash stability and hashes in general less important +* Use JSON/BSON as the format for configuration files + * Possible to have some options defined that *aren't* part of the key name + * The actual combination can be stored along with products, such that the combination can be recreated, or an inconsistency detected +* Use JSON to native conversion to produce native types that can then be combined +* Can have some standard ways to generate names for standard scenarios + * Such as using input source name as part of key +* Use associated files (not a global manifest), to allow easy manipulation/tooling +* Source will in general be deduped, with a compilation describing where it's source originated + * This is similar to how repro files work +* Some names will be automatically available by default +* Ideally a 'non configured' (or default configured) cache can work for common usage scenarios. + +For the runtime cache scenario this all still works. If an application wants a runtime cache that is memory based that works transparently (ie just through the use of Slang API), this is of course possible and it's output can be made compatible with the format. It will be fragile to Slang API changes, and probably not usable outside of the Slang ecosystem. + +Alternatives Considered +----------------------- + +Discussed elsewhere. + +## Issues On Github + +* Support low-overhead runtime "shader cache" lookups [#595](https://github.com/shader-slang/slang/issues/595) +* Compilation id/hash [#2050](https://github.com/shader-slang/slang/issues/2050) +* Support a simple zip-based container format [#860](https://github.com/shader-slang/slang/issues/860) + From c5f817497d216e315dcfd27b54dd93f6df162c02 Mon Sep 17 00:00:00 2001 From: Anders Leino Date: Fri, 7 Feb 2025 16:25:25 +0200 Subject: [PATCH 2/2] Add .github/CODEOWNERS file, just like in slang repo --- .github/CODEOWNERS | 1 + 1 file changed, 1 insertion(+) create mode 100644 .github/CODEOWNERS diff --git a/.github/CODEOWNERS b/.github/CODEOWNERS new file mode 100644 index 0000000..5493e5a --- /dev/null +++ b/.github/CODEOWNERS @@ -0,0 +1 @@ +/proposals @tangent-vector @csyonghe @expipiplus1 \ No newline at end of file