Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Updated to ClangSharp 16.0 and added support for bitfields #1620

Merged
merged 7 commits into from
Jul 16, 2023

Conversation

mikebattista
Copy link
Contributor

Fixed #1601
Fixed #1392

Example struct with bitfields from #1392:

[Documentation("https://docs.microsoft.com/windows/win32/api/d3d12/ns-d3d12-d3d12_raytracing_instance_desc")]
public struct D3D12_RAYTRACING_INSTANCE_DESC
{
	public float[] Transform;

	[NativeBitfield("InstanceID", 0L, 24L)]
	[NativeBitfield("InstanceMask", 24L, 8L)]
	public uint _bitfield1;

	[NativeBitfield("InstanceContributionToHitGroupIndex", 0L, 24L)]
	[NativeBitfield("Flags", 24L, 8L)]
	public uint _bitfield2;

	public ulong AccelerationStructure;
}

Projections can parse the NativeBitfield attribute and provide friendly properties that get/set the appropriate bits.

@riverar
Copy link
Collaborator

riverar commented Jul 14, 2023

Are there docs for NativeBitfield? At a glance, not sure what the parameters mean. Would be good to see a test too.

@mikebattista
Copy link
Contributor Author

mikebattista commented Jul 14, 2023

It reflects the C++ construct: https://en.cppreference.com/w/cpp/language/bit_field.

The attribute encodes the name of the property, the offset of the bits within the shared _bitfield field it is attached to, as well as the number of bits that property consumes (length).

@tannergooding
Copy link
Member

I can have clangsharp explicitly emit it as named parameters if that helps, so we'd get:

-[NativeBitfield("InstanceID", 0L, 24L)]
+[NativeBitfield("InstanceID", offset: 0L, length: 24L)]

It would increase size on disk for the source files a bit, but maybe not significantly enough to matter.

@mikebattista
Copy link
Contributor Author

mikebattista commented Jul 14, 2023

You actually already do this in the C# you generate. I'd honestly prefer that you didn't qualify them explicitly though since with the way we wrap with our own attribute you're forcing us to use lowercase for the names where naturally I'd want to use uppercase.

        [NativeBitfield("InstanceID", offset: 0, length: 24)]
        [NativeBitfield("InstanceMask", offset: 24, length: 8)]
        public uint _bitfield1;

@tannergooding
Copy link
Member

I'd honestly prefer that you didn't qualify them explicitly though since with the way we wrap with our own attribute you're forcing us to use lowercase for the names where naturally I'd want to use uppercase.

Could you clarify? The generated signature is:

internal sealed partial class NativeBitfieldAttribute : Attribute
{
    private readonly string _name;
    private readonly int _offset;
    private readonly int _length;

    /// <summary>Initializes a new instance of the <see cref=\"NativeBitfieldAttribute\" /> class.</summary>
    /// <param name=\"name\">The name of the bitfield that was used in the native signature.</param>
    /// <param name=\"offset\">The offset of the bitfield that was used in the native signature.</param>
    /// <param name=\"length\">The length of the bitfield that was used in the native signature.</param>
    public NativeBitfieldAttribute(string name, int offset, int length)
    {
        _name = name;
        _offset = offset;
        _length = length;
    }

    /// <summary>Gets the length of the bitfield that was used in the native signature.</summary>
    public int Length => _length;

    /// <summary>Gets the name of the bitfield that was used in the native signature.</summary>
    public string Name => _name;

    /// <summary>Gets the offset of the bitfield that was used in the native signature.</summary>
    public int Offset => _offset;
}

So the actual properties are uppercase. Its the method parameter names which are lowercase, which is the standard for .NET

@mikebattista
Copy link
Contributor Author

The way I defined the equivalent attribute for metadata is below. I'm not sure if projections see the names of these constructor parameters. I assumed they did and would have liked to have offset and length be Offset and Length. The code you generate wouldn't compile though if I did that given the explicit lowercase names that you emit. If projections don't see these names, then it's less of a problem.

/// <summary>Defines the layout of a bitfield as it was used in the native signature.</summary>
[AttributeUsage(AttributeTargets.Field, AllowMultiple = true, Inherited = true)]
public class NativeBitfieldAttribute : Attribute
{    
    public NativeBitfieldAttribute(string Name, long offset, long length)
    {
    }
}

@riverar
Copy link
Collaborator

riverar commented Jul 14, 2023

It reflects the C++ construct: https://en.cppreference.com/w/cpp/language/bit_field.

The attribute encodes the name of the property, the offset of the bits within the shared _bitfield field it is attached to, as well as the number of bits that property consumes (length).

Sorry for the confusion. I'm familiar with bitfields; I was just asking about the attribute and the positional arguments. But I see the new docs now, thanks!

The way I defined the equivalent attribute for metadata is below. I'm not sure if projections see the names of these constructor parameters. I assumed they did and would have liked to have offset and length be Offset and Length. [...]

The positional constructor args (used in your impl.) are typically lowercase in .NET. I think I'd prefer to see them lowercase to align with that style, but it's just a nit at this point.

@mikebattista mikebattista merged commit 22d6ad9 into main Jul 16, 2023
7 checks passed
@mikebattista mikebattista deleted the mikebattista/clangsharp-16.0 branch July 16, 2023 22:27
@AArnott
Copy link
Member

AArnott commented Jan 11, 2024

To be very clear, for an 8-bit field, does an offset of 0 and length of 1 mean: field & 0x1 or field & 0x80?
In other words, is the offset counting the bits skipped from the MSB or the LSB?

@riverar
Copy link
Collaborator

riverar commented Jan 11, 2024

Good question. MSVC, of which this is modeled after, orders bitfields from low bit to high bit.

See also: https://learn.microsoft.com/cpp/cpp/cpp-bit-fields

@AArnott
Copy link
Member

AArnott commented Jan 12, 2024

Why are some of these bitfields signed integers? As bit fields, negative numbers don't make sense, yet many of them are signed, which makes declaring the bit fields harder since I mean to express them all as unsigned.

@riverar
Copy link
Collaborator

riverar commented Jan 12, 2024

C++ bit fields are geared towards specifying the size of the annotated data member. C# bit flags are typically attached to enums and geared towards representing true/false values that can be combined.

Negative integers are normal in C++ bitfields. It's useful when you're trying to squeeze a tiny range of numbers into a tiny package.

Example:

struct TvRemoteControlRequest {
  int specialControlCode: 3; // Can store eight integers, -4 to 3
  // ...
}

@riverar
Copy link
Collaborator

riverar commented Jan 12, 2024

Continuing this line of thinking, let's look at

public struct KERB_CLOUD_KERBEROS_DEBUG_DATA
{
	[NativeBitfield("EnabledByPolicy", 0L, 1L)]
	[NativeBitfield("AsRepCallbackPresent", 1L, 1L)]
	[NativeBitfield("AsRepCallbackUsed", 2L, 1L)]
	[NativeBitfield("CloudReferralTgtAvailable", 3L, 1L)]
	[NativeBitfield("SpnOracleConfigured", 4L, 1L)]
	[NativeBitfield("KdcProxyPresent", 5L, 1L)]
	[NativeBitfield("PublicKeyCredsPresent", 6L, 1L)]
	[NativeBitfield("PasswordKeysPresent", 7L, 1L)]
	[NativeBitfield("PasswordPresent", 8L, 1L)]
	[NativeBitfield("AsRepSourceCred", 9L, 8L)]
	public uint _bitfield;
}

One back-of-napkin implementation in C# could be to represent all the fields as the bitfield type uint and generate public properties that use a bit mask to keep the values in range and/or throw an ArgumentOutOfRange exception. (Again, we can probably devise better developer UX here if we spend more than a few minutes. 😅)

public struct KERB_CLOUD_KERBEROS_DEBUG_DATA
{
    private uint _enabledByPolicy;
    public uint EnabledByPolicy
    {
        get => _enabledByPolicy;
        set
        {
            //                ▼ Struct field bitfield size goes here at codegen time
            uint mask = (1 << 1) - 1;
            if ((value & ~mask) != 0)
            {
                throw new ArgumentOutOfRangeException();
            }
            _enabledByPolicy = value & mask;
        }
    }

    // ...
}

@AArnott
Copy link
Member

AArnott commented Jan 12, 2024

I've already implemented it in C# (as of last night), but now that you say negative integers are actually interesting, I may have implemented it incorrectly.
It's somewhat similar to what you describe, but in your example you're using uint, which is easy. It's the int case that I'm concerned about. Are you saying that when the field is signed, that each bitfield is supposed to be interpreted as a smaller signed integer as well? I had guessed that 3 bits from an int should be expressed as a byte (which is unsigned).

@riverar
Copy link
Collaborator

riverar commented Jan 12, 2024

@AArnott Yup, smaller signed integer. To support signed integers, you can rejigger the sample C# above as such:

public struct FLICK_DATA
{
    // [NativeBitfield("iFlickActionCommandCode", 0L, 5L)]

    int _iFlickActionCommandCode;
    public int FlickActionCommandCode
    {
        get => _iFlickActionCommandCode;
        set
        {
            int mask = (1 << 5) - 1;
            if (Math.Abs(value) > mask)
            {
                throw new ArgumentOutOfRangeException();
            }
            _iFlickActionCommandCode = value & mask;
        }
    }

    public int _bitfield;
}

Passing in a -1 would be stored as 31 in this case. If only we had a Span<bit>...

@AArnott
Copy link
Member

AArnott commented Jan 12, 2024

Thanks. Your samples don't demonstrate multiple properties sharing the same field. And while I could ignore that, the getter and setter you do define return the whole field or stomp on the whole field with the new value. That is not how I implemented it in CsWin32. Instead:

  • the getter returns only the masked value, bit-shifted to the LSB. So a mask of 00111000 returns its value at positions 00000111.
  • the setter verifies that the provided value fits within the 00000111 mask, bit shifts it to 00111000, and then alters the field so that it matches that bit-shifted value only in the masked area, and retains its prior value in all the rest of the bits.

Now regarding signed integers, some of these bitfields have length 1 or 2. I haven't verified yet whether any of these short lengths may appear in signed fields, but if they did, I'd have to question whether these are really intended to be signed. 1 bit length certainly cannot be, and it's hard to imagine 2 bits being dedicated to a signed integer either. Do you have any thoughts or experience with this combination (if it does in fact appear in the metadata)?

@riverar
Copy link
Collaborator

riverar commented Jan 13, 2024

That's fair, I couldn't spend much time on the samples. They are definitely missing the interaction with the _bitfield field.

A signed integer bitfield width of 1 can represent 0 and 1 or -1. (It appears in MSVC, this is treated as -1.)
A signed integer bitfield width of 2 can represent -2, -1, 0, and 1.

Some interesting structures with bitfields (out of ~504 total) to consider:

  • signed integer bitfield Windows.Win32.UI.Shell.SHELLSTATEW
  • signed integer bitfield with width of 1: Windows.Win32.System.Rpc.MIDL_STUB_MESSAGE

Also keep in mind the type used influences alignment and just a heads up for some Windows-specific packing considerations.

@tannergooding
Copy link
Member

A signed integer bitfield width of 1 can represent 0 and 1 or -1. (It appears in MSVC, this is treated as -1.)

This is a very important detail and can be tricky to get right. It definitely comes up for cases like BOOL : 1 as well, since BOOL is itself a typedef for int.

All signed bitfield extractions therefore require an extra shift, such as what ClangSharp generates for TerraFX.Interop.Windows here: https://source.terrafx.dev/#TerraFX.Interop.Windows/Windows/um/ShlObj_core/SHELLSTATEW.cs,bee2684bd69b4eb1 (win32metadata opts out of many of these helper APIs).

You'll note, for example, that fShowExtensions which is the second bitfield does its getter as ((_bitfield1 << 30) >> 31) and it's setter as _bitfield1 = (_bitfield1 & ~0x1) | ((BOOL)(value) & 0x1)

The logic ClangSharp uses to generate these helpers is here and supports both Windows and Unix conventions: https://source.clangsharp.dev/#ClangSharp.PInvokeGenerator/PInvokeGenerator.VisitDecl.cs,1781

This includes computing across bitfield width boundaries (i.e. you need 24 bits, but only have 23 bits left in the current field), accounting for signs and multiple backing types, etc.

@AArnott
Copy link
Member

AArnott commented Jan 16, 2024

Whoa. I had no idea how complicated this was going to be. Thanks for all the cautions and links to samples.

InformationOverloadMindBlownGIF

@AArnott
Copy link
Member

AArnott commented Jan 23, 2024

Fixes here: microsoft/CsWin32#1126

All reviews welcome.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Update to ClangSharp 16.0 Feature request: use get/set to deal with bitfields.
4 participants