Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Differentiate between signed and unsigned types #62

Closed
sitio-couto opened this issue Apr 18, 2023 · 8 comments
Closed

Differentiate between signed and unsigned types #62

sitio-couto opened this issue Apr 18, 2023 · 8 comments

Comments

@sitio-couto
Copy link
Collaborator

CIR Integer types are built without any sign information:

// FIXME: break this in s/u and also pass signed param.
ResultType =
Builder.getIntegerType(static_cast<unsigned>(Context.getTypeSize(T)));

@bcardosolopes
Copy link
Member

Yep, that's the status quo, and this is also related with #5

Both are something we're gonna have to tackle sooner than later anyways, in case you wanna add to your list!

@sitio-couto
Copy link
Collaborator Author

@bcardosolopes regarding #5, what is the level of granularity we are looking for?

Does a cir.int and cir.float with arbitrary sizes suffice?
Or would a type-per-keyword scheme (cir.char ,cir.short, ...) be preferable?

I suggest we mirror MLIR's built-in dialect types (single int type with arbitrary size and one float type per size), since it already tracks signedness, then add a few qualifier attributes to mark const and volatile types.

@bcardosolopes
Copy link
Member

bcardosolopes commented Apr 25, 2023

Does a cir.int and cir.float with arbitrary sizes suffice? Or would a type-per-keyword scheme (cir.char ,cir.short, ...) be preferable?

Tracking arbitrary sizes sounds good enough.

I suggest we mirror MLIR's built-in dialect types (single int type with arbitrary size and one float type per size), since it already tracks signedness

Signedness is pretty important because we want code analysis writers to be able to detect things like integer overflows and whatnots. If we can take advantage of the underlying in-tree primitive types to represent that, all we would need is an extra getter method for C/C++ specific signed/unsigned queries.

Implementing primitive types while tracking signedness would be step (1).

Step (2): we should also consider adding an optional clang::Type or similar (just like we do keep RecordDecl's around wrapped in an attribute for cir.struct). It's possible some analysis might wanna check uses of (or lack of) size_t, which we know it's an alias for a primitive type but we don't wanna create a new type for it.

then add a few qualifier attributes to mark const and volatile types.

This brings an interesting point, qualifiers in clang are not part of the type, I believe the intent was to optimize memory usage for not creating extra types every time there's a qualifier variation (and also possibly helps to implement deductions that drop qualifiers, etc). We should probably handle qualifiers as part of step (2) or a new step (3), so we have some time to think/discuss while we make progress. Thoughts?

@sitio-couto
Copy link
Collaborator Author

@bcardosolopes,

Regarding step 1, using mlir::IntegerType and mlir::Float built-in types should suffice.

If we can take advantage of the underlying in-tree primitive types to represent that, all we would need is an extra getter method for C/C++ specific signed/unsigned queries.

The mlir::IntegerType can track the width of an integer as well as if it is singed/unsigned/signless (See sitio-couto@762cd50).

The built-in Floating point types seem to cover all C/C++ primitives as well.

With this in mind, should we use MLIR's built-in type for C/C++ primitives instead of custom CIR types?
I'm not sure how would we benefit from a custom cir.int/cir.float otherwise.

@bcardosolopes
Copy link
Member

The mlir::IntegerType can track the width of an integer as well as if it is singed/unsigned/signless (See sitio-couto@762cd50).

The built-in Floating point types seem to cover all C/C++ primitives as well.

Yeah I now, it's pretty attractive (I've been there).

With this in mind, should we use MLIR's built-in type for C/C++ primitives instead of custom CIR types? I'm not sure how would we benefit from a custom cir.int/cir.float otherwise.

I still believe we should wrap them so we can customize as we see fit (e.g. adding CIR specific attributes that won't be dropped by random passes) and hide the rest of CIR from MLIR in tree changes - example: if they decide at some point that the types should be part of a specific dialect, we only have to change our implementation in one specific place. It will also make our lives easier when adding qualifiers (be it by incorporating clang types with specific attributes or adding our own notion of qualifiers).

@lanza
Copy link
Member

lanza commented Apr 26, 2023

I still believe we should wrap them so we can customize as we see fit (e.g. adding CIR specific attributes that won't be dropped by random passes) and hide the rest of CIR from MLIR in tree changes - example: if they decide at some point that the types should be part of a specific dialect, we only have to change our implementation in one specific place. It will also make our lives easier when adding qualifiers (be it by incorporating clang types with specific attributes or adding our own notion of qualifiers).

Yup. As a first-principle we want to avoid being coupled to downstream MLIR changes. Rebasing against MLIR is an absolute nightmare. And when they change functionality of dialects/types/etc we become upwards exposed to surprise behavioral differences. I doubt mlir::IntegerType is changing much going forward, but AFAIK there's no guarantee that it doesn't.

At a high level, we use MLIR as an infrastructure for writing an IR and not as a tree of dialects that we can use.

It would be fine as an intermediate step if we used mlir's type, but that just pushes the work forward to some future date.

sitio-couto added a commit to sitio-couto/clangir that referenced this issue May 13, 2023
Updates CodeGen type converter and emitters to handle
sign information of integer values. Lowering is also
updated to convert const_arrays of signed types.
Most tests were also updated since MLIR uses a 's'
and 'u' prefix on integer types to identify their sings.

Fix llvm#62
@sitio-couto
Copy link
Collaborator Author

@bcardosolopes @lanza can you take a look at this draft:

#72

It implements a custom cir.int type and attribute to partially detach CIR from MLIR's built-in integers and track signedness information.

@sitio-couto
Copy link
Collaborator Author

#72 merged

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants