Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add mandatory FFI declarations proposal #184

Merged
merged 7 commits into from
Mar 30, 2021
Merged

Conversation

ergl
Copy link
Member

@ergl ergl commented Feb 19, 2021

Copy link
Member

@jemc jemc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, a very well-written RFC! 👏

I've provided my opinion on the unresolved questions below.

text/0000-force-declaration-of-ffi-functions.md Outdated Show resolved Hide resolved
text/0000-force-declaration-of-ffi-functions.md Outdated Show resolved Hide resolved
text/0000-force-declaration-of-ffi-functions.md Outdated Show resolved Hide resolved
@SeanTAllen
Copy link
Member

This is a great RFC. I'm in favor.

I'd like to see a couple bits of bookkeeping added...

  • listing of all files in the standard library that use FFI functions and will need to be updated. partially because this will be valuable to the implementer and partially because it might expose an issue we haven't considered
  • same listing for all pony libraries maintained by the ponylang organization

by including in the RFC (particularly the latter set), the RFC wouldn't be considered completed until all needed updates were done.

@ergl
Copy link
Member Author

ergl commented Feb 20, 2021

Thanks for the encouragement everyone, I've left some replies inline.

  • listing of all files in the standard library that use FFI functions and will need to be updated. partially because this will be valuable to the implementer and partially because it might expose an issue we haven't considered
  • same listing for all pony libraries maintained by the ponylang organization

@SeanTAllen I agree with that, I will see if I can include that in the RFC itself.

@ergl
Copy link
Member Author

ergl commented Feb 20, 2021

I've included removing return type declarations at call site, and included some tentative pointers to all the files in the ponyc repo and other pony libraries in the ponylang org that would need to be updated.

Someone else that is more familiar with the compiler could probably add more details to the "Compiler changes" section, I have taken a look at all the places that this RFC would need to change, but it's quite likely that I might have missed some things.

@ergl
Copy link
Member Author

ergl commented Feb 24, 2021

After investigating a bit, I've found that we might have to think more about if/how to write FFI declarations for intrinsic functions. We're currently overloading several intrinsics in the standard library, mainly to force LLVM to return either signed or unsigned integers, for example:

// unsigned.pony
primitive U8
  fun bit_reverse(): U8 => @"llvm.bitreverse.i8"[U8](this)

// signed.pony
primitive I8
    fun bit_reverse(): I8 => @"llvm.bitreverse.i8"[I8](this)

This gets trickier with 32/64 bit types, where an intrinsic function might be used not only to distinguish the sign, but also the type of integer when returning U32/U64/USize/ULong (same applies for I32/I64/ISize/ILong).

primitive U32
  fun bit_reverse(): U32 => @"llvm.bitreverse.i32"[U32](this)

primitive ULong
  fun bit_reverse(): ULong =>
    ifdef ilp32 or llp64 then
      @"llvm.bitreverse.i32"[ULong](this)
    else
      @"llvm.bitreverse.i64"[ULong](this)
    end

primitive USize
  fun bit_reverse(): USize =>
    ifdef ilp32 then
      @"llvm.bitreverse.i32"[USize](this)
    else
      @"llvm.bitreverse.i64"[USize](this)
    end

The Pony compiler doesn't allow you to specify different return types for different calls of the different FFI functions, unless LLVM can ensure that the different types:

  1. Have the same ABI size, or

  2. Can be safely casted to each other

However, these checks only occur during the code generation phase. If the user supplies a FFI declaration, the check against the returned type is done via is_eqtype, which only checks equality via subtyping rules.


Now, one could get around this via casting on the Pony side, like in the examples below. I don't know if this would impose any kind of overhead (since conversions are done via compile_intrinsic), and opens us up to mistakes if one is not careful (although this applies to the current situation, where we're forcing LLVM to cast types for us).

use @"llvm.bitreverse.i8"[U8](src: U8)
use @"llvm.bitreverse.i32"[U32](src: U32)
use @"llvm.bitreverse.i64"[U64](src: U64)

primitive I8
    fun bit_reverse(): I8 => @"llvm.bitreverse.i8"(this.u8()).i8()

primitive U8
  fun bit_reverse(): U8 => @"llvm.bitreverse.i8"(this)

primitive ULong
  fun bit_reverse(): ULong =>
    ifdef (ilp32 or llp64) then
      @"llvm.bitreverse.i32"(n.u32()).ulong()
    else
      @"llvm.bitreverse.i64"(n.u64()).ulong()
    end  

* Use an alternative syntax to let the compiler know about variadic functions at the call site, so that a declaration is not needed, such as (for example) `@printf[I32](...)(arg1, arg2)`.

* Leave the compiler as it is, and do not support platforms with different calling conventions.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One other alternative is to be able to handle header files from C in some way for this use case. Would keep it convenient for most use cases

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that's a good add on feature for FFI. I'm not sure I would call it an alternative given what a large change it would be from the existing FFI mechanisms.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My thought is that it could be used to give enough information to continue handling FFI without "use declarations"

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's really nice about the current is that I can compile for any supported platform and it will work and type check without needing any additional headers etc.

I've never installed windows headers on my machine but the declarations give me type checking.

I love that if a pony program type checks on one platform, it type checks on all.

I don't want to lose that.

The FFI declarations are "trust me" for that type checking but I still LOVE what it gives me and do not want to give it up.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The selfish part of me would love to be able to take C header files in lieu of FFI declaration proposals because I spend a non-trivial amount of time converting header files into these types of declarations. It's non-trivial both because of the number of functions I'm dealing with, but also the number of macros and other cpp tomfoolery that's found in many C libraries.

With 'self-authored' FFI declarations it's type-checking my function call against the what I understand the C function prototype to be.

If it's generated from the header-file, then it removes human error from another sensitive part of the process which is a good thing for software reliability - especially when juggling knives with C-FFI (by definition) at runtime.

In order to not lose the platform independence functionality that you value so much @SeanTAllen - would/could/should an independent tool which converts C header files into use statements exist?

Then potentially we could have the best of both worlds.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An independent tool could do that. I think it should be optional.

I shouldn't be required to have headers installed to type check.

I don't think there is anything preventing folks from developing such a tool now.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(arguably, this is the same hard problem that I've been struggling with for the last month or so)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another thing we currently have is you can support multiple API versions of the same library like OpenSSL and type check them all in one go.

That's built on the same foundation as the "type check all platforms" functionality I mentioned earlier.

Copy link
Member

@jemc jemc Mar 2, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The thing about the approach of scanning C headers / type signatures, in addition to the extra compiler complexity and the header dependency problem @SeanTAllen mentioned, is that it does not necessarily give enough information to wrap the calls for use in Pony. There are various cases of ambiguity that need an extra information channel (such as human-interpreted documentation, or something like the GIntrospection format that @redvers has been using) to resolve to the most appropriate Pony FFI use patterns.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that this would be a useful feature for Pony, but I have to side with Sean, I don't think it should be a requirement to have headers installed to use the FFI. Right now I can write some windows-dependent code on unix* platforms by reading the reference on msdn, and the compiler will check the types. It's not a guarantee that the code will work, but I'd like to keep that option open.

@jemc
Copy link
Member

jemc commented Mar 2, 2021

@ergl Regarding the issue with LLVM intrisincs, here's an idea:

We could possibly resolve this use case and also open up some new use cases by adding a feature to have an FFI "alias" declaration.

For example, you could declare the llvm.bitreverse.i8 function with the I8 based signature, then you could also declare an alias called llvm.bitreverse.u8 (an alias of llvm.bitreverse.i8), which has the U8 based signature.

This could also be useful as a convenience for cases where you are say, wrapping a C library that has a consistent prefix on all of its C functions - you could define aliases which omit the prefix, making all of your calling code slightly less verbose at expense of making the declarations more verbose.

What would syntax look like for this? I'm not sure, honestly.

@ergl
Copy link
Member Author

ergl commented Mar 3, 2021

@jemc I'm not sure how aliases would work from the typing point of view. An alias would need to be compatible with the old declaration, either using subtyping rules or using the ABI size / casting check that the compiler does now for FFI calls.

@jemc
Copy link
Member

jemc commented Mar 3, 2021

Yes, the ABI size check is what I was expecting. This would let you have different Pony typings declared for the same ABI function, as in the case of the LLVM intrinsics, as well as some other cases where you might want to vary things on the Pony typings side based on, say, opaque struct pointer patterns vs non-opaque struct pointer patterns.

@jemc
Copy link
Member

jemc commented Mar 3, 2021

So from the Pony type system standpoint they would be distinct functions with different type signatures, but at the FFI level we understand them to be referencing the same FFI function and enforce the compatible ABI types.

@ergl
Copy link
Member Author

ergl commented Mar 4, 2021

I see, so right now we have the following rules:

  1. Two declarations for the same FFI function have to use the same types (exactly the same, the compiler checks that the both declarations are the same decl pointer)

  2. A declaration and a call might differ in the type if the types are equal (using is_eqtype)

We remove the second one, and change it to:

  1. A FFI declaration might alias another FFI declaration as long as the types are compatible:

    • Either they are the same (using is_eqtype)
    • ABI size check
    • Can be safely casted to each other (?)

I'm not so sure about the last one, but the compiler already allows casting ints to pointers (and vice versa) or between different struct types.

@jemc
Copy link
Member

jemc commented Mar 16, 2021

We discussed on the sync call today.

We are wary of adding the "alias" idea into scope of the RFC.

@ergl brought up again his suggestion to use explicit numeric conversions in the standard library, like in the last example in this comment of his above (#184 (comment)).

After looking at this closer, I agree that this is probably the best approach, and should be zero runtime cost (All of the mentioned numeric conversions will be using the same LLVM type internally, so there will be no runtime cost or change in behavior from adding those conversion).

@ergl
Copy link
Member Author

ergl commented Mar 21, 2021

I double-checked all the FFI code in the compiler and in the ponylang org to verify that all cases of FFI calls with multuple return types were due to numeric types. There were some examples of non-numeric types, described below, but are easily fixed. So I think it's safe to go with casting on the Pony side.

Only the compiler contained mismatched types for FFI calls, most related to LLVM intrinsics, but there were other cases in the standard library:

  1. Most variants are signed / unsigned casts of the same type: I8 to U8 or viceversa.

  2. Integer width on 32/64 bit plaftforms: cast between I32/I64 and ISize/ILong. (This also happens for tuple returns like casting (U64, Bool) to (ULong, Bool).

  3. Both of the above: I32 to ULong / USize.

  4. Casting pointer types: Pointer[U8] to Pointer[None] and viceversa. (cases: @pony_os_stderr and @pony_os_stdout).

  5. Other: USize to U8, USize to Pointer[None]. These are found in Windows-related code, mainly modeling the HANDLE type as either USize, U8 or Pointer[None]. These cases could be easily fixed by picking any of the above types, as they are always treated as an opaque type from the Pony point of view.

@ergl
Copy link
Member Author

ergl commented Mar 21, 2021

I've added the type-casting discussion to the RFC. I think this RFC would be ready to be voted on if no one else has any comments on the latest version I pushed.

@SeanTAllen
Copy link
Member

I feel like this RFC is in a good place and should be moved to final comment period and then voted on at the next sync.

I'm in favor of the RFC as it currently stands.

@jemc jemc added the status - final comment period The RFC is finalized. Waiting for final comments. label Mar 23, 2021
@aturley
Copy link
Member

aturley commented Mar 30, 2021

The vote was taken in the 2021-03-30 sync and it passed.

@aturley aturley added status - ready for merge This RFC was approved and is ready to be merged. Waiting for bot... and removed status - final comment period The RFC is finalized. Waiting for final comments. labels Mar 30, 2021
@SeanTAllen
Copy link
Member

@Theodus do you have time to do the RFC shuffle?

@Theodus
Copy link
Contributor

Theodus commented Mar 30, 2021

@Theodus do you have time to do the RFC shuffle?

Yup, I'll get to it later today

@Theodus Theodus merged commit 8197ff4 into ponylang:main Mar 30, 2021
@ergl ergl deleted the ergl/ffi-decl branch March 31, 2021 07:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
status - ready for merge This RFC was approved and is ready to be merged. Waiting for bot...
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants