Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added ArrayPointer pattern #68

Closed
wants to merge 3 commits into from
Closed

Added ArrayPointer pattern #68

wants to merge 3 commits into from

Conversation

pixsperdavid
Copy link
Contributor

Added an ArrayPointer pattern. This is a lighter-weight alternative to the FFISlice pattern which allows expression of a pointer to an array of a type. This allows backends to generate more relevant bindings. Crucially, it forces the C# backend to declare the parameter as an IntPtr rather than a ref T (which makes little sense for an array of values).

@ralfbiedert
Copy link
Owner

On mobile right now, but quick cpomment: the api looks unsound as you can pass in len that mismatches the ffi array. At very least that method would have to be unsafe.

@ralfbiedert ralfbiedert added c-backend-c# C# Backend needs-discussion Something rather fuzzy, input wanted. enhancement Make existing things better. labels Sep 21, 2022
@ralfbiedert
Copy link
Owner

ralfbiedert commented Sep 21, 2022

Alright, I just had another look, some more comments in random order

  • ArrayPointer should probably be #[repr(transparent)] if it only wraps a single pointer
  • as_slice is definitely unsound, as you can easily pass in, say, u64::MAX and have instant UB
  • If I understand you correctly the main reason you'd want ArrayPointer is so C# can generate another pointer type. As ArrayPointer probably can't be made "safe and sound" (it'd have to be unsafe as on its own it can't guarantee correctness of the resulting slice), I'm wondering if it would rather make sense to somehow hint to the C# backend that it should generate bindings differently for a raw pointer type, instead of adding a type pattern that only seems to be useful in C# (I'm not aware of any other language FFI system that'd make any such distinction as C#). That could happen via additional annotations (e.g., #[ffi_hint("csharp", "...")], or via the backend configuration. Haven't thought this last point through though.

@pixsperdavid
Copy link
Contributor Author

as_slice is definitely unsound, as you can easily pass in, say, u64::MAX and have instant UB

I won't argue that it's not unsafe, however, it's exactly equivalent to using the current FFISlice type, but just by passing the length as a parameter value rather than as a struct member:

uint values[1];

pattern_array_pointer(&values, UINT_MAX); // Undefined behavior

my_library_sliceu32 slice;
slice.data = &values;
slice.len = UINT_MAX;
pattern_ffi_slice_1(slice); // Also undefined behavior

If I understand you correctly the main reason you'd want ArrayPointer is so C# can generate another pointer type. As ArrayPointer probably can't be made "safe and sound" (it'd have to be unsafe as on its own it can't guarantee correctness of the resulting slice).

No problem with marking the as_slice function as unsafe, but I just copied the implementation from here, so surely this would also need to be marked unsafe?

@ralfbiedert
Copy link
Owner

ralfbiedert commented Sep 21, 2022

I won't argue that it's not unsafe, however, it's exactly equivalent to using the current FFISlice type, but just by passing the length as a parameter value rather than as a struct member:

I'd say there is a small difference :

  • You are correct that the general premise is that "people call in correctly". So we assume, say, FFISlice contains a valid (pointer, len), and in Rust we safely convert that to a slice.
  • The unsoundness in ArrayPointer however can be caused "purely in Rust". The problem is that even if people call in correctly (i.e., we already have a valid ArrayPointer type) you can still trigger UB in Rust by passing in the wrong value when invoking as_slice().

There's a bit more written in the FAQ.

I just copied the implementation from here, so surely this would also need to be marked unsafe?

See above. I admit there is a bit of a 'philosophical' argument around soundness boundaries across FFI (e.g., one could make the argument that any incoming FFI call should always be treated as maximally unsound, but then you'd also have to consider RwLocking things, doing pointer alignment checks, and more), but IIRC the consensus was that "expecting people to call in properly" is ok.

In that model though each FFISlice is self-contained and sound, so it's permissible to produce a slice without going through unsafe.

(Edit - with the basic idea of 'soundness' in Rust being whether you can use the safe language itself to create UB)

No problem with marking the as_slice function as unsafe

I still think if the main purpose of ArrayPointer is to govern C# codegen, some other construct might be a better solution (not saying it is, just how I understand the PR so far).

Edit Ah, another thought, maybe this can be done exclusively in the C# by just creating other overload with IntPtr instead of refs?

@pixsperdavid
Copy link
Contributor Author

The unsoundness in ArrayPointer however can be caused "purely in Rust". The problem is that even if people call in correctly (i.e., we already have a valid ArrayPointer type) you can still trigger UB in Rust by passing in the wrong value when invoking as_slice().

Ok I see the distinction, I mistakenly thought that the FFISlice fields were public in rust.

I still think if the main purpose of ArrayPointer is to govern C# codegen, some other construct might be a better solution (not saying it is, just how I understand the PR so far).

I considered something like a ffi_hint as you suggested, however my reasoning for the pattern approach was:

  • It's a pre-existing concept within the library (rather than adding a whole new type).
  • It provides additional context on what the author intends the pointer to do, which is otherwise opaque to the library (ie, the backends have no way of knowing if a pointer represents an array, a single value, an optional value, etc.). Even if the only current use for this is solving the C# codegen issue, IMO having more context is beneficial.

- Made representation transparent
- Marked `as_slice` function unsafe
@ralfbiedert
Copy link
Owner

Ah, just saw you pushed an update.

I'd still prefer to first explore other avenues to get the C# backend to emit the right binding.

My issue with going that route is that it seems to have little benefit over using raw pointer types from a Rust or any-other-language perspective.

In Rust you'd still have to call unsafe { array_pointer.as_slice(len) }, which is more-or-less slice::from_raw_parts() just with another name. The "return empty slice" part is convenient, but that could be a simple call (either in your crate, or even
interoptopus) which comes with much less "machinery".

In other languages (C, Python) that concept wouldn't translate to anything else than a fallback, and in C# it really only seems to affect IntPtr or not (again, please tell me if I'm missing something).

On the other hand, I think a hint system to tweak FFI generation is needed and came up a few times in our own bindings.

That said, I'm not strictly arguing that FFI hints are the problem to this particular issue; an even simpler alternatives could be just using *const c_void on Rust and ensure / guarantee that this always translate to IntPtr (semi-related ping @dennisradell since IIRC you had issues IntPtr vs ref too).

I considered something like a ffi_hint as you suggested, however my reasoning for the pattern approach was:

  • It's a pre-existing concept within the library (rather than adding a whole new type).

I don't quite understand, isn't ArrayPointer a whole new type / pattern added?

  • It provides additional context on what the author intends the pointer to do, which is otherwise opaque to the library (ie, the backends have no way of knowing if a pointer represents an array, a single value, an optional value, etc.). Even if the only current use for this is solving the C# codegen issue, IMO having more context is beneficial.

Yes, and I'd totally agree that in many cases this is the right approach. It's just that this particular pattern, IMO, "doesn't carry it's weight" with respect to the user-code saved, safety gains, or general applicability.

(I might even change my opinion on this later on if more benefits became visible, just right now I don't see them).

@pixsperdavid
Copy link
Contributor Author

In Rust you'd still have to call unsafe { array_pointer.as_slice(len) }, which is more-or-less slice::from_raw_parts() just with another name. The "return empty slice" part is convenient, but that could be a simple call (either in your crate, or even
interoptopus) which comes with much less "machinery".

TBH I only added this method as a convenience so entirely happy to lose it if you feel it's unnecessary. It was not the purpose behind the pattern.

I don't quite understand, isn't ArrayPointer a whole new type / pattern added?

It's a new type but the idea of patterns is not a new concept for the library and was considerably simpler to implement than adding a new ffi_hint type would have been. Also being a pattern means that a user familiar with the concept of patterns can make some assumptions about how it will work.

It's just that this particular pattern, IMO, "doesn't carry it's weight" with respect to the user-code saved, safety gains, or general applicability.

My general philosophy is that it's preferable to give automated systems the correct information to allow them to make the right decisions rather than explicitly tell them what result you want. Of course, this is a very idealistic view and requires much pragmatism in it's application, but to explain in more depth my thought process regarding this feature:

What I want on the surface is for the library to generate an IntPtr parameter rather than a ref object parameter, but the reason I want this is because the pointer represents an array, not a single object, and a ref object parameter makes no sense for an array. By using a raw pointer in the rust function signature, it is impossible for the library to know what I intend, so we have to give it that information somehow.

If we simply 'override' what the library would usually do by telling it explicitly what result we want in C# specifically, via something like ffi_hint, then we are solving the surface problem, but next time the a feature is added to the library in which specific code generation behaviour needs to be decided on the basis of whether a pointer represents an array or not, we're going to run into the same problem, and more ffi_hint attributes will be needed. However if we provide a mechanism to resolve the core ambiguity, the library will have the information to make the right decision in future scenarios yet unknown to us now.

I'm not necessarily suggesting that the pattern approach I've employed here is the best way of conveying this information, it might be better done by some variety of attribute, however rather than simply correcting the C# output, I feel that it is far more useful to indicate the purpose of the pointer parameter as that ambiguity is the root of the problem.

@ralfbiedert
Copy link
Owner

ralfbiedert commented Sep 28, 2022

Thanks for the detailed reply.

More on a side note, from a design paradigms perspective I think I mostly agree with your reasoning; (so far) I've probably just differed in what the underlying vs. surface problem is. Within this library I mostly look at things with a C-lens, and there a distinction between single-element and multi-element pointers doesn't really exist, so adding these types feel like fixing what could also be labeled "bad C# ergonomics" (w.r.t, declaring C# FFI interfaces and working with ref / IntPtr). So then fixing the actual problem in this particular case just telling the backend what you want (and a similar "what you want" machinery would be needed, say, to tell C# whether you want the helpers metioned in #27).

However since C#-like marshalling details (e.g., the question of #[In, out] and whether content behind pointers need to be copied back) I could see myself getting around having interoptopus::patterns::ptr with a number of types which, in a first iteration, would just fall back to regular Rust pointers (e.g., via .into()) and FFI pointers (via .fallback_type())

Names tbd:

  • A generic FFIPointer<T, SingleElementOrMultiple, RustRW, FFIRW> indicating how that pointer is being used (type T of target, single or multiple elements, and who reads and writes; bonus points if we can make T so that it also accepts and enum of type alternatives although I wouldn't know how to do that)
  • various aliases to common FFIPointer<A, B, C, D> combinations.

The generic FFIPointer would fallback to *T or *c_void in backends for now or could be used for other optimizations (e.g., deciding if an #[In, out] should be applied.

Edit: bonus-bonus points if MultipleElements can also encode variable or constant many elements.

@pixsperdavid
Copy link
Contributor Author

Have some time now to look at #71 so closing this PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
c-backend-c# C# Backend enhancement Make existing things better. needs-discussion Something rather fuzzy, input wanted.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants