Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consider checking returned enums for validity #123

Closed
gkoz opened this issue Sep 24, 2015 · 48 comments
Closed

Consider checking returned enums for validity #123

gkoz opened this issue Sep 24, 2015 · 48 comments

Comments

@gkoz
Copy link
Member

gkoz commented Sep 24, 2015

It might be a good idea to test if the enums we get from the other side have valid values. This could require using newtype c_int wrappers instead of enums in the sys layer and implementing conversions between that and Rust enums.

@EPashkin
Copy link
Member

Note: there was long discussion about enums in gtk-rs/gdk#154

@EPashkin
Copy link
Member

Problem:
While enum copy with FromGlib implementation was intended to fix this issue,
current implementation
actually don't fix it as wrong ffi-side enum values actually count as one of cases.
Also copying enum introduced enums with different layout from FFI-side enum and even different integer values for same name.

@EPashkin
Copy link
Member

Solution 1.
Just rexport enums from FFI-side.
Proposed by @johncf. He even added PR for it #354.

@EPashkin
Copy link
Member

EPashkin commented Apr 25, 2017

Solution 2.
Stay with copy but change implementation to match integers. This way default branch with panic can be added.
Update: @sdroege proposed also add UNKNOWN case to all ffi enums too.

@EPashkin
Copy link
Member

Solution 3.
gtk-rs/gdk#154 (comment)
Proposed by @GuillaumeGomez. I don't understand it 😭

@GuillaumeGomez
Copy link
Member

🤣

@EPashkin
Copy link
Member

@GuillaumeGomez, @sdroege as #354 appeared again, we need decide what to do.

@GuillaumeGomez
Copy link
Member

So to put it simply: we have the choice between safety and performance. So I propose we add both (the performance will be _unchecked).

@sdroege
Copy link
Member

sdroege commented Jun 14, 2017

What exactly is the proposal to do here, the API to be added?

@johncf
Copy link

johncf commented Jun 14, 2017

So to put it simply: we have the choice between safety and performance.

If I remember correctly, the status quo right now is no more safer than what remains after the changes in #354.

@GuillaumeGomez
Copy link
Member

@johncf: I agree on this, that's why I propose to add both. The "safe" one will use match and the "performant" one will just cast.

@johncf
Copy link

johncf commented Jun 14, 2017

@GuillaumeGomez That change can come later after we land these changes.

Basically, this was what you had proposed to do while converting from integers to enums. That would be a pervasive change too with a more profound effect overall.

@johncf
Copy link

johncf commented Jun 14, 2017

@GuillaumeGomez Actually it can be done a bit more efficiently like this:

#[repr(C)]
pub enum Test {
   A = 0,
   B = 1,
   C = 2,
   D = 3,
}

fn toTest(v: u32) -> Test {
    match v {
        0..4 => unsafe { std::mem::transmute(v) },
        _ => panic!("Bad!"),
    }
}

Then again, I should remind you that what this would prevent are only rare upstream bugs. Rust users will never need to use this function (even indirectly).

@sdroege
Copy link
Member

sdroege commented Jun 14, 2017

The 0..4 match assumes that enum values are incremented by 1, which is not necessarily the case. You could have an enum that has all fibonacci numbers as values ;) And the code generator would have to check the enum values to detect cases where matches can be compressed (also I would assume that the compiler could do exactly that for us).

@GuillaumeGomez
Copy link
Member

@johncf: Wo, you just solved both issues! However it's not for sure that every values follow in the enums unfortunately. :-/

@johncf
Copy link

johncf commented Jun 14, 2017

@sdroege @GuillaumeGomez Yes, true. But most of the time, they are... But that was not the point I was trying to make!! 😭 It's that plain u32 will not come from the Rust side. So whether that change is worth the cost itself should be considered carefully.

@johncf
Copy link

johncf commented Jun 14, 2017

Anyways, as far the current changes are concerned, if everyone's on-board, I'll proceed to create a Gtk PR (which will also be huge ~4000 lines reduction).

@GuillaumeGomez
Copy link
Member

Fine by me.

@sdroege
Copy link
Member

sdroege commented Jun 14, 2017

I think it's all fine as-is, let's worry about that another time. Also for bitflags you probably want to use the from_bits_truncate then for the same reason.

In the case of enums (and flags), this can btw happen without a bug anywhere if the bindings were generated for an older version of the C library, and a new enum value or flag was added.

@johncf
Copy link

johncf commented Jun 14, 2017

Alright, then I'll PR Gtk and Pango repositories soon.

@sdroege
Copy link
Member

sdroege commented Jun 15, 2017

from_bits_truncate seems to be used for flags already, so that point is solved. For enums I think we should really check them for validity instead of transmuting (as that will cause undefined behaviour), and it can happen without any bugs anywhere (API additions in the C library, older Rust library).

Panicking seems a bit extreme though, maybe we can match all other values to the __Nonexhaustive variant... and possibly print a warning. Any newly added enum value to the C library must be safe to be ignored, otherwise the C library changed API (i.e. is not backwards compatible anymore to the older version) and must've been reflected in the C library name already (at least!): libglib-2.0.so.0 would become libglib-2.0.so.1, or even better because this is an API change and not just an ABI change libglib-3.0.so.0.

@sdroege
Copy link
Member

sdroege commented Jun 15, 2017

Update: @sdroege proposed also add UNKNOWN case to all ffi enums too.

@EPashkin We can't add an unknown case to the FFI enums, as it might conflict with the C enum at a later time (if you use a plain value), or is not possible with repr(C) (if you do some kind of Unknown(u32) or a ZST).

@EPashkin
Copy link
Member

@sdroege Yes. This why we redefine enums without repr(C) and added __Nonexhaustive(()) too prevent int conversion. So library user mostly don't see ffi enums.

@sdroege
Copy link
Member

sdroege commented Jun 15, 2017

@sdroege Yes. This why we redefine enums without repr(C) and added __Nonexhaustive(()) too prevent int conversion. So library user mostly don't see ffi enums.

Yes that's all good :) The problem is that on the FFI side there might be values returned that are not part of the FFI enum. But as those are repr(C) and with integers, maybe it's ok if those are transmuted to an integer later, and then passed through a conversion trait for getting the non-FFI enum.

@sdroege
Copy link
Member

sdroege commented Jun 15, 2017

Related rust-lang/rust-memory-model#41

@sdroege
Copy link
Member

sdroege commented Jun 16, 2017

So it seems to do anything safely here we could do:

  1. need some changes to how repr(C) enums work in Rust (so it's safe to pass out-of-enum values around)

  2. still duplicate the enums (sys and non-sys), so that the latter can be non-repr(C) and thus contain __Nonexhaustive (ZST can't be in repr(C)), or even better something to carry "unknown" enum values around (__Unknown(u32))

  3. have conversion traits implemented that do the mapping between the repr(C) enum and the proper Rust enum, handling unknown values in a meaningful way without panicking

  4. and 3) can be done on our side already, 1) is already broken in the current situation but likely does not cause any immediate problems as long as we just pass around the FFI enums, and convert them to some kind of integer for matching/conversion to the non-repr(C) enum

Comments?

Edit: Instead of not doing anything about 1), we could also use plain integers in the sys crates for functions, etc and require explicit conversion there already. But that seems suboptimal API-wise.

@johncf
Copy link

johncf commented Jun 16, 2017

@sdroege Sounds okay...

I understand that __Nonexhaustive may be required for some enums, but is it really required for all of them? Perhaps I'm just being naive.

And regarding __Unknown(u32), why be so forgiving as to work with broken code? Why not just panic (or an ffi-safe abort mechanism)?

@sdroege
Copy link
Member

sdroege commented Jun 16, 2017

I understand that __Nonexhaustive may be required for some enums, but is it really required for all of them? Perhaps I'm just being naive.

A configuration could be added to that, but generally the safe thing to do is to assume that the C library can add new values without breaking compatibility. Also this is how things are currently done already.

And regarding __Unknown(u32), why be so forgiving as to work with broken code? Why not just panic (or an ffi-safe abort mechanism)?

It's not necessarily broken code. There are completely valid scenarios when the C library starts returning values that are unknown at the Rust side, like when a new version of the C library adds a new value but the Rust side was not updated yet (and does not have to, the C library could do this change in a backwards compatible way).

@EPashkin
Copy link
Member

@sdroege, 1) IMHO if we really need _Unknown(u32) better make FromGlib Result-like or add separate trait for it. Currently I don't see reason to do this.
2) Currently I prefer panic over returning __Nonexhaustive for bad values, as nobody will remember to check this value and it not rustish way, but agree that we can use __Nonexhaustive if panic don't acceptable.

@sdroege
Copy link
Member

sdroege commented Jun 16, 2017

IMHO if we really need _Unknown(u32) better make FromGlib Result-like or add separate trait for it. Currently I don't see reason to do this.

It's not really needed, no, but then you would need to map to some other value (__Nonexhaustive?) or panic

Currently I prefer panic over returning __Nonexhaustive for bad values, as nobody will remember to check this value and it not rustish way, but agree that we can use __Nonexhaustive if panic don't acceptable.

Panic is a problem because it makes backwards compatible upgrades of the C library a breaking upgrade at the Rust level. And while it's true that nobody will check __Nonexhaustive directly, by just having it there you must have a _ case in your match statements and somehow handle "unknown" cases. (Which is how things are nowadays already).

Using __Nonexhaustive for unknown values is mostly so that no panic is needed. You need to use some value, and either some new one (__Unknown, __Unknown(u32)) has to be added or we just reuse/abuse that one.

@sdroege
Copy link
Member

sdroege commented Jun 16, 2017

better make FromGlib Result-like or add separate trait for it

Also the existing FromGlib seems sufficient for that, you don't need (and don't want) to return a Result<,> from there.

In the end this is all no different from e.g. std::io::ErrorKind. New values can be added at any time, and your code needs to handle that in one way or another.

@johncf
Copy link

johncf commented Jun 16, 2017

I think it would be good if we could find any examples of such a "backwards-compatible" change occurring in the past. If no such examples could be found, it might be good to get the perspective from a core Gtk developer about their stance on the subject, before adding __Unknown(u32) variant.

And, ideally, we should not add __Nonexhaustive to enums that are known to not change in the future in a "backwards-compatible" way. But if that information is hard to gather, and there are instances of this happening, then this cannot be avoided.

@EPashkin
Copy link
Member

So alternative of direct reexport is this? (I use values instead ffi as converting in match cases don't works).

#[doc(hidden)]
impl FromGlib<ffi::GtkButtonRole> for ButtonRole {
    fn from_glib(value: ffi::GtkButtonRole) -> Self {
        skip_assert_initialized!();
        match value as i32 {
            0 => ButtonRole::Normal,
            1 => ButtonRole::Check,
            2 => ButtonRole::Radio,
            _ => ButtonRole::__Nonexhaustive(()),
        }
    }
}

@sdroege
Copy link
Member

sdroege commented Jun 17, 2017

I think it would be good if we could find any examples of such a "backwards-compatible" change occurring in the past. If no such examples could be found, it might be good to get the perspective from a core Gtk developer about their stance on the subject, before adding __Unknown(u32) variant.

(Not core Gtk developer here, but GStreamer core developer and 10+ years GLib/GIO/GObject contributor)
It happens quite often (also see again my comment about std::io::ErrorKind, it also happens in Rust). One case I remember right now is https://gstreamer.freedesktop.org/data/doc/gstreamer/head/gst-plugins-base-libs/html/gst-plugins-base-libs-GstVideoAlignment.html#GstVideoFormat (see the Since: 1.x markers at the bottom). I'm sure if you look close enough, you'll also find cases in GIO and GTK.

And, ideally, we should not add __Nonexhaustive to enums that are known to not change in the future in a "backwards-compatible" way. But if that information is hard to gather, and there are instances of this happening, then this cannot be avoided.

The problem is that you can't detect these cases automatically, so you have to assume by default that enums can be extended (as that is the more general case). Configuration could be added to override that on a per-enum basis. Note also that nowadays we add __Nonexhaustive to every enum on the Rust side, so keeping it that way wouldn't be a change.

So alternative of direct reexport is this? (I use values instead ffi as converting in match cases don't works).

The direct re-export gives you undefined behaviour. The way how it is done now should be safe at least (ignoring for now that repr(C) enums currently do not allow arbitrary values) and panics. What you suggest would be an improvement on that (note that you can just do __Nonexhaustive without the () unit/empty tuple). For debugging purposes (at least) I would add a __Unknown(u32) so that this information is not completely lost on translation.

@EPashkin
Copy link
Member

  1. __Nonexhaustive without () allow cast to int that we want disallow (added in Make cast generated enums to i32 non-scalar #251)
  2. Adding u32 or i32 to enum will increased it size, I prefer don't do it, but don't completely disagree

@sdroege
Copy link
Member

sdroege commented Jun 17, 2017

__Nonexhaustive without () allow cast to int that we want disallow (added in #251)

Ah! Good to know :)

Adding u32 or i32 to enum will increased it size, I prefer don't do it, but don't completely disagree

Maybe behind some kind of "debug" feature?

@johncf
Copy link

johncf commented Jun 17, 2017

@sdroege Thanks for the example. I only now found out that all Gtk 3 versions ship shared objects with their major version set to 0 (e.g. libgtk-3.so.0.2200.15 for 3.22). Sorry I was unaware of this; I always assumed that the major version would be 22 in that case, which we can specify to the linker (which, when I thought about it, is not a very good idea). Since all Gtk3 versions are considered to have "fully backwards-compatible API," I realized that this is indeed unavoidable. (I guess things cannot always be as clean as you want it to be...)

@johncf
Copy link

johncf commented Jun 17, 2017

I was thinking about the implementation part of this, and it seems like it might be easier to define the sys enum types as u32 and the variants as simple u32 constants. Otherwise the generated conversion code in the safe side would have to use raw u32 constants in the match block (which matches a sys enum as u32-ed? This should probably be done only after Rust defines a behavior for invalid transmuted repr(C) enum values).

@johncf
Copy link

johncf commented Jun 17, 2017

A cleaner solution might be: rust-lang/rfcs#2008

But it might be a long wait.

@EPashkin
Copy link
Member

Part of enums has negative values (ex. GtkTextBufferTargetInfo).
All enums in gtk-rs is convertible to i32.
So instead u32 we better use i32 and maybe add check for size it we found that it wrong.

@EPashkin
Copy link
Member

@johncf You agreed that the re-export would not go?

@sdroege So fixed version of enum, will look like this?

#[derive(Clone, Copy, Debug, Eq, PartialEq, Hash)]
pub enum ButtonRole {
    Normal,
    Check,
    Radio,
    #[doc(hidden)]
    __Unknown(i32),
}

#[doc(hidden)]
impl ToGlib for ButtonRole {
    type GlibType = ffi::GtkButtonRole;

    fn to_glib(&self) -> ffi::GtkButtonRole {
        match *self {
            ButtonRole::Normal => ffi::GTK_BUTTON_ROLE_NORMAL,
            ButtonRole::Check => ffi::GTK_BUTTON_ROLE_CHECK,
            ButtonRole::Radio => ffi::GTK_BUTTON_ROLE_RADIO,
            ButtonRole::__Unknown(value) => unsafe{std::mem::transmute(value)},
        }
    }
}

#[doc(hidden)]
impl FromGlib<ffi::GtkButtonRole> for ButtonRole {
    fn from_glib(value: ffi::GtkButtonRole) -> Self {
        skip_assert_initialized!();
        match value as i32 {
            0 => ButtonRole::Normal,
            1 => ButtonRole::Check,
            2 => ButtonRole::Radio,
            value => ButtonRole::__Unknown(value),
        }
    }
}

@EPashkin
Copy link
Member

Note: size of enum changed from 1 to 8. But maybe in GUI application it really don't matter.

@sdroege
Copy link
Member

sdroege commented Jun 17, 2017

@EPashkin That would be one option, yes. I think ideally we would not use enums in the sys crates, but just an integer of the appropriate size plus constants (see discussion in the nikomatsakis/rust-memory-model repository).

I'm not sure if i32 is always correct though, the rules for the underlying integer type of C enums is not trivial IIRC. And yours would fail for values > 2**31

@johncf
Copy link

johncf commented Jun 17, 2017

@johncf You agreed that the re-export would not go?

That's right... Unless rfc#2008 is implemented in a way that helps with this situation.

@EPashkin
Copy link
Member

EPashkin commented Jun 17, 2017

@sdroege Problem with integers that it will be allow to group in function calls. So #[repr(C)]enum will need anyway. We can change definition GTK_BUTTON_ROLE_NORMAL to be integer, but then it can't be used as function parameter.
We can add check on generating enum in normal mode to check it range,
but IMHO enums always used as gint (https://github.com/GNOME/glib/blob/master/gobject/gparamspecs.c#L2046)

@sdroege
Copy link
Member

sdroege commented Jun 18, 2017

Enums in C are unfortunately not always "int", quoting the standard:

Each enumerated type shall be compatible with char, a signed integer type, or an unsigned integer type. The choice of type is implementation-defined,108) but shall be capable of representing the values of all the members of the enumeration.
[...]
An implementation may delay the choice of which integer type until all enumeration constants have been seen.

In practice, enums are "int" or "unsigned int" usually, depending on whether they contain signed values or values >= 2**31. Also I believe GI writes down enum values as signed integers, and bitfields (aka flags) as unsigned integers (but please check that before depending on it).

@EPashkin
Copy link
Member

Note: this maybe need reworking when rust-lang/rust-memory-model#41 fixed on rust side.

@sdroege
Copy link
Member

sdroege commented Jun 26, 2017

Also note that this is still not 100% correct, see discussion in there. It should be fine in practice though.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants