Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Creates a string from unvalidated utf8 #96

Closed
psychon opened this issue Jan 23, 2021 · 4 comments
Closed

Creates a string from unvalidated utf8 #96

psychon opened this issue Jan 23, 2021 · 4 comments

Comments

@psychon
Copy link

psychon commented Jan 23, 2021

I have no words:

impl GetAtomNameReply {
[...]
    pub fn name(&self) -> &str {
        unsafe {
            let field = self.ptr;
            let len = xcb_get_atom_name_name_length(field) as usize;
            let data = xcb_get_atom_name_name(field);
            let slice = std::slice::from_raw_parts(data as *const u8, len);
            // should we check what comes from X?
            std::str::from_utf8_unchecked(&slice)
        }
    }
}
@psychon
Copy link
Author

psychon commented Jan 23, 2021

This should affect every list of chars in the XML description, I think:
https://github.com/rtbo/rust-xcb/blob/b44e52845e21621613d57bf5eb47c48c2ad521dd/rs_client.py#L1360-L1361

$ git grep '<list type="char'  | wc -l
68
$ git grep  '<list type="char' | cut -f1 -d: | uniq -c
      2 src/dri2.xml
      7 src/glx.xml
      5 src/randr.xml
      1 src/render.xml
      1 src/sync.xml
      2 src/xf86dri.xml
      2 src/xf86vidmode.xml
      5 src/xinput.xml
     18 src/xproto.xml
     22 src/xselinux.xml
      3 src/xv.xml

(Well, not really, the above also counts lists that are sent to the X11 server and not only those that are received)

Some examples in xproto:

  • Setup::vendor()
  • GetAtomNameReply::name()
  • ListFontsWithInfoReply::name()
  • Anything containing STR is affected:
    • ListFontsReply
    • GetFontPathReply
    • ListExtensionsReply

@psychon
Copy link
Author

psychon commented Jan 23, 2021

For atoms, the name should be in Latin1: https://www.x.org/releases/X11R7.6/doc/xproto/x11protocol.html#requests:InternAtom

The string should use the ISO Latin-1 encoding.

However, the X11 server just treats them as a bunch of bytes / a C string. I am not sure what the server does with embedded \0 bytes, but this is not our problem.

Anyway, this is the handler for InternAtom requests that mostly just forwards to MakeAtom: https://gitlab.freedesktop.org/xorg/xserver/-/blob/f08ab719df921e1269691553daf56853380fb241/dix/dispatch.c#L1065

This then does some work with a hash table / a binary tree / I am not quite sure what and uses strcncmp to compare things: https://gitlab.freedesktop.org/xorg/xserver/-/blob/f08ab719df921e1269691553daf56853380fb241/dix/atom.c#L74

So, the X11 server will not mind if an atom name is not valid utf8.

I see two solutions on how to fix this in xcb-rs and both require API changes:

  • Change the return value from &str to &[u8] and directly return the slice instead of calling std::str::from_utf8_unchecked()
  • Change the return value to some kind of Result<&str, SomeKindOfError>, e.g. the return value of std::str::from_utf8()

For convenience, it might make sense to also add &str variants of the API again, but this then requires some way to return errors, so Option<&str> or Result<&str, something-here>.

@rtbo
Copy link
Collaborator

rtbo commented Nov 11, 2021

v1.0-beta.0 is released (see branch v1.0-dev). Return value is still &str, but the value is unwrapped rather than unchecked, so at least it is not unsafe anymore.

Before I release v1.0 I will go for your proposal:

fn name(&self) -> Result<&str, Utf8Error>;
fn name_raw(&self) -> &[u8];

@rtbo
Copy link
Collaborator

rtbo commented Dec 26, 2021

There is a change of API that I think is more sound.

As the X documentation states that the strings (those specified as <list type="char" /> in XML) are encoded in Latin-1 (ISO 8859-1), I've introduced the types Lat1Str and Lat1String as borrowed and owned versions of a Latin-1 slice.
Methods returning a string now actually return &Lat1Str. The user would call Lat1Str::to_utf8 and get a Cow<str> (Cow::Borrowed is returned if the string is ASCII, which I think to be 100% of the cases).
That way the user always get valid UTF-8 in a sound way, without unwrapping any result.
See Lat1Str and x::GetAtomNameReply::name.
See also here for usage example.

Constructing Lat1Str may be too cumbersome for the user and is also not needded. Therefore strings supplied by the user to requests (e.g. x::InternAtom::name) are slice of bytes. You would typically use it the same way, but with an additional b before the string literal. E.g:

x::InternAtom {
    only_if_exists: true,
    name: b"WM_PROTOCOLS",
}

If possible I'd like to receive feedback on this new API before 1.0 is released. I've published v1.0.0-beta.3 on crates.io with this change.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants