Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support Borrow trait so UniCase can be used as a hashmap key #22

Open
lilith opened this issue Oct 11, 2017 · 7 comments
Open

Support Borrow trait so UniCase can be used as a hashmap key #22

lilith opened this issue Oct 11, 2017 · 7 comments

Comments

@lilith
Copy link

lilith commented Oct 11, 2017

HashMap<UniCase<String>,String>::get(UniCase<&str>)

@Ryman
Copy link
Contributor

Ryman commented Oct 12, 2017

For reference: there were issues with the soundness of this previously, it might be worth reinvestigating: #6

@lilith
Copy link
Author

lilith commented Oct 13, 2017

This works, but soundness is reliant on repr(transparent), which isn't implemented yet.
The heavy reliance on AsRef instead of Borrow means that most code would have to be duplicated.

Enums like Encoding also aren't compatible with this strategy.

impl Ascii<str> {
    fn borrowed(s: &str) -> &Self {
        unsafe {
            &*(s as *const str as *const Ascii<str>)
        }
    }
}

impl Borrow<Ascii<str>> for Ascii<String> {
    #[inline]
    fn borrow(&self) -> &Ascii<str> { Ascii::borrowed(&self.0) }
}

The approach in #6 still produces the same error. It also looks specific to Cow<>.

I'm currently using this instead:


// TODO: uncomment this when support lands in rustc for improved safety
//#[repr(transparent)]
#[derive(Copy,Clone,Debug)]
pub struct AsciiFolding<S: ?Sized>(S);

impl<S> AsciiFolding<S> {
    pub fn new(s: S) -> Self {
       AsciiFolding(s)
    }
}

impl<S: AsRef<str>> AsRef<str> for AsciiFolding<S> {
    #[inline]
    fn as_ref(&self) -> &str {
        self.0.as_ref()
    }
}

impl<S: fmt::Display> fmt::Display for AsciiFolding<S> {
    #[inline]
    fn fmt(&self, fmt: &mut fmt::Formatter) -> fmt::Result {
        fmt::Display::fmt(&self.0, fmt)
    }
}

impl<S: ?Sized> AsciiFolding<S> {
    pub fn borrowed(s: &S) -> &Self {
        unsafe { &*(s as *const S as *const AsciiFolding<S>) }
    }
}

impl Borrow<AsciiFolding<str>> for AsciiFolding<String> {
    #[inline]
    fn borrow(&self) -> &AsciiFolding<str> {
        AsciiFolding::borrowed(self.0.borrow())
    }
}

impl<S: Borrow<str> + ?Sized> PartialEq for AsciiFolding<S> {
    #[inline]
    fn eq(&self, other: &Self) -> bool {
        self.0.borrow().eq_ignore_ascii_case(other.0.borrow())
    }
}

impl<S: Borrow<str> + ?Sized> Eq for AsciiFolding<S> {}

impl<S: Borrow<str> + ?Sized> Hash for AsciiFolding<S> {
    #[inline]
    fn hash<H: Hasher>(&self, hasher: &mut H) {
        for byte in self.0.borrow().bytes().map(|b| b.to_ascii_lowercase()) {
            hasher.write_u8(byte);
        }
    }
}

#[test]
fn test_hashmap_key() {
    let mut m = std::collections::HashMap::new();
    m.insert(AsciiFolding("v".to_owned()), "value");

    m.get(AsciiFolding::borrowed("v")).unwrap();
}

@troiganto
Copy link

Since repr(transparent) now exists (since 1.28.0), does this mean this is unblocked now? What would have to be done to get this into unicase?

benesch added a commit to benesch/rust-phf that referenced this issue Nov 6, 2020
uncased is an ASCII-only case insensitive comparison library like
UniCase, but it has better support for being used in a map.

Specifically, UniCase does not support the Borrow trait (see
seanmonstar/unicase#22), so the type

    phf::Map<UniCase<&'static str>, _>

only supports lookups with static string literals, which has limited
usefulness.

By contrast, the the equivalent uncased construction

    phf::Map<&'static Uncased, _>

*does* support lookups with non-static strings.
benesch added a commit to benesch/rust-phf that referenced this issue Nov 6, 2020
uncased is an ASCII-only case insensitive comparison library like
UniCase, but it has better support for being used in a map.

Specifically, UniCase does not support the Borrow trait (see
seanmonstar/unicase#22), so the type

    phf::Map<UniCase<&'static str>, _>

only supports lookups with static string literals, which has limited
usefulness.

By contrast, the the equivalent uncased construction

    phf::Map<&'static Uncased, _>

*does* support lookups with non-static strings.
benesch added a commit to benesch/rust-phf that referenced this issue Nov 6, 2020
uncased is an ASCII-only case insensitive comparison library like
UniCase, but it has better support for being used in a map.

Specifically, UniCase does not support the Borrow trait (see
seanmonstar/unicase#22), so the type

    phf::Map<UniCase<&'static str>, _>

only supports lookups with static string literals, which has limited
usefulness.

By contrast, the the equivalent uncased construction

    phf::Map<&'static Uncased, _>

*does* support lookups with non-static strings.
@rushmorem
Copy link

rushmorem commented Mar 26, 2021

I recently needed this. To work around it, I wrapped UniCase in an enum AnyCase:

enum AnyCase<'a> {
    Owned(UniCase<String>),
    Borrowed(UniCase<&'a str>),
}

This way, I can use the Owned variant (and 'static Borrowed) for insertions and the Borrowed one for lookups.

@vallentin
Copy link

Another alternative solution is also HashMap<UniCase<Cow<str>>, _> and then do .get(&UniCase::new(Cow::Borrowed(...)))

@rushmorem
Copy link

You are right. That was basically Cow. I have dropped the anycase module. Thanks for pointing that out!

@ijackson
Copy link

What we want is &Unicase<str> which would be a fat pointer like &str. But this is not really doable.

The problem is that Borrow requires that the returned pointer (&Unicase<str>, we are imagining) borrows only data from the original input. But <Unicase<String> as Borrow<Unicase<str>> has to borrow the discriminant, and also the actual string data, but they are in separate places.

In principle I think it might be possible to make some kind of custom fat pointer using the unstable pointer metadata API, but it would involve unsafe and be extremely hairy (and I can't see how to avoid stealing a bit from the length, since where else is that bit going to live?).

An alternative would be to have a nonoptimised UnicaseNoOpt<str> which could be transparent, so that &'s UnicaseNoOpt<str> can be made from &'s str with one transmute. (I like newtype wrappers around str so I do this quite often and it works well.) It is always fine to treat an ascii-only Unicase as a needing-unicode one, so this would be algorithmically correct. It would then be straightforward to impl Borrow<UnicaseNoOpt<str>> for Unicase<String>. The downside is that the optimisation goes away and also it would involve a new newtype with a whole matrix of trait impls and conversion methods and so on.

I may experiment with this.

benesch added a commit to benesch/rust-phf that referenced this issue Jun 16, 2021
uncased is an ASCII-only case insensitive comparison library like
UniCase, but it has better support for being used in a map.

Specifically, UniCase does not support the Borrow trait (see
seanmonstar/unicase#22), so the type

    phf::Map<UniCase<&'static str>, _>

only supports lookups with static string literals, which has limited
usefulness.

By contrast, the the equivalent uncased construction

    phf::Map<&'static Uncased, _>

*does* support lookups with non-static strings.
benesch added a commit to benesch/rust-phf that referenced this issue Jun 16, 2021
uncased is an ASCII-only case insensitive comparison library like
UniCase, but it has better support for being used in a map.

Specifically, UniCase does not support the Borrow trait (see
seanmonstar/unicase#22), so the type

    phf::Map<UniCase<&'static str>, _>

only supports lookups with static string literals, which has limited
usefulness.

By contrast, the the equivalent uncased construction

    phf::Map<&'static Uncased, _>

*does* support lookups with non-static strings.
benesch added a commit to benesch/rust-phf that referenced this issue Jun 16, 2021
uncased is an ASCII-only case insensitive comparison library like
UniCase, but it has better support for being used in a map.

Specifically, UniCase does not support the Borrow trait (see
seanmonstar/unicase#22), so the type

    phf::Map<UniCase<&'static str>, _>

only supports lookups with static string literals, which has limited
usefulness.

By contrast, the the equivalent uncased construction

    phf::Map<&'static Uncased, _>

*does* support lookups with non-static strings.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants