Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question: Using with complex types like HashMap #83

Open
nyurik opened this issue Apr 3, 2023 · 5 comments
Open

Question: Using with complex types like HashMap #83

nyurik opened this issue Apr 3, 2023 · 5 comments

Comments

@nyurik
Copy link

nyurik commented Apr 3, 2023

I would like to build this type of a struct, just like I outlined in stackoverflow question. It seems Ouroboros is the closest to this, but I am still not certain if it's possible:

#[self_referencing]
struct StringIndex {
    store: Vec<String>,

    #[borrows(store)]
    #[covariant]
    index: HashMap<&'this str, usize>,
}

and implement some magical method to perform "lookup or push":

impl StringIndex {
  pub fn get_or_insert(&mut self, key: &str) -> usize {
    match self.index.entry(key) {
      Occupied(v) => v.key(),
      Vacant(v) => {
        let idx = self.store.len();
        // Create a new clone, and a borrowed ref to it
        let new_value  = key.to_string();
        let borrowed_value = &key;  // This wouldn't work in Rust
        self.store.push(new_value);
        self.index.insert(borrowed_value, idx);
        idx
      }
    }
  }
}
@kpreid
Copy link
Contributor

kpreid commented Apr 4, 2023

The thing that ouroboros allows you to do is to keep a borrow that points to "this same object", but it's still subject to the rule that mutable access is exclusive access. Since store is borrowed perpetually, you can never get an &mut to it; doing so would be unsound (more than not possible without help — not something a valid program may ever do).

You can use an interior-mutable type as store, though. In particular, typed_arena::Arena might suffice; this compiles:

use std::collections::HashMap;
use typed_arena::Arena;

#[ouroboros::self_referencing]
struct StringIndex {
    store: Arena<String>,

    #[borrows(store)]
    #[covariant]
    index: HashMap<&'this str, usize>,
}

impl StringIndex {
    pub fn get_or_insert(&mut self, key: &str) -> usize {
        self.with_mut(|fields| -> usize {
            if let Some(&value) = fields.index.get(key) {
                value
            } else {
                let idx = fields.store.len();
                let stored_str: &str = fields.store.alloc(key.to_owned());
                fields.index.insert(stored_str, idx);
                idx
            }
        })
    }
}

However, given what you've written I imagine you also want to be able to get from usize to &str. typed_arena::Arena doesn't do that because it intends to offer mutability of the elements; what you want is an append-only list (which probably exists, but I don't know of one to recommend). You could do it by adding a second vector, but perhaps there's a better way than this:

use std::collections::HashMap;
use typed_arena::Arena;

#[ouroboros::self_referencing]
struct StringIndex {
    store: Arena<String>,

    #[borrows(store)]
    #[covariant]
    index: HashMap<&'this str, usize>,

    #[borrows(store)]
    #[covariant]
    rev_index: Vec<&'this str>,
}

impl StringIndex {
    pub fn get_or_insert(&mut self, key: &str) -> usize {
        self.with_mut(|fields| -> usize {
            if let Some(&value) = fields.index.get(key) {
                value
            } else {
                let idx = fields.store.len();
                let stored_str: &str = fields.store.alloc(key.to_owned());
                fields.rev_index.push(stored_str);
                fields.index.insert(stored_str, idx);
                idx
            }
        })
    }

    pub fn lookup_idx(&self, idx: usize) -> Option<&str> {
        self.borrow_rev_index().get(idx).copied()
    }
}

@nyurik
Copy link
Author

nyurik commented Apr 4, 2023

@kpreid thank you for such an awesome write up and explanation!!! TIL :) You are absolutely correct -- my use case is an ever-growing (accumulator style) vector -- the user needs to build a vector of values (using Vec::push() only, no modifications/deletions) and use the index of those values. Once done inserting, the entire vector will be consumed. So the ideal interface to this accumulator/interner is:

// Note that this interface should support `&'static str`, `String`, or any other `T`s,
// but not a mix of them in the same container.
fn append<T>(&mut self, value: T) -> usize;
fn into(self) -> Vec<T>;

From the sounds of it, I may have to actually implement this as a separate crate with unsafe functionality - if the Vec is only growing, the non-exposed references to the vector's values shouldn't violate any Rust memory rules. So internally it would be a Vec<T> and HashMap<&T, usize>, and as long as I don't allow any kind of vector access or modification beyond push, this should be sound.

P.S. Possible optimization of the above is if the T is a &'static V -- in this case it might make sense for the internal HashMap to store T itself, and not a ref to it (avoid double redirect): HashMap<T, usize>

@kpreid
Copy link
Contributor

kpreid commented Apr 4, 2023

the user needs to build a vector of values (using Vec::push() only, no modifications/deletions) and use the index of those values. Once done inserting, the entire vector will be consumed.

In this case, typed_arena is all you need, because it has Arena::into_vec(), and in order to get the Arena out, you can use ouroboros's generated into_heads() method. Just add this method to my first code sample (not tested):

    pub fn into_vec(self) -> Vec<String> {
        self.into_heads().store.into_vec()
    }

if the Vec is only growing, the non-exposed references to the vector's values shouldn't violate any Rust memory rules.

No, that's not reliably true: when the Vec grows, it allocates new memory and moves the existing data into it, invalidating references.

@nyurik
Copy link
Author

nyurik commented Apr 4, 2023

thanks, when i tried, i realized that the ref was into Vec, not into the memory allocated for the string itself. I came up with this (without Arena): I simply duplicate the String object as is, and then implement the Drop trait to ensure that I don't do the double-free. Note that I couldn't figure out which version of the insert_or_get to call, so I had to be explicit.

mod stores {
    use std::collections::HashMap;
    use std::mem;

    #[derive(Default)]
    struct LeakyKeyHashMap<K, V>(pub HashMap<K, V>);

    impl<K, V> Drop for LeakyKeyHashMap<K, V> {
        fn drop(&mut self) {
            for k in mem::take(&mut self.0).into_keys() {
                mem::forget(k);
            }
        }
    }

    #[derive(Default)]
    pub struct Store<T> {
        keys: Vec<T>,
        index: LeakyKeyHashMap<T, usize>,
    }

    impl<T> Store<T> {
        pub fn into_vec(self) -> Vec<T> {
            self.keys
        }
    }

    impl Store<String> {
        pub fn insert_or_get(&mut self, key: String) -> usize {
            if let Some(index) = self.index.0.get(&key) {
                *index
            } else {
                let index = self.keys.len();
                let (ptr, len, cap) = (key.as_ptr(), key.len(), key.capacity());
                self.keys.push(key);
                let key_dup = unsafe { String::from_raw_parts(ptr as *mut u8, len, cap) };
                self.index.0.insert(key_dup, index);
                index
            }
        }
    }

    impl Store<&'static str> {
        pub fn insert_or_get(&mut self, key: &'static str) -> usize {
            if let Some(index) = self.index.0.get(&key) {
                *index
            } else {
                let index = self.keys.len();
                self.keys.push(key);
                self.index.0.insert(key, index);
                index
            }
        }
    }
}

fn main() {
    use stores::Store;

    let mut store = Store::default();

    let foo_idx = Store::<String>::insert_or_get(&mut store, "foo".to_string());
    Store::<String>::insert_or_get(&mut store, "bar".to_string());
    let foo_idx2 = Store::<String>::insert_or_get(&mut store, "foo".to_string());
    assert_eq!(foo_idx, foo_idx2);

    let keys = store.into_vec();
    assert_eq!(keys, vec!["foo".to_string(), "bar".to_string()]);
}

@nyurik
Copy link
Author

nyurik commented Apr 4, 2023

I made a new crate with the above idea, hope it is "sound" :) https://github.com/nyurik/dup-indexer

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants