On a scale from 1-10, I would rate my experience a 3 out of 10. I first used Rust about 3 months ago. In my current and previous roles, I primarily used Python and C++. Obviously, I do aspire to write idiomatic Rust, but at this point I don't feel confident judging that.
For Part 2, I implemented the fixed sized open addressing hash table with linear probing as requested. To get the first and last item that were inserting, we have to implement some kind of insertion order tracking.
To retain O(1) time complexity for all the requested operations, I decided to use a doubly linked list. This makes it easy to remove keys that were inserted somewhere in the middle without having to search through many entries.
Rather than using actual nodes in the doubly-linked list, I decided to use the indices of the table itself. Since the table is fixed size, this seemed like the msot obvious and straigthforward way to implement the doubly-linked list.
To run the tests run:
cargo test
To run the actual code that reads the book and inserts it into the table run:
cargo run --bin part2 --release
or if you have just
installed (cargo install just
), you can run:
just part2
I benchmarked against the default hash map in Rust. It seems to have similar performance, which I think is reasonable given that we do have the added functionality of insertion order tracking.
Optimizations:
- Used ahash rather than the default hasher. This provides better distribution properties for the hash function.
- Used a power of two for the table size. This makes it easier to use the modulo operator to find the index.
- Use a high enough power of two for the table size. This minimizes the number of collisions. This requires knowing the number of unique keys in advance.
The parsing time here seems to be highly dominated by the network time. Once the resposne arrives, the whole JSON is parsed very fast relative to the time it takes to actually receive the bytes.
using simd_json
rather than the default serde_json
to deserialize does seem to help a bit with the parsing speed.
The run time complexity is O(n) where n represents the number of entries in the GET response.
To run:
cargo run --bin part3 --release
or if you have just
installed:
just part3