Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

hsts_list takes 26Mb - can it be lower? #25929

Open
paulrouget opened this issue Mar 9, 2020 · 4 comments
Open

hsts_list takes 26Mb - can it be lower? #25929

paulrouget opened this issue Mar 9, 2020 · 4 comments

Comments

@paulrouget
Copy link
Contributor

@paulrouget paulrouget commented Mar 9, 2020

hsts_list (public and private) takes 26Mb in memory. That's a pretty large blob in memory allocated just at startup.

Would there be any better way to store it?

@jdm
Copy link
Member

@jdm jdm commented Mar 9, 2020

@SimonSapin
Copy link
Member

@SimonSapin SimonSapin commented Mar 19, 2020

pub struct HstsEntry {
pub host: String,
pub include_subdomains: bool,
pub max_age: Option<u64>,
pub timestamp: Option<u64>,
}

There are some easy wins here. The String could be a Box<str>.
The three bits for the bool and the discriminants for each Option could be packed into a bitflags so they don’t occupy 3 × 64 bits because of alignment. Alternatively some more manual bit packing could be used if we don’t need the full u64 range.

It looks like the current code has a single HashMap for both the preloaded entries and other entries. I assume the preloaded ones are typically more numerous, making most of those 26 MB. If we keep the preloaded entries in a phf map https://github.com/sfackler/rust-phf (and keep a separate HashMap for the rest) they would be kept in static memory, which the OS can mmap in and out of cache and share across Servo processes.

@SimonSapin
Copy link
Member

@SimonSapin SimonSapin commented Mar 19, 2020

rust-phf already exists, so it would be relatively easy to use. An automaton like Gecko’s would take more effort but likely produce faster lookups and maybe more compact storage.

@skrzyp1
Copy link
Contributor

@skrzyp1 skrzyp1 commented Jun 1, 2020

I saw that there is aho-corasick crate already used in servo so i tried to use it here as an experiment .
However best size I got so far is 168MiB: aho_corasick::nfa::NFA<S>::add_sparse_state (nfa.rs:206). (i was using massif)
i might be doing something wrong (first time using this crate), or its just wrong tool for the job.
aho-corasick is probably more robust than needed - we just need to match patterns that start from beginning so the custom automaton could be smaller.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
4 participants
You can’t perform that action at this time.