Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Updated version of pull request #25 #28

Merged
merged 9 commits into from
Apr 22, 2024
Merged

Updated version of pull request #25 #28

merged 9 commits into from
Apr 22, 2024

Conversation

LucaCappelletti94
Copy link
Contributor

@LucaCappelletti94 LucaCappelletti94 commented Apr 15, 2024

Updated version of pull request #25

Once merged, this pull request resolves the issues #23 and #24

@LucaCappelletti94 LucaCappelletti94 changed the title Updated version of pull request # Updated version of pull request #25 Apr 15, 2024
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why were these two test source files deleted?

@shanecelis
Copy link
Collaborator

Will mem_dbg let us compare the size of the dictionary versus the size of the trie like this issue aims to?

@LucaCappelletti94
Copy link
Contributor Author

Yes, it should easily be able to do that. If you scroll down in the readme of this other project, you will find mem_dbg examples comparing trie, a simple Vec and an RCL. Next up I will add an RCL compressed with Huffman.

@shanecelis
Copy link
Collaborator

Hi Luca,

This mem_dbg stuff is cool. I wrote a little exploratory code to get a feel for it that I'd like to turn into a test.

    #[cfg(feature = "mem_dbg")]
    #[test]
    fn memsize() {
        use std::{env, io::{BufReader, BufRead}, fs::{File}};
        use mem_dbg::*;

        let mut builder = TrieBuilder::new();

        let repo_root = env::var("CARGO_MANIFEST_DIR").expect("CARGO_MANIFEST_DIR environment variable must be set.");
        let edict2_path = format!("{}/benches/edict.furigana", repo_root);
        println!("Reading dictionary file from: {}", edict2_path);

        let mut n_words = 0;
        for result in BufReader::new(File::open(edict2_path).unwrap()).lines() {
            let l = result.unwrap();
            builder.push(l);
            n_words += 1;
        }
        println!("Read {} words.", n_words);

        let trie = builder.build();
        let size = trie.mem_size(SizeFlags::default());
        eprintln!("Trie size {size}");
        let uncompressed: Vec<String> = trie.predictive_search("").collect();
        eprintln!("Uncompressed size {}", uncompressed.mem_size(SizeFlags::default()));

    }

The file itself is 3,717,355 bytes. When I run the above

$ cargo t --features mem_dbg memsize -- --nocapture
...
Read 185536 words.
Trie size 4174393
Uncompressed size 7984707

I'm surprised the trie is bigger than the dictionary file itself. I would have expected it to be smaller. Fortunately, it is half the size of the "uncompressed" version of the contents.

Collecting the uncompressed version takes a while, so it should probably not be kept as a unit test. Maybe it could be a benchmark somehow.

@shanecelis
Copy link
Collaborator

I'm intent on merging this. Can you update the Changelog?

@LucaCappelletti94
Copy link
Contributor Author

Sure, on it.

@LucaCappelletti94
Copy link
Contributor Author

Should I remove the travis badge? It is no longer working since travis became a paid services.

@LucaCappelletti94
Copy link
Contributor Author

Updated changelog - feel free to publish. I would suggest to remove the travis stuff before.

@shanecelis shanecelis merged commit 6d8fafd into laysakura:master Apr 22, 2024
@shanecelis
Copy link
Collaborator

Thanks for the PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants