Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add ability to write lexicons directly #32

Merged
merged 11 commits into from
Mar 7, 2022
Merged

Add ability to write lexicons directly #32

merged 11 commits into from
Mar 7, 2022

Conversation

JMMackenzie
Copy link
Member

Implements direct writes for the .termlex and .doclex files when running ciff2pisa.

@JMMackenzie JMMackenzie requested a review from elshize March 4, 2022 02:26
src/lib.rs Outdated Show resolved Hide resolved
src/lib.rs Outdated
Comment on lines 344 to 350
let termlex: PayloadVector = std::fs::read_to_string(&index_paths.terms)?
.trim()
.split('\n')
.map(str::to_string)
.collect();
let mut lex_path = BufWriter::new(File::create(&index_paths.termlex)?);
termlex.write(&mut lex_path)?;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we're using this code twice, it might be good to create a build_lexicon function (or whatever you feel the good name would be) that takes input txt file and output lexicon path, and does all this: reads, creates vector, writes.

We can actually add it in payload_vector module, and rewrite the test_write test slightly to use it. We'd need to first write the txt file instead of reading it from a file that's in git because of newlines problem on windows; in that case we can remove that file altogether I think; you could also rewrite this code to not explicitly use \n as separator, e.g., use lines on BufReader instance).

I think this might work, though I'm typing it off the top of my head:

let termlex = BufReader::new(File::open(input)?)
    .lines()
    .collect::<Result<PayloadVector, _>>()?;

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. I have worked the function in but have not yet updated the test case.

@JMMackenzie JMMackenzie requested a review from elshize March 7, 2022 01:21
@JMMackenzie
Copy link
Member Author

@elshize Is this ready for review/merge?

@elshize
Copy link
Member

elshize commented Mar 7, 2022

@JMMackenzie if you're ok with my edits, then go ahead and merge.

@JMMackenzie JMMackenzie merged commit 01d6af5 into master Mar 7, 2022
@JMMackenzie JMMackenzie deleted the lexicons branch March 7, 2022 01:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants