-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add ability to write lexicons directly #32
Conversation
src/lib.rs
Outdated
let termlex: PayloadVector = std::fs::read_to_string(&index_paths.terms)? | ||
.trim() | ||
.split('\n') | ||
.map(str::to_string) | ||
.collect(); | ||
let mut lex_path = BufWriter::new(File::create(&index_paths.termlex)?); | ||
termlex.write(&mut lex_path)?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since we're using this code twice, it might be good to create a build_lexicon
function (or whatever you feel the good name would be) that takes input txt file and output lexicon path, and does all this: reads, creates vector, writes.
We can actually add it in payload_vector
module, and rewrite the test_write
test slightly to use it. We'd need to first write the txt file instead of reading it from a file that's in git because of newlines problem on windows; in that case we can remove that file altogether I think; you could also rewrite this code to not explicitly use \n
as separator, e.g., use lines
on BufReader
instance).
I think this might work, though I'm typing it off the top of my head:
let termlex = BufReader::new(File::open(input)?)
.lines()
.collect::<Result<PayloadVector, _>>()?;
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed. I have worked the function in but have not yet updated the test case.
This reverts commit 9e5ce2b.
@elshize Is this ready for review/merge? |
@JMMackenzie if you're ok with my edits, then go ahead and merge. |
Implements direct writes for the
.termlex
and.doclex
files when runningciff2pisa
.