Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pass document by reference. #11

Merged
merged 9 commits into from
Aug 12, 2022
Merged

Conversation

tmpfs
Copy link
Collaborator

@tmpfs tmpfs commented Aug 11, 2022

I think it makes more sense to pass a reference to the document when adding to the index as it is only a reference that is passed to the extraction function for each field.

Also I have a use case where I need to store the original documents and this prevents an unnecessary clone when adding to the index.

Thanks 🙏

@marcus-pousette
Copy link
Collaborator

Thanks for the PR!

This looks good. Not sure why I passed documents by value in the first place when creating this.

Before merging:

  • Could you make sure that the cargo fmt check works as expected? See the failing pipeline step.

  • Could you make sure cargo bench works? There is a compile error in test_benchmark.rs

@tmpfs
Copy link
Collaborator Author

tmpfs commented Aug 11, 2022

Sure @marcus-pousette, those are fixed now 👍

@tmpfs
Copy link
Collaborator Author

tmpfs commented Aug 11, 2022

Technically this is a breaking change, do you want to bump the major version, maybe 2.0.0-alpha-1?

Also, I think a better API for Filter would be to return &str and Tokenizer could return Vec<&str> and you can take ownership in add_document_to_index() where appropriate, what do you think?

@marcus-pousette
Copy link
Collaborator

marcus-pousette commented Aug 11, 2022

A major version bump sounds good. Especially if we could have the improvements in the Filter and Tokenizer as you mentioned. I think the updates you mentioned sound great!

Additional things:

  • I realized the readme has to be changed with this PR See the example where add_document_to_index is invoked. Should be reference now on the document in stead of values.

  • cargo clippy is complaining, see failiing pipeline

If you are motivated to make the Tokenizer, and Filter update, feel free to include those in this PR or in another PR

@tmpfs
Copy link
Collaborator Author

tmpfs commented Aug 11, 2022

Thanks @marcus-pousette, lets see if we can get the Tokenizer and Filter changes into this PR while I am at it, should get to it in the next day or so 👍

@tmpfs
Copy link
Collaborator Author

tmpfs commented Aug 12, 2022

@marcus-pousette, I think this is good now, please review 👍

The benchmarks show a little performance improvement 😁

Copy link
Collaborator

@marcus-pousette marcus-pousette left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The changes you have done look good.

I included some suggestions for additional improvements that are now possible because of your changes, if you like to apply them (for even more performance gains) feel free to :). Otherwise I will merge this!

src/index.rs Outdated Show resolved Hide resolved
src/index.rs Outdated Show resolved Hide resolved
src/index.rs Outdated Show resolved Hide resolved
src/index.rs Outdated Show resolved Hide resolved
@tmpfs
Copy link
Collaborator Author

tmpfs commented Aug 12, 2022

Good catch @marcus-pousette , I have updated with those changes 👍

@marcus-pousette marcus-pousette merged commit 9d7a7fd into quantleaf:master Aug 12, 2022
@marcus-pousette
Copy link
Collaborator

Well done! Thanks 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants