-
-
Notifications
You must be signed in to change notification settings - Fork 599
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Function Score Query #2396
base: main
Are you sure you want to change the base?
Add Function Score Query #2396
Conversation
How you would say this relates to the existing |
Ah sorry, you wrote that in the linked issue... |
Sorry that I'm not quite used to contribute to large OSS projects so I'm not sure the exact practices on linking Issues and PR. Will add the issue link in the PR next time |
…he fastfield test to multiple documents
Regarding the formatting errors, note that you'll need to use a nightly toolchain to apply the correct formatting configuration. |
src/query/function_score_query.rs
Outdated
} | ||
} | ||
|
||
impl DocSet for FunctionScorer { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you implement the other methods too? seek for instance, has a default impemnetation but it will be catastrophically slow.
src/query/function_score_query.rs
Outdated
use crate::query::{EnableScoring, Explanation, Query, Scorer, Weight}; | ||
use crate::{DocId, DocSet, Score, SegmentReader, TantivyError}; | ||
|
||
type Function = Arc<dyn Fn(&SegmentReader, Score, DocId) -> Score + Send + Sync + 'static>; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If your fucntion needs to access a column from a fast field, you don't want to access it at each call.
The function should be a factory that takes a SegmentReader
and returns a scoring function
I have removed the use of The use of The current signature should only be a simple |
I think @fulmicoton's proposed approach is very right on target and has precedence in other Tantivy API, i.e. make the user pass a fallible factory function which given a segment reader will produce an infallible per-document scoring function. |
What comes up in my mind is a new enum like pub enum ScoreFunction {
ScoreOnly(Arc<dyn Fn(Score) -> Score + Send + Sync + 'static>),
ScoreWithDetails(Arc<dyn Fn(&SegmentReader, Score, DocId) -> Score + Send + Sync + 'static>),
}
pub struct FunctionScoreQuery {
query: Box<dyn Query>,
score_function: ScoreFunction,
} so that we can match different types of If we accept users to dynamically use the segment reader based on different conditions, we have to include it in the closure input parameter and in the end each call will still calling the segment reader. Or do you have any suggestions? |
The idea is not allow the scoring function access to the segment reader but only to the relevant column itself, because after that column has been loaded, access to the values becomes essentially infallible. type ScoreFunction = Arc<dyn Fn(DocId, Score) -> Score + Send + Sync + 'static>;
type ScoreFunctionFactory = Arc<dyn Fn(&SegmentReader) -> crate::Result<ScoreFunction> + Send + Sync + 'static>; (The score function will access the column retrieved from the segment reader in the factory by capturing it.) |
Yes, that makes sense by utilizing the type checker system. However, I'm still thinking how can we be benefited with not accessing the segment reader by every call, as @fulmicoton mentioned:
I think it's better to separate two structs as |
The implementation of FunctionScoreQuery which accepts an existing query and a closure for applying new scoring. You may consider this query as a
.map()
function which does not filter the documents out, but rather change their score.Closes #2395