-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Get the span as a result of a match #242
Comments
Tagging @V0ldek for notifications |
We already have Unfortunately, we can't exactly get zero-copy, because of the additional work required to get the end of a match – starting indices are easy (hence However, if the purpose is for external to parse the results, then trimming the whitespace is probably not required. I'd expect any parser to just discard the trailing whitespace. Maybe we can expose this and explicitly document that it's an approximate span, where the ends might point to some whitespace after the actual match. Call it |
To make the issue concrete, there are currently two real cases where a match includes whitespace that needs to be trimmed. One, if the JSON is weird and has spaces before commas delimiting items in an object or list: {
"a": 42 ,
"b": 43
} with query [ 42 , 43 ] where This is a corner-case, since no sane person or formatter writes JSONs like that. The other happens all the time, however, and it's when the match happens at the end of an object/list and so it will include all whitespace until the parent's closing brace/bracket: {
"a": {
"b": 42
}
} where In the last case, the correct span to report is 20-21, but to get that we need to trim the whitespace after the match. In the fast approximate span endpoint the report would be 20-25, which would include all the whitespace. |
@charles-paperman Is there value in adding this to |
I believe it is purely for the lib. I don't think reporting the span in rq makes a lot of sense. |
So the API here would be: fn approx_spans<I, S>(&self, input: &I, sink: &mut S) -> Result<(), EngineError>
where I: Input,
S: Sink<MatchSpan>; Internally, a different recorder that would eschew all the byte-copying and whitespace trimming and work with reported offsets only. It'd still need to use the stack to make sure the responses are ordered properly, we should follow a similar design to The documentation MUST explicitly and prominently state that the spans will be inaccurate at the ends, but with the guarantee that only trailing JSON whitespace characters are included. |
Efficient approximate copy-free span would be neat! |
- Engine can return an approximate span of the match, where "approximate" means the start index is correct, but the end index might include trailing whitespace after the match. - This mode is much faster that full `matches`, close to the performance of `count`, especially for large result sets. - This is a library-only feature. Ref: #242
- Engine can return an approximate span of the match, where "approximate" means the start index is correct, but the end index might include trailing whitespace after the match. - This mode is much faster that full `matches`, close to the performance of `count`, especially for large result sets. - This is a library-only feature. Ref: #242
- Engine can return an approximate span of the match, where "approximate" means the start index is correct, but the end index might include trailing whitespace after the match. - This mode is much faster that full `matches`, close to the performance of `count`, especially for large result sets. - This is a library-only feature. Ref: #242
- Engine can return an approximate span of the match, where "approximate" means the start index is correct, but the end index might include trailing whitespace after the match. - This mode is much faster that full `matches`, close to the performance of `count`, especially for large result sets. - This is a library-only feature. Ref: #242
Implemented in v0.8.1 as |
Given a string in RAM, it would be nice to collect all the span of the result so that they could be parsed and loaded with an external solution.
The text was updated successfully, but these errors were encountered: