Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SearchHit explicitly saves and uses source content-type #104903

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

ldematte
Copy link
Contributor

@ldematte ldematte commented Jan 30, 2024

source is passed as a BytesReference, but then its original content type is required to write out XContent and it needs to be inferred - which is deprecated across several functions.

This PR adds a content type parameter whenever a source is specified or set in a SearchHit.
This preliminary work will make "chunkification" of the missing bits of SearchHit response (the source field, missing from #104770) simpler and more efficient.

@ldematte ldematte added :Search/Search Search-related issues that do not fall into other categories test-update-serverless >refactoring labels Jan 30, 2024
@ldematte ldematte marked this pull request as ready for review January 30, 2024 16:04
@elasticsearchmachine elasticsearchmachine added the Team:Search Meta label for search team label Jan 30, 2024
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-search (Team:Search)

@javanna
Copy link
Member

javanna commented Jan 31, 2024

hey @ldematte having worked a long time ago on the deprecation of content type guessing, what is the plan for removing that for source? If I remember correctly there are places where we don't store the content type and we have no other way than guessing it?

@ldematte
Copy link
Contributor Author

ldematte commented Feb 1, 2024

Hey @javanna, thanks for taking a look.
I'm not 100% sure what you are asking; my purpose with this PR is quite limited, focusing on SearchHit.

Currently, when we are writing out source, we use XContentHelper.writeRawField; this needs to guess the content type of the source to see if the field can written "as is" or needs to be parsed and re-written.
Having the content-type stored along source would help removing this deprecated call.

Today a SearchHit may have a valid source field or not; my understanding is that if it does have it, it can either come from the Transport protocol (SearchHit#readFrom), from XContent parsing (via MAP_PARSER and a Map<String, Object>), or from setting it directly via SearchHit#sourceRef.

For the first one, we can transmit the content-type too, adding it to the protocol (from now on) and fallback to guessing it when it's not present (older nodes/protocol).
The second one: since we are parsing, we have access to the parser and its content type, so we can save it.
For the last one: I tracked down 2 non-test usages of SearchHit#sourceRef.

  • EnrichShardMultiSearchAction: in this case the content type is fixed to XContentType.SMILE - the method that produces the filterSource to be passed to sourceRef (filterSource) uses it explicitly.
  • FetchSourcePhase: in this case the bytes come from a Source, which already has a sourceContentType(). It is true that this is (again) guessed from the source bytes in case this is not available.

Looking at Source a bit more in depth, I see that in many cases this is built without content-type indeed; this means I'm not solving that much here, the content-type is still guessed most of the time. But my scope was limited to SearchHit. It's a little step, but it's one place less where content-type is guessed and will help in serializing source to XConent in chunks. Let me know if this is OK, if this does answer your question, or if you think that I should expand the scope.

@javanna
Copy link
Member

javanna commented Feb 2, 2024

Thanks @ldematte I am ok with the current scope. I wanted to see what the high-level plan was, and I think there is still an issue around the fact that we don't store the source content type in all places. Certainly content type guessing from SearchHit itself can be removed with this PR.

@ldematte
Copy link
Contributor Author

ldematte commented Feb 2, 2024

I wanted to see what the high-level plan was, and I think there is still an issue around the fact that we don't store the source content type in all places

I agree with you @javanna: this is only a small bit, by no means I would consider the issue solved. Nothing has changed high-level I'm afraid, I only happened across this bit doing related work, so the scope is quite limited.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>refactoring :Search/Search Search-related issues that do not fall into other categories Team:Search Meta label for search team test-update-serverless v8.15.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants