-
Notifications
You must be signed in to change notification settings - Fork 24.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reflect latest changes in synthetic source documentation #109501
Conversation
Documentation preview: |
Pinging @elastic/es-docs (Team:Docs) |
Pinging @elastic/es-storage-engine (Team:StorageEngine) |
@@ -61,10 +60,8 @@ types: | |||
** <<numeric-synthetic-source,`long`>> | |||
** <<range-synthetic-source,`range` types>> | |||
** <<numeric-synthetic-source,`scaled_float`>> | |||
** <<search-as-you-type-synthetic-source,`search_as_you_type`>> | |||
** <<numeric-synthetic-source,`short`>> | |||
** <<text-synthetic-source,`text`>> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
text
is controversial - if there is no keyword subfield it will store text in a field which is kind of the same as non-native approach
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should have a category "It's complicated" :P
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
imo native means the field type handling synthetic source without relying on ignored source. It says nothing about how it implements synthetic source.
I wonder if we should expand "modifications" section on this page and mention f.e. precision loss in |
Another question is if we should add section for synthetic source for all field types. It is not too useful in my opinion since it will just say "Synthetic source is supported by default.". |
|
||
There are a couple of restrictions to be aware of: | ||
<<synthetic-source-fields-native-list, Majority of field types>> natively support synthetic `_source`. Source for the field is constructed using existing data, most commonly <<doc-values,`doc_values`>>, meaning no additional space is used. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: s/Majority/ The majority/ ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe "Source for the field is usually reconstructed using doc values. However, sometimes stored fields are used, which can result in slight storage overhead."
Wanna mention something about objects (disabled, dynamic, ignore_*) and nested objects? Also, shall we add the highlight note as mentioned in the mtg? |
++ |
Yeah maybe use italics or smth to stress the |
<<synthetic-source-modifications,modifications>> compared to the original JSON. | ||
* Synthetic `_source` can be used with indices that contain only these field | ||
types: | ||
If a field type does not have native synthetic `_source` support, original data of this field is stored as is in the same way as the `_source` field in non-synthetic mode. Similarly, malformed values of fields that use <<ignore-malformed,`ignore_malformed`>> need to be stored as is, requiring additional storage space. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe add that the part of the _source is then stored in a special source field called _ignore_source
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we either explain somewhere the difference between native and non-native synthetic source...so that we can use the terminology later or we don't use such terminology and explain it differently like for instance:
"If a field type supports document reconstruction by means of doc values...otherwise, if doc values are not available the field value is stored as is..."
I am not in favor of one or the other...just saying that if we use "native synthetic source" we should explain it beforehand...or don't use it at all.
@@ -7,8 +7,7 @@ searchable), but it is stored so that it can be returned when executing | |||
_fetch_ requests, like <<docs-get,get>> or <<search-search,search>>. | |||
|
|||
If disk usage is important to you then have a look at | |||
<<synthetic-source,synthetic `_source`>> which shrinks disk usage at the cost of | |||
only supporting a subset of mappings and slower fetches or (not recommended) | |||
<<synthetic-source,synthetic `_source`>> which shrinks disk usage at the cost of slower fetches or (not recommended) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe rename at the cost of slower fetches or (not recommended)
into at the cost of slower access to _source, in get and search APIs
@kkrik-es Yes, i'll add a highlight separately. We are still accumulating highlights :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This info is super helpful! 🎉 Added some minor grammar/flow recommendations.
@@ -54,7 +54,7 @@ documents, the document `_id` is a hash of the document's dimensions and | |||
`@timestamp`. A TSDS doesn't support custom document `_id` values. | |||
|
|||
* A TSDS uses <<synthetic-source,synthetic `_source`>>, and as a result is | |||
subject to a number of <<synthetic-source-restrictions,restrictions>>. | |||
subject to <<synthetic-source-modifications,modifications>> applied to `_source` field. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
subject to <<synthetic-source-modifications,modifications>> applied to `_source` field. | |
subject to <<synthetic-source-modifications,modifications>> applied to the `_source` field. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
missing the
haunts me all my life, thank you
@@ -7,8 +7,7 @@ searchable), but it is stored so that it can be returned when executing | |||
_fetch_ requests, like <<docs-get,get>> or <<search-search,search>>. | |||
|
|||
If disk usage is important to you then have a look at |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If disk usage is important to you then have a look at | |
If disk usage is important to you, then consider the following options: |
<<synthetic-source,synthetic `_source`>> which shrinks disk usage at the cost of slower fetches or (not recommended) | ||
<<disable-source-field,disabling the `_source` field>> which also shrinks disk | ||
usage but disables many features. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
consider defining these briefly up here. if you don't want to define them here, I'd remove the downsides and keep the links only.
<<synthetic-source,synthetic `_source`>> which shrinks disk usage at the cost of slower fetches or (not recommended) | |
<<disable-source-field,disabling the `_source` field>> which also shrinks disk | |
usage but disables many features. | |
- Using <<synthetic-source,synthetic `_source`>>, which reconstructs source content at the time of retrieval instead of storing it on disk. This shrinks disk usage, at the cost of slower access to `_source` in <<docs-get,Get>> and <<search-search,Search>> queries. | |
- <<disable-source-field,Disabling the `_source` field completely>>. This shrinks disk | |
usage but disables features that rely on `_source`. |
|
||
There are a couple of restrictions to be aware of: | ||
<<synthetic-source-fields-native-list, Majority of field types>> natively support synthetic `_source`. Source for the field is constructed using existing data, most commonly <<doc-values,`doc_values`>>, meaning no additional space is used. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
<<synthetic-source-fields-native-list, Majority of field types>> natively support synthetic `_source`. Source for the field is constructed using existing data, most commonly <<doc-values,`doc_values`>>, meaning no additional space is used. | |
<<synthetic-source-fields-native-list,Most field types>> natively support synthetic `_source`. The source for natively supported fields is constructed using existing data, most commonly <<doc-values,`doc_values`>>, meaning that no additional space is used. |
@elasticmachine update branch |
Hi @lkts, I've created a changelog YAML for you. Note that since this PR is labelled |
|
||
[[synthetic-source-fields-native-list]] | ||
===== Field types that support synthetic source with no storage overhead | ||
Following field types support synthetic source using data from <<doc-values,`doc_values`>> and <<stored-fields, stored fields>> and require no additional storage space to construct `_source` field. Additional storage is required to store ignored field values produced as a result of using <<ignore-malformed,`ignore_malformed`>> or <<ignore-above,`ignore_above`>> parameters. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/Following/The following/
s/construct _source
field/construct the _source
field/
|
||
|
||
[[synthetic-source-fields-native-list]] | ||
===== Field types that support synthetic source with no storage overhead |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm on the fence on whether to report the fields that have additional indexing overhead when ignore_malformed
or ignore_above
are set. Wdyt?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's fair, i imagine normally you expect malformed values to be a small portion of your data.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, the copying parser always copies the current value preemptively, iiuc? This should increase overhead for all indexed docs, not just the ones with malformed values..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see what you mean. I don't know if it should be in this section but we can mention it somewhere. Let me follow up.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's wait with mentioning the copy parser overhead until we have a better understanding how much it actually is? I suspect it is minimal.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We're getting there.
Shall we call out that object
params are supported too? Esp disabled
is an interesting one, as it increases storage but works as expected.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, just some tiny changes. Thanks
Co-authored-by: Tyler Perkins <110836770+tylerperk@users.noreply.github.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
approved with a couple of small additional suggestions!
@@ -178,4 +153,38 @@ that ordering. | |||
|
|||
[[synthetic-source-modifications-ranges]] | |||
====== Representation of ranges | |||
Range field vales (e.g. `long_range`) are always represented as inclusive on both sides with bounds adjusted accordingly. See <<range-synthetic-source-inclusive, examples>>. | |||
Range field vales (e.g. `long_range`) are always represented as inclusive on both sides with bounds adjusted accordingly. See <<range-synthetic-source-inclusive, examples>>. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should this say values
?
Co-authored-by: shainaraskas <58563081+shainaraskas@users.noreply.github.com>
Co-authored-by: shainaraskas <58563081+shainaraskas@users.noreply.github.com>
Co-authored-by: shainaraskas <58563081+shainaraskas@users.noreply.github.com>
Co-authored-by: shainaraskas <58563081+shainaraskas@users.noreply.github.com>
Hi @lkts, I've updated the changelog YAML for you. Note that since this PR is labelled |
This reverts commit 32e80d3.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
This PR changes synthetic source documentation to reflect recent improvements.