-
Notifications
You must be signed in to change notification settings - Fork 153
Edit time series docs for clarity #3222
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. Weβll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
β¦docs-content into mw-tsds-final-countdown
manage-data/data-store/data-streams/time-series-data-stream-tsds.md
Outdated
Show resolved
Hide resolved
manage-data/data-store/data-streams/time-series-data-stream-tsds.md
Outdated
Show resolved
Hide resolved
manage-data/data-store/data-streams/time-series-data-stream-tsds.md
Outdated
Show resolved
Hide resolved
manage-data/data-store/data-streams/time-series-data-stream-tsds.md
Outdated
Show resolved
Hide resolved
manage-data/data-store/data-streams/time-series-data-stream-tsds.md
Outdated
Show resolved
Hide resolved
manage-data/data-store/data-streams/time-series-data-stream-tsds.md
Outdated
Show resolved
Hide resolved
manage-data/data-store/data-streams/time-series-data-stream-tsds.md
Outdated
Show resolved
Hide resolved
manage-data/data-store/data-streams/time-series-data-stream-tsds.md
Outdated
Show resolved
Hide resolved
manage-data/data-store/data-streams/time-series-data-stream-tsds.md
Outdated
Show resolved
Hide resolved
manage-data/data-store/data-streams/time-series-data-stream-tsds.md
Outdated
Show resolved
Hide resolved
manage-data/data-store/data-streams/time-series-data-stream-tsds.md
Outdated
Show resolved
Hide resolved
@kkrik-es @gmarouli Thanks for all your comments! I think I've resolved everything but please take another quick look. manage-data/data-store/data-streams/time-series-data-stream-tsds.md (or any other topics in the section, but those 3 are the main ones) |
Metrics differ from dimensions in that while dimensions generally remain constant, metrics are expected to change over time, even if rarely or slowly. | ||
:::{tip} | ||
Metrics are expected to change (even if rarely or slowly), while dimensions generally remain constant. | ||
::: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can probably remove this note. Dimension values may also change, e.g. as nodes join and leave a cloud deployment.
#### `_tsid` metadata field [tsid] | ||
|
||
[Pass-through](elasticsearch://reference/elasticsearch/mapping-reference/passthrough.md#passthrough-dimensions) fields may be configured as dimension containers. In this case, their sub-fields get included to the routing path automatically. | ||
The `_tsid` is an automatically generated object containing the documentβs dimensions. It's intended for internal {{es}} use, so in most cases you won't need to work with it. The format of the `_tsid` field is subject to change. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The `_tsid` is an automatically generated object containing the documentβs dimensions. It's intended for internal {{es}} use, so in most cases you won't need to work with it. The format of the `_tsid` field is subject to change. | |
The `_tsid` is an automatically generated object derived from the documentβs dimensions. It's intended for internal {{es}} use, so in most cases you won't need to work with it. The format of the `_tsid` field is subject to change. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"containing" is not accurate (it used to contain parts of the dimension values but not any more).. it's calculated using all dimension values per doc.
navigation_title: "Querying" | ||
products: | ||
- id: elasticsearch | ||
--- |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can probably include a reference to the TS
command here as tech preview.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Amazing, thanks!
|
||
A TSDS document is uniquely identified by its time series and timestamp, both of which are used to generate the document `_id`. So, two documents with the same dimensions and the same timestamp are considered to be duplicates. When you use the `_bulk` endpoint to add documents to a TSDS, a second document with the same timestamp and dimensions overwrites the first. When you use the `PUT /<target>/_create/<_id>` format to add an individual document and a document with the same `_id` already exists, an error is generated. | ||
:::{tip} | ||
{{es}} uses dimensions and timestamps to generate time series document `_id` values. Two documents with the same dimensions and timestamp are considered duplicates. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe this is a bit too much detail. But 409s (either due to actual duplicates or due to misconfiguration) are a common issue and are difficult to debug. So I think briefly mentioning the symptom could help. Alternatively, we could also add it to a section of common issues. This can definitely be a follow-up.
{{es}} uses dimensions and timestamps to generate time series document `_id` values. Two documents with the same dimensions and timestamp are considered duplicates. | |
{{es}} uses dimensions and timestamps to generate time series document `_id` values. Two documents with the same dimensions and timestamp are considered duplicates. Duplicates are rejected during ingestion with a `409 Conflict` status. |
- To define a metric, use the `time_series_metric` mapping parameter. For more details, refer to [Metrics](/manage-data/data-store/data-streams/time-series-data-stream-tsds.md#time-series-metric). | ||
- (Optional) Define a `date` or `date_nanos` mapping for the `@timestamp` field. If you don't specify a mapping, {{es}} maps `@timestamp` as a `date` field with default options. | ||
* (Optional) Other index settings, such as [`index.number_of_replicas`](elasticsearch://reference/elasticsearch/index-settings/index-modules.md#dynamic-index-number-of-replicas), for the data stream's backing indices. | ||
- A priority higher than `200`, to avoid [collisions](/manage-data/data-store/templates.md#avoid-index-pattern-collisions) with built-in templates. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we need to add here a lifecycle management section, at least for stateful, in serverless, this is enforced. I would recommend adding the following in the template
:
"lifecycle": {
"enabled": true
}
The main reason we need this is rollover, if a user doesn't add this they are going to end up with a gigantic index. Everything else is optional, we can leave it under the advanced set-up.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I second this - And I think that on the advanced setup we should at least explain why lifecycle management is useful for setting up rollover (quick one sentence to give them a reason on why the provided links are useful if they do not know about them)
--- | ||
|
||
# Reindex a TSDS [tsds-reindex] | ||
# Reindex a time series data stream [tsds-reindex] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kkrik-es if I understand correctly, this reindexing manual is suggesting to reindex the data of one data stream into a single backing index of another data stream. Right?
If this is true, then I think we need to add a disclaimer before a user gets unpleasantly surprised.
We could also mention the reindex data stream API that was added for upgrades, I will check if it works.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah not ideal.. It's orthogonal to this PR tho, let's file an issue to provide a better path (I thought we had one..)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should at least mention that the result will be a single index, like a note box or something, @marciw what do you think.
The rest is indeed orthogonal to this PR.
* One or more [metric fields](#time-series-metric) | ||
* An auto-generated document `_id` (custom `_id` values are not supported) | ||
* **Backing indices:** A TSDS uses [time-bound indices](/manage-data/data-store/data-streams/time-bound-tsds.md) to store data from the same time period in the same backing index. | ||
* **Dimension-based routing:** The routing logic uses dimension fields to map data to shards, improving storage efficiency and query performance. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, consider linking to the corresponding section in time-bound-tsds.md
.
* **Dimension-based routing:** The routing logic uses dimension fields to map data to shards, improving storage efficiency and query performance. | |
* **Dimension-based routing:** The routing logic uses dimension fields to map all data points of a time series to the same shard, improving storage efficiency and query performance, and ensuring that duplicate data points are rejected. |
``` | ||
|
||
Most time series data contains repeated values. Dimensions are repeated across documents in the same time series. The metric values of a time series may also change slowly over time. | ||
You can use the {{esql}} [`TS` command](elasticsearch://reference/query-languages/esql/commands/ts.md) (in technical preview) to query time series data streams. The `TS` command is optimized for time series data. It also enables the use of aggregation functions that efficiently process metrics per time series, before aggregating results. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe remove the (in technical preview). The TS docs also contain that and it's another thing we need to remember to keep in sync when TS goes beta or GA.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I kinda like the note here, so that people don't miss that it shouldn't be used in production yet. We'll hopefully update it when time comes as it's very prominent.
"routing_path": [ "metricset" ] | ||
} | ||
"index.mode": "time_series", | ||
"index.routing_path": ["dimension_field"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This setting gets auto-populated, let's remove it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you Marci, looks great! I have added a few minor comments for your consideration.
Small additional thing I noticed: The link in the "Create a data stream and add sample data" section of the quickstart is now broken as we have moved the accepted time range section to the Time-bound indices page
manage-data/data-store/data-streams/time-series-data-stream-tsds.md
Outdated
Show resolved
Hide resolved
manage-data/data-store/data-streams/time-series-data-stream-tsds.md
Outdated
Show resolved
Hide resolved
* The matching index template for a TSDS must contain the `index.routing_path` index setting. A TSDS uses this setting to perform [dimension-based routing](#dimension-based-routing). | ||
* A TSDS uses internal [index sorting](elasticsearch://reference/elasticsearch/index-settings/sorting.md) to order shard segments by `_tsid` and `@timestamp`. | ||
* TSDS documents only support auto-generated document `_id` values. For TSDS documents, the document `_id` is a hash of the documentβs dimensions and `@timestamp`. A TSDS doesnβt support custom document `_id` values. | ||
* A TSDS uses [synthetic `_source`](elasticsearch://reference/elasticsearch/mapping-reference/mapping-source-field.md#synthetic-source), and as a result is subject to some [restrictions](elasticsearch://reference/elasticsearch/mapping-reference/mapping-source-field.md#synthetic-source-restrictions) and [modifications](elasticsearch://reference/elasticsearch/mapping-reference/mapping-source-field.md#synthetic-source-modifications) applied to the `_source` field. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that we should still mention this - it could potentially cause problems to people not aware that synthetic source is enabled
- To define a metric, use the `time_series_metric` mapping parameter. For more details, refer to [Metrics](/manage-data/data-store/data-streams/time-series-data-stream-tsds.md#time-series-metric). | ||
- (Optional) Define a `date` or `date_nanos` mapping for the `@timestamp` field. If you don't specify a mapping, {{es}} maps `@timestamp` as a `date` field with default options. | ||
* (Optional) Other index settings, such as [`index.number_of_replicas`](elasticsearch://reference/elasticsearch/index-settings/index-modules.md#dynamic-index-number-of-replicas), for the data stream's backing indices. | ||
- A priority higher than `200`, to avoid [collisions](/manage-data/data-store/templates.md#avoid-index-pattern-collisions) with built-in templates. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I second this - And I think that on the advanced setup we should at least explain why lifecycle management is useful for setting up rollover (quick one sentence to give them a reason on why the provided links are useful if they do not know about them)
Co-authored-by: Yannis Roussos <yannis.roussos@elastic.co>
Co-authored-by: Yannis Roussos <yannis.roussos@elastic.co>
Adds docs for the new OTLP endpoint added via elastic/elasticsearch#133057 Closes #3363 --------- Co-authored-by: Fabrizio Ferri-Benedetti <fabri.ferribenedetti@elastic.co> Co-authored-by: Kostas Krikellas <131142368+kkrik-es@users.noreply.github.com>
Part of https://github.com/elastic/docs-team/issues/31?issue=elastic%7Cdocs-team%7C41
Status
π’ Ready for PM/engineer review
π§ Not ready for tech writer review
β Note for reviewers: We're going for "MVP" docs for now and tracking additional improvements in #3179
Changes
TODO: