Skip to content

Conversation

bwplotka
Copy link
Member

@bwplotka bwplotka commented Jul 5, 2024

Co-authored-by: Callum Styan <callumstyan@gmail.com>
Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>
* Added summary on top.
* Be precise on 2xx / 204 status codes.
* Simplified response header section
* Removed eventual durability case, with "accepted" wording that's obvious.
* Grammarly

Signed-off-by: bwplotka <bwplotka@gmail.com>
@bwplotka
Copy link
Member Author

bwplotka commented Jul 8, 2024

Ok, with new wording suggested by @cstyan I was able to simplify the section plus add clear example. Also updated 200 section (we often use 204 No content HTTP status instead, which makes more sense).

Changes:

  • Added summary on top.
  • Be precise on 2xx / 204 status codes.
  • Simplified response header section
  • Removed eventual durability case, with "accepted" wording that's obvious.
  • Grammarly

PTAL @cstyan @npazosmendez @alexgreenbank

Samples OR Histogram Samples not written = error

Signed-off-by: bwplotka <bwplotka@gmail.com>
@bwplotka
Copy link
Member Author

bwplotka commented Jul 9, 2024

See Receiver prometheus/prometheus#14427 and Sender prometheus/prometheus#14444 implementation with those.

Generally it allows Sender to catch common bug AND calculate correct partial error metrics.

Signed-off-by: bwplotka <bwplotka@gmail.com>
Signed-off-by: bwplotka <bwplotka@gmail.com>
@bwplotka
Copy link
Member Author

Rdy for final call!

* Broken 1.0 Receiver implementations: Senders SHOULD assume [415 HTTP Unsupported Media Type](https://www.rfc-editor.org/rfc/rfc9110.html#name-415-unsupported-media-type) status code when sending the data using `io.prometheus.write.v2.Request` request and receiving 2xx HTTP status code, but none of the `X-Prometheus-Remote-Write-Written-*` response headers from the Receiver. This is a common issue for the 1.0 Receivers that do not check the `Content-Type` request header; accidental decoding of the `io.prometheus.write.v2.Request` payload with `prometheus.WriteRequest` schema results in empty result and no decoding errors.
* Unknown broken implementations or issues: Senders MAY use those headers to detect broken Sender and Receiver implementations or other problems.

On the 2xx HTTP status code, Senders CAN assume that a missing write response header means that no such a piece of data was written by the Receiver (count of `0`). value. One exception is that Senders MUST NOT assume the same (empty header means zero) when using the deprecated `prometheus.WriteRequest` Protobuf Message in the request.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
On the 2xx HTTP status code, Senders CAN assume that a missing write response header means that no such a piece of data was written by the Receiver (count of `0`). value. One exception is that Senders MUST NOT assume the same (empty header means zero) when using the deprecated `prometheus.WriteRequest` Protobuf Message in the request.
On the 2xx HTTP status code, Senders CAN assume that a missing write response header means that no such a piece of data was written by the Receiver (count of `0`). One exception is that Senders MUST NOT assume the same (empty header means zero) when using the deprecated `prometheus.WriteRequest` Protobuf Message in the request.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we can rephrase this for clarity. Something like:

Upon receiving a 2xx HTTP status code with missing write response headers:

  • If the Sender used the io.prometheus.write.v2.Request schema to encode the request, it CAN assume that no data was written by the Receiver (count of 0).
  • If the Sender used a deprecated prometheus.WriteRequest Protobuf Message, it MUST NOT assume that no data was written by the Receiver (count of 0).

In the second case, should Sender assume the data was written correctly? (I might be missing some context here).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

So first case is broader - as there might be more messages in future, but I will check how we can improve.

In the second case, should Sender assume the data was written correctly? (I might be missing some context here).

Correct, there is no way to tell UNLESS you see the header with values for 1.0 message (even if 2.0 receiver was used, we don't know that). I will add example perhaps - might help

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, PTAL @thampiotr

bwplotka and others added 4 commits July 11, 2024 11:38
Co-authored-by: Piotr <17101802+thampiotr@users.noreply.github.com>
Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>
Co-authored-by: Piotr <17101802+thampiotr@users.noreply.github.com>
Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>
Signed-off-by: bwplotka <bwplotka@gmail.com>
Signed-off-by: bwplotka <bwplotka@gmail.com>
Copy link
Contributor

@thampiotr thampiotr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Member

@cstyan cstyan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. I still think the accepted wording was okay, it was really ingest/ingested we needed to remove imo since that implied some kind of persistence which rules out some receivers or at least significantly increases the response time on writes.

@bwplotka
Copy link
Member Author

bwplotka commented Jul 12, 2024

Thanks @cstyan

I want to make sure you are happy with final wording decision too (: We could switch back to Accepted but:

  • We don't use it anywhere, it has unknown definition. What definition of it you would put? Would it be the same or different to Write?
  • We ruled out Receive word [PRW 2.0] rc.2 (breaking change): Added required response headers for durability. #2486 (comment) from the opposite reason to ingest: We don't want to rule out ingesting receivers and with ingest we would rule only validating receivers. It feels accept has the same collocation as receive here - for the receivers that actually ingest it does not give the feeling you should bump this number ONLY if ingested perhaps?

WDYT?

@cstyan
Copy link
Member

cstyan commented Jul 12, 2024

I want to make sure you are happy with final wording decision too (: We could switch back to Accepted but:

I'm not unhappy, but I do think Accepted reads nicer than X-Prometheus-Remote-Write-Written-X. But the point that we have to define Accepted is a valid one

  • We don't use it anywhere, it has unknown definition. What definition of it you would put? Would it be the same or different to Write?

What about:

Accepted, and a 200 response, does not strictly require that data has already been persisted to long term storage and is queryable, but rather implies that the data has been accepted in some form and will be persisted. This could mean the data has been written to a WAL or queuing system with delivery guarantees such as Kafka. Receiver implementations should be careful, users will expect that any piece of data that is "Accepted" will be persisted to durable storage and not lost.

And then...

Damn, now I'm just contradicting myself over the course of multiple comments, including this one I am writing right now.

I think Ingested implies too much that we can't enforce, and Write-Written doesn't read nicely, Received or Accepted is good with some version of the description from earlier in this comment.

@bwplotka
Copy link
Member Author

Thanks! I agree Received == Accepted, but the reason we rule out receive fits accepted IMO, so it implies it's NEVER durable write (in practice, it depends on the receiver). Write is universal, it's whatever remote write is for, so best. Ingest is too in the other extreme.

Agree it does not read well.

What about we imply "write" and have 'X-Prometheus-Remote-Written-Samples'? (:

@bwplotka
Copy link
Member Author

The more I think about it, the more I like accepted/received like we had and really focus on this NEVER being durable write count, and always about initial validation/how many samples not rejected.. but then we need rewrite this section a bit for that and perhaps never send them on internal errors (non recoverable?). Tricky.

description of what `Written` means in the context of the RW2.0 spec.

Signed-off-by: Callum Styan <callumstyan@gmail.com>
@bwplotka
Copy link
Member Author

We switched to X-Prometheus-Remote-Write-*-Written form. Naming is hard, but this should work great. Thanks @cstyan for last updates 💪🏽 Let's go!

@bwplotka bwplotka merged commit 206136b into main Jul 19, 2024
5 checks passed
@bwplotka bwplotka deleted the rw20-empty-req branch July 19, 2024 16:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[PRW 2.0 spec] Sample Write Confirmation in Response
3 participants