Support spooling to disk #5441

axw · 2021-06-14T00:31:26Z

APM Server should support spooling events to disk, to avoid dropping data in the event that Elasticsearch is unavailable or temporarily overwhelmed.

Beats has a "disk queue", which is on its way to GA status: elastic/beats#22602. We have been waiting for this so that we can use it in APM Server.

We also have badger-based storage to support tail-based sampling. It would be a shame to have two different local storage mechanisms (not to mention storing data twice), so we may want to consider generalising this instead of using the libbeat disk queue.

axw · 2021-06-14T01:48:51Z

Some more thoughts on extending our use of badger:

In addition to having just one on-disk storage mechanism, it will also enable support for encryption at rest and compression. Naturally badger supports storing arbitrary []byte (as opposed to being beat.Event-oriented), which would aid with going in the direction of defining a more efficient codec for our events (#4120).

Another aspect to consider is when in the process we should be spooling to disk. If we use the libbeat disk queue, we're restricted to spooling when publishing the events (after processing and transformation). This means that if our model processors and transformation logic are not fast enough, we still risk dropping events. I don't think there is much risk of this at the moment, but that could change in the future.

The big question for me at the moment is whether it is (or can be reasonably made) fit-for-purpose. I think we would only use key lookup for tail-based sampling. For spooling we would ideally maintain a read position in the value log (~queue), advancing the position as we send the events to libbeat. We might be able to approximate this with badger.Stream.

stuartnelson3 · 2021-06-14T07:48:45Z

I've been curious about using prometheus' WAL, but if key lookup is a requirement for tail-based sampling then I believe it is a non-starter.

axw · 2021-06-14T08:32:18Z

@stuartnelson3 we definitely need key lookup for tail-based sampling, but it's not a strict requirement that we only store data once on disk - that's just a nice-to-have. So, worth considering.

stuartnelson3 · 2021-11-29T15:01:20Z

sqlite in WAL mode (or not in WAL mode) might also be of interest: https://www.sqlite.org/wal.html

ruflin · 2021-11-30T08:19:56Z

I think it would be very unfortunate if now that the libbeat disk queue went GA we add another queue to the system. @faec is working on some refactoring of the libbeat pipeline. Please sync up with her directly to see how the pieces can fit together before we add one more option.

axw · 2022-11-15T09:50:54Z

We are not planning to implement this any time soon. It's conceivable that we might make use of the Elastic Agent Shipper, which does have disk-based queuing built in. We'll open this back up if we plan to implement it.

axw added the enhancement label Jun 14, 2021

simitt mentioned this issue Jul 1, 2021

[meta] APM Server managed by Elastic Agent with Fleet (GA) #4636

Closed

16 tasks

zube bot added this to the 7.15 milestone Jul 1, 2021

zube bot added [zube]: Inbox v7.15.0 and removed [zube]: Inbox labels Jul 1, 2021

axw modified the milestones: 7.15, 7.16 Aug 4, 2021

axw added v7.16.0 and removed v7.15.0 labels Aug 4, 2021

zube bot added [zube]: Blocked and removed [zube]: Ready labels Aug 20, 2021

axw removed this from the 7.16 milestone Sep 23, 2021

axw added [zube]: Backlog and removed [zube]: Blocked v7.16.0 labels Sep 23, 2021

simitt removed the [zube]: Backlog label Dec 31, 2021

axw closed this as not planned Won't fix, can't repro, duplicate, stale Nov 15, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support spooling to disk #5441

Support spooling to disk #5441

axw commented Jun 14, 2021

axw commented Jun 14, 2021

stuartnelson3 commented Jun 14, 2021

axw commented Jun 14, 2021

stuartnelson3 commented Nov 29, 2021

ruflin commented Nov 30, 2021

axw commented Nov 15, 2022

Support spooling to disk #5441

Support spooling to disk #5441

Comments

axw commented Jun 14, 2021

axw commented Jun 14, 2021

stuartnelson3 commented Jun 14, 2021

axw commented Jun 14, 2021

stuartnelson3 commented Nov 29, 2021

ruflin commented Nov 30, 2021

axw commented Nov 15, 2022