Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use WAL to gather samples for remote write #4588

Merged
merged 3 commits into from Feb 12, 2019

Conversation

Projects
None yet
4 participants
@cstyan
Copy link
Contributor

commented Sep 7, 2018

Example of WAL tailing. Still need to do something with what we read out of the WAL.

cc @csmarchbanks @tomwilkie

@cstyan cstyan force-pushed the cstyan:callum-tail-wal branch 2 times, most recently Sep 7, 2018

Show resolved Hide resolved documentation/examples/tail-wal/main.go Outdated
Show resolved Hide resolved documentation/examples/tail-wal/main.go Outdated
@cstyan

This comment has been minimized.

Copy link
Contributor Author

commented Sep 12, 2018

@csmarchbanks @tomwilkie please see the latest commit, it's still an ugly hack but it's working ™️

to test I installed influx and used the remote storage example, only issue seems to be some tsdb metrics level=debug ts=2018-09-12T22:11:50.929937835Z caller=client.go:88 storage=InfluxDB msg="cannot send to InfluxDB, skipping sample" value=NaN sample="prometheus_tsdb_wal_truncate_duration_seconds{instance=\"localhost:9090\", job=\"prometheus\", quantile=\"0.99\"} => NaN @[1536790310.871]"

Show resolved Hide resolved storage/remote/storage.go Outdated
Show resolved Hide resolved storage/remote/storage.go Outdated
Show resolved Hide resolved storage/remote/storage.go Outdated
Show resolved Hide resolved storage/remote/storage.go Outdated

@cstyan cstyan force-pushed the cstyan:callum-tail-wal branch 3 times, most recently Sep 17, 2018

@cstyan

This comment has been minimized.

Copy link
Contributor Author

commented Sep 19, 2018

@tomwilkie could use a sanity check review here. Also @csmarchbanks if you're interested :)

Some thoughts:

  • The way I'm setting up and starting the go routine that's reading from the WAL feels wrong. One idea is that main could start remote storage and pass it a channel. Remote storage would start the routine immediately, and it would wait on this channel to recv a string containing the path to the WAL dir, then proceed as it does now.
  • If we need to obey the Appender interface we could take in a second channel that sends data we read from the WAL to a go routine(s) that just does writes. Or potentially relabelling and writes. Similar to the way queues are setup now. Except rather than Appender Add() being called as a result of scraping, we would call it in decodeSegment or somewhere around there.
  • I'm still not sure how to handle/if we want to read the WAL segments via some type of pointer into the WAL, or periodically polling the current segment, rather than the current setup of using fsnotify. I think we still want to use fsnotify to get file creation events.
  • Relabeling seems expensive. Maybe the global config external labels bit is applied to samples in tsdb on ingestion and so we don't need to do it here? Still need to investigate here.
@csmarchbanks

This comment has been minimized.

Copy link
Contributor

commented Sep 19, 2018

No real opinions on the first two points yet.

I'm still not sure how to handle/if we want to read the WAL segments via some type of pointer into the WAL, or periodically polling the current segment, rather than the current setup of using fsnotify. I think we still want to use fsnotify to get file creation events.

I think some type of pointer will be necessary for when a remote write endpoint goes down. We will need to not move forward in the WAL for that remote write until it comes back up. fsnotify could then be used to keep track of if there are new segments for the pointer to move to when it finishes reading the current segment. Also, if the current segment is deleted the pointer would have to be moved to the next segment.

Relabeling seems expensive. Maybe the global config external labels bit is applied to samples in tsdb on ingestion and so we don't need to do it here? Still need to investigate here.

Isn't relabeling done on every sample now? So it shouldn't be any more expensive than the current implementation? Since you can specify relabel configs for each remote write I don't know how you can avoid it.

@cstyan

This comment has been minimized.

Copy link
Contributor Author

commented Sep 19, 2018

I think some type of pointer will be necessary for when a remote write endpoint goes down. We will need to not move forward in the WAL for that remote write until it comes back up. fsnotify could then be used to keep track of if there are new segments for the pointer to move to when it finishes reading the current segment. Also, if the current segment is deleted the pointer would have to be moved to the next segment

Right, I keep forgetting that there can be multiple remote write URL's.

Isn't relabeling done on every sample now? So it shouldn't be any more expensive than the current implementation? Since you can specify relabel configs for each remote write I don't know how you can avoid it.

I just want to confirm whether or not the external labels are applied on sample ingestion into TSDB. If they are, we don't need to attempt to apply them again when reading out of the tsdb. See: https://github.com/prometheus/prometheus/pull/4588/files#diff-4a762c727e88ccd4b3454b9312afcccdR46

@brian-brazil

This comment has been minimized.

Copy link
Member

commented Sep 19, 2018

I just want to confirm whether or not the external labels are applied on sample ingestion into TSDB.

They aren't. They're added as the samples are sent out by remote_write, before write_relabel_configs.

@csmarchbanks

This comment has been minimized.

Copy link
Contributor

commented Sep 19, 2018

The way I'm setting up and starting the go routine that's reading from the WAL feels wrong.

I don't think main should start the wal watcher. That will need to be inside the remote write code, and done for each remote write config (like how we currently have a QueueManager for each).

What would be the reason for delaying starting the wal watcher using a channel? Seems like it would actually be good to have the wal reader read to the end of the wal (without sending any of the samples) before scraping starts to avoid sending duplicate samples?

@cstyan

This comment has been minimized.

Copy link
Contributor Author

commented Sep 19, 2018

I don't think main should start the wal watcher. That will need to be inside the remote write code, and done for each remote write config (like how we currently have a QueueManager for each).

Agreed.

What would be the reason for delaying starting the wal watcher using a channel? Seems like it would actually be good to have the wal reader read to the end of the wal (without sending any of the samples) before scraping starts to avoid sending duplicate samples?

We need to know the location of the tsdb/wal, which isn't necessarily set at the time that remote storage is created.

@cstyan cstyan force-pushed the cstyan:callum-tail-wal branch Sep 20, 2018

@cstyan cstyan changed the title WIP: Add WAL tailing example. WIP: use WAL to gather samples for remote write Sep 24, 2018

@cstyan cstyan force-pushed the cstyan:callum-tail-wal branch Sep 26, 2018

@cstyan

This comment has been minimized.

Copy link
Contributor Author

commented Sep 26, 2018

Some updates @csmarchbanks @tomwilkie we now do relabeling once per series per segment during storeSeries, and we store the series labels as prompb.Label rather than tsdb or pkg Label so that we don't need to recreate the labels as those types on every call to storeSamples.

Regarding the reads; the segment reader's Next() function will ensure we always get an entire record (something we can interpret as a metric with some value and timestamp) or an error.

Rather than trying to find a way to have some kind of pointer into the WAL per remote write config, we could keep the existing structure of a goroutine per config, plus an additional routine that does the actual sending. With the addition of a buffered queue, we could limit the amount of samples that could be read from the WAL and stored, waiting to be sent, at a time. We simply check the channel length to see if it's full and if so, skip reading within the existing select statement.

@cstyan cstyan force-pushed the cstyan:callum-tail-wal branch Oct 2, 2018

@cstyan

This comment has been minimized.

Copy link
Contributor Author

commented Oct 2, 2018

Latest commit made some changes to the proto definitions, made it so that certain fields can't be null which results in []type rather than []*type and therefore less object allocations.

thanks to @bwplotka for this PR: #4579

@cstyan

This comment has been minimized.

Copy link
Contributor Author

commented Oct 2, 2018

pretty significant refactor in the latest commit

  • there is now a single WAL watcher rather than one per remote write config
  • make use of Tom's original work (queue manager) to do parallelization, move things like the cache of series labels from WAL watcher to the queue managers
  • the WAL reading code was failing regularly under heavy (for my laptop) load if we got anything other than a tsdb series or sample record, we now handle the rest and seem to decode invalid records quite often
Show resolved Hide resolved storage/remote/wal_watcher.go Outdated
Show resolved Hide resolved prompb/remote.proto Outdated
Show resolved Hide resolved storage/remote/client_test.go Outdated
Show resolved Hide resolved storage/remote/codec.go Outdated
Show resolved Hide resolved storage/remote/codec.go Outdated
Show resolved Hide resolved storage/remote/queue_manager.go Outdated
Show resolved Hide resolved vendor/github.com/prometheus/tsdb/wal/wal.go Outdated

@cstyan cstyan force-pushed the cstyan:callum-tail-wal branch Dec 7, 2018

@cstyan cstyan referenced this pull request Dec 18, 2018

Merged

add live reader for WAL #481

@cstyan cstyan force-pushed the cstyan:callum-tail-wal branch Jan 8, 2019

@cstyan cstyan force-pushed the cstyan:callum-tail-wal branch 5 times, most recently Jan 15, 2019

@tomwilkie tomwilkie force-pushed the cstyan:callum-tail-wal branch 3 times, most recently Jan 18, 2019

@tomwilkie tomwilkie changed the title WIP: use WAL to gather samples for remote write Use WAL to gather samples for remote write Jan 18, 2019

Show resolved Hide resolved storage/remote/wal_watcher.go Outdated
Show resolved Hide resolved storage/remote/wal_watcher.go Outdated
Show resolved Hide resolved storage/remote/wal_watcher.go Outdated

@cstyan cstyan force-pushed the cstyan:callum-tail-wal branch Jan 19, 2019

@cstyan cstyan force-pushed the cstyan:callum-tail-wal branch Feb 6, 2019

@tomwilkie

This comment has been minimized.

Copy link
Member

commented Feb 12, 2019

Done a final pass on this - LGTM!

@tomwilkie

This comment has been minimized.

Copy link
Member

commented Feb 12, 2019

(Going to manually squash some of the changes so we don't spam the commit log)

cstyan and others added some commits Sep 7, 2018

Tail the TSDB WAL for remote_write
This change switches the remote_write API to use the TSDB WAL.  This should reduce memory usage and prevent sample loss when the remote end point is down.

We use the new LiveReader from TSDB to tail WAL segments.  Logic for finding the tracking segment is included in this PR.  The WAL is tailed once for each remote_write endpoint specified. Reading from the segment is based on a ticker rather than relying on fsnotify write events, which were found to be complicated and unreliable in early prototypes.

Enqueuing a sample for sending via remote_write can now block, to provide back pressure.  Queues are still required to acheive parallelism and batching.  We have updated the queue config based on new defaults for queue capacity and pending samples values - much smaller values are now possible.  The remote_write resharding code has been updated to prevent deadlocks, and extra tests have been added for these cases.

As part of this change, we attempt to guarantee that samples are not lost; however this initial version doesn't guarantee this across Prometheus restarts or non-retryable errors from the remote end (eg 400s).

This changes also includes the following optimisations:
- only marshal the proto request once, not once per retry
- maintain a single copy of the labels for given series to reduce GC pressure

Other minor tweaks:
- only reshard if we've also successfully sent recently
- add pending samples, latest sent timestamp, WAL events processed metrics

Co-authored-by: Chris Marchbanks <csmarchbanks.com> (initial prototype)
Co-authored-by: Tom Wilkie <tom.wilkie@gmail.com> (sharding changes)
Signed-off-by: Callum Styan <callumstyan@gmail.com>
Various fixes to locking & shutdown for WAL-based remote write.
- Remove datarace in the exported highest scrape timestamp.
- Backoff on enqueue should be per-sample - reset the result for each sample.
- Remove diffKeys, unused ctx and cancelfunc in WALWatcher, 'name' from writeTo interface, and pass it to constructor.
- Reorder functions in WALWatcher depth-first according to call graph.
- Fix vendor/modules.txt.
- Split out the various timer periods into consts at the top of the file.
- Move w.currentSegmentMetric.Set close to where we set the currentSegment.
- Combine r.Next() and isClosed(w.quit) into a single loop.
- Unnest some ifs in WALWatcher.watch, propagate erros in decodeRecord, add some new lines to make it easier to read.
- Reorganise checkpoint handling to reduce nesting and make it easier to follow.

Signed-off-by: Tom Wilkie <tom.wilkie@gmail.com>
Various improvements to WAL based remote write.
- Use the queue name in WAL watcher logging.
- Don't return from watch if the reader error was EOF.
- Fix sample timestamp check logic regarding what samples we send.
- Refactor so we don't need readToEnd/readSeriesRecords
- Fix wal_watcher tests since readToEnd no longer exists

Signed-off-by: Callum Styan <callumstyan@gmail.com>

@tomwilkie tomwilkie force-pushed the cstyan:callum-tail-wal branch to 65405dc Feb 12, 2019

@tomwilkie tomwilkie merged commit 37e35f9 into prometheus:master Feb 12, 2019

2 of 3 checks passed

ci/circleci: build CircleCI is running your tests
Details
ci/circleci: test Your tests passed on CircleCI!
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.