Skip to content
This repository has been archived by the owner. It is now read-only.

testing: integration, system, and failure modeling #14

Open
peterbourgon opened this issue Jan 16, 2017 · 2 comments
Open

testing: integration, system, and failure modeling #14

peterbourgon opened this issue Jan 16, 2017 · 2 comments
Labels

Comments

@peterbourgon
Copy link
Member

@peterbourgon peterbourgon commented Jan 16, 2017

The failure modes are thought-through, but not empirically validated. We should build a basic testing harness for failure modeling, in the style of a simplified Jepsen. Also, it would be good to establish a means to verify overall system throughput (MBps) and latency (ingest-to-query).

@mcgourty
Copy link

@mcgourty mcgourty commented Jul 20, 2018

@peterbourgon could you quantify an estimate for max throughput and latency?

@peterbourgon
Copy link
Member Author

@peterbourgon peterbourgon commented Jul 20, 2018

OK Log system write throughput should be approximately equal to disk write capacity on the ingest nodes, and should scale linearly. That is, if you have 10 ingest nodes that can each do 250 MBps of sustained writes, and if you have a large collection of well-balanced producers connected to them, the overall write throughput as observed by the system should be 250 MBps x 10 = 2.5 GBps.

Latency is bimodal: in high-throughput environments, it's controlled by network speed; in low-throughput environments, it's approximately equal to ingest segment flush max age plus store segment target age. That is, if ingest flush max age is 3s, and store segment target age is 3s, then ingest-to-query should be approximately 6s (worst case).

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
2 participants
You can’t perform that action at this time.