Improve write performance for RDS #1457

Open
jcarres-mdsol opened this Issue Dec 28, 2016 · 5 comments

Projects

None yet

3 participants

@jcarres-mdsol
Contributor

I have a server using Mysql managed by Amazon's RDS

It is right now averaging 1200 IOPS and 7MB/s of writes.
This means the average IO is writing 6KB

The limit on how much data RDS can read/write is based on the number of IO. If for instance those writes were near the limit of 16KB instead of the current 6KB, the performance of the database will more than double.

I am guessing to accomplish this, the server would need to buffer before writing the traces to the DB which may not be an easy fix.

@jcarres-mdsol jcarres-mdsol changed the title from Improve write performance to Improve write performance for RDS Dec 28, 2016
@adriancole
Contributor

yeah it would be a buffer/pipeline issue. you could buffer more in instrumentation or introduce buffering on the server.

Ex in kafka it is safe to try to collect more because the backlog is persistent. I think in zipkin-aws there's another buffering layer used for SQS for a similar problem.

cc @llinder @mansu for thoughts.

@jcarres-mdsol
Contributor

So it seems zipkin server would benefit from having a buffer functionality which can be used for all inputs?

@adriancole
Contributor
@llinder
Member
llinder commented Jan 4, 2017

For SQS it buffers on the sender before writing to SQS. This helps to make use of the 256KB message cap that SQS imposes and reduces API calls. The SQS collector only reads as fast as the storage layer accepts writes. For our use case SQS is effectively an off memory buffer just as Kafka would be.

Looking at the MySQLSpanConsumer I don't see any fixed 6KB limiting logic. It might be possible to introduce some logic to dynamically adjust the batch size to some tunable value though.

If your stuck with MySQL for storage and your using HTTP as the transport layer I would probably consider augmenting writes with SQS or Kafka just to avoid spikes in traffic from overwhelming your storage layer.

Beyond tuning batch inserts for a specific storage component, I don't think there is much benefit in Zipkin server buffering anything since there are much better solutions such as SQS, Kafka or a more scalable storage layer.

@adriancole
Contributor
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment