New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Graphite: Revised TcpSender #4202
Conversation
…4195 Motivation: In 3.7, we added some `Int => Expression[FiniteDuration]` implicits that in some cases take over the expected `Any => Expression[Any]`. Modification: Drop those implicits and add lots for overrides for during-like loops. Result: No more conflicts. Breaking binary change, can only be released in 3.8.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks fro contributing!
Could you please:
- run
compile
to format your code and organize your imports - address comments
- sign our CLA: https://docs.google.com/forms/d/1hbpxVqJ5hIYYJuOjOyOzub9fJFGFGvvfnB5w5LCWdUw/edit
- rebase your branch on top of our main, so we don't have merge commits
Thanks!
case Event(GraphiteMetrics(bytes), data: ConnectedData) => | ||
buffer(bytes, data) match { | ||
case Success(data) => stay() using data | ||
case Failure(_) => goto(BufferOverflow) using NoData |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
case _
|
||
private def writeFirst(data: ConnectedData): Unit = { | ||
data.connection ! Write(data.storage(0), Ack(data.storageOffset)) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
single instruction => useless {}
for ((bytes, i) <- data.storage.zipWithIndex) { | ||
data.connection ! Write(bytes, Ack(data.storageOffset + i)) | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
single instruction => useless {}
|
||
private def acknowledge(ack: Int, data: ConnectedData): ConnectedData = { | ||
require(ack == data.storageOffset, s"Received wrong ack $ack at ${data.storageOffset}") | ||
require(data.storage.nonEmpty, s"Storage was empty at ack $ack") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What happens if this throws?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the actor will be restarted
@@ -47,7 +55,14 @@ private[graphite] class TcpSender( | |||
unstashAll() | |||
val connection = sender() | |||
connection ! Register(self) | |||
goto(Running) using ConnectedData(connection, failures) | |||
goto(Running) using ConnectedData(connection, failures, 0, Vector.empty[ByteString], 0L, false, 0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Vector.empty[ByteString] => Nil
|
||
private[sender] sealed trait TcpSenderData | ||
private[sender] case object NoData extends TcpSenderData | ||
private[sender] final case class DisconnectedData(retry: Retry) extends TcpSenderData | ||
private[sender] final case class ConnectedData(connection: ActorRef, retry: Retry) extends TcpSenderData | ||
private[sender] final case class ConnectedData(connection: ActorRef, retry: Retry, storageOffset: Int, storage: Vector[ByteString], stored: Long, suspended: Boolean, nack: Int) extends TcpSenderData |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why Vector?
data.connection ! Write(bytes, Ack(currentOffset(data))) | ||
buffer(bytes, data) match { | ||
case Success(data) => stay() using data | ||
case Failure(_) => goto(BufferOverflow) using NoData |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
case _ =>
logger.info(s"Sending metrics to Graphite server located at: $remote") | ||
data.connection ! Write(bytes, Ack(currentOffset(data))) | ||
buffer(bytes, data) match { | ||
case Success(data) => stay() using data |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rename so you're not shadowing the data
reference.
} | ||
// Connection actor failed to send metric, log it as a failure | ||
case Event(CommandFailed(Write(_, Ack(ack))), data: ConnectedData) => | ||
logger.info(s"Failed to write to Graphite server located at: $remote, buffering...") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't it be a warning?
goto(WaitingForConnection) using DisconnectedData(newFailures) | ||
when(Buffering)(event => { | ||
var toAck = 10 | ||
var peerClosed = false |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Those should be in ConnectedData
.
@vJoeyz Have you seen all the |
They appeared after running |
??? |
Yes. |
Do you use something like |
Not that I'm aware of, no. Anyways, I've removed them. :) |
Yeah, but your code is not properly formatted. It would if you were using Also, you commit history is broken because it doesn't only have your commits rebased on top of our main branch. |
@vJoeyz We'll release 3.7.5 next week. By then, do you thing you'll be able to fix your commit history and your formatting? |
I tried to fix formatting and wartremover on your branch, but still,
|
6b3802f
to
e6bb51e
Compare
84fab42
to
772f5a0
Compare
I've tested this fix and it works in my case. Before fix integration with graphite used to stop after ~24h. Please merge it to stable, thanks :) |
@biski There's no way to merge this work as is:
Contributions welcome |
ba5492e
to
91ec84b
Compare
Sorry for my late reply, the past half year my health got in the way. I'll try to fix it up this week. :) |
@vJoeyz Oh, sorry for you! I hope you're doing well now. |
Closing as idle for too long and can't be merged. |
Contents
This pull request contains the implementation of a revised
TcpSender
used for sending statistics to Graphite. This revised version implements NACK-based write back-pressure with suspending, providing more reliability and data conservation when sending statistics.Rationale
In the current version of the TcpSender, when a response isn't received within the
WritePeriod
(1 second by default) from Gatling's config (due to a slow db connection, resource problems etc.), the request is rejected, but the data is never buffered or resent. Furthermore, if this happens more than 5 times (as specified in the hardcodedmaxRetries
), the writing to Graphite halts completely, never to be resumed.I have tested this implementation in my load tests and it fully resolves all issues I had with the current TcpSender when recording large amounts of data to a database.
Credits
This implementation was heavily inspired by Akka's sample implementation.