-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add latency package #14
Conversation
Pull Request Test Coverage Report for Build 5357773285
💛 - Coveralls |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've added a few comments to start.
Reviewed 2 of 6 files at r1.
Reviewable status: 0 of 1 approvals obtained (waiting on @robertodauria)
internal/handler/handler.go
line 51 at r2 (raw file):
) type Handler struct {
This type is a handler.Handler
. I didn't notice this before. See: https://google.github.io/styleguide/go/decisions#package-vs-exported-symbol-name
Consider renaming this package to provide context for its contents, e.g. ndt8.Handler
.
This doesn't have to be in this PR.
internal/latency/latency.go
line 22 at r2 (raw file):
// Handler is the handler for latency tests. type Handler struct {
This is a latency.Handler
-- this is self-describing name, and clear in the context where it's used. 👍
internal/latency/latency.go
line 73 at r2 (raw file):
log.Debug("session created", "id", mid) rw.Write([]byte(mid))
What must the client do with this value? If they must format this into a LatencyPacket for the first request, what do you think of sending the content the client should include in the first packet? e.g. a complete serialized "c2s" LatencyPacket.
internal/latency/latency.go
line 90 at r2 (raw file):
return } // TODO: mfence?
Are there memory errors to debug? What is this TODO?
internal/latency/latency.go
line 125 at r2 (raw file):
} n, writeErr := conn.WriteTo(b, remoteAddr) if err != nil {
The error handling here and above is unclear. marshalErr
or writeErr
may be non-nil but we'll continue?
Do these mean only to save the current error value in err
?
internal/latency/latency.go
line 135 at r2 (raw file):
session.SendTimesMu.Lock() defer session.SendTimesMu.Unlock() session.SendTimes[seq] = time.Now()
I believe we want to record the timestamp before the call to conn.WriteTo
. We cannot control or quantify delays between write operations and this timestamp; this could be a source of unquantified error. For the round trip time, the latency will include the syscall overhead for the recv and the write. That's okay. I don't think the timestamp here is a reliable way to avoid it.
If you want timestamps that avoid syscall overhead, there may be a way using SO_TIMESTAMPNS
, but I don't have experience with this.
internal/latency/latency.go
line 141 at r2 (raw file):
log.Debug("packet sent", "len", n, "id", session.ID, "seq", seq) }, memoryless.Config{
This kind of scheduling limits the data to only network jitter calculations, and prevents application-level inter-arrival jitter calculations.
internal/latency/latency.go
line 189 at r2 (raw file):
if !session.Started { session.Started = true timeout, _ := context.WithTimeout(context.Background(), 5*time.Second)
Rather than ignore the cancel
, can the timeout be created within the sendLoop instead?
pkg/latency/model/result.go
line 45 at r2 (raw file):
// Packets is a list of every measurement received or sent as part of // this latency measurement. Packets []LatencyPacket
Can we eliminate the redundancy in the Type/ID fields for archival data? I can imagine an ArchivePacket or similar that includes only Seq / LastRTT fields. What do you think?
Stop memoryless.Run() by canceling the context.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 0 of 1 approvals obtained (waiting on @stephen-soltesz)
internal/latency/latency.go
line 125 at r2 (raw file):
Previously, stephen-soltesz (Stephen Soltesz) wrote…
The error handling here and above is unclear.
marshalErr
orwriteErr
may be non-nil but we'll continue?Do these mean only to save the current error value in
err
?
It wasn't doing what I wanted. I believe now it does. PTAL again?
internal/latency/latency.go
line 135 at r2 (raw file):
Previously, stephen-soltesz (Stephen Soltesz) wrote…
I believe we want to record the timestamp before the call to
conn.WriteTo
. We cannot control or quantify delays between write operations and this timestamp; this could be a source of unquantified error. For the round trip time, the latency will include the syscall overhead for the recv and the write. That's okay. I don't think the timestamp here is a reliable way to avoid it.If you want timestamps that avoid syscall overhead, there may be a way using
SO_TIMESTAMPNS
, but I don't have experience with this.
Done.
internal/latency/latency.go
line 141 at r2 (raw file):
Previously, stephen-soltesz (Stephen Soltesz) wrote…
This kind of scheduling limits the data to only network jitter calculations, and prevents application-level inter-arrival jitter calculations.
OTOH, memoryless has the property that you can see if there are cyclic network patterns with regard to round-trip time, while with fixed-interval scheduling you could align to the cycle and never notice them. Which property is more desirable for us?
internal/latency/latency.go
line 189 at r2 (raw file):
Previously, stephen-soltesz (Stephen Soltesz) wrote…
Rather than ignore the
cancel
, can the timeout be created within the sendLoop instead?
Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 0 of 1 approvals obtained (waiting on @stephen-soltesz)
internal/handler/handler.go
line 51 at r2 (raw file):
Previously, stephen-soltesz (Stephen Soltesz) wrote…
This type is a
handler.Handler
. I didn't notice this before. See: https://google.github.io/styleguide/go/decisions#package-vs-exported-symbol-nameConsider renaming this package to provide context for its contents, e.g.
ndt8.Handler
.This doesn't have to be in this PR.
Ack. That hurts to read. I'll open a separate PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 0 of 1 approvals obtained (waiting on @robertodauria)
internal/latency/latency.go
line 125 at r2 (raw file):
Previously, robertodauria (Roberto D'Auria) wrote…
It wasn't doing what I wanted. I believe now it does. PTAL again?
I still don't understand: why do we check err
here instead of writeErr
?
internal/latency/latency.go
line 135 at r2 (raw file):
Previously, robertodauria (Roberto D'Auria) wrote…
Done.
Should SendTimes
be updated only on success? Or maybe it doesn't matter b/c that value is only used once the echo is received again?
My original thought was to take the timestamp just before the write and update SendTimes
down here after everything else was successful.
internal/latency/latency.go
line 141 at r2 (raw file):
Previously, robertodauria (Roberto D'Auria) wrote…
OTOH, memoryless has the property that you can see if there are cyclic network patterns with regard to round-trip time, while with fixed-interval scheduling you could align to the cycle and never notice them. Which property is more desirable for us?
My desire is only that we document our choices and that our choices are fully considered. e.g. "We do not support application level inter arrival jitter calculations in favor of the potential for detecting cyclic network patterns". Or vice versa. Which property do you want to support or that this tool will be most applicable to?
internal/latency/latency.go
line 134 at r3 (raw file):
// include the ping packet's write time. This is intentional. session.SendTimesMu.Lock() defer session.SendTimesMu.Unlock()
Do we need to hold the lock open?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 0 of 1 approvals obtained (waiting on @stephen-soltesz)
internal/latency/latency.go
line 73 at r2 (raw file):
Previously, stephen-soltesz (Stephen Soltesz) wrote…
What must the client do with this value? If they must format this into a LatencyPacket for the first request, what do you think of sending the content the client should include in the first packet? e.g. a complete serialized "c2s" LatencyPacket.
Done. We now send a full kickoff packet instead of just the measurement id. The design document has been updated accordingly.
internal/latency/latency.go
line 90 at r2 (raw file):
Previously, stephen-soltesz (Stephen Soltesz) wrote…
Are there memory errors to debug? What is this TODO?
Done.
internal/latency/latency.go
line 125 at r2 (raw file):
Previously, stephen-soltesz (Stephen Soltesz) wrote…
I still don't understand: why do we check
err
here instead ofwriteErr
?
Done. That was a mistake.
internal/latency/latency.go
line 135 at r2 (raw file):
Previously, stephen-soltesz (Stephen Soltesz) wrote…
Should
SendTimes
be updated only on success? Or maybe it doesn't matter b/c that value is only used once the echo is received again?My original thought was to take the timestamp just before the write and update
SendTimes
down here after everything else was successful.
I don't think it matters since it's only used when a corresponding "pong" is received, as you observed. I moved the update to just after WriteTo
returned successfully in the interest of saving a mutex lock/unlock.
internal/latency/latency.go
line 141 at r2 (raw file):
Previously, stephen-soltesz (Stephen Soltesz) wrote…
My desire is only that we document our choices and that our choices are fully considered. e.g. "We do not support application level inter arrival jitter calculations in favor of the potential for detecting cyclic network patterns". Or vice versa. Which property do you want to support or that this tool will be most applicable to?
(discussed on VC) I have documented this on the design document under "Alternatives considered".
internal/latency/latency.go
line 134 at r3 (raw file):
Previously, stephen-soltesz (Stephen Soltesz) wrote…
Do we need to hold the lock open?
Done.
pkg/latency/model/result.go
line 45 at r2 (raw file):
Previously, stephen-soltesz (Stephen Soltesz) wrote…
Can we eliminate the redundancy in the Type/ID fields for archival data? I can imagine an ArchivePacket or similar that includes only Seq / LastRTT fields. What do you think?
Changed as discussed on VC.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 1 of 6 files at r1.
Reviewable status: 0 of 1 approvals obtained (waiting on @robertodauria)
internal/latency/latency.go
line 141 at r2 (raw file):
Previously, robertodauria (Roberto D'Auria) wrote…
(discussed on VC) I have documented this on the design document under "Alternatives considered".
Thank you. Please add a single sentence summary of the decision as a comment to motivate these expected delays.
internal/latency/latency.go
line 63 at r7 (raw file):
dataDir: dir, sessions: cache, sessionsMu: &sync.Mutex{},
nit: you're returning a *Handler, so the Handler sessionMu field could be a literal mutex rather than a pointer, and no mutex alloc is necessary. You do not have code that copies the Handler but depends on the mutex remaining the same? I suggest just changing the type of sessionsMu to sync.Mutex
, and removing this line.
Part of what got my attention is that the embedded mutex pattern is typical in Go. This isn't wrong, but it's extra steps, so maybe a little more complex than needed. Simple is good as often as we can find it. The whole will always be complex.
internal/latency/latency.go
line 103 at r7 (raw file):
_, err = rw.Write(b) if err != nil { rw.WriteHeader(http.StatusInternalServerError)
nit: any write prompts the http library to send a status OK header if no other header has been sent yet. So, the write error is either b/c it couldn't send even the "status ok" header or it happened after that and this header isn't meaningful / would be ignored.
I'm not really sure what to do affirmatively if there is a write error here. Logs could be too verbose, metrics may not be seen. metrics are probably the right way given the options I see.
internal/latency/latency.go
line 142 at r7 (raw file):
_, err = rw.Write(b) if err != nil { rw.WriteHeader(http.StatusInternalServerError)
Same comment as above about write errors for rw.
internal/latency/latency.go
line 239 at r7 (raw file):
defer session.SendTimesMu.Unlock() if m.Seq >= len(session.SendTimes) { return errorInvalidSeqN
A valid mid but invalid sequence number would be a very interesting error case.
internal/latency/latency.go
line 284 at r7 (raw file):
// The receive time should be recorded as soon as possible after // reading the packet, to improve accuracy. recvTime := time.Now()
Based on the comment, move this above the log statement?
internal/latency/latency.go
line 287 at r7 (raw file):
err = h.processPacket(conn, addr, buf[:n], recvTime) if err != nil { log.Debug("failed to process packet", "err", err)
Who is the intended audience for this log message? Would it help to include some conn metadata to make these messages include diagnostic hints, e.g. remote IP? It looks like the common cases will be corrupt json, invalid seq, or unauthorized. But, maybe we'll only rarely have debug enabled, but what will that person want to see?
internal/persistence/file.go
line 42 at r7 (raw file):
var filename string if subtest != "" { filename = datatype + "-" + subtest + "-" +
nit: is there a subtest name for latency that can preserve the previous structure? e.g. "network" for the network latency definition you added to the design, and leave as an option for other definitions in the future? e.g. "application" or other?
I'm just trying to make this goop simpler :)
pkg/latency/model/result.go
line 45 at r2 (raw file):
Previously, robertodauria (Roberto D'Auria) wrote…
Changed as discussed on VC.
This is much better.
pkg/latency/model/result.go
line 137 at r7 (raw file):
ID: s.ID, GitShortCommit: prometheusx.GitShortCommit, Version: "",
Add TODO?
pkg/latency/model/result.go
line 143 at r7 (raw file):
Results: s.Results, PacketsSent: len(s.SendTimes), PacketsReceived: len(s.Results),
Is this correct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 0 of 1 approvals obtained (waiting on @stephen-soltesz)
internal/latency/latency.go
line 141 at r2 (raw file):
Previously, stephen-soltesz (Stephen Soltesz) wrote…
Thank you. Please add a single sentence summary of the decision as a comment to motivate these expected delays.
Done.
internal/latency/latency.go
line 103 at r7 (raw file):
Previously, stephen-soltesz (Stephen Soltesz) wrote…
nit: any write prompts the http library to send a status OK header if no other header has been sent yet. So, the write error is either b/c it couldn't send even the "status ok" header or it happened after that and this header isn't meaningful / would be ignored.
I'm not really sure what to do affirmatively if there is a write error here. Logs could be too verbose, metrics may not be seen. metrics are probably the right way given the options I see.
Added a TODO (will add Prometheus metrics in a future PR).
internal/latency/latency.go
line 142 at r7 (raw file):
Previously, stephen-soltesz (Stephen Soltesz) wrote…
Same comment as above about write errors for rw.
Added a TODO.
internal/latency/latency.go
line 239 at r7 (raw file):
Previously, stephen-soltesz (Stephen Soltesz) wrote…
A valid mid but invalid sequence number would be a very interesting error case.
Done. I added an INFO log and a TODO to add a Prometheus metric. Not completely sure about the log level, which assumes we'll run this code in production with an INFO minium log level.
internal/latency/latency.go
line 284 at r7 (raw file):
Previously, stephen-soltesz (Stephen Soltesz) wrote…
Based on the comment, move this above the log statement?
Done.
internal/latency/latency.go
line 287 at r7 (raw file):
Previously, stephen-soltesz (Stephen Soltesz) wrote…
Who is the intended audience for this log message? Would it help to include some conn metadata to make these messages include diagnostic hints, e.g. remote IP? It looks like the common cases will be corrupt json, invalid seq, or unauthorized. But, maybe we'll only rarely have debug enabled, but what will that person want to see?
The intended audience is a developer running this manually with loglevel=DEBUG on their local machine. Since the unconditional DEBUG message just before this one already includes the address (and the packet's payload) I did not want to repeat the same info twice. I added the remote address.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 0 of 1 approvals obtained (waiting on @stephen-soltesz)
pkg/latency/model/result.go
line 137 at r7 (raw file):
Previously, stephen-soltesz (Stephen Soltesz) wrote…
Add TODO?
Done.
pkg/latency/model/result.go
line 143 at r7 (raw file):
Previously, stephen-soltesz (Stephen Soltesz) wrote…
Is this correct?
Done. (It was not.)
internal/persistence/file.go
line 42 at r7 (raw file):
Previously, stephen-soltesz (Stephen Soltesz) wrote…
nit: is there a subtest name for latency that can preserve the previous structure? e.g. "network" for the network latency definition you added to the design, and leave as an option for other definitions in the future? e.g. "application" or other?
I'm just trying to make this goop simpler :)
That works. Reverted to the original file.go and used "network" as the subtest name.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 2 of 3 files at r10, 2 of 2 files at r11, all commit messages.
Reviewable status: 0 of 1 approvals obtained (waiting on @robertodauria)
cmd/msak-latency/main.go
line 1 at r11 (raw file):
package main
Please stop adding new files to an already large PR.
Large PRs are slower to review, more likely to let bugs through, and generally lower the quality of the end result.
internal/latency/latency.go
line 103 at r7 (raw file):
Previously, robertodauria (Roberto D'Auria) wrote…
Added a TODO (will add Prometheus metrics in a future PR).
Please ensure the TODO includes removing the extraneous writes to rw
also.
internal/latency/latency.go
line 287 at r7 (raw file):
Previously, robertodauria (Roberto D'Auria) wrote…
The intended audience is a developer running this manually with loglevel=DEBUG on their local machine. Since the unconditional DEBUG message just before this one already includes the address (and the packet's payload) I did not want to repeat the same info twice. I added the remote address.
Okay. Please format the addr the same way in both log lines. Currently one is "addr=addr" in the second it's "src=addr". My intention is to help a future developer line up separate messages, possibly among a flood of messages to find the two they want easily. This is hard won instinct!
internal/latency/latency.go
line 52 at r11 (raw file):
archive := i.Value().Archive() archive.EndTime = time.Now() _, err := persistence.WriteDataFile(dir, "latency", "network", archive.ID, archive)
I know I suggested "network" - since we are discussing "transport throughput" and "application throughput" - how about calling this "transport" for "transport latency", as distinct from a future test that could be "application latency"?
pkg/latency/model/result.go
line 58 at r11 (raw file):
} type RTT struct {
Rather than RTT.RTT
how about RoundTrip.RTT
?
Then the fields currently named Results
can be RoundTrips []RoundTrip
.
What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 0 of 1 approvals obtained (waiting on @robertodauria)
pkg/latency/model/result.go
line 58 at r11 (raw file):
Previously, stephen-soltesz (Stephen Soltesz) wrote…
Rather than
RTT.RTT
how aboutRoundTrip.RTT
?Then the fields currently named
Results
can beRoundTrips []RoundTrip
.What do you think?
Or more literally, Packet.RTT
and Packet.Lost
?
Packets []Packet
pkg/latency/model/result.go
line 132 at r11 (raw file):
Started: false, StartedMu: &sync.Mutex{},
I agree with what you said in chat - these do not need to be pointers either.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 0 of 1 approvals obtained (waiting on @stephen-soltesz)
internal/latency/latency.go
line 103 at r7 (raw file):
Previously, stephen-soltesz (Stephen Soltesz) wrote…
Please ensure the TODO includes removing the extraneous writes to
rw
also.
Just to make sure I understand, you're suggesting to not even try changing the status code on a write error? If so, I can just do it in this PR.
internal/latency/latency.go
line 52 at r11 (raw file):
Previously, stephen-soltesz (Stephen Soltesz) wrote…
I know I suggested "network" - since we are discussing "transport throughput" and "application throughput" - how about calling this "transport" for "transport latency", as distinct from a future test that could be "application latency"?
That works.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 0 of 1 approvals obtained (waiting on @stephen-soltesz)
internal/latency/latency.go
line 287 at r7 (raw file):
Previously, stephen-soltesz (Stephen Soltesz) wrote…
Okay. Please format the addr the same way in both log lines. Currently one is "addr=addr" in the second it's "src=addr". My intention is to help a future developer line up separate messages, possibly among a flood of messages to find the two they want easily. This is hard won instinct!
Done.
pkg/latency/model/result.go
line 58 at r11 (raw file):
Previously, stephen-soltesz (Stephen Soltesz) wrote…
Or more literally,
Packet.RTT
andPacket.Lost
?
Packets []Packet
It could be confused with LatencyPacket which is the one sent over the wire. Maybe I like RoundTrips
more...
pkg/latency/model/result.go
line 132 at r11 (raw file):
Previously, stephen-soltesz (Stephen Soltesz) wrote…
I agree with what you said in chat - these do not need to be pointers either.
Done. They aren't anymore.
cmd/msak-latency/main.go
line 1 at r11 (raw file):
Previously, stephen-soltesz (Stephen Soltesz) wrote…
Please stop adding new files to an already large PR.
Large PRs are slower to review, more likely to let bugs through, and generally lower the quality of the end result.
It was meant for a separate PR, I added it by mistake.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 0 of 1 approvals obtained (waiting on @stephen-soltesz)
pkg/latency/model/result.go
line 58 at r11 (raw file):
Previously, robertodauria (Roberto D'Auria) wrote…
It could be confused with LatencyPacket which is the one sent over the wire. Maybe I like
RoundTrips
more...
Changed to RoundTrips
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 0 of 1 approvals obtained (waiting on @robertodauria)
internal/latency/latency.go
line 103 at r7 (raw file):
Previously, robertodauria (Roberto D'Auria) wrote…
Just to make sure I understand, you're suggesting to not even try changing the status code on a write error? If so, I can just do it in this PR.
Yes, if the write fails writing again will also fail for one of two possible reasons.
internal/latency/latency.go
line 52 at r11 (raw file):
Previously, robertodauria (Roberto D'Auria) wrote…
That works.
We discussed on VC that "transport" is not correct actually - but "application" is.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 1 of 1 approvals obtained (waiting on @robertodauria)
This PR introduces the latency package, implementing the UDP latency test.
This change is