-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Round of Changes #2
Conversation
In most cases the metrics being collected are not actually timings, but instead are counters. There are, however, some things that *are* actually "timers" which we need to address in a future commit. For now, we still submit the counters, as timers, to statsd to maintain the statsd aggregation facilities that are useful for per request stat tracking. (Tests are probably failing after this commit)
RackReporter reports only counters (as timers), which are useful in a per request context (e.g. how many minor collections are happening per request), using sampling strategies. However, data for the entire process, which previously was reported as Rack::Server should not be sampled. Instead, the PeriodicReporter reports to statsd as counts and gauges, providing a more accurate look at the server's characteristics over time. The intention of PeriodicReporter is to be used by a clock that measures, and then reports on an interval corresponding to the statsd flushInterval (by default something like 10 seconds). A clock is not included in this commit. Stay tuned.
Periodic collects metrics and reports them to reporter on an interval, rather than on a per request basis (such as the Rack class). To avoid large rearchitecting, it mimics that of the Rack Middleware by conforming to the environment passing model of the reporter and metrics storage, using *it's* thread local storage for that of persisted last values of counters to track changes. In addition, we add a RailTie, specifically for the enabling and starting the periodic reporter.
- ReporterTest ensures that report implementations do nothing - Move ReporterTest -> RackReporterTest - Add PeriodicReporterTest which tests sample rates implementation.
Provide a test helper for Statsd which yields after a call to `:block`. This provides a basis for mocking out the `:count` and `gauge` methods.
Sorry about the background-reporter being in here too. It's pretty easy to ignore some of that as the last commit just adds 3 files. |
Connects to heroku/opex-team#241 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some minor style nits. There are some places where we could clean up logic. There's lots of room for personal opinion on style though. lining up equals and hashes helps with readability a lot.
My biggest ask is possibly for more docs. In my dream world each of these classes would have a small blurb saying what they do and then a tiny example. It's not 100% critical, but helps a ton if the next person to maintain this code isn't either one of us.
There is some hex math being done. If you could document how that works or why it works, or link to some other documentation about what exactly you're doing that would be amazing.
lib/trashed/instruments/backlog.rb
Outdated
@pid = nil | ||
end | ||
|
||
def start(state, counters, gauges) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we using this method anywhere? Can we get rid of it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
meter.rb has a if instrument.respond_to?(:start)
so it's safe to remove.
lib/trashed/instruments/backlog.rb
Outdated
basefile = "/proc/#{@pid}/" | ||
else | ||
basefile = "/proc/net/" | ||
end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like this is the only place we're actively using the @pid
. Instead of storing the @pid and then building a new string every time, could we instead do this calculation once in the initializer?
Also you don't need the then
keyword.
def initialize(pid = 'net')
@basefile = "/proc/#{pid}/"
end
I'm not sure if you want to do any kind of type checking here. I.e. if pid is entered as a non integer and not "net" then maybe we want to error?
if pid == "net" || pid = Integer(pid)
raise ArgumentError, "You entered #{pid} which is not an integer or 'net'"
On the flip side since we're opening the file later I guess it will fail if it doesn't exist, so that's reasonable enough.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good call on this.
lib/trashed/instruments/backlog.rb
Outdated
end | ||
|
||
[:tcp, :tcp6].each do |name| | ||
ProcTCP.new(File.open(basefile + name)).by(:st => TCP_SYN_RECV) do |entry| |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can do File.join(basefile, name)
to ensure that you don't accidentally have a double slash or a missing slash. Alternatively you could use a pathname object
require 'pathname'
PathName.new("/proc/#{pid}").join(name).open
On a style note :st => TCP_SYN_RECV
is usually done as st: TCP_SYN_RECV
. If the keys are symbols.
lib/trashed/instruments/backlog.rb
Outdated
class ProcTCP | ||
def initialize(file, pid=nil) | ||
@file = file | ||
@pid = pid |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we're using this @pid variable in this class
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch! This got refactored for testing, and ...
lib/trashed/instruments/backlog.rb
Outdated
:retrnsmt => pieces[10], | ||
:uid => pieces[11].to_i, | ||
:timeout => pieces[12].to_i, | ||
:inode => pieces[13].to_i, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Style nit, I would line these up and move over to 1.9.3 style hashes:
local_addr: toip(pieces[1]),
local_port: pieces[2].hex,
remote_addr: toip(pieces[3]),
remote_port: pieces[4].hex,
st: pieces[5].hex,
tx_queue: pieces[6].to_i,
rx_queue: pieces[7].to_i,
tr: pieces[8].hex,
tm_when: pieces[9],
retrnsmt: pieces[10],
uid: pieces[11].to_i,
timeout: pieces[12].to_i,
inode: pieces[13].to_i,
lib/trashed/instruments/backlog.rb
Outdated
def toip(piece) | ||
if piece.length == 8 | ||
i = piece.hex | ||
"#{i & 0xff}.#{(i >> 8) & 0xff}.#{(i >> 16) & 0xff}.#{(i >> 24) & 0xff}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add some docs on what exactly this method is doing? This is hex magic as far as i'm concerned
lib/trashed/instruments/backlog.rb
Outdated
TCP_CLOSE_WAIT = 8 | ||
TCP_LAST_ACK = 9 | ||
TCP_LISTEN = 10 | ||
TCP_CLOSING = 11 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you line up the equals?
TCP_ESTABLISHED = 1
TCP_SYN_SENT = 2
TCP_SYN_RECV = 3
TCP_FIN_WAIT1 = 4
TCP_FIN_WAIT2 = 5
TCP_TIME_WAIT = 6
TCP_CLOSE = 7
TCP_CLOSE_WAIT = 8
TCP_LAST_ACK = 9
TCP_LISTEN = 10
TCP_CLOSING = 11
test/backlog_test.rb
Outdated
count += 1 | ||
end | ||
assert_equal expected, count | ||
end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be longer but I would probably blow these up into two separate tests. It's a bit more verbose, but slightly more clear from a readability standpoint what's being tested:
count = 0
Trashed::Instruments::ProcTCP.new(tcp).by(:st => Trashed::Instruments::TCP_LISTEN) do |entry|
count += 1
end
assert_equal 2, count
count = 0
Trashed::Instruments::ProcTCP.new(tcp).by(:st => Trashed::Instruments::TCP_ESTABLISHED) do |entry|
count += 1
end
assert_equal 7, count
test/backlog_test.rb
Outdated
|
||
class BacklogTest < Minitest::Test | ||
def tcp | ||
StringIO.new(<<-HEREDOC |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for putting these in here. Will make maintenance much easier.
lib/trashed/reporter.rb
Outdated
class PeriodicReporter < Reporter | ||
|
||
def report_logger(env) | ||
end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Blank method?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The base class has a thing that is utilized in the RackReporter (which is being killed in this fork).
For reasons of Linux, and, related to Puma things, this SYN_RECV counter isn't an effective tool for measuring anything meaningful. The Linux kernel actually has two queues, and the queue that is important to measure ... can't be. So, this SYN_RECV counter is really measuring the number of client initiated connections that haven't completed the 3-way TCP handshake (e.g. haven't ACK'd), and that's a typical thing of a malicious actor, or very slow internet connection (but something we'd unlikely catch at a 20 second polling). There's an item in Puma (and possibly other ruby web servers, called the "backlog", which represents the number of connections waiting for a thread in the thread pool. This is likely a more interesting measurement, but even more interesting is probably the amount of time those connections sit idle waiting to be picked up, which we can get from the Heroku router. Since we're targetting Heroku with this, we'll simply dismiss this as being useful and move on.
This PR is taking on a life of it's own, and I'm going to merge and close it. Will Follow up with some additional PRs that address other things. |
Grab the TCP listen backlog from proc on Linux.
Connects to heroku/opex-team#241