Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Round of Changes #2

Merged
merged 21 commits into from
Sep 27, 2017
Merged

Round of Changes #2

merged 21 commits into from
Sep 27, 2017

Conversation

apg
Copy link

@apg apg commented Sep 14, 2017

Grab the TCP listen backlog from proc on Linux.

Connects to heroku/opex-team#241

In most cases the metrics being collected are not actually timings,
but instead are counters. There are, however, some things that *are*
actually "timers" which we need to address in a future commit.

For now, we still submit the counters, as timers, to statsd to
maintain the statsd aggregation facilities that are useful for per
request stat tracking.

(Tests are probably failing after this commit)
RackReporter reports only counters (as timers), which are useful in a
per request context (e.g. how many minor collections are happening per
request), using sampling strategies.

However, data for the entire process, which previously was reported as
Rack::Server should not be sampled. Instead, the PeriodicReporter
reports to statsd as counts and gauges, providing a more accurate look
at the server's characteristics over time.

The intention of PeriodicReporter is to be used by a clock that
measures, and then reports on an interval corresponding to the statsd
flushInterval (by default something like 10 seconds). A clock is
not included in this commit. Stay tuned.
Periodic collects metrics and reports them to reporter on an interval,
rather than on a per request basis (such as the Rack class). To avoid
large rearchitecting, it mimics that of the Rack Middleware by
conforming to the environment passing model of the reporter and
metrics storage, using *it's* thread local storage for that of
persisted last values of counters to track changes.

In addition, we add a RailTie, specifically for the enabling and
starting the periodic reporter.
- ReporterTest ensures that report implementations do nothing
- Move ReporterTest -> RackReporterTest
- Add PeriodicReporterTest which tests sample rates implementation.
Provide a test helper for Statsd which yields after a call to
`:block`. This provides a basis for mocking out the `:count` and
`gauge` methods.
@apg
Copy link
Author

apg commented Sep 14, 2017

Sorry about the background-reporter being in here too. It's pretty easy to ignore some of that as the last commit just adds 3 files.

@apg
Copy link
Author

apg commented Sep 15, 2017

Connects to heroku/opex-team#241

Copy link

@schneems schneems left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some minor style nits. There are some places where we could clean up logic. There's lots of room for personal opinion on style though. lining up equals and hashes helps with readability a lot.

My biggest ask is possibly for more docs. In my dream world each of these classes would have a small blurb saying what they do and then a tiny example. It's not 100% critical, but helps a ton if the next person to maintain this code isn't either one of us.

There is some hex math being done. If you could document how that works or why it works, or link to some other documentation about what exactly you're doing that would be amazing.

@pid = nil
end

def start(state, counters, gauges)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we using this method anywhere? Can we get rid of it?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

meter.rb has a if instrument.respond_to?(:start) so it's safe to remove.

basefile = "/proc/#{@pid}/"
else
basefile = "/proc/net/"
end

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like this is the only place we're actively using the @pid. Instead of storing the @pid and then building a new string every time, could we instead do this calculation once in the initializer?

Also you don't need the then keyword.

def initialize(pid = 'net')
  @basefile = "/proc/#{pid}/"
end

I'm not sure if you want to do any kind of type checking here. I.e. if pid is entered as a non integer and not "net" then maybe we want to error?

if pid == "net" || pid = Integer(pid)
  raise ArgumentError, "You entered #{pid} which is not an integer or 'net'"

On the flip side since we're opening the file later I guess it will fail if it doesn't exist, so that's reasonable enough.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good call on this.

end

[:tcp, :tcp6].each do |name|
ProcTCP.new(File.open(basefile + name)).by(:st => TCP_SYN_RECV) do |entry|

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can do File.join(basefile, name) to ensure that you don't accidentally have a double slash or a missing slash. Alternatively you could use a pathname object

require 'pathname'
PathName.new("/proc/#{pid}").join(name).open

On a style note :st => TCP_SYN_RECV is usually done as st: TCP_SYN_RECV. If the keys are symbols.

class ProcTCP
def initialize(file, pid=nil)
@file = file
@pid = pid

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we're using this @pid variable in this class

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch! This got refactored for testing, and ...

:retrnsmt => pieces[10],
:uid => pieces[11].to_i,
:timeout => pieces[12].to_i,
:inode => pieces[13].to_i,

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Style nit, I would line these up and move over to 1.9.3 style hashes:

local_addr:  toip(pieces[1]),
local_port:  pieces[2].hex,
remote_addr: toip(pieces[3]),
remote_port: pieces[4].hex,
st:          pieces[5].hex,
tx_queue:    pieces[6].to_i,
rx_queue:    pieces[7].to_i,
tr:          pieces[8].hex,
tm_when:     pieces[9],
retrnsmt:    pieces[10],
uid:         pieces[11].to_i,
timeout:     pieces[12].to_i,
inode:       pieces[13].to_i,

def toip(piece)
if piece.length == 8
i = piece.hex
"#{i & 0xff}.#{(i >> 8) & 0xff}.#{(i >> 16) & 0xff}.#{(i >> 24) & 0xff}"

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add some docs on what exactly this method is doing? This is hex magic as far as i'm concerned

TCP_CLOSE_WAIT = 8
TCP_LAST_ACK = 9
TCP_LISTEN = 10
TCP_CLOSING = 11

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you line up the equals?

TCP_ESTABLISHED = 1
TCP_SYN_SENT    = 2
TCP_SYN_RECV    = 3
TCP_FIN_WAIT1   = 4
TCP_FIN_WAIT2   = 5
TCP_TIME_WAIT   = 6
TCP_CLOSE       = 7
TCP_CLOSE_WAIT  = 8
TCP_LAST_ACK    = 9
TCP_LISTEN      = 10
TCP_CLOSING     = 11

count += 1
end
assert_equal expected, count
end

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be longer but I would probably blow these up into two separate tests. It's a bit more verbose, but slightly more clear from a readability standpoint what's being tested:

count = 0
Trashed::Instruments::ProcTCP.new(tcp).by(:st => Trashed::Instruments::TCP_LISTEN) do |entry|
  count += 1
end
assert_equal 2, count

count = 0
Trashed::Instruments::ProcTCP.new(tcp).by(:st => Trashed::Instruments::TCP_ESTABLISHED) do |entry|
  count += 1
end
assert_equal 7, count


class BacklogTest < Minitest::Test
def tcp
StringIO.new(<<-HEREDOC

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for putting these in here. Will make maintenance much easier.

class PeriodicReporter < Reporter

def report_logger(env)
end

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Blank method?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The base class has a thing that is utilized in the RackReporter (which is being killed in this fork).

For reasons of Linux, and, related to Puma things, this SYN_RECV
counter isn't an effective tool for measuring anything meaningful.
The Linux kernel actually has two queues, and the queue that is
important to measure ... can't be. So, this SYN_RECV counter is
really measuring the number of client initiated connections that
haven't completed the 3-way TCP handshake (e.g. haven't ACK'd),
and that's a typical thing of a malicious actor, or very slow
internet connection (but something we'd unlikely catch at a 20
second polling).

There's an item in Puma (and possibly other ruby web servers,
called the "backlog", which represents the number of connections
waiting for a thread in the thread pool. This is likely a more
interesting measurement, but even more interesting is probably
the amount of time those connections sit idle waiting to be
picked up, which we can get from the Heroku router. Since we're
targetting Heroku with this, we'll simply dismiss this as being
useful and move on.
@apg apg changed the title Tcp listen backlog Round of Changes Sep 27, 2017
@apg
Copy link
Author

apg commented Sep 27, 2017

This PR is taking on a life of it's own, and I'm going to merge and close it. Will Follow up with some additional PRs that address other things.

@apg apg merged commit bbbd3d0 into master Sep 27, 2017
@apg apg deleted the tcp-listen-backlog branch September 27, 2017 18:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants