New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Round of Changes #2

Merged

apg merged 21 commits into master from tcp-listen-backlog

Sep 27, 2017

apg commented Sep 14, 2017 •

edited

Loading

Grab the TCP listen backlog from proc on Linux.

Connects to heroku/opex-team#241

apghero added 13 commits

August 10, 2017 10:21


          Broken: Rename usage of timings to counters

27ae0ce

In most cases the metrics being collected are not actually timings,
but instead are counters. There are, however, some things that *are*
actually "timers" which we need to address in a future commit.

For now, we still submit the counters, as timers, to statsd to
maintain the statsd aggregation facilities that are useful for per
request stat tracking.

(Tests are probably failing after this commit)


          Broken: Subclass Reporter for per-request, server metrics purposes

f19e12a

RackReporter reports only counters (as timers), which are useful in a
per request context (e.g. how many minor collections are happening per
request), using sampling strategies.

However, data for the entire process, which previously was reported as
Rack::Server should not be sampled. Instead, the PeriodicReporter
reports to statsd as counts and gauges, providing a more accurate look
at the server's characteristics over time.

The intention of PeriodicReporter is to be used by a clock that
measures, and then reports on an interval corresponding to the statsd
flushInterval (by default something like 10 seconds). A clock is
not included in this commit. Stay tuned.


          Broken: Provide a clock implementation and a RailTie for it

8d85d68

Periodic collects metrics and reports them to reporter on an interval,
rather than on a per request basis (such as the Rack class). To avoid
large rearchitecting, it mimics that of the Rack Middleware by
conforming to the environment passing model of the reporter and
metrics storage, using *it's* thread local storage for that of
persisted last values of counters to track changes.

In addition, we add a RailTie, specifically for the enabling and
starting the periodic reporter.


          raise on parent reporter for unimplemented methods

44792d6


          Make the pre-existing tests pass.

e2e9693


          Add some new tests, and rearrange a bit

2e24e83

- ReporterTest ensures that report implementations do nothing
- Move ReporterTest -> RackReporterTest
- Add PeriodicReporterTest which tests sample rates implementation.


          Add a test for the PeriodicReporter hitting statsd

5a584a8

Provide a test helper for Statsd which yields after a call to
`:block`. This provides a basis for mocking out the `:count` and
`gauge` methods.


          Implement nop report_logger for PeriodicReporter

ebe4edd


          Remove a spurious puts in periodic.

54f4138


          raise NotImplementedError instead.

06ab17f


          Merge the periodic tie into the main railtie.

ccf9ee5


          Move the Statsd helper into PeriodicReporterTest

c81b89b


          Add an Instrument for SYN RECV sockets (e.g. w. for accept(2))

0c2394c

apg requested review from freeformz and schneems

September 14, 2017 23:37

Author

apg commented Sep 14, 2017

Sorry about the background-reporter being in here too. It's pretty easy to ignore some of that as the last commit just adds 3 files.

Author

apg commented Sep 15, 2017

Connects to heroku/opex-team#241

schneems requested changes

View reviewed changes

schneems left a comment

Some minor style nits. There are some places where we could clean up logic. There's lots of room for personal opinion on style though. lining up equals and hashes helps with readability a lot.

My biggest ask is possibly for more docs. In my dream world each of these classes would have a small blurb saying what they do and then a tiny example. It's not 100% critical, but helps a ton if the next person to maintain this code isn't either one of us.

There is some hex math being done. If you could document how that works or why it works, or link to some other documentation about what exactly you're doing that would be amazing.

lib/trashed/instruments/backlog.rb Outdated

+                      @pid = nil
+                    end
+                    def start(state, counters, gauges)

schneems Sep 15, 2017

Are we using this method anywhere? Can we get rid of it?

Author

apg Sep 15, 2017

meter.rb has a if instrument.respond_to?(:start) so it's safe to remove.

lib/trashed/instruments/backlog.rb Outdated

+                        basefile = "/proc/#{@pid}/"
+                      else
+                        basefile = "/proc/net/"
+                      end

schneems Sep 15, 2017

It looks like this is the only place we're actively using the @pid. Instead of storing the @pid and then building a new string every time, could we instead do this calculation once in the initializer?

Also you don't need the then keyword.

def initialize(pid = 'net')
  @basefile = "/proc/#{pid}/"
end

I'm not sure if you want to do any kind of type checking here. I.e. if pid is entered as a non integer and not "net" then maybe we want to error?

if pid == "net" || pid = Integer(pid)
  raise ArgumentError, "You entered #{pid} which is not an integer or 'net'"

On the flip side since we're opening the file later I guess it will fail if it doesn't exist, so that's reasonable enough.

Author

apg Sep 15, 2017

Good call on this.

lib/trashed/instruments/backlog.rb Outdated

+                      end
+                      [:tcp, :tcp6].each do |name|
+                        ProcTCP.new(File.open(basefile + name)).by(:st => TCP_SYN_RECV) do |entry|

schneems Sep 15, 2017

You can do File.join(basefile, name) to ensure that you don't accidentally have a double slash or a missing slash. Alternatively you could use a pathname object

require 'pathname'
PathName.new("/proc/#{pid}").join(name).open

On a style note :st => TCP_SYN_RECV is usually done as st: TCP_SYN_RECV. If the keys are symbols.

lib/trashed/instruments/backlog.rb Outdated

+                  class ProcTCP
+                    def initialize(file, pid=nil)
+                      @file = file
+                      @pid = pid

schneems Sep 15, 2017

I don't think we're using this @pid variable in this class

Author

apg Sep 15, 2017

Good catch! This got refactored for testing, and ...

lib/trashed/instruments/backlog.rb Outdated

+                        :retrnsmt => pieces[10],
+                        :uid => pieces[11].to_i,
+                        :timeout => pieces[12].to_i,
+                        :inode => pieces[13].to_i,

schneems Sep 15, 2017

Style nit, I would line these up and move over to 1.9.3 style hashes:

local_addr:  toip(pieces[1]),
local_port:  pieces[2].hex,
remote_addr: toip(pieces[3]),
remote_port: pieces[4].hex,
st:          pieces[5].hex,
tx_queue:    pieces[6].to_i,
rx_queue:    pieces[7].to_i,
tr:          pieces[8].hex,
tm_when:     pieces[9],
retrnsmt:    pieces[10],
uid:         pieces[11].to_i,
timeout:     pieces[12].to_i,
inode:       pieces[13].to_i,

lib/trashed/instruments/backlog.rb Outdated

+                    def toip(piece)
+                      if piece.length == 8
+                        i = piece.hex
+                        "#{i & 0xff}.#{(i >> 8) & 0xff}.#{(i >> 16) & 0xff}.#{(i >> 24) & 0xff}"

schneems Sep 15, 2017

Can you add some docs on what exactly this method is doing? This is hex magic as far as i'm concerned

lib/trashed/instruments/backlog.rb Outdated

+                  TCP_CLOSE_WAIT = 8
+                  TCP_LAST_ACK = 9
+                  TCP_LISTEN = 10
+                  TCP_CLOSING = 11

schneems Sep 15, 2017

Can you line up the equals?

TCP_ESTABLISHED = 1
TCP_SYN_SENT    = 2
TCP_SYN_RECV    = 3
TCP_FIN_WAIT1   = 4
TCP_FIN_WAIT2   = 5
TCP_TIME_WAIT   = 6
TCP_CLOSE       = 7
TCP_CLOSE_WAIT  = 8
TCP_LAST_ACK    = 9
TCP_LISTEN      = 10
TCP_CLOSING     = 11

test/backlog_test.rb Outdated

+                      count += 1
+                    end
+                    assert_equal expected, count
+                  end

schneems Sep 15, 2017

It would be longer but I would probably blow these up into two separate tests. It's a bit more verbose, but slightly more clear from a readability standpoint what's being tested:

count = 0
Trashed::Instruments::ProcTCP.new(tcp).by(:st => Trashed::Instruments::TCP_LISTEN) do |entry|
  count += 1
end
assert_equal 2, count

count = 0
Trashed::Instruments::ProcTCP.new(tcp).by(:st => Trashed::Instruments::TCP_ESTABLISHED) do |entry|
  count += 1
end
assert_equal 7, count

test/backlog_test.rb Outdated

+              class BacklogTest < Minitest::Test
+                def tcp
+                   StringIO.new(<<-HEREDOC

schneems Sep 15, 2017

Thank you for putting these in here. Will make maintenance much easier.

lib/trashed/reporter.rb Outdated

+                class PeriodicReporter < Reporter
+                  def report_logger(env)
+                  end

schneems Sep 15, 2017

Blank method?

Author

apg Sep 25, 2017

The base class has a thing that is utilized in the RackReporter (which is being killed in this fork).

apghero added 8 commits

September 15, 2017 16:36


          Address requested feedback.

d3dd979


          Make clear what are tests are doing.

58d3e55


          Chopping block.

d9413b3


          Remove some excess things.

80b8b66


          Rename emitted metric for TCP Backlog.

238ff00


          It's not a hash, it's a list of pairs.

1ffba21


          Remove the Process.pid and try to capture it all.

21fa451


          The tcp_backlog meter actually doesn't actually bear any fruit

43acba0

For reasons of Linux, and, related to Puma things, this SYN_RECV
counter isn't an effective tool for measuring anything meaningful.
The Linux kernel actually has two queues, and the queue that is
important to measure ... can't be. So, this SYN_RECV counter is
really measuring the number of client initiated connections that
haven't completed the 3-way TCP handshake (e.g. haven't ACK'd),
and that's a typical thing of a malicious actor, or very slow
internet connection (but something we'd unlikely catch at a 20
second polling).

There's an item in Puma (and possibly other ruby web servers,
called the "backlog", which represents the number of connections
waiting for a thread in the thread pool. This is likely a more
interesting measurement, but even more interesting is probably
the amount of time those connections sit idle waiting to be
picked up, which we can get from the Heroku router. Since we're
targetting Heroku with this, we'll simply dismiss this as being
useful and move on.

apg changed the title ~~Tcp listen backlog~~ Round of Changes

Author

apg commented Sep 27, 2017

This PR is taking on a life of it's own, and I'm going to merge and close it. Will Follow up with some additional PRs that address other things.

apg merged commit bbbd3d0 into master

apg deleted the tcp-listen-backlog branch

September 27, 2017 18:17

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet