Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Collect HTTP transfer stats #366

Merged
merged 11 commits into from
Jun 16, 2015
Merged

Collect HTTP transfer stats #366

merged 11 commits into from
Jun 16, 2015

Conversation

rubyist
Copy link
Contributor

@rubyist rubyist commented Jun 3, 2015

I was doing some measuring today and added some code that might be generally useful with respect to some analytics functionality previously discussed.

What's in this PR is a modification of the HTTP tracing code that keeps track of the sizes and timings of every http request that goes through the default client. Once you have a response you can link it to a key in a way similar to statsd (though far less capable, at this point), e.g.:

LogTransfer("lfs.api.upload", res)

It tracks the size of both the header and body portions of the transfers, as well as the timing. The header vs body sizes can be useful for determining the point at which the HTTP overhead overtakes the file size (e.g. thousands of small files in LFS).

I'd love to be able to break down the timings into the various pieces of the http request, but that's a whole other can of worms best left for another day.

As it stands, when tracing with GIT_CURL_VERBOSE=1, you'll get output like this:

(100 of 100 files) 50.00 MB 1m22s
HTTP Transfer Stats

Concurrent Transfers: 3, Batch: false

lfs.api.upload:
  Transfers: 100
  Time: 120870.59ms
  Wire Data (Bytes): 81700
    Request Size (Bytes): 37300
      Headers: 28500
      Bodies: 8800
    Response Size (Bytes): 44400
      Headers: 12700
      Bodies: 31700
  Mean Wire Size: 817
    Requests: 373.00
    Responses: 444.00
  Mean Transfer Time: 1208.71ms

lfs.data.upload:
  Transfers: 100
  Time: 121015.36ms
  Wire Data (Bytes): 52461500
    Request Size (Bytes): 52459800
      Headers: 31000
      Bodies: 52428800
    Response Size (Bytes): 1700
      Headers: 1700
      Bodies: 0
  Mean Wire Size: 524615
    Requests: 524598.00
    Responses: 17.00
  Mean Transfer Time: 1210.15ms


Total Transfers: 200
Total Time: 241885.94ms

or

(100 of 100 files) 50.00 MB 1m15s
HTTP Transfer Stats

Concurrent Transfers: 10, Batch: false

lfs.api.upload:
  Transfers: 100
  Time: 350577.71ms
  Wire Data (Bytes): 81700
    Request Size (Bytes): 37300
      Headers: 28500
      Bodies: 8800
    Response Size (Bytes): 44400
      Headers: 12700
      Bodies: 31700
  Mean Wire Size: 817
    Requests: 373.00
    Responses: 444.00
  Mean Transfer Time: 3505.78ms

lfs.data.upload:
  Transfers: 100
  Time: 350871.42ms
  Wire Data (Bytes): 52461500
    Request Size (Bytes): 52459800
      Headers: 31000
      Bodies: 52428800
    Response Size (Bytes): 1700
      Headers: 1700
      Bodies: 0
  Mean Wire Size: 524615
    Requests: 524598.00
    Responses: 17.00
  Mean Transfer Time: 3508.71ms


Total Transfers: 200
Total Time: 701449.13ms

or

(100 of 100 files) 50.00 MB 37s
HTTP Transfer Stats

Concurrent Transfers: 10, Batch: true

lfs.api.batch:
  Transfers: 1
  Time: 766.13ms
  Wire Data (Bytes): 41046
    Request Size (Bytes): 9204
      Headers: 291
      Bodies: 8913
    Response Size (Bytes): 31842
      Headers: 128
      Bodies: 31714
  Mean Wire Size: 41046
    Requests: 9204.00
    Responses: 31842.00
  Mean Transfer Time: 766.13ms

lfs.data.upload:
  Transfers: 100
  Time: 186513.05ms
  Wire Data (Bytes): 52461500
    Request Size (Bytes): 52459800
      Headers: 31000
      Bodies: 52428800
    Response Size (Bytes): 1700
      Headers: 1700
      Bodies: 0
  Mean Wire Size: 524615
    Requests: 524598.00
    Responses: 17.00
  Mean Transfer Time: 1865.13ms


Total Transfers: 101
Total Time: 187279.18ms

Some things to consider before this is a real thing:

  • Use a separate tracing var for it, http tracing output is messy with concurrent transfers
  • Probably want some mutexes around the hashes, it's kind of OK as is but a little iffy
  • Make double sure we're not storing extra data when not tracing

Future work could further aggregate these, send them off somewhere, make more statsd-like outputs (percentiles, sums, histograms, etc).

@technoweenie
Copy link
Contributor

Big 👍. Though I think separating this from GIT_CURL_VERBOSE would be a good idea.

Future work could further aggregate these, send them off somewhere, make more statsd-like outputs (percentiles, sums, histograms, etc).

This could be a common log format (I'm partial to grohl of course). Something simple that the Git LFS tool can produce for something else to dump into splunk, statsd, spark, etc.

@rubyist
Copy link
Contributor Author

rubyist commented Jun 3, 2015

Yeah that's what I was getting at in that first checkbox, it gets drowned out in the GIT_CURL_VERBOSE output, a separate flag is def needed.

I like the common log format idea, dumping right to that would probably save a bit of memory while tracing, and some more request metadata could be added in.

@rubyist
Copy link
Contributor Author

rubyist commented Jun 5, 2015

I've done a bunch of clean up on this and I think it does enough for now to continue other benchmarking work. With the latest changes, if you set GIT_LOG_STATS (open to a better name) it will log the stats for all transfers in that command run to a timestamp-based log file in .git/lfs/objects/logs/http. I pulled the display output out of this code. I've got some code up in a gist that will render what was there. This could be made part of the lfs logs command, or left external and made into a proper repo. I'm not sure it's necessary to have all that in the client proper, external tooling might be better (and can be extended without interfering with the client release cycle).

@rubyist rubyist closed this Jun 16, 2015
@rubyist rubyist reopened this Jun 16, 2015
rubyist added a commit that referenced this pull request Jun 16, 2015
Collect HTTP transfer stats
@rubyist rubyist merged commit 257f909 into master Jun 16, 2015
@rubyist rubyist deleted the statistician branch June 16, 2015 14:43
// header because it may not exist or be -1 in the case of chunked responses.
resstats := &transferStats{HeaderSize: resHeaderSize, Start: start}
transfersLock.Lock()
transfers[res] = &transfer{requestStats: reqstats, responseStats: resstats, url: req.URL.String()}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It may not be necessary to dump the url, since you can get it from the response object with res.Request.URL.

Maybe it makes too little difference, but you could build the transfer object outside of the lock.

t := &transfer{requestStats: reqstats, responseStats: resstats, url: req.URL.String()}
transfersLock.Lock()
transfers[res] = t
transfersLock.Unlock()

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, we should build the object outside of the lock. Not a big deal, but feels better. I'm not sure what you mean by "dump the url", though.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants