-
Notifications
You must be signed in to change notification settings - Fork 18
Modifies service_rpc to use chunked transfer #1
Conversation
|
Looks great @melder, I will give this a spin on a staging server. |
|
OK I tested this on a staging server with a plain GitLab 6.3 install using the supplied nginx config file. Setup:
Test:
Before:
After:
This is really impressive @melder! We should probably also test with Apache because a lot of people use that. |
|
Did another test, still with nginx: Setup:
Test:
Before:
After:
|
|
Did a git clone with Git 1.7.10 (oldest version supported by GitLab) through nginx, worked. |
|
Running into trouble with Apache. I used this configuration file: https://github.com/gitlabhq/gitlab-recipes/blob/master/web-server/apache/gitlab.conf When I do the clone for the repository with the big file in it I get the following: It seems the Unicorn worker is getting killed after 30s (the default). It is not clear to me why this does not happen when nginx is the frontend. Any thoughts @melder @axilleas @SAG47 ? |
|
On closer inspection I am not sure if the apache issue described above is related to this PR at all. If I use the baseline Grack version I get the same error when doing the HTTP clone through Apache. |
|
👍 for this PR /cc @randx |
|
@jacobvosmaer try doing the very large clone with Apache but without using the patch. Apache handles chunking by default where nginx does not before version 1.3.9. Ubuntu 12.04 comes with |
|
@melder @jacobvosmaer Also I'm not entirely sure the chunked encoding is implemented fully to spec with regards to HTTP/1.1. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a Windows end of line. This should be a Unix end of line (CRLF = "\n") since it is hosted on a Linux system should it not?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@SAG47 this is required by the HTTP protocol.
|
@SAG47 I am aware of issues with pushing to GitLab through nginx where the chunking done by the Git client seems not supported by nginx. I am observing the following while cloning a repo with a 170MB file from GitLab 6.3:
|
lib/grack/server.rb
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not really an issue but more of a grammar correction. Shouldn't the comment be # *stream* it to the client?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This typo is very old (check the git history) but I agree with @SAG47 that we might as well correct it at this time.
|
@SAG47 also note that the nginx link you gave earlier states:
This PR deals with the response, not the request. |
|
Shouldn't there be a |
|
|
|
I take that back. It looks like it follows the chunked-encoding standard from my reading. I was having trouble reading the RFC as I'm not used to their format. I could still be wrong about reading it though. |
|
@SAG47 I drew your attention to this PR because for a moment I feared it broke things for Apache. I spoke to soon perhaps because after more testing I have not been able to find a test case for git HTTP access that works through Apache before but breaks after applying this patch to Grack. Do you see anything in this PR that looks problematic for Apache? |
|
@jacobvosmaer There appears to be a bug filed with Apache mod_proxy mishandling chunked responses. In the Apache config try setting |
I'm able to clone, pull, and push to repositories via HTTP on RHEL 6.5 Apache httpd 2.2.15 release 29.el6_4. I have not tested very large files in a repo as mentioned in this post. |
|
Thanks for the link to the bug @SAG47 . It is not clear to me if that bug weighs against merging this PR. What do you think? |
|
@jacobvosmaer I just tested large files in a repository over http in my vanilla production 6.3 instance. Here are my results and the steps I took.
Here's the output from the final push of the large file. I don't appear to be experiencing this issue in my instance. I don't know how merging this patch will affect a working set up. I'll need to create a test environment and merge this PR to be sure. I probably can't do that today as I've got a lot on my plate. |
|
@SAG47 is |
|
@jacobvosmaer This is the exact apache config I use in production where I conducted those tests. My config is a little outdated and I have not made any changes based on this PR. ... ah darn I forgot to edit out the server name.... well the internet now knows where I host gitlab at Drexel. Oh well, it's fairly secure. |
|
Thanks for sharing the config @SAG47 (I hope you won't get in trouble). I have tried before/after again with apache, this time cloning on the server itself. I see the same sort of memory improvements for this patch with apache as with nginx. I was testing the clone via a slow connection before; I wonder if that caused the timeout because Unicorn had to wait for Apache to send the data to my machine. With a fast connection (localhost, on the gitlab server) the clone went fine. This PR still looks good to me, and I think Apache users would also appreciate Unicorn using less memory. :) |
|
@jacobvosmaer if that's the case then a timeout can now be reliably determined with the 8192 byte chunks. Just determine your speed with the server and the amount of time it takes to transfer 8MB (slowest between upload/download). Then set the Unicorn/Apache/nginx timeouts to be much greater than that value. I didn't realize the 8MB blocks of transfer before. Now that I do I see this being relatively easy to document a recommendation for admins. |
|
Thanks for the suggestion @SAG47. I have no reservations about this PR at this point. |
|
I think 8mb is a typo as well, it's 8192bytes, so 8KB. http://www.ruby-doc.org/core-2.0.0/IO.html#method-i-read I updated the comments to save you guys a little work. Let me know if you have any questions, I couldn't decipher any over this long thread :) |
|
@melder Ah you're right! Silly I didn't notice my own mistake calling 8192 bytes == 2^20 bytes. Bah! |
Modifies service_rpc to use chunked transfer
|
@SAG47 thank you |
|
And thank you @melder !! |
Efficient Clone Since gitlabhq/grack#1 this warning is no longer needed I think.
https://github.com/gitlabhq/gitlabhq/issues/3079
When downloading/uploading git data, just like with any other web request, a Content-Length header is expected. The problem is the entire response must first be read in order to calculate its length. This is okay with small repos, but breaks servers when serving repos several GBs in size. The entire repo would first be loaded into memory, and then served to the client making it inefficient, unfriendly, (git may appear to hang while the server is loading the repo) and prone to OOM errors.
Chunked transfer solves these issues by doing away with the Content-Length header. It simply grabs data and sends to the client as it is read. The result is immediate download on client side, and negligible memory consumption on server side.
Note that this fix does not provide improvement for single-threaded servers like Thin. It works wonderfully on unicorn, though.