New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove 8K limit for single access.log line #332
Conversation
Can one of the admins verify this patch? |
ok to test |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change needs a lot of work, including a decision on what to do on overflows. See inlined comments for a partial review, including a list of overflow handling options to consider.
src/log/File.cc
Outdated
char* buf = NULL; | ||
|
||
while (true) { | ||
buf = new char[bufsize]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We must not allocate and delete memory on every request unless such expensive operations are necessary. This allocation is not necessary AFAICT. For example, we could start with a pre-allocated 16KB on-stack or static buffer and switch to a larger buffer only when needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Buffers allocated on stacks are the #1 cause of vulnerabilities in software. It is much safer to allocate memory in the heap. Also, 16K is an excessive amount of memory to allocate on the stack (the former amount was 8K, which is also excessive for the stack). Allocating a static buffer would be fine, but then the function is no longer thread-safe, unless the memory can be allocated with the C++11 thread_local directive.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AFAIK, this function does not need to be thread safe. In fact, virtually no Squid function needs to be thread safe except a very small set of specialized functions dedicated to AUFS code (src/DiskIO/DiskThreads/). Outside of threaded AUFS code, virtually any "I will do X because of thread safety" argument is invalid. This logging code is outside of threaded AUFS code.
And if you convert this code to use SBuf, then we do not need to argue whether allocating X KB on stack is "safe" -- SBuf does not use stack for its storage.
src/log/File.cc
Outdated
break; | ||
} | ||
|
||
// Try again with a bigger buffer |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please find a way to avoid duplicating SBuf::vappendf() code. For example, it may be possible to reuse a static SBuf buf
object in this function instead of all this C-like code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change request has not been addressed in 35f3acd. There are other problems in that commit as well, but I am going to ignore them until this change request is addressed -- most of the other problems should disappear with this change.
src/log/File.cc
Outdated
if (fmt[strlen(fmt) - 1] == '\n') | ||
buf[8191] = '\n'; | ||
if (!buf) { | ||
logfileWrite(lf, (char*)"-", 1); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@squid-cache/commiters, besides warning the admin via cache.log, what should we do if an access log line grows insanely big and, hence, should not be allocated and logged "as is"? AFAICT, our options include:
- Log a single dash or
[...]
to allow automated detection/handling but providing no access.log information to track the problem. - Log what we can, possibly misleading automation tools with a truncated field value (e.g.,
12
instead of1234
). - Log what we can but append
[...]
or some other line truncation indicator to allow automated detection/handling. - Split the line into multiple lines, breaking "one transaction, one line" rule.
- Truncating largest field(s) as necessary, using option 1, 2, or 3 above for each truncated field.
- Allocate until we run out of RAM. See the comment below.
All of the above options except options 5 and 6 break configured logformat.
AFAICT, Squid currently uses option 2. If we cannot agree how to improve/fix the current behavior, this PR should probably preserve it.
I like option 6 the most. What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Needless to say, we can adopt the current PR approach of having no limit at all -- allocate more RAM until Squid is killed by the OS.
This option 6 may actually be the best option because it is simple and OOM deaths caused by this code are rather unlikely. Please note that Squid already keeps pieces of what it wants to log in RAM.
src/log/File.cc
Outdated
if (s > 8192) { | ||
s = 8192; | ||
// bufsize will be large enough for most log lines | ||
int bufsize = 16 * 1024; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please use size_t
for sizes.
On the architectural level please be aware that this 8KB buffer is not the only limit imposed on log lines.
IMO this magic 8196 number should just be replaced with LOGFILE_BUFSZ which can be made replaceable at build time with -DLOGFILE_BUFSZ when it is moved to log/File.h. The Log:: objects are overdue for a full refactoring to use C++ instead of C virtual functions - which seems out of scope here. |
AFAICT, it is the only 8KB limit in some environments, and even if it is never the only limit, the PR intent to increase that limit is still valid. You have not said that it is invalid, but I want to explicitly clarify that nothing in your comment should be misinterpreted as PR intent invalidation. BTW, your summary is very useful for many purposes! Please consider copying it to some wiki page.
I do not think a |
Agreed on both counts. That refactoring should be done, but in a separate independent PR. |
Completed requested modifications and ran tests. |
I've removed the thread_local declaration from logfilePrintf(). |
LOGFILE_BUFSZ is currently defined as 32K, which should be more than adequate for most lines. |
Agreed. In case you are implying the opposite, increasing initial buffer size does not address my change request regarding code duplication. I am not sure why you are ignoring it (and a few other change requests), but I am happy to answer any questions about them. Please go back to my review and make sure that change requests not already marked as "resolved" are addressed, starting with the code duplication request (which starts with |
Silly question, why did Squid create SBuf instead of using std::string? |
Greatly simplified code using SBuf. |
In short, std::string is a bad choice for frequently shared, parsed, and peeked at raw buffers that Squid needs for shoveling I/O results around and similar performance-sensitive tasks. The general-purpose class was not designed for that kind of performance work. For example, modern std::string cannot be safely shared without copying. Also, std::string has some scary and slow locale-dependent code (at least it was present in older std::string implementations). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for addressing all of my earlier change requests! AFAICT, the PR is still very buggy, but
- You are not responsible for the remaining bugs.
- You are on the right track overall, with only one high-level issue remaining in your current code.
Please see inlined change requests for details.
src/log/File.cc
Outdated
logfileWrite(lf, buf, (size_t) s); | ||
static SBuf sbuf; | ||
sbuf.clear(); | ||
sbuf.vappendf(fmt, args); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Other bugs notwithstanding, the maximum line size is now limited to 256 MB (SBuf::maxSize or 268435455 bytes to be precise). That is more than enough, of course, but I wonder whether we should add code to handle overflows. Without such code, an unlikely huge access.log line will either kill Squid or not appear in the access.log at all. Thus, we are back to the six overflow handling options discussed earlier, except that the simplest option 6 is off the table. Since we no longer deal with raw formatting code here, I suggest working around this problem instead of solving it directly:
- Do not handle overflows in logfilePrintf(). Just check that if SBuf throws, there will some ERROR message about that in cache.log. To check, temporary hack logfilePrintf() code to add
Must(false)
and see what happens -- do that before moving on to step2 below. - Change Log::Format::SquidCustom() to print the already formatted(!!) line using logfileWrite() instead of abusing logfilePrintf(). You may need to refactor a little so that Format::assemble() appends a new line when needed.
The above approach is based on the assumption that the most likely (if not the only source) of logfilePrintf() overflows is custom log formats. And if we want to improve overflow handling there, we have to improve Format::assemble(), where overflow handling approaches like the nice option 5 posted earlier are much easier to implement. We do not have to do those improvements in this PR though -- the above two steps should be enough for now as far as overflow handling is concerned.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The above approach is based on the assumption that the most likely (if not the only source) of logfilePrintf() overflows is custom log formats.
Alex, that assumption is wrong. Anything that logs URL/URI and/or header field-values can produce very long lines. That means all log formats built into Squid are possible overflows. Format::assemble is also used in all the non-log places where logformat macros are supported (error pages, deny_ino URLs, external ACL helper I/O) - so care needs to be taken not to break any of those by forcing implicit line-wrapping into its output.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alex: the most likely source of logfilePrintf() overflows is custom log formats
Amos: that assumption is wrong. [...] all log formats built into Squid are possible overflows
Your statement does not contradict my assumption AFAICT, but, more importantly, I believe that the already implemented PR changes invalidate your statement: Some logfilePrintf() callers cannot overflow at all (even before the PR), and only custom log format has any chance of producing log lines that exceed 268435455 characters (the post-PR limit).
care needs to be taken not to break any of those by forcing implicit line-wrapping into Format::assemble() output
Yes, that is why I said "appends a new line when needed". One way to handle this without adding parameters/options to assemle() could be to append a newline to the configured logformat string.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change request is still unresolved. The two-step plan I have suggested still applies AFAICT.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am out of time on this project. I will not implement the suggested change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will not implement the suggested change.
Thank you for sharing your decision. I believe the requested change should be implemented before this PR is approved because it is not difficult to implement, will address a potentially important in-scope problem, and even provide a significant performance improvement. Hopefully, somebody else will take over and finish this work.
However, if other @squid-cache/commiters disagree, they may be able to approve this PR (after a thorough from-scratch review) without this change because they may decide that the PR is a step forward, while that known problem is unlikely to surface in real deployments.
I went through all my change requests and marked the addressed ones (and the ones that do not need to be addressed) as resolved. AFAICT, there are two change requests remaining. I added a "not addressed" comment to each of them. One is optional/trivial. The other one requires a little bit of work. Rant: GitHub does not make it very easy to find what needs to be done, but if you use the web interface and start from the actual review post (rather than notification emails or some random change requests), you can usually find all the change requests that may need your attention. This can be time consuming, especially when there are outdated blobs and when reviewers do not mark addressed change requests promptly, but that is the only way I know that works. You can always ask about a particular change request status if you are not sure, of course. Please avoid using individual change requests for posting general PR questions or comments (like this one). Each change request should be focused on that specific change. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am removing my "changes requested" mark so that I am not blocking progress with this PR. Please see the last remaining change request for details.
src/log/File.cc
Outdated
logfileWrite(lf, buf, (size_t) s); | ||
static SBuf sbuf; | ||
sbuf.clear(); | ||
sbuf.vappendf(fmt, args); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will not implement the suggested change.
Thank you for sharing your decision. I believe the requested change should be implemented before this PR is approved because it is not difficult to implement, will address a potentially important in-scope problem, and even provide a significant performance improvement. Hopefully, somebody else will take over and finish this work.
However, if other @squid-cache/commiters disagree, they may be able to approve this PR (after a thorough from-scratch review) without this change because they may decide that the PR is a step forward, while that known problem is unlikely to surface in real deployments.
Allow others to move this PR forward as discussed elsewhere
Log lines with values causing it to exceed SBuf::maxSize will overflow and trigger an exception. Currently that is expected to be caught in the main loop handler and shutdown Squid. Since this is a transaction specific problem it could be better handled as a transaction specific exception and let other client connections and the proxy continue unaffected. TODO rather than XXX because the large SBuf size limit is hard to encounter and malicious attempts to trigger it remotely are prohibited by the HTTP message size limits.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a code TODO to document Alex's outstanding issue for a future dev to resolve.
This PR now passes the 10+ day criteria and since we have v6 as main already this can go in now.
If it actually fixes that bug, yes. But AFAIK once Anubis starts triggering its build commit it is too late, the commit already being tested with the existing title will either be merged or fail. |
It looks like something is going wrong with the pr-auto triggers. The status says "build triggered for merge commit", but Jenkins has no queued build for this PR. |
I was worried about the lack of Jenkins results as well. @eduard-bagdasaryan, could you please take a look? I do not know much about Jenkins setup... |
Based on the bug and PR descriptions, I assume it does, but I do not know for sure. The reporter has not updated the bug report yet(?) with this information (but that could also be because this PR does not apply cleanly to v4).
It should not be too late until the tested commit is merged into master. Anubis should notice the title change and regenerate the merge commit, which should restart testing (IIRC). AFAICT, as of now, Anubis has not even created the merge commit (on the auto branch) because Anubis waits for Jenkins test results (of the GitHub-generated commit). |
It looks like Jenkins was down at the time of fc2501d, thus missing the notification. The easiest way to re-trigger it is probably to re-send the corresponding webhook push event (the list of notifications should be available on Webhooks GitHub page). I cannot do it myself due to permission restrictions. |
Hmm.. I do not see a push event that corresponds to fc2501d dated 2020-02-12. The latest events are 2020-02-08 and 2020-02-18. Neither event is about this PR AFAICT. I am going to commit a non-change in hope to trigger a new Jenkins test. |
The function logfilePrintf() currently limits a single line to no more than 8K characters. It allocates an 8K buffer on the stack, which is not safe. This PR eliminates the 8K limit, and moves the buffer to the heap.
@yadij, please let me know whether you would backport commit a03343c into v5 (or accept a backport). A year of silence may imply "no", but it could also imply that you just did not see the question or forgot about it... Let's close this question, one way or another. |
The function logfilePrintf() currently limits a single line
to no more than 8K characters. It allocates an 8K buffer
on the stack, which is not safe. This PR eliminates
the 8K limit, and moves the buffer to the heap.