Remove 8K limit for single access.log line #332

redmeadowman · 2018-11-14T17:13:43Z

The function logfilePrintf() currently limits a single line
to no more than 8K characters. It allocates an 8K buffer
on the stack, which is not safe. This PR eliminates
the 8K limit, and moves the buffer to the heap.

squid-prbot · 2018-11-14T17:13:45Z

Can one of the admins verify this patch?

kinkie · 2018-11-14T21:53:58Z

ok to test

rousskov

This change needs a lot of work, including a decision on what to do on overflows. See inlined comments for a partial review, including a list of overflow handling options to consider.

rousskov · 2018-11-14T23:33:32Z

src/log/File.cc

+    char* buf = NULL;
+
+    while (true) {
+        buf = new char[bufsize];


We must not allocate and delete memory on every request unless such expensive operations are necessary. This allocation is not necessary AFAICT. For example, we could start with a pre-allocated 16KB on-stack or static buffer and switch to a larger buffer only when needed.

Buffers allocated on stacks are the #1 cause of vulnerabilities in software. It is much safer to allocate memory in the heap. Also, 16K is an excessive amount of memory to allocate on the stack (the former amount was 8K, which is also excessive for the stack). Allocating a static buffer would be fine, but then the function is no longer thread-safe, unless the memory can be allocated with the C++11 thread_local directive.

AFAIK, this function does not need to be thread safe. In fact, virtually no Squid function needs to be thread safe except a very small set of specialized functions dedicated to AUFS code (src/DiskIO/DiskThreads/). Outside of threaded AUFS code, virtually any "I will do X because of thread safety" argument is invalid. This logging code is outside of threaded AUFS code.

And if you convert this code to use SBuf, then we do not need to argue whether allocating X KB on stack is "safe" -- SBuf does not use stack for its storage.

src/log/File.cc

rousskov · 2018-11-14T23:38:15Z

src/log/File.cc

+            break;
+        }
+
+        // Try again with a bigger buffer


Please find a way to avoid duplicating SBuf::vappendf() code. For example, it may be possible to reuse a static SBuf buf object in this function instead of all this C-like code.

This change request has not been addressed in 35f3acd. There are other problems in that commit as well, but I am going to ignore them until this change request is addressed -- most of the other problems should disappear with this change.

rousskov · 2018-11-14T23:59:08Z

src/log/File.cc

-        if (fmt[strlen(fmt) - 1] == '\n')
-            buf[8191] = '\n';
+    if (!buf) {
+        logfileWrite(lf, (char*)"-", 1);


@squid-cache/commiters, besides warning the admin via cache.log, what should we do if an access log line grows insanely big and, hence, should not be allocated and logged "as is"? AFAICT, our options include:

Log a single dash or [...] to allow automated detection/handling but providing no access.log information to track the problem.

Log what we can, possibly misleading automation tools with a truncated field value (e.g., 12 instead of 1234).

Log what we can but append [...] or some other line truncation indicator to allow automated detection/handling.

Split the line into multiple lines, breaking "one transaction, one line" rule.

Truncating largest field(s) as necessary, using option 1, 2, or 3 above for each truncated field.

Allocate until we run out of RAM. See the comment below.

All of the above options except options 5 and 6 break configured logformat.

AFAICT, Squid currently uses option 2. If we cannot agree how to improve/fix the current behavior, this PR should probably preserve it.

I like option 6 the most. What do you think?

Needless to say, we can adopt the current PR approach of having no limit at all -- allocate more RAM until Squid is killed by the OS.

This option 6 may actually be the best option because it is simple and OOM deaths caused by this code are rather unlikely. Please note that Squid already keeps pieces of what it wants to log in RAM.

src/log/File.cc

rousskov · 2018-11-15T00:02:37Z

src/log/File.cc

-    if (s > 8192) {
-        s = 8192;
+    // bufsize will be large enough for most log lines
+    int bufsize = 16 * 1024;


Please use size_t for sizes.

src/log/File.cc

yadij · 2018-11-15T04:22:26Z

On the architectural level please be aware that this 8KB buffer is not the only limit imposed on log lines.

The UDP module is limited to jumbogram packet size (usually 512, 4KB, 32KB or 64KB). Anything larger than the local network accepts is silently dropped.
The syslog module is limited to the jumbogram packet size less the syslog timestamp+header bytes. Anything larger is either truncated as an opaque string (ksyslog, syslogng), or automatically split into multi-line (rsyslog, systemd).
The daemon module is limited to the helper I/O protocol buffer in the receiving helper. Anything longer is handled by the specific daemon implementation;
- Squid has a hard limit of internal buffer to LOGFILE_BUFSZ*LOGFILE_MAXBUFS (currently 32K * 128 respectively). Anything longer will be dropped with a warning.
- our bundled daemon accepts up to LOGFILE_BUF_LEN (currently 64K). Anything longer varies in behaviour depending on what bytes exist at the 64KB boundary - usually appending to access.log but may also trigger unexpected daemon behaviour if wrapped lines are misinterpreted as different log entries with other daemon action codes.
- other daemons we can only expect and rely on HELPER_INPUT_BUFFER (currently 8KB) being accepted.

IMO this magic 8196 number should just be replaced with LOGFILE_BUFSZ which can be made replaceable at build time with -DLOGFILE_BUFSZ when it is moved to log/File.h. The Log:: objects are overdue for a full refactoring to use C++ instead of C virtual functions - which seems out of scope here.

rousskov · 2018-11-15T17:31:28Z

On the architectural level please be aware that this 8KB buffer is not the only limit imposed on log lines.

AFAICT, it is the only 8KB limit in some environments, and even if it is never the only limit, the PR intent to increase that limit is still valid. You have not said that it is invalid, but I want to explicitly clarify that nothing in your comment should be misinterpreted as PR intent invalidation.

BTW, your summary is very useful for many purposes! Please consider copying it to some wiki page.

IMO this magic 8196 number should just be replaced with LOGFILE_BUFSZ which can be made replaceable at build time with -DLOGFILE_BUFSZ when it is moved to log/File.h

I do not think a LOGFILE_BUFSZ macro is a good solution here because 8KB is too small for most production deployments these days while significantly larger defaults become increasingly inappropriate for stack allocation. Having unlimited buffer is probably the best solution here, and that eliminates the need for defining LOGFILE_BUFSZ.

rousskov · 2018-11-15T17:35:52Z

The Log:: objects are overdue for a full refactoring to use C++ instead of C virtual functions - which seems out of scope here.

Agreed on both counts. That refactoring should be done, but in a separate independent PR.

redmeadowman · 2018-11-15T18:04:06Z

Completed requested modifications and ran tests.

redmeadowman · 2018-11-15T18:27:53Z

I've removed the thread_local declaration from logfilePrintf().

redmeadowman · 2018-11-15T18:29:20Z

LOGFILE_BUFSZ is currently defined as 32K, which should be more than adequate for most lines.

rousskov · 2018-11-15T18:36:38Z

LOGFILE_BUFSZ is currently defined as 32K, which should be more than adequate for most lines.

Agreed. In case you are implying the opposite, increasing initial buffer size does not address my change request regarding code duplication. I am not sure why you are ignoring it (and a few other change requests), but I am happy to answer any questions about them.

Please go back to my review and make sure that change requests not already marked as "resolved" are addressed, starting with the code duplication request (which starts with Please find a way to avoid duplicating SBuf::vappendf()). Please note that change requests marked as "outdated" by GitHub may still need to be addressed -- the essence of the problem may remain even if the code the change request was attached to has changed.

redmeadowman · 2018-11-15T18:44:10Z

Silly question, why did Squid create SBuf instead of using std::string?

redmeadowman · 2018-11-15T19:58:43Z

Greatly simplified code using SBuf.

rousskov · 2018-11-15T21:07:21Z

Silly question, why did Squid create SBuf instead of using std::string?

In short, std::string is a bad choice for frequently shared, parsed, and peeked at raw buffers that Squid needs for shoveling I/O results around and similar performance-sensitive tasks. The general-purpose class was not designed for that kind of performance work. For example, modern std::string cannot be safely shared without copying. Also, std::string has some scary and slow locale-dependent code (at least it was present in older std::string implementations).

rousskov

Thank you for addressing all of my earlier change requests! AFAICT, the PR is still very buggy, but

You are not responsible for the remaining bugs.
You are on the right track overall, with only one high-level issue remaining in your current code.

Please see inlined change requests for details.

src/log/File.cc

rousskov · 2018-11-15T22:14:14Z

src/log/File.cc

-    logfileWrite(lf, buf, (size_t) s);
+    static SBuf sbuf;
+    sbuf.clear();
+    sbuf.vappendf(fmt, args);


Other bugs notwithstanding, the maximum line size is now limited to 256 MB (SBuf::maxSize or 268435455 bytes to be precise). That is more than enough, of course, but I wonder whether we should add code to handle overflows. Without such code, an unlikely huge access.log line will either kill Squid or not appear in the access.log at all. Thus, we are back to the six overflow handling options discussed earlier, except that the simplest option 6 is off the table. Since we no longer deal with raw formatting code here, I suggest working around this problem instead of solving it directly:

Do not handle overflows in logfilePrintf(). Just check that if SBuf throws, there will some ERROR message about that in cache.log. To check, temporary hack logfilePrintf() code to add Must(false) and see what happens -- do that before moving on to step2 below.

Change Log::Format::SquidCustom() to print the already formatted(!!) line using logfileWrite() instead of abusing logfilePrintf(). You may need to refactor a little so that Format::assemble() appends a new line when needed.

The above approach is based on the assumption that the most likely (if not the only source) of logfilePrintf() overflows is custom log formats. And if we want to improve overflow handling there, we have to improve Format::assemble(), where overflow handling approaches like the nice option 5 posted earlier are much easier to implement. We do not have to do those improvements in this PR though -- the above two steps should be enough for now as far as overflow handling is concerned.

The above approach is based on the assumption that the most likely (if not the only source) of logfilePrintf() overflows is custom log formats.

Alex, that assumption is wrong. Anything that logs URL/URI and/or header field-values can produce very long lines. That means all log formats built into Squid are possible overflows. Format::assemble is also used in all the non-log places where logformat macros are supported (error pages, deny_ino URLs, external ACL helper I/O) - so care needs to be taken not to break any of those by forcing implicit line-wrapping into its output.

Alex: the most likely source of logfilePrintf() overflows is custom log formats

Amos: that assumption is wrong. [...] all log formats built into Squid are possible overflows

Your statement does not contradict my assumption AFAICT, but, more importantly, I believe that the already implemented PR changes invalidate your statement: Some logfilePrintf() callers cannot overflow at all (even before the PR), and only custom log format has any chance of producing log lines that exceed 268435455 characters (the post-PR limit).

care needs to be taken not to break any of those by forcing implicit line-wrapping into Format::assemble() output

Yes, that is why I said "appends a new line when needed". One way to handle this without adding parameters/options to assemle() could be to append a newline to the configured logformat string.

This change request is still unresolved. The two-step plan I have suggested still applies AFAICT.

I am out of time on this project. I will not implement the suggested change.

I will not implement the suggested change.

Thank you for sharing your decision. I believe the requested change should be implemented before this PR is approved because it is not difficult to implement, will address a potentially important in-scope problem, and even provide a significant performance improvement. Hopefully, somebody else will take over and finish this work.

However, if other @squid-cache/commiters disagree, they may be able to approve this PR (after a thorough from-scratch review) without this change because they may decide that the PR is a step forward, while that known problem is unlikely to surface in real deployments.

src/log/File.cc

rousskov · 2018-11-27T00:13:52Z

Reading back through the comments above it is not clear to me what changes have been requested for this pull request.

I went through all my change requests and marked the addressed ones (and the ones that do not need to be addressed) as resolved. AFAICT, there are two change requests remaining. I added a "not addressed" comment to each of them. One is optional/trivial. The other one requires a little bit of work.

Rant: GitHub does not make it very easy to find what needs to be done, but if you use the web interface and start from the actual review post (rather than notification emails or some random change requests), you can usually find all the change requests that may need your attention. This can be time consuming, especially when there are outdated blobs and when reviewers do not mark addressed change requests promptly, but that is the only way I know that works. You can always ask about a particular change request status if you are not sure, of course.

Please avoid using individual change requests for posting general PR questions or comments (like this one). Each change request should be focused on that specific change.

…nto log-line-limit

rousskov

I am removing my "changes requested" mark so that I am not blocking progress with this PR. Please see the last remaining change request for details.

rousskov · 2018-11-27T21:40:51Z

src/log/File.cc

-    logfileWrite(lf, buf, (size_t) s);
+    static SBuf sbuf;
+    sbuf.clear();
+    sbuf.vappendf(fmt, args);


I will not implement the suggested change.

Thank you for sharing your decision. I believe the requested change should be implemented before this PR is approved because it is not difficult to implement, will address a potentially important in-scope problem, and even provide a significant performance improvement. Hopefully, somebody else will take over and finish this work.

However, if other @squid-cache/commiters disagree, they may be able to approve this PR (after a thorough from-scratch review) without this change because they may decide that the PR is a step forward, while that known problem is unlikely to surface in real deployments.

Allow others to move this PR forward as discussed elsewhere

Log lines with values causing it to exceed SBuf::maxSize will overflow and trigger an exception. Currently that is expected to be caught in the main loop handler and shutdown Squid. Since this is a transaction specific problem it could be better handled as a transaction specific exception and let other client connections and the proxy continue unaffected. TODO rather than XXX because the large SBuf size limit is hard to encounter and malicious attempts to trigger it remotely are prohibited by the HTTP message size limits.

yadij

Added a code TODO to document Alex's outstanding issue for a future dev to resolve.

This PR now passes the 10+ day criteria and since we have v6 as main already this can go in now.

rousskov · 2020-02-17T17:09:17Z

@yadij, do you want to link the new bug 5026 from this PR title before it is merged?

yadij · 2020-02-17T21:24:18Z

If it actually fixes that bug, yes. But AFAIK once Anubis starts triggering its build commit it is too late, the commit already being tested with the existing title will either be merged or fail.

yadij · 2020-02-18T00:45:22Z

It looks like something is going wrong with the pr-auto triggers. The status says "build triggered for merge commit", but Jenkins has no queued build for this PR.

rousskov · 2020-02-18T05:20:07Z

It looks like something is going wrong with the pr-auto triggers. The status says "build triggered for merge commit", but Jenkins has no queued build for this PR.

I was worried about the lack of Jenkins results as well. @eduard-bagdasaryan, could you please take a look? I do not know much about Jenkins setup...

rousskov · 2020-02-18T05:27:33Z

If it actually fixes that bug, yes.

Based on the bug and PR descriptions, I assume it does, but I do not know for sure. The reporter has not updated the bug report yet(?) with this information (but that could also be because this PR does not apply cleanly to v4).

AFAIK once Anubis starts triggering its build commit it is too late, the commit already being tested with the existing title will either be merged or fail.

It should not be too late until the tested commit is merged into master. Anubis should notice the title change and regenerate the merge commit, which should restart testing (IIRC).

AFAICT, as of now, Anubis has not even created the merge commit (on the auto branch) because Anubis waits for Jenkins test results (of the GitHub-generated commit).

eduard-bagdasaryan · 2020-02-18T08:29:20Z

It looks like something is going wrong with the pr-auto triggers.

It looks like Jenkins was down at the time of fc2501d, thus missing the notification. The easiest way to re-trigger it is probably to re-send the corresponding webhook push event (the list of notifications should be available on Webhooks GitHub page). I cannot do it myself due to permission restrictions.

rousskov · 2020-02-18T14:25:33Z

The easiest way to re-trigger it is probably to re-send the corresponding webhook push event

Hmm.. I do not see a push event that corresponds to fc2501d dated 2020-02-12. The latest events are 2020-02-08 and 2020-02-18. Neither event is about this PR AFAICT. I am going to commit a non-change in hope to trigger a new Jenkins test.

The function logfilePrintf() currently limits a single line to no more than 8K characters. It allocates an 8K buffer on the stack, which is not safe. This PR eliminates the 8K limit, and moves the buffer to the heap.

rousskov · 2021-06-02T18:15:57Z

@yadij, please consider backporting this fix (i.e. commit a03343c) to v5 which suffers from the same logline truncation problem. I have heard that a similar fix works for v5. Please let me know if you want Factory to confirm and post a backporting pull request to save you time.

rousskov · 2022-08-18T18:23:20Z

@yadij, please consider backporting this fix (i.e. commit a03343c) to v5 which suffers from the same logline truncation problem. I have heard that a similar fix works for v5. Please let me know if you want Factory to confirm and post a backporting pull request to save you time.

@yadij, please let me know whether you would backport commit a03343c into v5 (or accept a backport). A year of silence may imply "no", but it could also imply that you just did not see the question or forgot about it... Let's close this question, one way or another.

Removed 8K limit for single log line.

045439f

rousskov requested changes Nov 15, 2018

View reviewed changes

rousskov changed the title ~~Remove 8K limit for single log line~~ Remove 8K limit for single access.log line Nov 15, 2018

Changes as requested for logfilePrintf(). Moved LOGFILE_BUFSZ to File.h.

35f3acd

Removed thread_local declaration from logfilePrintf().

8702c85

Reverted move of LOGFILE_BUFSZ. Simplified code in logfilePrintf().

eb12621

Sync prototype of logfileWrite() with definition.

5f3a3eb

rousskov previously requested changes Nov 15, 2018

View reviewed changes

Merge branch 'master' into log-line-limit

3b64034

dnevil and others added 3 commits November 27, 2018 21:17

Minor change to include ordering.

8bc33c1

Merge branch 'log-line-limit' of https://github.com/sisic-inc/squid i…

35a46cc

…nto log-line-limit

Merge branch 'master' into log-line-limit

f8c25ef

rousskov reviewed Nov 27, 2018

View reviewed changes

rousskov added the S-waiting-for-more-reviewers needs a reviewer and/or a second opinion label Nov 27, 2018

squid-anubis added M-failed-other https://github.com/measurement-factory/anubis#pull-request-labels and removed M-failed-other https://github.com/measurement-factory/anubis#pull-request-labels labels May 6, 2019

squid-anubis added M-failed-other https://github.com/measurement-factory/anubis#pull-request-labels and removed M-failed-other https://github.com/measurement-factory/anubis#pull-request-labels labels Jul 18, 2019

squid-anubis added M-failed-other https://github.com/measurement-factory/anubis#pull-request-labels and removed M-failed-other https://github.com/measurement-factory/anubis#pull-request-labels labels Sep 5, 2019

squid-anubis added M-failed-other https://github.com/measurement-factory/anubis#pull-request-labels and removed M-failed-other https://github.com/measurement-factory/anubis#pull-request-labels labels Sep 27, 2019

yadij added the feature maintainer needs documentation updates for merge label Dec 23, 2019

yadij approved these changes Feb 11, 2020

View reviewed changes

yadij added M-cleared-for-merge https://github.com/measurement-factory/anubis#pull-request-labels and removed S-waiting-for-more-reviewers needs a reviewer and/or a second opinion labels Feb 11, 2020

Merge branch 'master' into log-line-limit

fc2501d

Merge branch 'master' into log-line-limit

0bdd5a7

squid-anubis added the M-waiting-staging-checks https://github.com/measurement-factory/anubis#pull-request-labels label Feb 18, 2020

squid-anubis closed this Feb 18, 2020

Remove 8K limit for single access.log line #332

Remove 8K limit for single access.log line #332

Conversation

redmeadowman commented Nov 14, 2018

squid-prbot commented Nov 14, 2018

kinkie commented Nov 14, 2018

rousskov left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rousskov Nov 14, 2018 • edited

Choose a reason for hiding this comment

rousskov Nov 15, 2018 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

yadij commented Nov 15, 2018

rousskov commented Nov 15, 2018

rousskov commented Nov 15, 2018

redmeadowman commented Nov 15, 2018

redmeadowman commented Nov 15, 2018

redmeadowman commented Nov 15, 2018

rousskov commented Nov 15, 2018

redmeadowman commented Nov 15, 2018

redmeadowman commented Nov 15, 2018

rousskov commented Nov 15, 2018

rousskov left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rousskov commented Nov 27, 2018

rousskov left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

yadij left a comment

Choose a reason for hiding this comment

rousskov commented Feb 17, 2020

yadij commented Feb 17, 2020 • edited

yadij commented Feb 18, 2020

rousskov commented Feb 18, 2020

rousskov commented Feb 18, 2020

eduard-bagdasaryan commented Feb 18, 2020

rousskov commented Feb 18, 2020

rousskov commented Jun 2, 2021

rousskov commented Aug 18, 2022

rousskov Nov 14, 2018 •

edited

rousskov Nov 15, 2018 •

edited

yadij commented Feb 17, 2020 •

edited