Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove 8K limit for single access.log line #332

Closed
wants to merge 12 commits into from

Conversation

redmeadowman
Copy link
Contributor

The function logfilePrintf() currently limits a single line
to no more than 8K characters. It allocates an 8K buffer
on the stack, which is not safe. This PR eliminates
the 8K limit, and moves the buffer to the heap.

@squid-prbot
Copy link
Collaborator

Can one of the admins verify this patch?

@kinkie
Copy link
Contributor

kinkie commented Nov 14, 2018

ok to test

Copy link
Contributor

@rousskov rousskov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change needs a lot of work, including a decision on what to do on overflows. See inlined comments for a partial review, including a list of overflow handling options to consider.

src/log/File.cc Outdated
char* buf = NULL;

while (true) {
buf = new char[bufsize];
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We must not allocate and delete memory on every request unless such expensive operations are necessary. This allocation is not necessary AFAICT. For example, we could start with a pre-allocated 16KB on-stack or static buffer and switch to a larger buffer only when needed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Buffers allocated on stacks are the #1 cause of vulnerabilities in software. It is much safer to allocate memory in the heap. Also, 16K is an excessive amount of memory to allocate on the stack (the former amount was 8K, which is also excessive for the stack). Allocating a static buffer would be fine, but then the function is no longer thread-safe, unless the memory can be allocated with the C++11 thread_local directive.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AFAIK, this function does not need to be thread safe. In fact, virtually no Squid function needs to be thread safe except a very small set of specialized functions dedicated to AUFS code (src/DiskIO/DiskThreads/). Outside of threaded AUFS code, virtually any "I will do X because of thread safety" argument is invalid. This logging code is outside of threaded AUFS code.

And if you convert this code to use SBuf, then we do not need to argue whether allocating X KB on stack is "safe" -- SBuf does not use stack for its storage.

src/log/File.cc Outdated Show resolved Hide resolved
src/log/File.cc Outdated
break;
}

// Try again with a bigger buffer
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please find a way to avoid duplicating SBuf::vappendf() code. For example, it may be possible to reuse a static SBuf buf object in this function instead of all this C-like code.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change request has not been addressed in 35f3acd. There are other problems in that commit as well, but I am going to ignore them until this change request is addressed -- most of the other problems should disappear with this change.

src/log/File.cc Outdated
if (fmt[strlen(fmt) - 1] == '\n')
buf[8191] = '\n';
if (!buf) {
logfileWrite(lf, (char*)"-", 1);
Copy link
Contributor

@rousskov rousskov Nov 14, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@squid-cache/commiters, besides warning the admin via cache.log, what should we do if an access log line grows insanely big and, hence, should not be allocated and logged "as is"? AFAICT, our options include:

  1. Log a single dash or [...] to allow automated detection/handling but providing no access.log information to track the problem.
  2. Log what we can, possibly misleading automation tools with a truncated field value (e.g., 12 instead of 1234).
  3. Log what we can but append [...] or some other line truncation indicator to allow automated detection/handling.
  4. Split the line into multiple lines, breaking "one transaction, one line" rule.
  5. Truncating largest field(s) as necessary, using option 1, 2, or 3 above for each truncated field.
  6. Allocate until we run out of RAM. See the comment below.

All of the above options except options 5 and 6 break configured logformat.

AFAICT, Squid currently uses option 2. If we cannot agree how to improve/fix the current behavior, this PR should probably preserve it.

I like option 6 the most. What do you think?

Copy link
Contributor

@rousskov rousskov Nov 15, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Needless to say, we can adopt the current PR approach of having no limit at all -- allocate more RAM until Squid is killed by the OS.

This option 6 may actually be the best option because it is simple and OOM deaths caused by this code are rather unlikely. Please note that Squid already keeps pieces of what it wants to log in RAM.

src/log/File.cc Outdated Show resolved Hide resolved
src/log/File.cc Outdated Show resolved Hide resolved
src/log/File.cc Outdated
if (s > 8192) {
s = 8192;
// bufsize will be large enough for most log lines
int bufsize = 16 * 1024;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please use size_t for sizes.

src/log/File.cc Outdated Show resolved Hide resolved
@rousskov rousskov changed the title Remove 8K limit for single log line Remove 8K limit for single access.log line Nov 15, 2018
@yadij
Copy link
Contributor

yadij commented Nov 15, 2018

On the architectural level please be aware that this 8KB buffer is not the only limit imposed on log lines.

  • The UDP module is limited to jumbogram packet size (usually 512, 4KB, 32KB or 64KB). Anything larger than the local network accepts is silently dropped.
  • The syslog module is limited to the jumbogram packet size less the syslog timestamp+header bytes. Anything larger is either truncated as an opaque string (ksyslog, syslogng), or automatically split into multi-line (rsyslog, systemd).
  • The daemon module is limited to the helper I/O protocol buffer in the receiving helper. Anything longer is handled by the specific daemon implementation;
    • Squid has a hard limit of internal buffer to LOGFILE_BUFSZ*LOGFILE_MAXBUFS (currently 32K * 128 respectively). Anything longer will be dropped with a warning.
    • our bundled daemon accepts up to LOGFILE_BUF_LEN (currently 64K). Anything longer varies in behaviour depending on what bytes exist at the 64KB boundary - usually appending to access.log but may also trigger unexpected daemon behaviour if wrapped lines are misinterpreted as different log entries with other daemon action codes.
    • other daemons we can only expect and rely on HELPER_INPUT_BUFFER (currently 8KB) being accepted.

IMO this magic 8196 number should just be replaced with LOGFILE_BUFSZ which can be made replaceable at build time with -DLOGFILE_BUFSZ when it is moved to log/File.h. The Log:: objects are overdue for a full refactoring to use C++ instead of C virtual functions - which seems out of scope here.

@rousskov
Copy link
Contributor

On the architectural level please be aware that this 8KB buffer is not the only limit imposed on log lines.

AFAICT, it is the only 8KB limit in some environments, and even if it is never the only limit, the PR intent to increase that limit is still valid. You have not said that it is invalid, but I want to explicitly clarify that nothing in your comment should be misinterpreted as PR intent invalidation.

BTW, your summary is very useful for many purposes! Please consider copying it to some wiki page.

IMO this magic 8196 number should just be replaced with LOGFILE_BUFSZ which can be made replaceable at build time with -DLOGFILE_BUFSZ when it is moved to log/File.h

I do not think a LOGFILE_BUFSZ macro is a good solution here because 8KB is too small for most production deployments these days while significantly larger defaults become increasingly inappropriate for stack allocation. Having unlimited buffer is probably the best solution here, and that eliminates the need for defining LOGFILE_BUFSZ.

@rousskov
Copy link
Contributor

The Log:: objects are overdue for a full refactoring to use C++ instead of C virtual functions - which seems out of scope here.

Agreed on both counts. That refactoring should be done, but in a separate independent PR.

@redmeadowman
Copy link
Contributor Author

Completed requested modifications and ran tests.

@redmeadowman
Copy link
Contributor Author

I've removed the thread_local declaration from logfilePrintf().

@redmeadowman
Copy link
Contributor Author

LOGFILE_BUFSZ is currently defined as 32K, which should be more than adequate for most lines.

@rousskov
Copy link
Contributor

LOGFILE_BUFSZ is currently defined as 32K, which should be more than adequate for most lines.

Agreed. In case you are implying the opposite, increasing initial buffer size does not address my change request regarding code duplication. I am not sure why you are ignoring it (and a few other change requests), but I am happy to answer any questions about them.

Please go back to my review and make sure that change requests not already marked as "resolved" are addressed, starting with the code duplication request (which starts with Please find a way to avoid duplicating SBuf::vappendf()). Please note that change requests marked as "outdated" by GitHub may still need to be addressed -- the essence of the problem may remain even if the code the change request was attached to has changed.

@redmeadowman
Copy link
Contributor Author

Silly question, why did Squid create SBuf instead of using std::string?

@redmeadowman
Copy link
Contributor Author

Greatly simplified code using SBuf.

@rousskov
Copy link
Contributor

Silly question, why did Squid create SBuf instead of using std::string?

In short, std::string is a bad choice for frequently shared, parsed, and peeked at raw buffers that Squid needs for shoveling I/O results around and similar performance-sensitive tasks. The general-purpose class was not designed for that kind of performance work. For example, modern std::string cannot be safely shared without copying. Also, std::string has some scary and slow locale-dependent code (at least it was present in older std::string implementations).

Copy link
Contributor

@rousskov rousskov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for addressing all of my earlier change requests! AFAICT, the PR is still very buggy, but

  1. You are not responsible for the remaining bugs.
  2. You are on the right track overall, with only one high-level issue remaining in your current code.

Please see inlined change requests for details.

src/log/File.cc Show resolved Hide resolved
src/log/File.cc Outdated Show resolved Hide resolved
src/log/File.cc Outdated
logfileWrite(lf, buf, (size_t) s);
static SBuf sbuf;
sbuf.clear();
sbuf.vappendf(fmt, args);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Other bugs notwithstanding, the maximum line size is now limited to 256 MB (SBuf::maxSize or 268435455 bytes to be precise). That is more than enough, of course, but I wonder whether we should add code to handle overflows. Without such code, an unlikely huge access.log line will either kill Squid or not appear in the access.log at all. Thus, we are back to the six overflow handling options discussed earlier, except that the simplest option 6 is off the table. Since we no longer deal with raw formatting code here, I suggest working around this problem instead of solving it directly:

  1. Do not handle overflows in logfilePrintf(). Just check that if SBuf throws, there will some ERROR message about that in cache.log. To check, temporary hack logfilePrintf() code to add Must(false) and see what happens -- do that before moving on to step2 below.
  2. Change Log::Format::SquidCustom() to print the already formatted(!!) line using logfileWrite() instead of abusing logfilePrintf(). You may need to refactor a little so that Format::assemble() appends a new line when needed.

The above approach is based on the assumption that the most likely (if not the only source) of logfilePrintf() overflows is custom log formats. And if we want to improve overflow handling there, we have to improve Format::assemble(), where overflow handling approaches like the nice option 5 posted earlier are much easier to implement. We do not have to do those improvements in this PR though -- the above two steps should be enough for now as far as overflow handling is concerned.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The above approach is based on the assumption that the most likely (if not the only source) of logfilePrintf() overflows is custom log formats.

Alex, that assumption is wrong. Anything that logs URL/URI and/or header field-values can produce very long lines. That means all log formats built into Squid are possible overflows. Format::assemble is also used in all the non-log places where logformat macros are supported (error pages, deny_ino URLs, external ACL helper I/O) - so care needs to be taken not to break any of those by forcing implicit line-wrapping into its output.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alex: the most likely source of logfilePrintf() overflows is custom log formats

Amos: that assumption is wrong. [...] all log formats built into Squid are possible overflows

Your statement does not contradict my assumption AFAICT, but, more importantly, I believe that the already implemented PR changes invalidate your statement: Some logfilePrintf() callers cannot overflow at all (even before the PR), and only custom log format has any chance of producing log lines that exceed 268435455 characters (the post-PR limit).

care needs to be taken not to break any of those by forcing implicit line-wrapping into Format::assemble() output

Yes, that is why I said "appends a new line when needed". One way to handle this without adding parameters/options to assemle() could be to append a newline to the configured logformat string.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change request is still unresolved. The two-step plan I have suggested still applies AFAICT.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am out of time on this project. I will not implement the suggested change.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will not implement the suggested change.

Thank you for sharing your decision. I believe the requested change should be implemented before this PR is approved because it is not difficult to implement, will address a potentially important in-scope problem, and even provide a significant performance improvement. Hopefully, somebody else will take over and finish this work.

However, if other @squid-cache/commiters disagree, they may be able to approve this PR (after a thorough from-scratch review) without this change because they may decide that the PR is a step forward, while that known problem is unlikely to surface in real deployments.

src/log/File.cc Outdated Show resolved Hide resolved
@rousskov
Copy link
Contributor

Reading back through the comments above it is not clear to me what changes have been requested for this pull request.

I went through all my change requests and marked the addressed ones (and the ones that do not need to be addressed) as resolved. AFAICT, there are two change requests remaining. I added a "not addressed" comment to each of them. One is optional/trivial. The other one requires a little bit of work.

Rant: GitHub does not make it very easy to find what needs to be done, but if you use the web interface and start from the actual review post (rather than notification emails or some random change requests), you can usually find all the change requests that may need your attention. This can be time consuming, especially when there are outdated blobs and when reviewers do not mark addressed change requests promptly, but that is the only way I know that works. You can always ask about a particular change request status if you are not sure, of course.

Please avoid using individual change requests for posting general PR questions or comments (like this one). Each change request should be focused on that specific change.

Copy link
Contributor

@rousskov rousskov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am removing my "changes requested" mark so that I am not blocking progress with this PR. Please see the last remaining change request for details.

src/log/File.cc Outdated
logfileWrite(lf, buf, (size_t) s);
static SBuf sbuf;
sbuf.clear();
sbuf.vappendf(fmt, args);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will not implement the suggested change.

Thank you for sharing your decision. I believe the requested change should be implemented before this PR is approved because it is not difficult to implement, will address a potentially important in-scope problem, and even provide a significant performance improvement. Hopefully, somebody else will take over and finish this work.

However, if other @squid-cache/commiters disagree, they may be able to approve this PR (after a thorough from-scratch review) without this change because they may decide that the PR is a step forward, while that known problem is unlikely to surface in real deployments.

@rousskov rousskov dismissed their stale review November 27, 2018 21:47

Allow others to move this PR forward as discussed elsewhere

@rousskov rousskov added the S-waiting-for-more-reviewers needs a reviewer and/or a second opinion label Nov 27, 2018
@squid-anubis squid-anubis added M-failed-other https://github.com/measurement-factory/anubis#pull-request-labels and removed M-failed-other https://github.com/measurement-factory/anubis#pull-request-labels labels May 6, 2019
@squid-anubis squid-anubis added M-failed-other https://github.com/measurement-factory/anubis#pull-request-labels and removed M-failed-other https://github.com/measurement-factory/anubis#pull-request-labels labels Jul 18, 2019
@squid-anubis squid-anubis added M-failed-other https://github.com/measurement-factory/anubis#pull-request-labels and removed M-failed-other https://github.com/measurement-factory/anubis#pull-request-labels labels Sep 5, 2019
@squid-anubis squid-anubis added M-failed-other https://github.com/measurement-factory/anubis#pull-request-labels and removed M-failed-other https://github.com/measurement-factory/anubis#pull-request-labels labels Sep 27, 2019
@yadij yadij added the feature maintainer needs documentation updates for merge label Dec 23, 2019
Log lines with values causing it to exceed SBuf::maxSize will overflow and trigger an exception.

Currently that is expected to be caught in the main loop handler and shutdown Squid.
Since this is a transaction specific problem it could be better handled as a transaction
specific exception and let other client connections and the proxy continue unaffected.

TODO rather than XXX because the large SBuf size limit is hard to encounter and
malicious attempts to trigger it remotely are prohibited by the HTTP message size limits.
Copy link
Contributor

@yadij yadij left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a code TODO to document Alex's outstanding issue for a future dev to resolve.

This PR now passes the 10+ day criteria and since we have v6 as main already this can go in now.

@yadij yadij added M-cleared-for-merge https://github.com/measurement-factory/anubis#pull-request-labels and removed S-waiting-for-more-reviewers needs a reviewer and/or a second opinion labels Feb 11, 2020
@rousskov
Copy link
Contributor

@yadij, do you want to link the new bug 5026 from this PR title before it is merged?

@yadij
Copy link
Contributor

yadij commented Feb 17, 2020

If it actually fixes that bug, yes. But AFAIK once Anubis starts triggering its build commit it is too late, the commit already being tested with the existing title will either be merged or fail.

@yadij
Copy link
Contributor

yadij commented Feb 18, 2020

It looks like something is going wrong with the pr-auto triggers. The status says "build triggered for merge commit", but Jenkins has no queued build for this PR.

@rousskov
Copy link
Contributor

It looks like something is going wrong with the pr-auto triggers. The status says "build triggered for merge commit", but Jenkins has no queued build for this PR.

I was worried about the lack of Jenkins results as well. @eduard-bagdasaryan, could you please take a look? I do not know much about Jenkins setup...

@rousskov
Copy link
Contributor

If it actually fixes that bug, yes.

Based on the bug and PR descriptions, I assume it does, but I do not know for sure. The reporter has not updated the bug report yet(?) with this information (but that could also be because this PR does not apply cleanly to v4).

AFAIK once Anubis starts triggering its build commit it is too late, the commit already being tested with the existing title will either be merged or fail.

It should not be too late until the tested commit is merged into master. Anubis should notice the title change and regenerate the merge commit, which should restart testing (IIRC).

AFAICT, as of now, Anubis has not even created the merge commit (on the auto branch) because Anubis waits for Jenkins test results (of the GitHub-generated commit).

@eduard-bagdasaryan
Copy link
Contributor

It looks like something is going wrong with the pr-auto triggers.

It looks like Jenkins was down at the time of fc2501d, thus missing the notification. The easiest way to re-trigger it is probably to re-send the corresponding webhook push event (the list of notifications should be available on Webhooks GitHub page). I cannot do it myself due to permission restrictions.

@rousskov
Copy link
Contributor

The easiest way to re-trigger it is probably to re-send the corresponding webhook push event

Hmm.. I do not see a push event that corresponds to fc2501d dated 2020-02-12. The latest events are 2020-02-08 and 2020-02-18. Neither event is about this PR AFAICT. I am going to commit a non-change in hope to trigger a new Jenkins test.

squid-anubis pushed a commit that referenced this pull request Feb 18, 2020
The function logfilePrintf() currently limits a single line
to no more than 8K characters.  It allocates an 8K buffer
on the stack, which is not safe. This PR eliminates
the 8K limit, and moves the buffer to the heap.
@squid-anubis squid-anubis added the M-waiting-staging-checks https://github.com/measurement-factory/anubis#pull-request-labels label Feb 18, 2020
@squid-anubis squid-anubis added M-merged https://github.com/measurement-factory/anubis#pull-request-labels and removed M-cleared-for-merge https://github.com/measurement-factory/anubis#pull-request-labels M-waiting-staging-checks https://github.com/measurement-factory/anubis#pull-request-labels labels Feb 18, 2020
@rousskov
Copy link
Contributor

rousskov commented Jun 2, 2021

@yadij, please consider backporting this fix (i.e. commit a03343c) to v5 which suffers from the same logline truncation problem. I have heard that a similar fix works for v5. Please let me know if you want Factory to confirm and post a backporting pull request to save you time.

@rousskov
Copy link
Contributor

@yadij, please consider backporting this fix (i.e. commit a03343c) to v5 which suffers from the same logline truncation problem. I have heard that a similar fix works for v5. Please let me know if you want Factory to confirm and post a backporting pull request to save you time.

@yadij, please let me know whether you would backport commit a03343c into v5 (or accept a backport). A year of silence may imply "no", but it could also imply that you just did not see the question or forgot about it... Let's close this question, one way or another.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature maintainer needs documentation updates for merge M-merged https://github.com/measurement-factory/anubis#pull-request-labels
Projects
None yet
8 participants