-
Notifications
You must be signed in to change notification settings - Fork 23
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Do not exit with an error if maximum archive size exceeded #64
Conversation
Generate changelog in
|
pkg/crawl/identify.go
Outdated
@@ -209,7 +209,7 @@ func (i *Log4jIdentifier) findArchiveVulnerabilities(ctx context.Context, depth | |||
func (i *Log4jIdentifier) vulnerabilityFileWalkFunc(depth uint, result *Finding, versions Versions, obfuscated bool) archive.FileWalkFn { | |||
return func(ctx context.Context, path string, size int64, contents io.Reader) (proceed bool, err error) { | |||
archiveType, ok := i.ParseArchiveFormat(path) | |||
if ok && depth < i.ArchiveMaxDepth { | |||
if ok && depth < i.ArchiveMaxDepth && size < i.ArchiveMaxSize { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm under the impression we only need to use the max archive size for zips, because zips have the directory tree at the end of the zip content and are read into a buffer before we can walk it.
For tar based archives, I believe we can actually avoid filling up a buffer and can support tar-based archives that are over the archive max size, so this change would mean we no longer support those.
Can we log an error from the nested call to findArchiveVulnerabilities
on line 230 instead of returning an error there? This way we will defer erroring logic to the logic creating the size-capped buffers?
log4j-sniffer/pkg/crawl/identify.go
Lines 228 to 231 in 3618fc4
finding, innerVersions, err := i.findArchiveVulnerabilities(ctx, depth+1, walker, obfuscated) | |
if err != nil { | |
return false, err | |
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're right we weren't limited before - the error was confusing me to think it applied to all archives so I've wrapped that such it's clearer it's being hit on a zip.
To better describe what is going on as I fix this up:
- For tars, the file headers are inline in the archive and we can read nested archives without memory using just the read buffer per archive (small) until we find a zip.
- Zip files need random access. Tar files do not offer random access. Once a zip file is encountered, however deeply nested, we have to buffer to memory with the current implementation.
- Which nested tars aren't free, they are much cheaper. This is due to not needing random access on the contents as the file header appears first in the stream - we use that to decide whether we actually need the following bytes or can just throw them away.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yup, this is my understanding of it, so we're on the same page.
Let me know if you want to pair on the fix.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know we're keen to get this out so approving as is and I'll FLUP with some small tidy ups I believe we should be making.
Before: if the maximum archive size was exceeded we'd exit with an error.
After: if the maximum archive size is exceeded we print information about this and continue.