Do not exit with an error if maximum archive size exceeded #64

hpryce · 2022-01-11T12:34:35Z

Before: if the maximum archive size was exceeded we'd exit with an error.
After: if the maximum archive size is exceeded we print information about this and continue.

changelog-app · 2022-01-11T12:34:39Z

Generate changelog in `changelog/@unreleased`

Type

Description

Do not exit with an error if maximum archive size exceeded

Check the box to generate changelog(s)

Generate changelog entry

glynternet · 2022-01-11T13:23:32Z

pkg/crawl/identify.go

@@ -209,7 +209,7 @@ func (i *Log4jIdentifier) findArchiveVulnerabilities(ctx context.Context, depth
 func (i *Log4jIdentifier) vulnerabilityFileWalkFunc(depth uint, result *Finding, versions Versions, obfuscated bool) archive.FileWalkFn {
 	return func(ctx context.Context, path string, size int64, contents io.Reader) (proceed bool, err error) {
 		archiveType, ok := i.ParseArchiveFormat(path)
-		if ok && depth < i.ArchiveMaxDepth {
+		if ok && depth < i.ArchiveMaxDepth && size < i.ArchiveMaxSize {


I'm under the impression we only need to use the max archive size for zips, because zips have the directory tree at the end of the zip content and are read into a buffer before we can walk it.

For tar based archives, I believe we can actually avoid filling up a buffer and can support tar-based archives that are over the archive max size, so this change would mean we no longer support those.

Can we log an error from the nested call to findArchiveVulnerabilities on line 230 instead of returning an error there? This way we will defer erroring logic to the logic creating the size-capped buffers?

log4j-sniffer/pkg/crawl/identify.go

Lines 228 to 231 in 3618fc4

finding, innerVersions, err := i.findArchiveVulnerabilities(ctx, depth+1, walker, obfuscated)

if err != nil {

return false, err

}

You're right we weren't limited before - the error was confusing me to think it applied to all archives so I've wrapped that such it's clearer it's being hit on a zip.

To better describe what is going on as I fix this up:

For tars, the file headers are inline in the archive and we can read nested archives without memory using just the read buffer per archive (small) until we find a zip.

Zip files need random access. Tar files do not offer random access. Once a zip file is encountered, however deeply nested, we have to buffer to memory with the current implementation.

Which nested tars aren't free, they are much cheaper. This is due to not needing random access on the contents as the file header appears first in the stream - we use that to decide whether we actually need the following bytes or can just throw them away.

Yup, this is my understanding of it, so we're on the same page.
Let me know if you want to pair on the fix.

glynternet

I know we're keen to get this out so approving as is and I'll FLUP with some small tidy ups I believe we should be making.

Do not exit with an error if maximum archive size exceeded

68221ec

Add generated changelog entries

c0a1f98

hpryce added the merge when ready label Jan 11, 2022

hpryce mentioned this pull request Jan 11, 2022

Print skipped paths #65

Merged

glynternet reviewed Jan 11, 2022

View reviewed changes

Only skip zip files due to size

a20c4d2

glynternet approved these changes Jan 11, 2022

View reviewed changes

bulldozer-bot bot merged commit 2f550d9 into develop Jan 11, 2022

bulldozer-bot bot deleted the archive_max_size_fix branch January 11, 2022 17:44

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Do not exit with an error if maximum archive size exceeded #64

Do not exit with an error if maximum archive size exceeded #64

hpryce commented Jan 11, 2022

changelog-app bot commented Jan 11, 2022 •

edited by hpryce

glynternet Jan 11, 2022

hpryce Jan 11, 2022

glynternet Jan 11, 2022 •

edited

glynternet left a comment

	finding, innerVersions, err := i.findArchiveVulnerabilities(ctx, depth+1, walker, obfuscated)
	if err != nil {
	return false, err
	}

Do not exit with an error if maximum archive size exceeded #64

Do not exit with an error if maximum archive size exceeded #64

Conversation

hpryce commented Jan 11, 2022

changelog-app bot commented Jan 11, 2022 • edited by hpryce

Generate changelog in changelog/@unreleased

glynternet Jan 11, 2022

Choose a reason for hiding this comment

hpryce Jan 11, 2022

Choose a reason for hiding this comment

glynternet Jan 11, 2022 • edited

Choose a reason for hiding this comment

glynternet left a comment

Choose a reason for hiding this comment

changelog-app bot commented Jan 11, 2022 •

edited by hpryce

Generate changelog in `changelog/@unreleased`

glynternet Jan 11, 2022 •

edited