Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Do not exit with an error if maximum archive size exceeded #64

Merged
merged 3 commits into from
Jan 11, 2022

Conversation

hpryce
Copy link
Contributor

@hpryce hpryce commented Jan 11, 2022

Before: if the maximum archive size was exceeded we'd exit with an error.
After: if the maximum archive size is exceeded we print information about this and continue.

@changelog-app
Copy link

changelog-app bot commented Jan 11, 2022

Generate changelog in changelog/@unreleased

Type

  • Feature
  • Improvement
  • Fix
  • Break
  • Deprecation
  • Manual task
  • Migration

Description

Do not exit with an error if maximum archive size exceeded

Check the box to generate changelog(s)

  • Generate changelog entry

@@ -209,7 +209,7 @@ func (i *Log4jIdentifier) findArchiveVulnerabilities(ctx context.Context, depth
func (i *Log4jIdentifier) vulnerabilityFileWalkFunc(depth uint, result *Finding, versions Versions, obfuscated bool) archive.FileWalkFn {
return func(ctx context.Context, path string, size int64, contents io.Reader) (proceed bool, err error) {
archiveType, ok := i.ParseArchiveFormat(path)
if ok && depth < i.ArchiveMaxDepth {
if ok && depth < i.ArchiveMaxDepth && size < i.ArchiveMaxSize {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm under the impression we only need to use the max archive size for zips, because zips have the directory tree at the end of the zip content and are read into a buffer before we can walk it.

For tar based archives, I believe we can actually avoid filling up a buffer and can support tar-based archives that are over the archive max size, so this change would mean we no longer support those.

Can we log an error from the nested call to findArchiveVulnerabilities on line 230 instead of returning an error there? This way we will defer erroring logic to the logic creating the size-capped buffers?

finding, innerVersions, err := i.findArchiveVulnerabilities(ctx, depth+1, walker, obfuscated)
if err != nil {
return false, err
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right we weren't limited before - the error was confusing me to think it applied to all archives so I've wrapped that such it's clearer it's being hit on a zip.

To better describe what is going on as I fix this up:

  • For tars, the file headers are inline in the archive and we can read nested archives without memory using just the read buffer per archive (small) until we find a zip.
  • Zip files need random access. Tar files do not offer random access. Once a zip file is encountered, however deeply nested, we have to buffer to memory with the current implementation.
  • Which nested tars aren't free, they are much cheaper. This is due to not needing random access on the contents as the file header appears first in the stream - we use that to decide whether we actually need the following bytes or can just throw them away.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup, this is my understanding of it, so we're on the same page.
Let me know if you want to pair on the fix.

Copy link
Contributor

@glynternet glynternet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know we're keen to get this out so approving as is and I'll FLUP with some small tidy ups I believe we should be making.

@bulldozer-bot bulldozer-bot bot merged commit 2f550d9 into develop Jan 11, 2022
@bulldozer-bot bulldozer-bot bot deleted the archive_max_size_fix branch January 11, 2022 17:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants