Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for recursing into zip files #24

Merged
merged 11 commits into from
Dec 20, 2021
Merged

Conversation

glynternet
Copy link
Contributor

@glynternet glynternet commented Dec 18, 2021

Recursing into zip files is now supported up to a configurable depth, --nested-archive-max-depth. By default this is set to 0, which means an archive on disk will have its contents inspected but an archive within it would not be unarchived for inspection.
Nested archives will only be inspected if they are below a configurable size, --nested-archive-max-depth.
To control memory usage for unarchiving, a combination of --nested-archive-max-depth and --nested-archive-max-depth should be used. This will only control the memory usage on top of the base memory usage of log4j-sniffer.


Also had to change zip logic to use a new WalkFn of ZipWalkFn, where instead of a path, we pass in a *zip.Reader. Path cannot be used anymore because previously it was referring to a path on disk, whereas when nested inside another zip we are not resolving the reader on an os path but one relative inside the zip.

Likely after this PR we will need to unify logic so that zips in tars and tars in zips will be recursed, etc.

@changelog-app
Copy link

changelog-app bot commented Dec 18, 2021

Generate changelog in changelog/@unreleased

Type

  • Feature
  • Improvement
  • Fix
  • Break
  • Deprecation
  • Manual task
  • Migration

Description

Recursing into zip files is now supported up to a configurable depth, --nested-archive-max-depth. By default this is set to 0, which means an archive on disk will have its contents inspected but an archive within it would not be unarchived for inspection.
Nested archives will only be inspected if they are below a configurable size, --nested-archive-max-depth.
To control memory usage for unarchiving, a combination of --nested-archive-max-depth and --nested-archive-max-depth should be used. This will only control the memory usage on top of the base memory usage of log4j-sniffer.

Check the box to generate changelog(s)

  • Generate changelog entry

@glynternet glynternet force-pushed the gh/resursive-zip branch 3 times, most recently from a752675 to f81b66e Compare December 18, 2021 17:48
@glynternet glynternet force-pushed the gh/resursive-zip branch 2 times, most recently from c688e26 to 2b2ed99 Compare December 19, 2021 18:35
@@ -38,13 +38,14 @@ func TestBadVersions(t *testing.T) {
{name: "single bad version", directory: "../examples/single_bad_version", count: 1, finding: crawl.JarName | crawl.ClassPackageAndName | crawl.ClassFileMd5},
{name: "multiple bad versions", directory: "../examples/multiple_bad_versions", count: 13, finding: crawl.JarName | crawl.ClassPackageAndName | crawl.ClassFileMd5},
{name: "inside a dist", directory: "../examples/inside_a_dist", count: 2, finding: crawl.JarNameInsideArchive},
{name: "nested inside multiple zips", directory: "../examples/recursively_nested_zip", count: 1, finding: crawl.JarNameInsideArchive},
{name: "inside a par", directory: "../examples/inside_a_par", count: 1, finding: crawl.JarNameInsideArchive},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should now also find crawl.ClassPackageAndName and crawl.ClassFileMd5

archiveResult |= JarNameInsideArchive
versions[archiveVersion] = struct{}{}
}
return true, nil
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We still want to check inside - it's valuable to know if it's just a name match or the md5 etc also matches from classes inside.

{name: "inside a dist", directory: "../examples/inside_a_dist", count: 2, finding: crawl.JarNameInsideArchive},
{name: "inside a par", directory: "../examples/inside_a_par", count: 1, finding: crawl.JarNameInsideArchive},
{name: "inside a dist", directory: "../examples/inside_a_dist", count: 2, finding: crawl.JarNameInsideArchive | crawl.ClassPackageAndName | crawl.ClassFileMd5},
{name: "inside a par", directory: "../examples/inside_a_par", count: 1, finding: crawl.JarNameInsideArchive | crawl.ClassPackageAndName | crawl.ClassFileMd5},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you can also add the archived_fat_jar example - getting us to all the examples checked in at this point.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We currently don't recurse into zips that are contained within an archive.
I will add this as a follow up, I'd prefer not to grow this PR anymore.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Turns out this is still not enough, but will come later.

@bulldozer-bot bulldozer-bot bot merged commit e9addc2 into develop Dec 20, 2021
@bulldozer-bot bulldozer-bot bot deleted the gh/resursive-zip branch December 20, 2021 15:34
hpryce pushed a commit that referenced this pull request Dec 20, 2021
hpryce pushed a commit that referenced this pull request Dec 20, 2021
hpryce pushed a commit that referenced this pull request Dec 21, 2021
hpryce pushed a commit that referenced this pull request Dec 21, 2021
hpryce pushed a commit that referenced this pull request Dec 21, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants