-
Notifications
You must be signed in to change notification settings - Fork 23
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for recursing into zip files #24
Conversation
Generate changelog in
|
a752675
to
f81b66e
Compare
c688e26
to
2b2ed99
Compare
Recursing into zip files is now supported up to 3 zips deep.
7701e7c
to
cdc0752
Compare
@@ -38,13 +38,14 @@ func TestBadVersions(t *testing.T) { | |||
{name: "single bad version", directory: "../examples/single_bad_version", count: 1, finding: crawl.JarName | crawl.ClassPackageAndName | crawl.ClassFileMd5}, | |||
{name: "multiple bad versions", directory: "../examples/multiple_bad_versions", count: 13, finding: crawl.JarName | crawl.ClassPackageAndName | crawl.ClassFileMd5}, | |||
{name: "inside a dist", directory: "../examples/inside_a_dist", count: 2, finding: crawl.JarNameInsideArchive}, | |||
{name: "nested inside multiple zips", directory: "../examples/recursively_nested_zip", count: 1, finding: crawl.JarNameInsideArchive}, | |||
{name: "inside a par", directory: "../examples/inside_a_par", count: 1, finding: crawl.JarNameInsideArchive}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should now also find crawl.ClassPackageAndName and crawl.ClassFileMd5
pkg/crawl/identify.go
Outdated
archiveResult |= JarNameInsideArchive | ||
versions[archiveVersion] = struct{}{} | ||
} | ||
return true, nil |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We still want to check inside - it's valuable to know if it's just a name match or the md5 etc also matches from classes inside.
22b5f3f
to
60fa8e5
Compare
{name: "inside a dist", directory: "../examples/inside_a_dist", count: 2, finding: crawl.JarNameInsideArchive}, | ||
{name: "inside a par", directory: "../examples/inside_a_par", count: 1, finding: crawl.JarNameInsideArchive}, | ||
{name: "inside a dist", directory: "../examples/inside_a_dist", count: 2, finding: crawl.JarNameInsideArchive | crawl.ClassPackageAndName | crawl.ClassFileMd5}, | ||
{name: "inside a par", directory: "../examples/inside_a_par", count: 1, finding: crawl.JarNameInsideArchive | crawl.ClassPackageAndName | crawl.ClassFileMd5}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you can also add the archived_fat_jar example - getting us to all the examples checked in at this point.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We currently don't recurse into zips that are contained within an archive.
I will add this as a follow up, I'd prefer not to grow this PR anymore.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Turns out this is still not enough, but will come later.
Recursing into zip files is now supported up to a configurable depth,
--nested-archive-max-depth
. By default this is set to 0, which means an archive on disk will have its contents inspected but an archive within it would not be unarchived for inspection.Nested archives will only be inspected if they are below a configurable size,
--nested-archive-max-depth
.To control memory usage for unarchiving, a combination of
--nested-archive-max-depth
and--nested-archive-max-depth
should be used. This will only control the memory usage on top of the base memory usage oflog4j-sniffer
.Also had to change zip logic to use a new WalkFn of ZipWalkFn, where instead of a path, we pass in a
*zip.Reader
. Path cannot be used anymore because previously it was referring to a path on disk, whereas when nested inside another zip we are not resolving the reader on an os path but one relative inside the zip.Likely after this PR we will need to unify logic so that zips in tars and tars in zips will be recursed, etc.