Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support deleting findings #93

Merged
merged 14 commits into from
Feb 4, 2022
Merged

Support deleting findings #93

merged 14 commits into from
Feb 4, 2022

Conversation

glynternet
Copy link
Contributor

@glynternet glynternet commented Jan 31, 2022

Closes #71

Adds a subcommand, delete, that crawls the filesystem for vulnerable log4j versions and can delete them based on some configuration supplied by flags.

The command uses the same flags as crawl to configure rate limiting of scanning, max archive depth, etc.

Features unique to delete are:

  • --filepath-owner - A filepath regex pattern and templated owner can be provided. When encountering a finding, if the containing filepath matches the pattern provided by filepath-owner, then the owner template will be expanded against the filepath match to see if the owner of the file matches.
    This is specifically designed for cases like ^/home/(\w+)/.+:$1 where the owner of a file may depend on the path to it.
  • --skip-owner-check - Skips any file owner checks. This is required on Windows systems where file ownership seems to be non-trivial.
  • --finding-match - Specify findings that are required to be found for a vulnerability for the containing file to be deleted.
Delete files containing log4j vulnerabilities.

Crawl the file system from root, detecting files containing log4j-vulnerabilities and deleting them if they meet certain requirements determined by the command flags.
Root must be provided and can be a single file or directory.

Dry-run mode is enabled by default, where a line will be output to state where a file would be deleted when running not in dry run mode.
It is recommended to run using dry-run mode enabled, checking the logged output and then running with dry-run disabled using the same configuration flags.
Use --dry-run=false to turn off dry-run mode, enabling deletes.

When used on windows, deleting based on file ownership is unsupported and skip-owner-check should be used instead of filepath-owner.

Usage:
  log4j-sniffer delete <root> [flags]

Examples:
Delete all findings nested beneath /path/to/dir that are owned by foo and contain findings that match both classFileMd5 and jarFileObfuscated.

log4j-sniffer delete /path/to/dir --dry-run=false --filepath-owner ^/path/to/dir/.*:foo --finding-match classFileMd5 --finding-match jarFileObfuscated

Flags:
      --archive-open-mode string                             Supported values:
                                                               standard - standard file opening will be used. This may cause the filesystem cache to be populated with reads from the archive opens.
                                                               directio - direct I/O will be used when opening archives that require sequential reading of their content without being able to skip to file tables at known locations within the file.
                                                                          For example, "directio" can have an effect on the way that tar-based archives are read but will have no effect on zip-based archives.
                                                                          Using "directio" will cause the filesystem cache to be skipped where possible. "directio" is not supported on tmpfs filesystems and will cause tmpfs archive files to report an error. (default "standard")
      --archives-per-second-rate-limit int                   The maximum number of archives to scan per second. 0 for unlimited.
      --directories-per-second-rate-limit int                The maximum number of directories to crawl per second. 0 for unlimited.
      --dry-run                                              When true, a line with be output instead of deleting a file. Use --dry-run=false to enable deletion. (default true)
      --enable-obfuscation-detection                         Enable applying partial bytecode matching to Jars that appear to be obfuscated. (default true)
      --enable-partial-matching-on-all-classes               Enable partial bytecode matching to all class files found.
      --enable-trace-logging                                 Enables trace logging whilst crawling. disable-detailed-findings must be set to false (the default value) for this flag to have an effect.
      --filepath-owner strings                               Provide a filepath pattern and owner template that will be used to check whether a file should be deleted or not when it is deemed to be vulnerable.
                                                             Multiple values can be provided and values must be provided in the form filepath_pattern:owner_template, where a filepath pattern and owner template are colon separated.

                                                             When a file is deemed to be vulnerable, the path of the file containing the vulnerability will be matched against all filepath patterns.
                                                             For all filepath matches, the owner template will be expanded against the filepath pattern match to resolve to a file owner value that the actual file owner will then be compared against.
                                                             Owner templates may use template variables, e.g. $1, $2, $name, that correspond to capture groups in the filepath pattern. Please refer to the standard go regexp package documentation at https://pkg.go.dev/regexp#Regexp.Expand for more detailed expanding behaviour.

                                                             If no filepaths match, the file will not be deleted. If any filepaths match, all matching filepath patterns' corresponding expanded templated owner values must match against the actual file owner for the file to be deleted.

                                                             Examples:
                                                             --filepath-owner ^/foo/bar/.+:qux would consider /foo/bar/baz for deletion only if it is owned by qux.
                                                             --filepath-owner ^/foo/bar/.+:qux and --filepath-owner ^/foo/bar/baz/.+:quuz would not consider /foo/bar/baz/corge for deletion if owned by either qux or quuz because both would need to match.
                                                             --filepath-owner ^/foo/(\w+)/.*:$1 would consider /foo/bar/baz for deletion only if it is owned by bar.

      --finding-match strings                                When supplied, any vulnerable finding must contain all values that are provided to finding-match for it to be considered for deletion.
                                                             These values are considered on a finding-by-finding basis, i.e. an archive containing two separate vulnerable jars will only be deleted if either of the contained jars matches all finding-match values.

                                                             Supported values are as follows, but can be provided case-insensitively:
                                                             - ClassBytecodeInstructionMd5
                                                             - ClassBytecodePartialMatch
                                                             - ClassFileMd5
                                                             - JarFileObfuscated
                                                             - JarName
                                                             - JarNameInsideArchive
                                                             - JndiLookupClassName
                                                             - JndiLookupClassPackageAndName
                                                             - JndiManagerClassName
                                                             - JndiManagerClassPackageAndName

                                                             Example:
                                                             --finding-match classFileMd5 and --finding-match jarFileObfuscated would only delete a file containing a vulnerability if the vulnerability contains a class file hash match and an obfuscated jar name.
                                                             If a vulnerable finding contained only one of these finding-match values then the file would not be considered for deletion.

  -h, --help                                                 help for delete
      --ignore-dir strings                                   Specify directory pattern to ignore. Use multiple times to supply multiple patterns.
                                                             Patterns should be relative to the provided root.
                                                             e.g. ignore "^/proc" to ignore "/proc" when using a crawl root of "/"
      --maximum-average-obfuscated-class-name-length int     The maximum class name length for a class to be considered obfuscated. (default 3)
      --maximum-average-obfuscated-package-name-length int   The maximum average package name length a class to be considered obfuscated. (default 3)
      --nested-archive-max-depth uint                        The maximum depth to recurse into nested archives.
                                                             A max depth of 0 will open up an archive on the filesystem but not any nested archives.
      --nested-archive-max-size uint                         The maximum compressed size in bytes of any nested archive that will be unarchived for inspection.
                                                             This limit is made a per-depth level.
                                                             The overall limit to nested archive size unarchived should be controlled
                                                             by both the nested-archive-max-size and nested-archive-max-depth. (default 5242880)
      --per-archive-timeout duration                         If this duration is exceeded when inspecting an archive,
                                                             an error will be logged and the crawler will move onto the next file. (default 15m0s)
      --skip-owner-check                                     When provided, the owner of a file will not be checked before attempting a delete.

@changelog-app
Copy link

changelog-app bot commented Jan 31, 2022

Generate changelog in changelog/@unreleased

Type

  • Feature
  • Improvement
  • Fix
  • Break
  • Deprecation
  • Manual task
  • Migration

Description

A subcommand, log4j-sniffer delete, has been added that crawls the filesystem for vulnerable log4j versions and can delete them based on some configuration supplied by flags.

The flags for tuning configure figure deletion based on file ownership (on unix-like systems) and type of findings found within a file.

Please run log4j-sniffer delete -h for detailed documentation.

Check the box to generate changelog(s)

  • Generate changelog entry

Copy link
Contributor

@nmiyake nmiyake left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working on this! I like/am encouraged by the fact that this is able to leverage the existing Crawl framework.

Given the current design of the rest of the system, I think this PR/implementation is very reasonable and well-structured -- left some general feedback, but nothing too major.

cmd/delete.go Outdated
for _, value := range directoriesWithOwners {
split := strings.Split(value, ":")
if len(split) != 2 {
return fmt.Errorf(`invalid directory-with-owner, must contain 2 colon-separated segments but got %q`, value)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Previous 2 returns were doing errors.New, but this is doing fmt.Errorf -- no strong opinion on which should be used, but they should be used consistently within the same function.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My understanding was that errors.New should be used when the message is static and fmt.Errorf when the message changes depending on the state of the application/codepath.

What's the reasoning behind sticking to one of errors.New or fmt.Errorf within a function? I don't follow.

//
// If the filepath and detailed finding both match for a given crawl.Path then a file is eligible for deletion.
// In this case, Process will always return false to state that this file should no longer exist and that inspecting
// this file for more findings should not be undertaken.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: extra whitespace between "should" and "not"


// DirectoryMatch returns true when the directory containing path matches the TemplatedOwner DirectoryExpression field.
func (r TemplatedOwner) DirectoryMatch(path string) bool {
dir, _ := filepath.Split(path)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pretty sure that filepath.Dir is more appropriate here (functionality should be mostly the same, but believe it's semantically more correct -- you can play around with exact output at https://go.dev/play/p/L5SwrEo8uxc)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome, thanks, forgot about that function.

pkg/crawl/report.go Show resolved Hide resolved
@@ -108,7 +154,9 @@ type Log4jIdentifier struct {

// HandleFindingFunc is called with the given findings and versions when Log4jIdentifier identifies
// a log4j vulnerability whilst crawling the filesystem.
type HandleFindingFunc func(ctx context.Context, path Path, result Finding, version Versions)
// The bool returned by HandleFindingFunc, when false, will instruct the identification of the file to cease.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would expand upon this/update phrasing, as I'm unsure of what "will instruct the identification of the file to cease" actually means

directoryMatches int
owner string
)
for _, match := range ms.Matchers {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit, but would name variable matcher rather than match for clarity

if len(path) == 0 {
return true
}
match, err := d.FilepathMatch(path[0])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would extract path[0] to a named variable based on number of times it's used here (and to make it more clear what it semantically represents)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will currently crash if d.FilepathMatch is not specified (since default value is nil). Generally, it is best form for exported structs to be functional/have sensible defaults in their empty form.

Main options would be:

  1. Define behavior when these values aren't specified (match all/match none)
  2. Check if these values aren't specified and throw an appropriate error on function invocation
  3. Unexport the struct, make it an interface and then have creation function return properly initialized one

cmd/delete.go Outdated
return errors.New("at least one --directory-with-owner value must be provided or --skip-owner-check must be set")
}

var ms []deleter.Matcher
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For this and the 3-4 blocks after, decomposing into a function that returns the arguments based on the relevant input flags may make the function more readable (just be decreasing size). Not a huge deal, but a suggestion.

FilepathMatch func(filepath string) (bool, error)
FindingMatch func(finding crawl.Finding) bool
DryRun bool
Delete func(filepath string) error
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason that this is configurable right now? Look like only current usage sets it to os.Remove, and right now if this isn't specified as part of the struct then using it will cause a crash (will attempt to invoke a nil function). Unless there's a concrete planned usage where this is set to something different, probably fine for the implementation to start with just hard-coding os.Remove in the delete code.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was purely for testing purposes to stub it out and test the handling of the returned error whilst avoiding interacting with the filesystem.

I'll move it to be hardcoded and add forced deleting of non-existant files to test the error handling case.
That does seem less clean to me. Am I thinking about this the wrong way?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah got it, sorry, missed that this was used in tests. In that case, I think it's fine either way.

cmd/delete.go Show resolved Hide resolved
Copy link
Contributor

@nmiyake nmiyake left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for iterating! Provided a bit more feedback, but on the whole I think this looks good.

func (f Finding) String() string {
var out []string
// for all non-zero bits, append string for finding if it exists
for i := 0; f > 0; i, f = i+1, f>>1 {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This logic is a bit hard to parse/read to me -- personally, I would find it easier to follow if it was a for loop of the form:

for i := JndiLookupClassName /* or NothingDetected? */; i != (valueAfterClassFileMd5); i>>1 {
    // logic
}

The above construction would make it clearer to me that we're iterating through each valid "Finding" type and appending based on that. However, it would require defining a new terminal Finding type (so that there is an end point for the iteration).

I don't feel strongly enough about this to assert that the change must be made, but wanted to provide the feedback.

@glynternet glynternet merged commit dabeb5a into develop Feb 4, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support deleting of vulnerable files
3 participants