Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added Input Engine with triage functionality #777

Merged
merged 41 commits into from Jul 9, 2020

Conversation

Niraj-Kamdar
Copy link
Contributor

@Niraj-Kamdar Niraj-Kamdar commented Jul 3, 2020

New features:

  • Created input_engine that can handle various input files with triage data
  • Currently, input engine supports csv and json file.
  • To get cves from input file - Usage: cve-bin-tool -i=test.csv
  • Input engine can also be used to supplement triage data - Usage: cve-bin-tool -i=test.csv /path/to/scan
  • User can now add remarks field in csv or json which can have any value from following values ( Here, values in parenthesis are aliases for that specific type. )
    1. NewFound (1, n, N)
    2. Unexplored (2, u, U)
    3. Confirmed (3, c, C)
    4. Mitigated, (4, m, M)
    5. Ignored (5, i, I)
  • Output will be also represented in above order.
  • tests for input engine added under test_input_engine.
  • removed test for old csv2cve

@codecov-commenter
Copy link

codecov-commenter commented Jul 3, 2020

Codecov Report

Merging #777 into master will decrease coverage by 0.93%.
The diff coverage is 81.55%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #777      +/-   ##
==========================================
- Coverage   88.32%   87.38%   -0.94%     
==========================================
  Files         139      140       +1     
  Lines        2483     2474       -9     
  Branches      301      297       -4     
==========================================
- Hits         2193     2162      -31     
- Misses        226      243      +17     
- Partials       64       69       +5     
Flag Coverage Δ
#longtests 87.38% <81.55%> (-0.94%) ⬇️
Impacted Files Coverage Δ
cve_bin_tool/csv2cve.py 0.00% <0.00%> (-63.37%) ⬇️
cve_bin_tool/cvedb.py 87.60% <ø> (+2.12%) ⬆️
cve_bin_tool/version_signature.py 76.59% <ø> (ø)
test/test_data/ffmpeg.py 100.00% <ø> (ø)
test/test_data/kerberos.py 100.00% <ø> (ø)
test/test_data/lighttpd.py 100.00% <ø> (ø)
cve_bin_tool/util.py 70.58% <33.33%> (-13.42%) ⬇️
cve_bin_tool/cli.py 82.90% <71.11%> (-3.27%) ⬇️
test/test_scanner.py 90.00% <79.16%> (-6.39%) ⬇️
test/test_output_engine.py 97.61% <90.00%> (-1.22%) ⬇️
... and 13 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update ce183b0...de68871. Read the comment docs.

from .version import VERSION
from .version_scanner import VersionScanner

ERROR_CODES = {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Custom error codes

Comment on lines +44 to +47
def excepthook(exctype, exc, trace):
oldHook(exctype, exc, trace)
if ERROR_CODES.get(exctype):
sys.exit(ERROR_CODES[exctype])
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

custom except hook to also throw exit code while exception raise

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is really lovely, thanks. One of our users (I think someone running this on a Yocto based system?) was saying that they use the return codes to colourize their CI dashboard, so this may be very helpful to others as well.

Comment on lines 86 to 92
parser.add_argument(
"-i",
"--input-file",
action="store",
default=None,
help="Provide input filename",
)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

New option to specify input file

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shouldn't it be the actual path to the file? I guess it should be file-path


LOGGER.info("")
LOGGER.info("Overall CVE summary: ")
LOGGER.info(
f"There are {scanner.files_with_cve} files with known CVEs detected"
f"There are {cve_scanner.products_with_cve} products with known CVEs detected"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just a better name because files_with_cves won't make much sense while using it as csv2cve

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good thinking, though I wonder if products will be confusing for people scanning directories, as they may think of a "product" as a .deb or somesuch and we have a different definition. it's tempting to just say "things" instead of files or products, but I guess that doesn't sound quite as formal.

else:
csv2cve.update_logLevel(log_level=args.log_level)
csv2cve.generate_output(args.output_file, args.format)
argv = argv or sys.argv
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dummy csv2cve file

Comment on lines +11 to +20
class InvalidCsvError(Exception):
""" Given File is an Invalid CSV """


class MissingFieldsError(Exception):
""" Missing needed fields """


class InvalidJsonError(Exception):
""" Given File is an Invalid JSON """
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

custom exceptions to raise exceptions

Comment on lines +23 to +28
class Remarks(OrderedEnum):
NewFound = 1, "1", "NewFound", "n", "N"
Unexplored = 2, "2", "Unexplored", "u", "U", ""
Mitigated = 4, "4", "Mitigated", "m", "M"
Confirmed = 3, "3", "Confirmed", "c", "C"
Ignored = 5, "5", "Ignored", "i", "I"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

custom enum class that can handle multiple aliases easily and can also be sorted.

@@ -109,3 +110,25 @@ def pattern_match(text, patterns):
if fnmatch.fnmatch(text, pattern):
return True
return False


class OrderedEnum(Enum):
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is to provide comparison methods that allows sorting of list of Remarks.

@@ -118,6 +118,7 @@ jobs:
ACTIONS: 1
LONG_TESTS: 0
NO_EXIT_CVE_NUM: 1
PYTHONIOENCODING: 'utf8'
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixes problem with Windows CI

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

YAY! Thanks for this

test/test_cli.py
test/test_extractor.py
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

@terriko terriko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've got a bunch of nitpicky comments about error codes and better PR splits, but given that @SinghHrmn I think is waitnig for this PR to be merged, I'm going to merge and file issues for my feedback rather than request changes.

@Niraj-Kamdar one big thing though is that you really need to get into the habit of grouping the changes in PRs by function, so the checker changes here should have been in a separate branch and a separate PR. Much of the minor refactoring stuff should also have been split out. We've let you get into bad habits because we had so much refactoring that needed doing that I was expecting to see giant PRs, and that's on me and @pdxjohnny as mentors, but it's something I think is worth working on going forwards.

Here's a couple of helpful articles on how to split up PRs that I think you might find helpful: https://unhashable.com/stacked-pull-requests-keeping-github-diffs-small/ https://www.thedroidsonroids.com/blog/splitting-pull-request

CONTAINS_PATTERNS = [r"http://bugzilla.gnome.org/enter_bug.cgi?product=GStreamer"]
CONTAINS_PATTERNS = [
r"http://bugzilla.gnome.org/enter_bug.cgi?product=GStreamer",
r"libgstreamer-((\d+\.)+\d+)",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we be altering the checker default so it scans for VERSION_PATTERNS as well as CONTAINS_PATTERNS by default? It seems silly to duplicate lines in each checker if we probably actually want version patterns to also be a valid detection string.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes that can be helpful.

Comment on lines +44 to +47
def excepthook(exctype, exc, trace):
oldHook(exctype, exc, trace)
if ERROR_CODES.get(exctype):
sys.exit(ERROR_CODES[exctype])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is really lovely, thanks. One of our users (I think someone running this on a Yocto based system?) was saying that they use the return codes to colourize their CI dashboard, so this may be very helpful to others as well.

@@ -37,8 +60,7 @@ def __call__(self, parser, namespace, value, option_string=None):

def main(argv=None):
""" Scan a binary file for certain open source libraries that may have CVEs """
if argv is None:
argv = sys.argv
argv = argv or sys.argv
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I kind of prefer the clarity of the is None check in some cases (because we get so many new pythonistas looking at this project) but this isn't a thing I expect a lot of new contributors to look at so I think streamlining this is the right choice.

default=None,
help="provide output filename (default: output to stdout)",
default="",
help="provide output filename",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changing the default is fine, but don't change the help message here: that's to clarify for users because the default messages from argparser didn't make the behaviour clear.

Suggested change
help="provide output filename",
help="provide output filename (default: output to stdout)",

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will add that.

"-x",
"--extract",
action="store_true",
default=False,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've been wondering about this: should we change the default to auto-extract? I've opened an issue to discuss, since that's out of scope for this PR. #786


LOGGER.info("")
LOGGER.info("Overall CVE summary: ")
LOGGER.info(
f"There are {scanner.files_with_cve} files with known CVEs detected"
f"There are {cve_scanner.products_with_cve} products with known CVEs detected"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good thinking, though I wonder if products will be confusing for people scanning directories, as they may think of a "product" as a .deb or somesuch and we have a different definition. it's tempting to just say "things" instead of files or products, but I guess that doesn't sound quite as formal.

argv = argv or sys.argv
if len(argv) < 2:
LOGGER.error("Invalid Usage: csv file required")
return -2
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we're importing cli, we might as well use the same error code list here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will change that

csv2cve.update_logLevel(log_level=args.log_level)
csv2cve.generate_output(args.output_file, args.format)
LOGGER.error("Invalid Usage: csv file required")
return -4
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another error code spot.


@staticmethod
def openssl_convert(version):
def openssl_convert(version: str) -> str:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yay type hints!

@@ -4,7 +4,7 @@
"version": "4.1.3",
"version_strings": [
"%s version 4.1.3",
"Codec %s is not recognized by FFmpeg.",
"Codec '%s' is not recognized by FFmpeg.",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In future, can you put unrelated changes to checkers in a separate PR? They could all be grouped together, but they shouldn't be grouped with the InputEngine changes. It's considered bad form by most open source projects to group them the way you have.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have put it here because I have to make VersionScanner generator to use InputEngine parsed data as supplement and it was breaking the test_scanner because I have also included test for contains string previously but I have forget to clean up scanner and test was only passing due to previous scan has detected it. So, that's why when I converted that in generator this bug come to surface. I should have divide this PR in two parts: 1) Making version_scanner generator and 2) building input_engine. Thanks, for the feedback. I will try to keep PR size smaller.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants