-
Notifications
You must be signed in to change notification settings - Fork 18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add parser for flawfinder #31
Add parser for flawfinder #31
Conversation
firehose/parsers/flawfinder.py
Outdated
weakness_messages.append(message) | ||
# TODO: Flawfinder can returns more than one CWE, | ||
# when this rappends, get_cwe cannot return a valid value | ||
# Fix this. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to support more than one CWE per issue in the schema? Currently <cwe>
is optional within <issue>
; should it be zeroOrMore
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was going to ask this. I don't know what the best pratice in this case. When flawfinder returns more than one CWE, what should i do?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you post some examples of what the output looks like?
My current opinion is that we should change the schema and model.py to support multiple CWEs per issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
./docs/examples/rtsp.c:170: [4] (buffer) sscanf:
The scanf() family's %s operation, without a limit specification, permits
buffer overflows (CWE-120, CWE-20). Specify a limit to %s, or use a
different input function.
./docs/examples/synctime.c:155: [4] (buffer) sscanf:
The scanf() family's %s operation, without a limit specification, permits
buffer overflows (CWE-120, CWE-20). Specify a limit to %s, or use a
different input function.
In my opinion, Add multiple CWEs would generate a more realistic output, once one warning could be related to more than one CWE.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can send another PR doing this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. I'll update the schema accordingly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will first make this change in models, then i finish the flawfinder parser.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@davidmalcolm I have a doubt. Projects that already use firehose, and pass to Issue model a single cwe using a integer value. Should we maintain the compatibility? (Allows de user to pass a integer or a list of integers). Or we keep only a list of integers?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change impacts the others parsers too.
76666dd
to
d10b185
Compare
Hey @davidmalcolm , i have updated this PR. I added some tests, and have included flawfinder in the documentation. Now the parser capture the first CWE, as we had discussed. I created the issue #35, to map the multiple cwes problem. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks; this is looking good. I have some concerns about version-handling, and would prefer it if the parser captured the severity of issues (and maybe categories also).
Sorry if this seems nit-picky.
firehose/parsers/flawfinder.py
Outdated
""" | ||
|
||
generator = Generator(name='flawfinder', | ||
version=flawfinder()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that a Generator's version
can be None
. It looks like the actual version is embedded in the first line of the output:
Flawfinder version 1.31, (C) 2001-2014 David A. Wheeler.
so presumably parse_file
should be looking for lines like that and parsing them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed. Now i get the version from the Flawfinder output.
firehose/parsers/flawfinder.py
Outdated
def get_cwe(line): | ||
"""Extract CWE from flawfinder alarms | ||
It's possible that a alarm not have a correlated CWE | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should have a comment here referencing issue #35, and that it's just the first CWE if there's more than one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
firehose/parsers/flawfinder.py
Outdated
try: | ||
return check_output(["flawfinder", "--version"]).strip() | ||
except Exception: | ||
return '1.31' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As noted above, I'd much prefer it if parse_file
parsed the version header in the output.
If the output file doesn't contain a version string, then I think it's better for the parser to leave the Generator
's version
as None
, rather than trying to scrape it from flawfinder --version
(or guessing). Rationale: I use firehose in mock-with-analysis
, which does the analysis in a chroot. There might be a different version of flawfinder in the chroot compared to what's in the host $PATH
.
So I think it's better to get rid of this function and do the version-handling in parse_file
.
firehose/parsers/flawfinder.py
Outdated
line = loaded_file.readline() | ||
|
||
counter = 0 | ||
for weakness in weakness_paths: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rather than manually incrementing counter
, could this code use Python's enumerate
builtin? (and maybe use idx
rather then counter
?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
firehose/parsers/flawfinder.py
Outdated
|
||
issue = Issue(cwe, None, location, | ||
Message(text=weakness_messages[counter]), notes=None, | ||
trace=None, severity=None, customfields=None) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Re severity
: Flawfinder's docs say:
The risk level is shown inside square brackets and varies from 0, very little risk, to 5, great risk.
It would be good to capture that number as Issue.severity
(as a string, without the square brackets):
http://firehose.readthedocs.io/en/latest/data-model.html#firehose.model.severity
firehose/parsers/flawfinder.py
Outdated
else: | ||
cwe = int(weakness_cwes[counter]) | ||
|
||
issue = Issue(cwe, None, location, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The None
here is the Issue.testid
:
http://firehose.readthedocs.io/en/latest/data-model.html#firehose.model.testid
Perhaps the (shell) system:
stuff should be captured here. If I'm reading the Flawfinder docs correctly:
Each entry has a filename, a colon, a line number, a
risk level in brackets (where 5 is the most risky), a category, the name of the function, and a description of
....then Flawfinder's "category" here is "shell" and the name of the function is "system". I'm not yet sure how that should map to a firehose Issue.testid.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@davidmalcolm how should we handle this? Apprently flawfinder does not have a testid value.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we use flawfinder's "category" here for the FIrehose "testid"? Looking at the sample output, that would give values like "shell", "format", "buffer", "race", which seems sane.
Hey @davidmalcolm , i updated this PR with the last revision. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the updates.
I've got a few more nitpicks (sorry), covering code I had trouble following.
firehose/parsers/flawfinder.py
Outdated
|
||
:arg1: file passed by cli | ||
:returns: file loaded in memory | ||
""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't actually load a file into memory; it merely returns a file
object representing a file opened for reading. The code would be clearer if you got rid of this function and replace users with just the open
builtin.
firehose/parsers/flawfinder.py
Outdated
""" Parser loaded flawfinder output | ||
|
||
:loaded_file: flawfinder report, loaded in memory | ||
:returns: Firehose Analysis object, representing the final XML. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"loaded_file" is actually a file-like object, not an in-memory string. Please rename to "infile", or similar ("input" is a builtint IIRC)
firehose/parsers/flawfinder.py
Outdated
data["message"] = message; | ||
line = message_line | ||
flawfinder_data.append(data) | ||
else: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm finding it hard to follow this two-stage parsing. If I'm reading things right, the code first reads the lines, building up a list of dicts (flawfinder_data
). There's then a second loop over that list, creating Issue
objects and adding them to analysis.results
.
Wouldn't it be much simpler to have one loop, and append Issue
objects directly to analysis.results
, without the data
dict i.e. work directly with an Issue
object instead.
Hope that makes sense; I may have misread this code.
firehose/parsers/flawfinder.py
Outdated
int(line[cwe_first_int_char]) | ||
cwe_first_int_char += 1 | ||
except Exception: | ||
cwe = line[(cwe_init_str_pos + 4):(cwe_first_int_char)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wouldn't this be much simpler with a regex e.g.
re.findall(line, "CWE-([0-9]+)")
or somesuch? (which I think would give a list of tuples of strings).
Could then rename the function to find_cwes
and have it return a list of ints, and do the dropping of multiples in the caller.
Can also write a unit test for the CWE parser that way.
firehose/parsers/flawfinder.py
Outdated
|
||
|
||
def flawfinder(first_line): | ||
"""Retrieve flawfinder version from report. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I find the current name to be unclear; please rename to parse_first_line
(as that's what it does).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe rename to get_flawfinder_version
should be better, what do you think @davidmalcolm ?
|
||
class TestParseXml(unittest.TestCase): | ||
def parse_example(self, filename): | ||
report = load_file(os.path.join(os.path.dirname(__file__), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given that it's a file-like object, maybe turn into:
path = os.path.join(...)
with open(path) as f:
return parse_file (f)
863d9a5
to
8bf2e8a
Compare
@davidmalcolm thanks for the great revision, i had updated the PR. |
4c64931
to
65ac128
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the updated version.
This is looking much better, but I have a few more nitpicks (sorry).
firehose/parsers/flawfinder.py
Outdated
arg_file = sys.argv[1] | ||
report = open(arg_file, 'r') | ||
except IOError: | ||
print("File note found") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Typo: "note" should be "not".
But an IOError is not necessarily "file not found"; it could be a permissions error, or an attempt to open a directory rather than a file, etc. Maybe just lose this clause?
firehose/parsers/flawfinder.py
Outdated
(prog.search(message_line) and prog2.search(message_line)): | ||
issue_message += " " + message_line | ||
try: | ||
cwe = re.findall("CWE-([0-9]+)", message_line)[0] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't like using try/except for parsing; I prefer to use non-exception program flow (rationale: sometimes we end up catching a different exception than the one we thought we were, so it can conceal bugs).
Anything that matches the ([0-9]+)
is known to be convertible to int
, so how about:
# list of ints
cwes = []
before the "while message_line" loop, and:
cwes.extend([int(m.group(0))
for m in re.findall("CWE-([0-9]+)", message_line)])
here, and then after the "message_line" loop:
if cwes:
first_cwe = cwes[0]
else
first_cwe = None
to avoid the two try/except clauses.
Please reinstate the comment about issue #35
firehose/parsers/flawfinder.py
Outdated
try: | ||
testid = re.findall(r"\(([a-z]+)\)", line)[0] | ||
except IndexError: | ||
pass |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we just want the first match, just use re.search
, which returns None
for the "not-found" case ( https://docs.python.org/3.4/library/re.html#re.search ).
@davidmalcolm I had updated the PR |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your improvements here.
Ironically, with the code cleanups you've done, you've simplified things enough that I can now see a couple of other issues that were hiding.
firehose/parsers/flawfinder.py
Outdated
testid = re.search("\(([a-z]+)\)", line).group(1) | ||
issue_path = line.split(":")[0] | ||
issue_line = line.split(":")[1] | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for doing the earlier cleanups; the code is clearer now.
With that, the handling of prog
, prog2
and the testid
looks a bit unusual; presumably these always appear on the same line, in a consistent way. Given that, I think it would be improved by using a single regular expression; something like:
# We want to match a line like:
# ./docs/examples/asiohiper.cpp:78: [4] (shell) system:
filename_group = "(.+)"
linenum_group = "([0-9]+)
severity_group = "\[([0-9])\]"
category_group = "\((.+)\)"
funcname_group = "(\S+)"
ws = '\s+'
PAT= (filename_group + ':' + linenum_group + ':' + ws + severity_group + ws
+ category_group + ws + funcname_group)
It should then be possible to do:
m = re.match(PAT, message_line)
if m:
# do stuff with m.groups()
firehose/parsers/flawfinder.py
Outdated
|
||
issue = Issue(first_cwe, testid, location, | ||
Message(text=issue_message), notes=None, | ||
trace=None, severity=None, customfields=None) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please set the severity
field to the number in square parens (keep it as a string, rather than calling int
on it).
firehose/parsers/flawfinder.py
Outdated
cwes = [] | ||
while message_line != "\n" and not \ | ||
(prog.search(message_line) and prog2.search(message_line)): | ||
issue_message += " " + message_line |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You could reuse the PAT from above here.
- Add only one cwe in Firehose report - Retrieve flawfinder version from report. - Use enumerate instead of counter. - Add tests - Add comment about multiple cwes. - Fix regex.
@davidmalcolm I had updated the PR |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for all the updates; this looks good.
I'm still working on this pull request (missing tests). soon i will remove the 'WIP' tag.