Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Show prevalence of rules in the output #1737

Open
wants to merge 50 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 37 commits
Commits
Show all changes
50 commits
Select commit Hold shift + click to select a range
7603f85
Entropy Methods
Aayush-Goel-04 Jul 29, 2023
f5b38d5
Merge branch 'mandiant:master' into Aayush-Goel-04/Issue#520
Aayush-Goel-04 Aug 2, 2023
bf1f59b
Sort rules in render based on match probability
Aayush-Goel-04 Aug 5, 2023
31bd6b3
Rendering rules into two sections. * for interesting rules.
Aayush-Goel-04 Aug 6, 2023
9ca4f9d
Merge branch 'mandiant:master' into Aayush-Goel-04/Issue#520
Aayush-Goel-04 Aug 7, 2023
78877f2
Merge branch 'mandiant:master' into Aayush-Goel-04/Issue#520
Aayush-Goel-04 Aug 13, 2023
f5f3e87
update
Aayush-Goel-04 Aug 13, 2023
a6797de
Merge branch 'mandiant:master' into Aayush-Goel-04/Issue#520
Aayush-Goel-04 Aug 19, 2023
0b5a326
Update default.py
Aayush-Goel-04 Aug 19, 2023
def2d98
Merge branch 'Aayush-Goel-04/Issue#520' of https://github.com/Aayush-…
Aayush-Goel-04 Aug 19, 2023
039fdbd
Update utils.py
Aayush-Goel-04 Aug 19, 2023
8a0e61b
Merge branch 'master' into Aayush-Goel-04/Issue#520
Aayush-Goel-04 Aug 19, 2023
f6058b1
Update default.py
Aayush-Goel-04 Aug 19, 2023
dc399c3
Merge branch 'master' into Aayush-Goel-04/Issue#520
Aayush-Goel-04 Aug 27, 2023
c5302cd
prevalence db update
Aayush-Goel-04 Aug 27, 2023
430bde6
Update default.py
Aayush-Goel-04 Aug 27, 2023
7f1566d
Update capa/render/default.py
Aayush-Goel-04 Aug 28, 2023
24541b6
Merge branch 'mandiant:master' into Aayush-Goel-04/Issue#520
Aayush-Goel-04 Sep 6, 2023
6787555
updated default render
Aayush-Goel-04 Sep 6, 2023
7c84926
Update utils.py
Aayush-Goel-04 Sep 6, 2023
c1f9e72
Revert "Update utils.py"
Aayush-Goel-04 Sep 6, 2023
7d6ec15
Merge branch 'mandiant:master' into Aayush-Goel-04/Issue#520
Aayush-Goel-04 Oct 9, 2023
5c1464c
Merge branch 'mandiant:master' into Aayush-Goel-04/Issue#520
Aayush-Goel-04 Oct 10, 2023
8ede526
Resolving path issues
Aayush-Goel-04 Oct 10, 2023
4476b2c
Update utils.py
Aayush-Goel-04 Oct 10, 2023
6077e99
Update utils.py
Aayush-Goel-04 Oct 10, 2023
bc0d129
Update pyinstaller.spec
Aayush-Goel-04 Oct 16, 2023
12dea73
Merge branch 'mandiant:master' into Aayush-Goel-04/Issue#520
Aayush-Goel-04 Oct 16, 2023
3bce5a9
Merge branch 'mandiant:master' into Aayush-Goel-04/Issue#520
Aayush-Goel-04 Oct 17, 2023
5a0a3a5
Update CHANGELOG.md
Aayush-Goel-04 Oct 20, 2023
e4bb521
Merge branch 'master' into Aayush-Goel-04/Issue#520
Aayush-Goel-04 Oct 20, 2023
fe4af5c
render output with prevalence for (v) verbose
Aayush-Goel-04 Oct 20, 2023
95bdf5d
Update utils.py
Aayush-Goel-04 Oct 20, 2023
af57da8
Update RuleMetaData with Prevalence
Aayush-Goel-04 Nov 12, 2023
8057a73
Apply suggestions from code review
Aayush-Goel-04 Nov 12, 2023
5102ca1
Imports, Paths, Comments & Exceptions handled
Aayush-Goel-04 Nov 16, 2023
07553a6
Update result_document.py
Aayush-Goel-04 Nov 16, 2023
2c4931d
Update result_document.py
Aayush-Goel-04 Nov 20, 2023
c531a15
Merge branch 'master' into Aayush-Goel-04/Issue#520
Aayush-Goel-04 Feb 3, 2024
61e7459
Added prevalence to verbose
Aayush-Goel-04 Feb 3, 2024
66d0ab7
linter checks
Aayush-Goel-04 Feb 3, 2024
e3ca32b
Revert "linter checks"
Aayush-Goel-04 Feb 3, 2024
f084040
Update result_document.py
Aayush-Goel-04 Feb 3, 2024
b07d600
Merge branch 'master' into Aayush-Goel-04/Issue#520
Aayush-Goel-04 Feb 5, 2024
10d2140
Convert database to python files
Aayush-Goel-04 Feb 5, 2024
9bebffc
Lint checks
Aayush-Goel-04 Feb 5, 2024
fa89f44
Delete rules_prevalence.json.gz
Aayush-Goel-04 Feb 25, 2024
d93f135
Merge branch 'mandiant:master' into Aayush-Goel-04/Issue#520
Aayush-Goel-04 Feb 25, 2024
08ea4a9
Merge branch 'mandiant:master' into Aayush-Goel-04/Issue#520
Aayush-Goel-04 Mar 6, 2024
7992b1b
Merge branch 'master' into Aayush-Goel-04/Issue#520
mr-tz Mar 13, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,7 @@
- binja: add support for forwarded exports #1646 @xusheng6
- binja: add support for symtab names #1504 @xusheng6
- add com class/interface features #322 @Aayush-goel-04
- Show prevalence of rules in the output #520 @Aayush-Goel-04

### Breaking Changes

Expand Down
Binary file not shown.
55 changes: 45 additions & 10 deletions capa/render/default.py
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,7 @@
# See the License for the specific language governing permissions and limitations under the License.

import collections
from typing import Dict

import tabulate

Expand Down Expand Up @@ -72,19 +73,30 @@ def rec(match: rd.Match):

def render_capabilities(doc: rd.ResultDocument, ostream: StringIO):
"""
render capabilities sorted by:
- prevalence (rare to unknown)
- namespace (alphabetical)

example::
Aayush-Goel-04 marked this conversation as resolved.
Show resolved Hide resolved

+-------------------------------------------------------+-------------------------------------------------+
| CAPABILITY | NAMESPACE |
|-------------------------------------------------------+-------------------------------------------------|
| check for OutputDebugString error (2 matches) | anti-analysis/anti-debugging/debugger-detection |
| read and send data from client to server | c2/file-transfer |
| ... | ... |
+-------------------------------------------------------+-------------------------------------------------+
+-------------------------------------------------------+-------------------------------------------------+------------+
| CAPABILITY | NAMESPACE | PREVALENCE |
|-------------------------------------------------------+-------------------------------------------------|------------|
| check for OutputDebugString error (2 matches) | anti-analysis/anti-debugging/debugger-detection | rare |
| ... | ... | ... |
|-------------------------------------------------------|-------------------------------------------------|------------|
| read and send data from client to server | c2/file-transfer | common |
| ... | ... | ... |
+-------------------------------------------------------+-------------------------------------------------+------------+
"""
subrule_matches = find_subrule_matches(doc)

rows = []
# seperate rules based on their prevalence
common: Dict[str, str] = {"capability": "", "namespace": "", "prevalence": ""}
had_common = False
rare: Dict[str, str] = {"capability": "", "namespace": "", "prevalence": ""}
had_rare = False

for rule in rutils.capability_rules(doc):
if rule.meta.name in subrule_matches:
# rules that are also matched by other rules should not get rendered by default.
Expand All @@ -97,11 +109,34 @@ def render_capabilities(doc: rd.ResultDocument, ostream: StringIO):
capability = rutils.bold(rule.meta.name)
else:
capability = f"{rutils.bold(rule.meta.name)} ({count} matches)"
rows.append((capability, rule.meta.namespace))

namespace = rule.meta.namespace if rule.meta.namespace is not None else ""
prevalence = rutils.bold(rule.meta.prevalence) if rule.meta.prevalence != "unknown" else "unknown"

if "rare" in prevalence:
rare["capability"] += capability + "\n"
rare["namespace"] += namespace + "\n"
rare["prevalence"] += prevalence + "\n"
had_rare = True
else:
common["capability"] += capability + "\n"
common["namespace"] += namespace + "\n"
common["prevalence"] += prevalence + "\n"
had_common = True

rows = []
if had_rare:
rows.append((rare["capability"], rare["namespace"], rare["prevalence"]))
if had_common:
rows.append((common["capability"], common["namespace"], common["prevalence"]))

if rows:
ostream.write(
tabulate.tabulate(rows, headers=[width("Capability", 50), width("Namespace", 50)], tablefmt="mixed_outline")
tabulate.tabulate(
rows,
headers=[width("Capability", 50), width("Namespace", 50), width("Prevalence", 10)],
tablefmt="mixed_grid",
)
)
ostream.write("\n")
else:
Expand Down
16 changes: 16 additions & 0 deletions capa/render/result_document.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,13 +5,17 @@
# Unless required by applicable law or agreed to in writing, software distributed under the License
# is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and limitations under the License.
import gzip
import json
import datetime
import collections
from typing import Dict, List, Tuple, Union, Literal, Optional
from pathlib import Path
from functools import lru_cache

from pydantic import Field, BaseModel, ConfigDict

import capa.main
import capa.rules
import capa.engine
import capa.features.common
Expand Down Expand Up @@ -501,9 +505,20 @@ class MaecMetadata(FrozenModel):
model_config = ConfigDict(frozen=True, populate_by_name=True)


@lru_cache(maxsize=None)
def load_rules_prevalence() -> Dict[str, str]:
CD = capa.main.get_default_root()
file = CD / "assets" / "rules_prevalence_data" / "rules_prevalence.json.gz"
if not file.exists():
return {}
with gzip.open(file, "rb") as gzfile:
return json.loads(gzfile.read().decode("utf-8"))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

while we're at it, is it worth defining a pydantic data model for the DB file/format?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks like the format is dict[rule name, prevalence] which will be hard to represent in pydantic, unless we enumerate all the rule names as potential values. i think the type hint above is a good start. still, adding some comments here showing a snippet of the file would be valuable.



class RuleMetadata(FrozenModel):
name: str
namespace: Optional[str] = None
prevalence: str = "unknown"
authors: Tuple[str, ...]
scope: capa.rules.Scope
attack: Tuple[AttackSpec, ...] = Field(alias="att&ck")
Expand All @@ -521,6 +536,7 @@ def from_capa(cls, rule: capa.rules.Rule) -> "RuleMetadata":
return cls(
name=rule.meta.get("name"),
namespace=rule.meta.get("namespace"),
prevalence=load_rules_prevalence().get(rule.meta.get("name"), "unknown"),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is the rule prevalence database distributed with capa the library? i think its important that people be able to use capa the library without maintaining this database. so perhaps we want to handle the case of the database not existing here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In case database is not present, all rule matches will have prevalence as unknown in the results.
image

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe we can provide a warning if no db is found (in case that's not already there) pointing to one and explaining shortly what it does

authors=rule.meta.get("authors"),
scope=capa.rules.Scope(rule.meta.get("scope")),
attack=tuple(map(AttackSpec.from_str, rule.meta.get("att&ck", []))),
Expand Down
1 change: 0 additions & 1 deletion capa/render/utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,6 @@
# Unless required by applicable law or agreed to in writing, software distributed under the License
# is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and limitations under the License.

import io
from typing import Union, Iterator

Expand Down
3 changes: 3 additions & 0 deletions capa/render/verbose.py
Original file line number Diff line number Diff line change
Expand Up @@ -139,6 +139,9 @@ def render_rules(ostream, doc: rd.ResultDocument):

rows.append((key, v))

prevalence = rutils.bold(rule.meta.prevalence) if rule.meta.prevalence != "unknown" else "unknown"
rows.insert(1, ("prevalence", prevalence))

if rule.meta.scope != capa.rules.FILE_SCOPE:
locations = [m[0] for m in doc.rules[rule.meta.name].matches]
rows.append(("matches", "\n".join(map(format_address, locations))))
Expand Down
3 changes: 3 additions & 0 deletions capa/render/vverbose.py
Original file line number Diff line number Diff line change
Expand Up @@ -305,6 +305,9 @@ def render_rules(ostream, doc: rd.ResultDocument):
# library rules should not have a namespace
rows.append(("namespace", rule.meta.namespace))

prevalence = rutils.bold(rule.meta.prevalence) if rule.meta.prevalence != "unknown" else "unknown"
rows.append(("prevalence", prevalence))

if rule.meta.maec.analysis_conclusion or rule.meta.maec.analysis_conclusion_ov:
rows.append(
(
Expand Down
14 changes: 3 additions & 11 deletions capa/rules/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -16,18 +16,9 @@
import binascii
import collections
from enum import Enum
from pathlib import Path

from capa.helpers import assert_never

try:
from functools import lru_cache
except ImportError:
# need to type ignore this due to mypy bug here (duplicate name):
# https://github.com/python/mypy/issues/1153
from backports.functools_lru_cache import lru_cache # type: ignore

from typing import Any, Set, Dict, List, Tuple, Union, Iterator, Optional
from pathlib import Path
from functools import lru_cache

import yaml
import pydantic
Expand All @@ -43,6 +34,7 @@
import capa.features.common
import capa.features.basicblock
from capa.engine import Statement, FeatureSet
from capa.helpers import assert_never
from capa.features.common import MAX_BYTES_FEATURE_SIZE, Feature
from capa.features.address import Address

Expand Down
Loading