Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extracts web domain and IP address, implements rendering functions and tests #1944

Closed
wants to merge 1 commit into from

Conversation

aaronatp
Copy link
Contributor

@aaronatp aaronatp commented Jan 24, 2024

This PR partially resolves #1907. It extracts web domains and IP addresses, and implements rendering functions and tests.

These changes likely don't require updates to the documentation, but if some users want to, they should be able to repurpose many of the extraction functions fairly easily.

Unfortunately, I'll probably be unavailable during the next few days, but this weekend, I'll ensure this PR passes the CI tests.

I'll probably also add some more tests for the rendering functions.

Please let me know if you have any questions or suggestions!

Below is example output for the default mode:

    +------------------------------+
    | IP addresses and web domains |
    |------------------------------+
    | google.com                   |
    | 192.123.232.08               |
    | my-w3bs1te.net               |
    | maliciooous.r4ndom-site.uhoh |
    | whoops.net                   |
    +------------------------------+

Here is example output for verbose and vverbose modes:

    +-----------------------------------------------------------+
    | IP addresses and web domains                              |
    |-----------------------------------------------------------+
    | google.com                                                |
    |    |----IP address:                                       |
    |            |----192.0.0.1                                 |
    |    |----Functions used to communicate with google.com:    |
    |            |----InternetConnectA                          |
    |            |----HttpOpenRequestA                          |
    |            |----FtpGetFileA                               |
    |    |----3 occurrances                                     |
    |                                                           |                                                                          |
    | 192.123.232.08                                            |
    |    |----Functions used to communicate with 192.123.232.08:|
    |            |----...                                       |
    |                                                           |
    +-----------------------------------------------------------+

Checklist

  • No CHANGELOG update needed
  • No new tests needed
  • No documentation update needed

…d tests

This PR partially resolves mandiant#1907. It extracts web domains and IP addresses, and implements rendering functions and tests.

These changes likely don't require updates to the documentation, but if some users want to, they should be able to repurpose many of the extraction functions without too much trouble.

Unfortunately, I'll probably be unavailable during the next few days, but this weekend, I'll ensure the PR passes the CI tests.

I'll probably also add some more tests for the rendering functions.

Please let me know if you have any questions or suggestions!

Below is example output for the default mode:

        +------------------------------+
        | IP addresses and web domains |
        |------------------------------+
        | google.com                   |
        | 192.123.232.08               |
        | my-w3bs1te.net               |
        | maliciooous.r4ndom-site.uhoh |
        | whoops.net                   |
        +------------------------------+

Here is example output for verbose and vverbose modes:

        +-----------------------------------------------------------+
        | IP addresses and web domains                              |
        |-----------------------------------------------------------+
        | google.com                                                |
        |    |----IP address:                                       |
        |            |----192.0.0.1                                 |
        |    |----Functions used to communicate with google.com:    |
        |            |----InternetConnectA                          |
        |            |----HttpOpenRequestA                          |
        |            |----FtpGetFileA                               |
        |    |----3 occurrances                                     |
        |                                                           |                                                                          |
        | 192.123.232.08                                            |
        |    |----Functions used to communicate with 192.123.232.08:|
        |            |----...                                       |
        |                                                           |
        +-----------------------------------------------------------+
@mr-tz
Copy link
Collaborator

mr-tz commented Jan 26, 2024

very cool, I'll have to take a closer look in the upcoming week at this! thanks for the suggestions.

@aaronatp
Copy link
Contributor Author

Thanks @mr-tz! I'm just working on a couple bugs so I'll lyk when it's done!

@mr-tz mr-tz added the dont merge Indicate a PR that is still being worked on label Jan 31, 2024
CD = Path(__file__).resolve().parent.parent.parent

# these constants are also defined in capa.main
# defined here to avoid a circular import
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhpas define a script with all these constants inside? It is better than having repeated code

from capa.render.result_document import ResultDocument
from capa.features.extractors.base_extractor import FeatureExtractor

CD = Path(__file__).resolve().parent.parent.parent
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CD is for current directory?

for tuple in obj:
strings.append(tuple[0])

return strings
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Return [tuple[0] for tuple in obj]

"""
invalid_list = ["win", "exe", "dll", "med"] # add more to this list

for domain in invalid_list:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Return not string in invalid_list

if re.search(DOMAIN_PATTERN, string):
if not invalid_domain(string):
try:
domain_counts[string] += 1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

domain_counts[string] = domain_counts.get(string, 0) + 1

In this way you don't use a try block. Faster and better


elif is_ip_addr(string):
try:
ip_counts[string] += 1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above

@mr-tz
Copy link
Collaborator

mr-tz commented Mar 22, 2024

can this be closed as superseded by #2031?

@aaronatp
Copy link
Contributor Author

@mr-tz Yes, I'll go ahead and close it!

@aaronatp aaronatp closed this Mar 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
dont merge Indicate a PR that is still being worked on
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Extract indicators (HBI/NBI) around capability detections
3 participants