Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Rapid7 InsightVM source #1010

Open
wants to merge 33 commits into
base: master
Choose a base branch
from
Open

Add Rapid7 InsightVM source #1010

wants to merge 33 commits into from

Conversation

juju4
Copy link
Contributor

@juju4 juju4 commented Oct 8, 2022

Add Ingestion of Rapid7 InsightVM assets as source. (follow-up of #1000)
Include initial test data and draft schema

Bugs

  • fix import of array and dict, or needed values (addresses, configurations...) to get instance_id, subscription_d and resource_id which are required to make relationship with AzureVirtualMachine.
  • fix mypy issues.

Pending

  • AzureVirtualMachine relationship.
  • import vulnerabilities' detections.
  • may want cleanup retention to be configurable. Typically, EDR keep inventories for 30-45 days.

Reviewed with pylint and black

@ramonpetgrave64
Copy link
Contributor

This is really cool, and thank you!
I can see that this is still draft from you, but I'll try to comment more closely this week.

@juju4
Copy link
Contributor Author

juju4 commented Oct 15, 2022

Like mde, good for review and merge.

cartography/data/jobs/rapid7_import_cleanup.json Outdated Show resolved Hide resolved
.github/workflows/test_suite.yml Outdated Show resolved Hide resolved
cartography/intel/rapid7/util.py Outdated Show resolved Hide resolved
cartography/intel/rapid7/util.py Show resolved Hide resolved
cartography/intel/rapid7/util.py Outdated Show resolved Hide resolved
docs/schema/rapid7.md Outdated Show resolved Hide resolved
docs/schema/rapid7.md Outdated Show resolved Hide resolved
Comment on lines 49 to 59
TBD
* Azure Tenant contains one or more Rapid7 Hosts.
```
(AzureTenant)-[RESOURCE]->(Rapid7Host)
```
* Azure Subscription contains one or more Rapid7 Hosts.
```
(AzureSubscription)-[RESOURCE]->(Rapid7Host)
```
* Azure Virtual Machine is one single Rapid7 Host.
* Similarly for other cloud providers and Onpremises.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still TBD? Your changes do not create these relationships.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On relationships, I would be interested into some advices.
It does not seem most of the sources of cartography are doing cross-sources relations.
But this is an important point to me.

  • link Cloud VM with tools present on it.
  • problem, tool may have not always the information that makes relation certain. In case of Azure
  • best case is to have resourceid (mde, rapid7 but does not seem always retrieved)
  • crowdstrike gets only subscriptionid, resourcegroup and hostname
  • worst case, only have hostname

I started to make AzureVM relation based on resourceid and where not available AzureSubscription based on subscriptionid.
I removed the latter as I found it making graph less readable.
I was thinking to use PRESENT_IN if relation based on resourceid and MAYBE_PRESENT_IN if relation based just on hostname.

What do you think?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

feedback on this?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have much context on how Azure works. But our preference is to only be certain about the relationships we create.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the problem is always the same when dealing with multiple tools. is there a reliable key to use to join?
Azure resourceid could be but sadly not fetched reliably by most tools, at least in my context.
hostname/short_hostname is the default join key but hardly reliable (hostname reused, different case depending on tool, different length enforcement depending on tool - some enforce 15-chars windows limit...)

At this point, I'm plan to make relation base on resourceid or hostname/short_hostname.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

more to add here?

cartography/intel/rapid7/util.py Show resolved Hide resolved
cartography/intel/rapid7/util.py Outdated Show resolved Hide resolved
@juju4
Copy link
Contributor Author

juju4 commented Oct 29, 2022

As said, would need a bit of help on mypy error as less familiar with the tool.

Also, for some reason, CLA check is not green, but when going to https://oss.lyft.com/cla, it is marked signed and current. I did it a few weeks ago.

@juju4
Copy link
Contributor Author

juju4 commented Oct 29, 2022

Ah,CLA check is green now

@juju4
Copy link
Contributor Author

juju4 commented Nov 5, 2022

on my side, linter looks all good.
only pyupgrade is failing but not related to code and did recent change to 2.31. 0 (3.2.0 works ok for me)

cartography/cli.py Outdated Show resolved Hide resolved
cartography/cli.py Outdated Show resolved Hide resolved
Input: dataframe row configurations column
"""

if configurations is None or not isinstance(configurations, list):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does the data returned from rapid7s api sometimes have empty fields?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fine to resolve?

azure = json.loads(line["value"])
else:
azure = line["value"]
instance_id = azure["instanceId"]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will and "azure" value always be an instance?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if azure type, I believe there will be instanceid
but not all azure resources are correctly marked azure type

Copy link
Contributor

@ramonpetgrave64 ramonpetgrave64 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please do try to address the pylint comments.

@juju4
Copy link
Contributor Author

juju4 commented Nov 13, 2022

I also have a mypy error that I'm not sure how to address

cartography/intel/rapid7/util.py:171: error: Unsupported target for indexed assignment ("Tuple[str, str, str, bool]")  [index]

cartography/intel/rapid7/util.py Show resolved Hide resolved
cartography/intel/rapid7/util.py Outdated Show resolved Hide resolved
cartography/intel/rapid7/util.py Show resolved Hide resolved
cartography/intel/rapid7/util.py Outdated Show resolved Hide resolved
cartography/intel/rapid7/util.py Outdated Show resolved Hide resolved
cartography/intel/rapid7/util.py Outdated Show resolved Hide resolved
cartography/intel/rapid7/util.py Outdated Show resolved Hide resolved
cartography/intel/rapid7/util.py Outdated Show resolved Hide resolved
cartography/intel/rapid7/util.py Outdated Show resolved Hide resolved
cartography/intel/rapid7/util.py Outdated Show resolved Hide resolved
@ramonpetgrave64
Copy link
Contributor

I was on vacation for the past month, but I'll be taking another look at this PR again next week.

cartography/intel/rapid7/util.py Outdated Show resolved Hide resolved
Even w 48GB RAM, only up to 13k...
"""

if authorization[2] and authorization[4]:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we're checking for positions in this 6-tuple, It's better to convert the authorization to a class or named tuple
https://docs.python.org/3/library/collections.html?highlight=namedtuple#collections.namedtuple

cartography/intel/rapid7/util.py Outdated Show resolved Hide resolved
cartography/intel/rapid7/util.py Outdated Show resolved Hide resolved
nodes = neo4j_session.run(
"""
MATCH (n:Rapid7Host)
RETURN n.id
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Try to add one more test to check for the relationship to the Azure hosts.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

did a quick attempt. testing needed

|-------|--------------|
| firstseen| Timestamp of when a sync job first discovered this node |
| lastupdated | Timestamp of the last time the node was updated |
| r7_id | id |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we know this is a Rapid7Host node, there's no need to prefix each of the properties with "r7_".

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMHO, it depends on each product which may have different output.
Typically for IP/MAC address, if multiple cards, there is no certainty that each product will pick the same. same for os product, version... no shared taxonomy sadly.

if still want to remove prefix, please say if everywhere or only part of fields:

size_interval = 500
result_array: List[Any] = []
df_rapid7_tmp = pandas.DataFrame()
while count < limit:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we have a limit? Is it because of API ratelimitting? If that's the case, consider using the @backoff decorator.
https://github.com/lyft/cartography/blob/master/cartography/util.py#L206

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No API rate-limiting AFAIK, more response body size and experience that it is more likely to fail if more depending on context. at least, not on doc https://help.rapid7.com/insightvm/en-us/api/index.html#operation/findAssets

Another reason to use the report option.

else:
nexpose_verify_cert2 = True

count = total_resources = 0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like count is unused.

cartography/intel/rapid7/util.py Show resolved Hide resolved
cartography/intel/rapid7/util.py Outdated Show resolved Hide resolved
@juju4
Copy link
Contributor Author

juju4 commented Jan 7, 2023

Currently, this can get data from

  • API getAssets
  • local report file
  • remote report file directly from vulnerability management server

I don't know if should keep the API one which has limitations mostly for large inventory.
After, there are probably some cases where it is better than report.
On my side, using remote report file now in a satisfying manner.

Mypy typing errors that I didn't solve (or was impacting functionalities), all related to extract_rapid7_configurations_* functions.
https://github.com/juju4/cartography/actions/runs/3862847617/jobs/6584667101#step:4:67

cartography/intel/rapid7/util.py:51: error: Function is missing a type annotation for one or more arguments  [no-untyped-def]
cartography/intel/rapid7/util.py:80: error: Function is missing a type annotation for one or more arguments  [no-untyped-def]
cartography/intel/rapid7/util.py:104: error: Function is missing a type annotation for one or more arguments  [no-untyped-def]
cartography/intel/rapid7/util.py:128: error: Function is missing a type annotation for one or more arguments  [no-untyped-def]
Found 4 errors in 1 file (checked 441 source files)

@juju4
Copy link
Contributor Author

juju4 commented Feb 25, 2023

is there something that I can help to move this forward?
outside of the points left above for which I would like inputs too.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants