Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update Metadata type in capa main #1502

Merged

Conversation

Aayush-Goel-04
Copy link
Contributor

Closes Issue #1411
Closes PR #1444

Checklist

  • No CHANGELOG update needed
  • No new tests needed
  • No documentation update needed

@Aayush-Goel-04
Copy link
Contributor Author

Aayush-Goel-04 commented May 31, 2023

I have made some temporary changes here, Will add from_capa apis to other Models and update it soon.
IDA also used meta as dict, so changed that too.

@Aayush-Goel-04 Aayush-Goel-04 changed the title Aayush goel 04/issue#1411 Update Metadata type in capa main May 31, 2023
@Aayush-Goel-04
Copy link
Contributor Author

capa/capa/main.py

Lines 243 to 255 in 8d016de

def find_capabilities(ruleset: RuleSet, extractor: FeatureExtractor, disable_progress=None) -> Tuple[MatchResults, Any]:
all_function_matches = collections.defaultdict(list) # type: MatchResults
all_bb_matches = collections.defaultdict(list) # type: MatchResults
all_insn_matches = collections.defaultdict(list) # type: MatchResults
meta = {
"feature_counts": {
"file": 0,
"functions": {},
},
"library_functions": {},
} # type: Dict[str, Any]

Here should I change feature_counts & library_functions dict to their Respective models mentioned in ResultDocument. Then we won't have to recast meta in dict and then to Metadata. It will be easy to update any changes.

@williballenthin
Copy link
Collaborator

would you mark this PR as a draft until you're ready for review?

@Aayush-Goel-04 Aayush-Goel-04 marked this pull request as draft May 31, 2023 10:26
@Aayush-Goel-04
Copy link
Contributor Author

I have looked into meta types used in main, direct partial update to meta data with outputs from find_capabilities and compute_layout is not possible since froze = True for FrozenModel Types. So where ever we want to update metadata we have do it via __dict__.update(data).

@williballenthin
Copy link
Collaborator

williballenthin commented Jun 1, 2023 via email

capa/main.py Outdated
meta["analysis"].update(counts)
meta["analysis"]["layout"] = compute_layout(rules, extractor, capabilities)

meta.analysis.__dict__.update(counts)
Copy link
Contributor Author

@Aayush-Goel-04 Aayush-Goel-04 Jun 1, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don’t think there’s a good reason for it to be frozen.

If Frozen = True, we can't do direct item assignment, here, for example using:

meta.analysis.features_counts = counts["feature_counts"]

for the changes I made in this PR shows error:

File "pydantic/main.py", line 359, in pydantic.main.BaseModel.__setattr__
TypeError: "Analysis" is immutable and does not support item assignment

thus, it is updated via __dict__.update(counts).
@williballenthin If frozen is removed we won't need to update it via dict.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lets remove frozen so that we can do the updates more easily

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then we won't need from_capa functionality for most of the classes in result_document since we can easily update meta.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@williballenthin if we are removing frozen, I think we should change name FrozenModel class, as it can be misleading. Can you suggest what I should name it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also can you look into script capa_as_library.py , since we have set meta format as MetaData everywhere. Should the output format in script as Dictionary be removed.

capa/main.py Outdated
Comment on lines 768 to 770
return rdoc.Metadata.from_capa(
{
"timestamp": datetime.datetime.now().isoformat(),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ideally we can just use the rdoc.Metadata constructor directly here.

Comment on lines 143 to 145
return capa.render.result_document.Metadata.from_capa(
{
"timestamp": datetime.datetime.now().isoformat(),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ideally we can just use the Metadata constructor directly here.

@williballenthin
Copy link
Collaborator

you're doing good work @Aayush-Goel-04, keep it up! thank you!

@Aayush-Goel-04 Aayush-Goel-04 marked this pull request as ready for review June 3, 2023 10:06
capa/main.py Outdated
@@ -1198,7 +1209,7 @@ def main(argv=None):
return E_FILE_LIMITATION

# TODO: #1411 use a real type, not a dict here.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# TODO: #1411 use a real type, not a dict here.

Comment on lines 25 to 27
class FrozenModel(BaseModel):
class Config:
frozen = True
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lets not call this Frozen if its not frozen.

for things that can still stay frozen, lets use the existing FrozenModel. but lets also introduce a Model class for things that aren't frozen.

Copy link
Collaborator

@williballenthin williballenthin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these changes look awesome! the only thing i'd like to see is the new Model class, then lets merge and get on to the next thing! thanks @Aayush-Goel-04!

Copy link
Collaborator

@williballenthin williballenthin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm
🎉

@williballenthin williballenthin merged commit 28629b3 into mandiant:master Jun 6, 2023
@Aayush-Goel-04
Copy link
Contributor Author

Hey @williballenthin @mr-tz ,
I am really interested in malware analysis / security. Can you recommend me some good issues to work on.

Thanks

@mr-tz
Copy link
Collaborator

mr-tz commented Jun 9, 2023

Awesome, some issues that come to mind:

I think these may be a good fit for your interests and skill set! Happy to hear your thoughts on those or any other open issue. We really appreciate your interest and all the great contributions!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants