Implement different ASFF fixing mechanism and other improvements (#88)

* support-trivy-asff-output * better-error-handling-no-credentials * better-error-hanlding * check-tags-in-resource * linter * parse-region-from-resource * comment-function * linter * update-aws-arn * readme-input-asff * readme * check-resource-type * bump-aws-arn * add-Container
gabrielsoltz · Mar 23, 2024 · 8509d24 · 8509d24
1 parent 9015216
commit 8509d24
Show file tree

Hide file tree

Showing 13 changed files with 297 additions and 111 deletions.
diff --git a/README.md b/README.md
@@ -64,13 +64,6 @@ Read your security findings from AWS Security Hub with the default filters and e
 ./metahub
 ```
 
-Read your security findings from Prowler as an input file and executes the default context options:
-
-```bash
-python3 prowler.py aws -M json-asff -q
-./metahub --inputs file-asff --input-asff /path/to/prowler-findings.json.asff
-```
-
 Read a specific (filtered by Id) security finding from AWS Security Hub and executes the default context options:
 
 ```bash
@@ -89,6 +82,31 @@ Read all the security findings affecting an AWS Account which are ACTIVE (filter
 ./metahub --sh-filters RecordState=ACTIVE AwsAccountId=123456789012 --mh-filters-tags Environment=stg --context config tags
 ```
 
+# Quick Run (Reading findings from a input ASFF file)
+
+Read your security findings from Prowler as an input file and executes the default context options:
+
+```bash
+python3 prowler.py aws -M json-asff -q
+./metahub --inputs file-asff --input-asff /path/to/prowler-findings.json.asff
+```
+
+Read your security findings from Powerpipe as an input file and executes the default context options:
+
+```bash
+powerpipe benchmark run aws_compliance.benchmark.all_controls --export asff
+./metahub --inputs file-asff --input-asff /path/to/powerpipe-findings.json.asff
+```
+
+Read your security findings from Trivy as an input file and executes the default context options:
+
+```bash
+export AWS_REGION=us-west-1
+export AWS_ACCOUNT_ID=317105492065
+trivy image --format template --template "@contrib/asff.tpl" -o trivy-findings.json.asff public.ecr.aws/n2p8q5p4/metahub:stable
+./metahub --inputs file-asff --input-asff /path/to/trivy-findings.json.asff
+```
+
 # Context
 
 In **MetaHub**, **context** refers to information about the affected resources like their **configuration**, **associations**, **logs**, **tags** and **account**.
@@ -985,22 +1003,28 @@ The minimum policy needed for context includes the managed policy `arn:aws:iam::
 
 # Inputs
 
-MetaHub can read security findings directly from AWS Security Hub using its API. If you don't use Security Hub, you can use any ASFF-based scanner. Most cloud security scanners support the ASFF format. Check with them or leave an issue if you need help.
+MetaHub can read security findings directly from AWS Security Hub using its API. If you don't use Security Hub, you can use any ASFF-compatible scanner. Most cloud security scanners support the ASFF format like Prolwer, Steampipe, Trivy, and more.
 
-If you want to read from an input ASFF file, you need to use the options:
+If you want to read from an input ASFF file, you need to use the option (`--inputs file-asff`) and provide the path to the file. You can provide multiple files separated by a space.:
 
 ```sh
 ./metahub.py --inputs file-asff --input-asff path/to/the/file.json.asff path/to/the/file2.json.asff
 ```
 
-You also can combine AWS Security Hub findings with input ASFF files specifying both inputs:
+You also can combine AWS Security Hub findings with input ASFF files specifying both inputs (`--inputs file-asff securityhub`). MetaHub will process all findings together and end up with a single output.:
 
 ```sh
 ./metahub.py --inputs file-asff securityhub --input-asff path/to/the/file.json.asff
 ```
 
 When using a file as input, you can't use the option `--sh-filters` for filter findings, as this option relies on AWS API for filtering. You can't use the options `--update-findings` or `--enrich-findings` as those findings are not in the AWS Security Hub. If you are reading from both sources at the same time, only the findings from AWS Security Hub will be updated.
 
+MetaHub also implements some **fixing mechanisms** for the ASFF format, when they are not correctly formatted. This is a best-effort approach to make the ASFF as useful as possible, but it's not perfect and needs to be fixed in the source scanner.
+
+- If the key `Region` is missing from the Resources and from the Root Level, MetaHub will calculate the region based on the ARN of the affected resource.
+- If the ASFF file is not correctly setting the ASFF Resource Type, MetaHub will calculate it based on the ARN of the affected resource using the library [aws-arn](https://github.com/gabrielsoltz/aws-arn)
+- If any other field is missing like `SeverityLabel`, `Workflow`, `RecordState`, `Compliance`, `Id`, `ProductArn` or `StandardsControlArn`, MetaHub will set them to `Unknown`.
+
 # Outputs
 
 **MetaHub** can generate different programmatic and visual outputs. By default, all output modes are enabled: `json-short`, `json-full`, `json-statistics`, `json-inventory`, `html`, `csv`, and `xlsx`. If you want only to generate a specific output mode, you can use the option `--output-modes` with the desired output mode. The outputs will be saved in the `outputs/` folder with the execution date.

diff --git a/lib/AwsHelpers.py b/lib/AwsHelpers.py
@@ -115,18 +115,21 @@ def get_account_alias(logger, aws_account_number, role_name=None, profile=None):
 
 
 def get_boto3_client(logger, service, region, sess, profile=None):
-    if sess:
-        return sess.client(service_name=service, region_name=region)
-    if profile:
-        try:
-            return boto3.Session(profile_name=profile).client(
-                service_name=service, region_name=region
-            )
-        except ProfileNotFound as e:
-            logger.error(
-                "Error getting boto3 client using AWS profile (check --sh-profile): {}".format(
-                    e
+    try:
+        if sess:
+            return sess.client(service_name=service, region_name=region)
+        if profile:
+            try:
+                return boto3.Session(profile_name=profile).client(
+                    service_name=service, region_name=region
                 )
-            )
-            exit(1)
-    return boto3.client(service, region_name=region)
+            except ProfileNotFound as e:
+                logger.error(
+                    "Error getting boto3 client using AWS profile (check --sh-profile): {}".format(
+                        e
+                    )
+                )
+                exit(1)
+        return boto3.client(service, region_name=region)
+    except Exception as e:
+        logger.error("Error getting boto3 client: {}".format(e))
diff --git a/lib/context/context.py b/lib/context/context.py
@@ -6,8 +6,9 @@
 )
 
 import lib.context.resources
-from lib.AwsHelpers import assume_role, get_account_id, get_boto3_client
+from lib.AwsHelpers import assume_role, get_boto3_client
 from lib.config.resources import MetaHubResourcesConfig
+from lib.securityhub import parse_region
 
 
 class Context:
@@ -19,26 +20,34 @@ def __init__(
         mh_filters_tags,
         mh_role,
         cached_associated_resources,
+        current_account_id,
     ):
         self.logger = logger
         self.parse_finding(finding)
         self.get_session(mh_role)
         self.mh_filters_config = mh_filters_config
         self.mh_filters_tags = mh_filters_tags
         self.cached_associated_resources = cached_associated_resources
-        # Move to Config:
         self.drilled_down = True
+        self.current_account_id = current_account_id
+
+    def convert_tags_to_key_value(self, tags):
+        """When reading the Tags from the finding, the format is a list of dictionaries, we need to convert it to a dictionary of key-value pairs"""
+        return [{"Key": key, "Value": value} for key, value in tags.items()]
 
     def parse_finding(self, finding):
         self.finding = finding
         self.resource_account_id = finding["AwsAccountId"]
-        self.resource_type = finding["Resources"][0]["Type"]
-        self.resource_arn = finding["Resources"][0]["Id"]
-        try:
-            self.resource_region = finding["Region"]
-        except KeyError:
-            self.resource_region = finding["Resources"][0]["Region"]
-        self.current_account_id = get_account_id(self.logger)
+        self.resources = finding.get("Resources")
+        if self.resources:
+            self.resource_type = self.resources[0]["Type"]
+            self.resource_arn = self.resources[0]["Id"]
+            self.resource_tags = self.resources[0].get("Tags", False)
+        else:
+            self.resource_type = "Unknown"
+            self.resource_arn = "Unknown"
+            self.resource_tags = False
+        self.resource_region = parse_region(self.resource_arn, self.finding)
 
     def get_session(self, mh_role):
         if mh_role:
@@ -164,40 +173,43 @@ def get_context_tags(self):
         ):
             return resource_tags, resource_matched
 
-        # Execute Tags
-        tags = False
-        client = get_boto3_client(
-            self.logger, "resourcegroupstaggingapi", self.resource_region, self.sess
-        )
+        # Check if Tags are already available in the resource object
+        if not self.resource_tags:
+            tags = False
+            client = get_boto3_client(
+                self.logger, "resourcegroupstaggingapi", self.resource_region, self.sess
+            )
 
-        # Some tools sometimes return incorrect ARNs for some resources, here is an attemp to fix them
-        def fix_arn(arn, resource_type):
-            # Route53 Hosted Zone with Account Id
-            if resource_type == "AwsRoute53HostedZone":
-                if arn.split(":")[4] != "":
-                    fixed_arn = arn.replace(arn.split(":")[4], "")
-                    return fixed_arn
-            return arn
+            # Some tools sometimes return incorrect ARNs for some resources, here is an attemp to fix them
+            def fix_arn(arn, resource_type):
+                # Route53 Hosted Zone with Account Id
+                if resource_type == "AwsRoute53HostedZone":
+                    if arn.split(":")[4] != "":
+                        fixed_arn = arn.replace(arn.split(":")[4], "")
+                        return fixed_arn
+                return arn
 
-        try:
-            response = client.get_resources(
-                ResourceARNList=[fix_arn(self.resource_arn, self.resource_type)]
-            )
             try:
-                tags = response["ResourceTagMappingList"][0]["Tags"]
-            except IndexError:
-                self.logger.info(
-                    "No Tags found for resource: %s (%s)",
+                response = client.get_resources(
+                    ResourceARNList=[fix_arn(self.resource_arn, self.resource_type)]
+                )
+                try:
+                    tags = response["ResourceTagMappingList"][0]["Tags"]
+                except IndexError:
+                    self.logger.info(
+                        "No Tags found for resource: %s (%s)",
+                        self.resource_arn,
+                        self.resource_type,
+                    )
+            except (ClientError, ParamValidationError, Exception) as err:
+                self.logger.warning(
+                    "Error Fetching Tags for resource %s (%s) - %s",
                     self.resource_arn,
                     self.resource_type,
+                    err,
                 )
-        except (ClientError, ParamValidationError, Exception) as err:
-            self.logger.warning(
-                "Error Fetching Tags for resource %s (%s) - %s",
-                self.resource_arn,
-                self.resource_type,
-                err,
-            )
+        else:
+            tags = self.convert_tags_to_key_value(self.resource_tags)
 
         if tags:
             for tag in tags:
@@ -303,12 +315,20 @@ def get_account_organizations(self):
         except ClientError as err:
             organizations = False
             if not err.response["Error"]["Code"] == "AWSOrganizationsNotInUseException":
-                self.logger.warning(
+                self.logger.error(
                     "Failed to describe_organization: %s, for resource: %s - %s",
                     self.resource_account_id,
                     self.resource_arn,
                     err,
                 )
+        except Exception as err:
+            organizations = False
+            self.logger.error(
+                "Failed to describe_organization: %s, for resource: %s - %s",
+                self.resource_account_id,
+                self.resource_arn,
+                err,
+            )
         return organizations
 
     def get_account_organizations_details(self):
@@ -410,21 +430,21 @@ def get_account_alternate_contact(self, alternate_contact_type="SECURITY"):
                 alternate_contact = account_client.get_alternate_contact(
                     AlternateContactType=alternate_contact_type
                 ).get("AlternateContact")
-            except (NoCredentialsError, ClientError, EndpointConnectionError) as err:
+            except (ClientError, EndpointConnectionError) as err:
                 if err.response["Error"]["Code"] == "ResourceNotFoundException":
                     self.logger.info(
                         "No alternate contact found for account %s (%s) - %s",
                         self.resource_account_id,
                         self.resource_arn,
                         err,
                     )
-                else:
-                    self.logger.warning(
-                        "Failed to get_alternate_contact for account %s (%s) - %s",
-                        self.resource_account_id,
-                        self.resource_arn,
-                        err,
-                    )
+            except Exception as err:
+                self.logger.error(
+                    "Failed to get_alternate_contact for account %s (%s) - %s",
+                    self.resource_account_id,
+                    self.resource_arn,
+                    err,
+                )
         return alternate_contact
 
     def get_account_alias(self):

diff --git a/lib/context/resources/AwsIamGroup.py b/lib/context/resources/AwsIamGroup.py
@@ -1,6 +1,5 @@
 """ResourceType: AwsIamGroup"""
 
-
 from botocore.exceptions import ClientError
 
 from lib.AwsHelpers import get_boto3_client

diff --git a/lib/context/resources/AwsIamRole.py b/lib/context/resources/AwsIamRole.py
@@ -1,6 +1,5 @@
 """ResourceType: AwsIamRole"""
 
-
 from botocore.exceptions import ClientError
 
 from lib.AwsHelpers import get_boto3_client

diff --git a/lib/context/resources/AwsIamUser.py b/lib/context/resources/AwsIamUser.py
@@ -1,6 +1,5 @@
 """ResourceType: AwsIamUser"""
 
-
 from datetime import datetime, timezone
 
 from botocore.exceptions import ClientError

diff --git a/lib/context/resources/Container.py b/lib/context/resources/Container.py
@@ -0,0 +1,85 @@
+"""ResourceType: Container"""
+
+from botocore.exceptions import ClientError
+
+from lib.AwsHelpers import get_boto3_client
+from lib.context.resources.Base import ContextBase
+
+
+class Metacheck(ContextBase):
+    def __init__(
+        self,
+        logger,
+        finding,
+        mh_filters_config,
+        sess,
+        drilled=False,
+    ):
+        self.logger = logger
+        self.sess = sess
+        self.mh_filters_config = mh_filters_config
+        self.parse_finding(finding, drilled)
+        self.client = get_boto3_client(
+            self.logger, "ecr-public", self.region, self.sess
+        )
+        self.container = self.describe_container()
+        self.resource_policy = self.get_repository_policy()
+
+    def parse_finding(self, finding, drilled):
+        self.finding = finding
+        self.region = finding["Region"]
+        self.account = finding["AwsAccountId"]
+        self.partition = finding["Resources"][0]["Id"].split(":")[1]
+        self.resource_type = finding["Resources"][0]["Type"]
+        self.resource_arn = finding["Resources"][0]["Id"]
+        if finding["Resources"][0]["Id"].startswith("arn:aws"):
+            self.resource_id = finding["Resources"][0]["Id"].split(":")[-1]
+        elif finding["Resources"][0]["Id"].startswith("public.ecr.aws"):
+            self.resource_id = finding["Resources"][0]["Id"].split("/")[2].split(":")[0]
+
+    # Describe Functions
+    def describe_container(self):
+        try:
+            response = self.client.describe_repositories(
+                repositoryNames=[self.resource_id]
+            )
+            return response
+        except ClientError as err:
+            if not err.response["Error"]["Code"] == "ResourceNotFoundException":
+                self.logger.error(
+                    "Failed to describe_container {}, {}".format(self.resource_id, err)
+                )
+        return False
+
+    # Resource Policy
+
+    def get_repository_policy(self):
+        if self.container:
+            try:
+                response = self.client.get_repository_policy(
+                    repositoryName=self.resource_id
+                )
+                return response
+            except ClientError as err:
+                if (
+                    not err.response["Error"]["Code"]
+                    == "RepositoryPolicyNotFoundException"
+                ):
+                    self.logger.error(
+                        "Failed to get_repository_policy {}, {}".format(
+                            self.resource_id, err
+                        )
+                    )
+        return False
+
+    # Context Config
+
+    def associations(self):
+        associations = {}
+        return associations
+
+    def checks(self):
+        checks = {
+            "resource_policy": self.resource_policy,
+        }
+        return checks