Playbooks (solves #628) (#1123)

* Initialising work on Playbooks * Setting up playbook_config.json * Setting up playbook_config.json * Debugging playbook dataclass file * Debugging playbook serializer file * Debugging playbook serializer file * Debugging playbook serializer file * Debugging core serializer file for playbooks * Debugging core serializer file for playbooks * Setting up test cases, urls and views for playbooks * Fixing playbook tests * Adding playbooks urls * Fixing playbooks urls * Debugging playbooks python_module * Saving progress * Analyze request update. * Updated Job models * Cleaning playbooks_manager * Cleaning playbooks_manager * Adding test playbook values * Cleaning playbooks dataclass * Updating playbook_config.json for testin * Debugging controller file * Debugging controller file * Debugging controller file * Debugging controller file * Adding playbook specific responses * Debugging backend * Optimising job reports for playbooks * Adding frontend scanning support for IntelOwl. * Adding frontend scanning support for IntelOwl. * Adding scan all support for IntelOwl * Getting plugins page ready * Setting up job results page * Refactoring playbook scanform * Fixing runtime_configuration bug * Fixing job status update bug to set job status to running for Playbooks * Fixing job status update bug for playbooks * Fixing job status update bug for playbooks using chords * populating analyzers_to_execute and connectors_to_execute in Playbook APIs * Adding proper logging to Playbooks * Taking care of conflicts * Making it work after taking care of conflicts. * Fixing scanform changes * Fixing scanform changes * Fixing the frontend after merging Playbooks branch * Removing grouping of Playbooks for now * Adding backend changes to support the additions of multiple observables * Taking care of errors in adding Backend support * Fixing dataclass errors after merge * Fixing dataclass errors after merge [All parameters] * Fixing the frontend after taking care of backend merge conflicts * Fixing frontend bugs * Acting on FFlake8 suggestions * Making requested changes (cleaning up the code mostly) * Fixing circular imports issue * Fixing circular import issues by creating a utility.py file * Fixing circular import issues by creating a utility.py file and importing tasks inside the functions themselves. * Fixing invalid arguments bug for filter_playbooks() * Fixing cleaning data in from_dict() for PlaybookConfig * Fixing cleaning data in from_dict() for PlaybookConfig [1] (pop error) * Fixing cleaning data in from_dict() for PlaybookConfig [2] (dictionary iteration error) * Fixing cleaning data in from_dict() for PlaybookConfig [3] (dictionary creation error) * Fixing cleaning data in from_dict() for PlaybookConfig [4] (dictionary creation error) * Returning appropriate response for Playbook endpoints * Cleaning up API response for Playbook endpoints * Fixing up the frontend to show jobIds and redirect accordingly. * Fixing scanpage frontend and backend API bugs * Fixing package.json format * Fixing up the frontend to show jobIds and redirect accordingly [1] * Fixing backend type errors * Adding migrations * Adding migrations * Adding FREE_TO_USE_ANALYZERS Playbooks * Adding Playbook tests. * Adding Playbook tests and removing comments which were for me * Adding playbook test cases * Fixing frontend bugs * Fixing frontend bugs [2] * Fixing frontend bugs [3] * Fixing frontend bugs [4] * Fixing frontend bugs [5] * Fixing frontend bugs where plugins other than Playbooks weren't loading * Fixing import error for logging * Removing utility.py and making all it's functions classmethods/staticmethods of appropriate classes * Fixing logging library's import error * Fixing backend API bugs [1] * fixing uuid import error * Adding pre-commit suggested changes * Fixing frontend bug where requests for files were sent to observable endpoint instead * Fixing frontend bug where requests for files were sent to observable endpoint instead [1] * Fixing frontend bug where requests for files were sent to observable endpoint instead [2] * Fixing parent_playbook=null issue * Fixing parent_playbook=null issue [1] * Adding free to use playbooks with all free analyzers, Fixing supports' * Fixing ObservableTypeWithFile inheritence errors * Adding 'AllTypes' as an ENUM for choices in Playbooks * Fixing inheritence errors in AllTypes * Fixing issue where backend runs any observable for playbooks whether supported or not * Fixing issue where backend runs any observable for playbooks whether supported or not [1] * Adding linting * Enabling multiple observable job results in playbook analyze scan result and adding CodeDoctor suggestions. * Fixing migrations * Untracking yarn.lock * Adding test case for stack_analyzers and fixing AnalyzerActionViewSet perform_retry errors * Adding test cases and fixing frontend bugs * Adding linting * Adding linting * Fixing test cases * Fixing test cases [1] and adding linting * Fixing test cases [2] and adding linting * Fixing test cases [3] and adding linting * Fixing test cases [4] and adding linting * Fixing test cases [5] and adding linting * Fixing test cases [5] and adding linting * Adding suggestions for the frontend. * Adding suggestions for the frontend [1] * Adding suggestions for the frontend [2] * Reducing Description max length * Reducing Description minWidth for Playbooks plugin page * Reducing Description minWidth for Playbooks plugin page [1] * Reducing Description minWidth for Playbooks plugin page [2] * Adding the handling of analyzer/connector report numbers differently on the frontend when playbooks are run * Fixing package.json changes * Fixing test case breaking changes * Fixing test case breaking changes [1] * Fixing the number of analyzers, connectors and playbooks that show up on 'Plugins Executed' * Adding analyzer/connector to playbook toggle through radio buttons * Removing frontend comments * Wrapping up frontend for Playbooks feature along with all known bugs 🎉 * Fixing pre-commit errors * Disabling run_all * Adding frontend support for disabling run_all for playbooks * improving UX for playbooks * Rewriting playbook serializers * Fixing Serializers * Fixing lint errors * Fixing status code 500 for playbook APIs * Fixing bug in playbook serializers that led to no analyzers/connectors being run * Fixing 500 bugs in playbook run APIs * Fixing start_playbook() related errors * Fixing playbook file scan errors * Fixing playbook file scan errors [1] * Fixing playbook file scan errors [2] * Fixing playbook file scan errors [3] * Fixing playbook file scan errors [4] * Fixing serializers and frontend * Fixing serializers [Analyzers and connectors] and frontend * Revert "Fixing serializers and frontend" This reverts commit e7fc34b. * Reverting * Fixing serializers * Fixing serializers * Fixing serializers [1] * Fixing serializers [2] * Fixing serializers validation * Fixing serializers validation for connectors * Adding playbook documentation * Fixing bug where error led to parent_playbook remaining empty * Making it necessary for playbooks to be not empty * Minor fix for the last commit * Making parent_playbook nullable * Adding new migrations and model changes * Fixing multiple values for argument errors * Adding support for playbooks and custom configs & fixing bugs * Fixing response bugs * Fixing response serializer bugs [1] * Fixing tasks for playbooks * Removing unnecessary warnings from showing up on the UI * Adding warning changes for all serializers and optimising filter_connectors * Fixing playbook related model values and optimising before_run() methods * Fixing not null errors due to parent_playbook value * Adding better logging in test cases * Adding debugging logs for test cases * Adding debugging logs for test cases [fixing linting] * Adding debugging logs for test cases [1] * Fixing connector checks during CI checks * Fixing connector serializer * Optimising connector support in playbooks * Fixing CI related connector test case issues * Fixing before_run function for files * Fixing typo in controller function start_playbook() * Adding changes for playbooks * Adding test cases for playbooks * Updating tests for playbooks * Fixing tests * Fixing auto-imports * Fixing test_start_playbooks_observable * Adding TEST_PLAYBOOKS for ci * Adding debugging logs for playbook tests * Handling exceptions in playbook serializers * Fixing linting errors * Moving playbooks up for a while * Covering edge cases for playbooks * Moving playbook test workflow up and covering edge cases for playbooks * Debugging tests for playbooks * Fixing playbook tests * Making playbook tests for a single playbook * Fixing bugs in playbook tests * Fixing bugs in playbook tests [1] * Fixing bugs in playbook tests [2] * Adding documentation and playbook test case for files * Fixing tests for playbook files * Fixing tests for playbook files [adding querydict] * Fixing bugs in tests for playbook files * Removing failing integrations * Removing failing integrations * Removing useless f strings * Removing analyzers which took too long from free playbook provided * Pushing playbooks down in github workflows * Fixing frontend warnings * Bump django from 3.2.14 to 3.2.15 in /requirements (#1144) Bumps [django](https://github.com/django/django) from 3.2.14 to 3.2.15. - [Release notes](https://github.com/django/django/releases) - [Commits](django/django@3.2.14...3.2.15) --- updated-dependencies: - dependency-name: django dependency-type: direct:production ... Signed-off-by: dependabot[bot] <support@github.com> Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * Adding PR suggested changes * Adding instructions for contributors to add free analyzers in free analyzers playbooks * Adding instructions for contributors to add free analyzers in free analyzers playbooks * Letting analyzers fail in playbook tests * Fixing linting * fixing playbook tests * fixing linting errors * Squashing migrations together * Adding instructions in PR templates * adjusted migrations Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Matteo Lodi <30625432+mlodic@users.noreply.github.com>
intelowlproject · Oct 10, 2022 · d4756f1 · d4756f1
1 parent 26c9e92
commit d4756f1
Show file tree

Hide file tree

Showing 62 changed files with 2,631 additions and 467 deletions.
diff --git a/.github/ISSUE_TEMPLATE/new_analyzer.md b/.github/ISSUE_TEMPLATE/new_analyzer.md
@@ -14,7 +14,8 @@ assignees: ''
 ## Type of analyzer
 **this can be observable, file, and docker**
 
+
 ## Why should we use it
 
 
-## Possible implementation
+## Possible implementation
diff --git a/.github/release_template.md b/.github/release_template.md
@@ -16,3 +16,4 @@ WARNING: The release will be live within an hour!
 - [ ] Wait for [dockerHub](https://hub.docker.com/repository/docker/intelowlproject/intelowl) to finish the builds
 - [ ] Merge the PR to the `master` branch. **Note:** Only use "Merge and commit" as the merge strategy and not "Squash and merge". Using "Squash and merge" makes history between branches misaligned.
 - [ ] Remove the "wait" statement in the release description.
+- [ ] If the analyzer is free, Please add it in the `FREE_TO_USE_ANALYZERS` playbook in `playbook_config.json`
diff --git a/.github/workflows/pull_request_automation.yml b/.github/workflows/pull_request_automation.yml
@@ -136,6 +136,10 @@ jobs:
         run: |
           docker/scripts/coverage_test.sh tests.analyzers_manager.test_file_scripts
 
+      - name: "Test: Playbooks Manager"
+        run: |
+          docker/scripts/coverage_test.sh tests.playbooks_manager.test_controller
+
       - name: "Coverage: generate xml and transfer from docker container to host"
         run: |
           docker exec intelowl_uwsgi coverage combine

diff --git a/api_app/analyzers_manager/classes.py b/api_app/analyzers_manager/classes.py
@@ -93,7 +93,9 @@ def _validate_result(self, result, level=0, max_recursion=190):
             result = 9223372036854775807
         return result
 
-    def before_run(self):
+    def before_run(self, *args, **kwargs):
+        parent_playbook = kwargs.get("parent_playbook", "")
+        self.add_parent_playbook(parent_playbook=parent_playbook)
         self.report.update_status(status=self.report.Status.RUNNING)
 
     def after_run(self):
@@ -136,8 +138,8 @@ def __post__init__(self):
             self.observable_classification = self._job.observable_classification
         return super(ObservableAnalyzer, self).__post__init__()
 
-    def before_run(self):
-        super().before_run()
+    def before_run(self, *args, **kwargs):
+        super().before_run(**kwargs)
         logger.info(
             f"STARTED analyzer: {self.__repr__()} -> "
             f"Observable: {self.observable_name}."
@@ -184,8 +186,8 @@ def __post__init__(self):
         self.file_mimetype = self._job.file_mimetype
         return super(FileAnalyzer, self).__post__init__()
 
-    def before_run(self):
-        super().before_run()
+    def before_run(self, *args, **kwargs):
+        super().before_run(**kwargs)
         logger.info(
             f"STARTED analyzer: {self.__repr__()} -> "
             f"File: ({self.filename}, md5: {self.md5})"

diff --git a/api_app/analyzers_manager/constants.py b/api_app/analyzers_manager/constants.py
@@ -20,3 +20,12 @@ class ObservableTypes(models.TextChoices):
     DOMAIN = "domain"
     HASH = "hash"
     GENERIC = "generic"
+
+
+class AllTypes(models.TextChoices):
+    IP = "ip"
+    URL = "url"
+    DOMAIN = "domain"
+    HASH = "hash"
+    GENERIC = "generic"
+    FILE = "file"
diff --git a/api_app/analyzers_manager/controller.py b/api_app/analyzers_manager/controller.py
@@ -4,16 +4,13 @@
 import logging
 from typing import Dict, List
 
-from celery import chord, uuid
-from django.conf import settings
+from celery import chord
 from django.utils.module_loading import import_string
 from rest_framework.exceptions import ValidationError
 
+from intel_owl import tasks
 from intel_owl.celery import app as celery_app
-from intel_owl.consts import DEFAULT_QUEUE
 
-from ..exceptions import AlreadyFailedJobException
-from ..helpers import get_now
 from ..models import Job
 from .classes import DockerBasedAnalyzer
 from .dataclasses import AnalyzerConfig
@@ -27,115 +24,32 @@ def start_analyzers(
     analyzers_to_execute: List[str],
     runtime_configuration: Dict[str, Dict] = None,
 ) -> None:
-    from intel_owl import tasks
 
     # we should not use mutable objects as default to avoid unexpected issues
     if runtime_configuration is None:
         runtime_configuration = {}
 
-    # to store the celery task signatures
-    task_signatures = []
+    cleaned_result = AnalyzerConfig.stack_analyzers(
+        job_id=job_id,
+        analyzers_to_execute=analyzers_to_execute,
+        runtime_configuration=runtime_configuration,
+    )
 
-    # get analyzer config
-    analyzer_dataclasses = AnalyzerConfig.all()
-
-    # get job
-    job = Job.objects.get(pk=job_id)
-    job.update_status(Job.Status.RUNNING)  # set job status to running
-
-    # loop over and create task signatures
-    for a_name in analyzers_to_execute:
-        # get corresponding dataclass
-        config = analyzer_dataclasses[a_name]
-
-        # if disabled or unconfigured (this check is bypassed in STAGE_CI)
-        if not config.is_ready_to_use and not settings.STAGE_CI:
-            logger.info(f"skipping execution of analyzer {a_name}, job_id {job_id}")
-            continue
-
-        # get runtime_configuration if any specified for this analyzer
-        runtime_params = runtime_configuration.get(a_name, {})
-        # gen new task_id
-        task_id = uuid()
-        # construct arguments
-        args = [
-            job_id,
-            config.asdict(),
-            {"runtime_configuration": runtime_params, "task_id": task_id},
-        ]
-        # get celery queue
-        queue = config.config.queue
-        if queue not in settings.CELERY_QUEUES:
-            logger.warning(
-                f"Analyzer {a_name} has a wrong queue: {queue}."
-                f" Setting to `{DEFAULT_QUEUE}`"
-            )
-            queue = DEFAULT_QUEUE
-        # get soft_time_limit
-        soft_time_limit = config.config.soft_time_limit
-        # create task signature and add to list
-        task_signatures.append(
-            tasks.run_analyzer.signature(
-                args,
-                {},
-                queue=queue,
-                soft_time_limit=soft_time_limit,
-                task_id=task_id,
-            )
-        )
+    task_signatures = cleaned_result[0]
 
     # fire the analyzers in a grouped celery task
     # also link the callback to be executed
     # canvas docs: https://docs.celeryproject.org/en/stable/userguide/canvas.html
     runner = chord(task_signatures)
     cb_signature = tasks.post_all_analyzers_finished.signature(
-        [job.pk, runtime_configuration], immutable=True
+        [job_id, runtime_configuration], immutable=True
     )
+
     runner(cb_signature)
 
     return None
 
 
-def job_cleanup(job: Job) -> None:
-    logger.info(f"[STARTING] job_cleanup for <-- {job.__repr__()}.")
-    status_to_set = job.Status.RUNNING
-
-    try:
-        if job.status == job.Status.FAILED:
-            raise AlreadyFailedJobException()
-
-        stats = job.get_analyzer_reports_stats()
-
-        logger.info(f"[REPORT] {job.__repr__()}, status:{job.status}, reports:{stats}")
-
-        if len(job.analyzers_to_execute) == stats["all"]:
-            if stats["running"] > 0 or stats["pending"] > 0:
-                status_to_set = job.Status.RUNNING
-            elif stats["success"] == stats["all"]:
-                status_to_set = job.Status.REPORTED_WITHOUT_FAILS
-            elif stats["failed"] == stats["all"]:
-                status_to_set = job.Status.FAILED
-            elif stats["failed"] >= 1 or stats["killed"] >= 1:
-                status_to_set = job.Status.REPORTED_WITH_FAILS
-            elif stats["killed"] == stats["all"]:
-                status_to_set = job.Status.KILLED
-
-    except AlreadyFailedJobException:
-        logger.error(
-            f"[REPORT] {job.__repr__()}, status: failed. Do not process the report"
-        )
-
-    except Exception as e:
-        logger.exception(f"job_id: {job.pk}, Error: {e}")
-        job.append_error(str(e), save=False)
-
-    finally:
-        if not (job.status == job.Status.FAILED and job.finished_analysis_time):
-            job.finished_analysis_time = get_now()
-        job.status = status_to_set
-        job.save(update_fields=["status", "errors", "finished_analysis_time"])
-
-
 def set_failed_analyzer(
     job_id: int, name: str, err_msg, **report_defaults
 ) -> AnalyzerReport:
@@ -153,7 +67,7 @@ def set_failed_analyzer(
 
 
 def run_analyzer(
-    job_id: int, config_dict: dict, report_defaults: dict
+    job_id: int, config_dict: dict, report_defaults: dict, parent_playbook
 ) -> AnalyzerReport:
     aconfig = AnalyzerConfig.from_dict(config_dict)
     try:
@@ -164,7 +78,7 @@ def run_analyzer(
             raise Exception(f"Class: {cls_path} couldn't be imported")
         # else
         instance = klass(config=aconfig, job_id=job_id, report_defaults=report_defaults)
-        report = instance.start()
+        report = instance.start(parent_playbook=parent_playbook)
     except Exception as e:
         report = set_failed_analyzer(job_id, aconfig.name, str(e), **report_defaults)
 
@@ -180,7 +94,7 @@ def post_all_analyzers_finished(job_id: int, runtime_configuration: dict) -> Non
     # get job instance
     job = Job.objects.get(pk=job_id)
     # execute some callbacks
-    job_cleanup(job)
+    job.job_cleanup()
     # fire connectors when job finishes with success
     # avoid re-triggering of connectors (case: recurring analyzer run)
     if job.status == Job.Status.REPORTED_WITHOUT_FAILS and (

diff --git a/api_app/analyzers_manager/dataclasses.py b/api_app/analyzers_manager/dataclasses.py
@@ -1,9 +1,16 @@
 # This file is a part of IntelOwl https://github.com/intelowlproject/IntelOwl
 # See the file 'LICENSE' for copying permission.
 import dataclasses
+import logging
 import typing
 
+from celery import uuid
+from celery.canvas import Signature
+from django.conf import settings
+
 from api_app.core.dataclasses import AbstractConfig
+from api_app.models import Job
+from intel_owl.consts import DEFAULT_QUEUE
 
 from .constants import HashChoices, TypeChoices
 from .serializers import AnalyzerConfigSerializer
@@ -12,6 +19,8 @@
     "AnalyzerConfig",
 ]
 
+logger = logging.getLogger(__name__)
+
 
 @dataclasses.dataclass
 class AnalyzerConfig(AbstractConfig):
@@ -31,7 +40,6 @@ class AnalyzerConfig(AbstractConfig):
     run_hash_type: typing.Literal["md5", "sha256"] = HashChoices.MD5
 
     # utils
-
     @property
     def is_type_observable(self) -> bool:
         return self.type == TypeChoices.OBSERVABLE
@@ -104,3 +112,81 @@ def all(cls) -> typing.Dict[str, "AnalyzerConfig"]:
     def filter(cls, names: typing.List[str]) -> typing.Dict[str, "AnalyzerConfig"]:
         all_analyzer_config = cls.all()
         return {name: ac for name, ac in all_analyzer_config.items() if name in names}
+
+    @staticmethod
+    def runnable_analyzers(analyzers_to_execute: typing.List[str]) -> typing.List[str]:
+        analyzer_dataclass = AnalyzerConfig.all()
+        return [
+            analyzer
+            for analyzer in analyzers_to_execute
+            if analyzer_dataclass.get(analyzer)
+        ]
+
+    @classmethod
+    def stack_analyzers(
+        cls,
+        job_id: int,
+        analyzers_to_execute: typing.List[str],
+        runtime_configuration: typing.Dict[str, typing.Dict] = None,
+        parent_playbook="",
+    ) -> typing.Tuple[typing.List[Signature], typing.List[str]]:
+        from intel_owl import tasks
+
+        # to store the celery task signatures
+        task_signatures = []
+        analyzers_used = []
+
+        analyzers_to_run = cls.runnable_analyzers(
+            analyzers_to_execute=analyzers_to_execute
+        )
+
+        analyzer_dataclasses = cls.all()
+
+        # get job
+        job = Job.objects.get(pk=job_id)
+        job.update_status(Job.Status.RUNNING)  # set job status to running
+
+        # loop over and create task signatures
+        for a_name in analyzers_to_run:
+            # get corresponding dataclass
+            config = analyzer_dataclasses.get(a_name, None)
+
+            # if disabled or unconfigured (this check is bypassed in STAGE_CI)
+            if not config.is_ready_to_use and not settings.STAGE_CI:
+                logger.info(f"skipping execution of analyzer {a_name}, job_id {job_id}")
+                continue
+
+            # get runtime_configuration if any specified for this analyzer
+            runtime_params = runtime_configuration.get(a_name, {})
+            # gen new task_id
+            task_id = uuid()
+            # construct arguments
+            args = [
+                job_id,
+                config.asdict(),
+                {"runtime_configuration": runtime_params, "task_id": task_id},
+                parent_playbook,
+            ]
+            # get celery queue
+            queue = config.config.queue
+            if queue not in settings.CELERY_QUEUES:
+                logger.warning(
+                    f"Analyzer {a_name} has a wrong queue."
+                    f" Setting to `{DEFAULT_QUEUE}`"
+                )
+                queue = DEFAULT_QUEUE
+            # get soft_time_limit
+            soft_time_limit = config.config.soft_time_limit
+            # create task signature and add to list
+            task_signatures.append(
+                tasks.run_analyzer.signature(
+                    args,
+                    {},
+                    queue=queue,
+                    soft_time_limit=soft_time_limit,
+                    task_id=task_id,
+                )
+            )
+            analyzers_used.append(a_name)
+
+        return task_signatures, analyzers_used
diff --git a/api_app/analyzers_manager/migrations/0002_analyzerreport_parent_playbook.py b/api_app/analyzers_manager/migrations/0002_analyzerreport_parent_playbook.py
@@ -0,0 +1,18 @@
+# Generated by Django 3.2.14 on 2022-10-07 18:52
+
+from django.db import migrations, models
+
+
+class Migration(migrations.Migration):
+
+    dependencies = [
+        ("analyzers_manager", "0001_initial"),
+    ]
+
+    operations = [
+        migrations.AddField(
+            model_name="analyzerreport",
+            name="parent_playbook",
+            field=models.CharField(blank=True, default="", max_length=128),
+        ),
+    ]
diff --git a/api_app/analyzers_manager/serializers.py b/api_app/analyzers_manager/serializers.py
@@ -29,6 +29,7 @@ class Meta:
             "end_time",
             "runtime_configuration",
             "type",
+            "parent_playbook",
         )