Skip to content

Commit

Permalink
Playbooks (solves #628) (#1123)
Browse files Browse the repository at this point in the history
* Initialising work on Playbooks

* Setting up playbook_config.json

* Setting up playbook_config.json

* Debugging playbook dataclass file

* Debugging playbook serializer file

* Debugging playbook serializer file

* Debugging playbook serializer file

* Debugging core serializer file for playbooks

* Debugging core serializer file for playbooks

* Setting up test cases, urls and views for playbooks

* Fixing playbook tests

* Adding playbooks urls

* Fixing playbooks urls

* Debugging playbooks python_module

* Saving progress

* Analyze request update.

* Updated Job models

* Cleaning playbooks_manager

* Cleaning playbooks_manager

* Adding test playbook values

* Cleaning playbooks dataclass

* Updating playbook_config.json for testin

* Debugging controller file

* Debugging controller file

* Debugging controller file

* Debugging controller file

* Adding playbook specific responses

* Debugging backend

* Optimising job reports for playbooks

* Adding frontend scanning support for IntelOwl.

* Adding frontend scanning support for IntelOwl.

* Adding scan all support for IntelOwl

* Getting plugins page ready

* Setting up job results page

* Refactoring playbook scanform

* Fixing runtime_configuration bug

* Fixing job status update bug to set job status to running for Playbooks

* Fixing job status update bug for playbooks

* Fixing job status update bug for playbooks using chords

* populating analyzers_to_execute and connectors_to_execute in Playbook APIs

* Adding proper logging to Playbooks

* Taking care of conflicts

* Making it work after taking care of conflicts.

* Fixing scanform changes

* Fixing scanform changes

* Fixing the frontend after merging Playbooks branch

* Removing grouping of Playbooks for now

* Adding backend changes to support the additions of multiple observables

* Taking care of errors in adding Backend support

* Fixing dataclass errors after merge

* Fixing dataclass errors after merge [All parameters]

* Fixing the frontend after taking care of backend merge conflicts

* Fixing frontend bugs

* Acting on FFlake8 suggestions

* Making requested changes (cleaning up the code mostly)

* Fixing circular imports issue

* Fixing circular import issues by creating a utility.py file

* Fixing circular import issues by creating a utility.py file and importing tasks inside the functions themselves.

* Fixing invalid arguments bug for filter_playbooks()

* Fixing cleaning data in from_dict() for PlaybookConfig

* Fixing cleaning data in from_dict() for PlaybookConfig [1] (pop error)

* Fixing cleaning data in from_dict() for PlaybookConfig [2] (dictionary iteration error)

* Fixing cleaning data in from_dict() for PlaybookConfig [3] (dictionary creation error)

* Fixing cleaning data in from_dict() for PlaybookConfig [4] (dictionary creation error)

* Returning appropriate response for Playbook endpoints

* Cleaning up API response for Playbook endpoints

* Fixing up the frontend to show jobIds and redirect accordingly.

* Fixing scanpage frontend and backend API bugs

* Fixing package.json format

* Fixing up the frontend to show jobIds and redirect accordingly [1]

* Fixing backend type errors

* Adding migrations

* Adding migrations

* Adding FREE_TO_USE_ANALYZERS Playbooks

* Adding Playbook tests.

* Adding Playbook tests and removing comments which were for me

* Adding playbook test cases

* Fixing frontend bugs

* Fixing frontend bugs [2]

* Fixing frontend bugs [3]

* Fixing frontend bugs [4]

* Fixing frontend bugs [5]

* Fixing frontend bugs where plugins other than Playbooks weren't loading

* Fixing import error for logging

* Removing utility.py and making all it's functions classmethods/staticmethods of appropriate classes

* Fixing logging library's import error

* Fixing backend API bugs [1]

* fixing uuid import error

* Adding pre-commit suggested changes

* Fixing frontend bug where requests for files were sent to observable endpoint instead

* Fixing frontend bug where requests for files were sent to observable endpoint instead [1]

* Fixing frontend bug where requests for files were sent to observable endpoint instead [2]

* Fixing parent_playbook=null issue

* Fixing parent_playbook=null issue [1]

* Adding free to use playbooks with all free analyzers, Fixing supports'

* Fixing ObservableTypeWithFile inheritence errors

* Adding 'AllTypes' as an ENUM for choices in Playbooks

* Fixing inheritence errors in AllTypes

* Fixing issue where backend runs any observable for playbooks whether supported or not

* Fixing issue where backend runs any observable for playbooks whether supported or not [1]

* Adding linting

* Enabling multiple observable job results in playbook analyze scan result and adding CodeDoctor suggestions.

* Fixing migrations

* Untracking yarn.lock

* Adding test case for stack_analyzers and fixing AnalyzerActionViewSet perform_retry errors

* Adding test cases and fixing frontend bugs

* Adding linting

* Adding linting

* Fixing test cases

* Fixing test cases [1] and adding linting

* Fixing test cases [2] and adding linting

* Fixing test cases [3] and adding linting

* Fixing test cases [4] and adding linting

* Fixing test cases [5] and adding linting

* Fixing test cases [5] and adding linting

* Adding suggestions for the frontend.

* Adding suggestions for the frontend [1]

* Adding suggestions for the frontend [2]

* Reducing Description max length

* Reducing Description minWidth for Playbooks plugin page

* Reducing Description minWidth for Playbooks plugin page [1]

* Reducing Description minWidth for Playbooks plugin page [2]

* Adding the handling of analyzer/connector report numbers differently on the frontend when playbooks are run

* Fixing package.json changes

* Fixing test case breaking changes

* Fixing test case breaking changes [1]

* Fixing the number of analyzers, connectors and playbooks that show up on 'Plugins Executed'

* Adding analyzer/connector to playbook toggle through radio buttons

* Removing frontend comments

* Wrapping up frontend for Playbooks feature along with all known bugs 🎉

* Fixing pre-commit errors

* Disabling run_all

* Adding frontend support for disabling run_all for playbooks

* improving UX for playbooks

* Rewriting playbook serializers

* Fixing Serializers

* Fixing lint errors

* Fixing status code 500 for playbook APIs

* Fixing bug in playbook serializers that led to no analyzers/connectors being run

* Fixing 500 bugs in playbook run APIs

* Fixing start_playbook() related errors

* Fixing playbook file scan errors

* Fixing playbook file scan errors [1]

* Fixing playbook file scan errors [2]

* Fixing playbook file scan errors [3]

* Fixing playbook file scan errors [4]

* Fixing serializers and frontend

* Fixing serializers [Analyzers and connectors] and frontend

* Revert "Fixing serializers and frontend"

This reverts commit e7fc34b.

* Reverting

* Fixing serializers

* Fixing serializers

* Fixing serializers [1]

* Fixing serializers [2]

* Fixing serializers validation

* Fixing serializers validation for connectors

* Adding playbook documentation

* Fixing bug where error led to parent_playbook remaining empty

* Making it necessary for playbooks to be not empty

* Minor fix for the last commit

* Making parent_playbook nullable

* Adding new migrations and model changes

* Fixing multiple values for argument errors

* Adding support for playbooks and custom configs & fixing bugs

* Fixing response bugs

* Fixing response serializer bugs [1]

* Fixing tasks for playbooks

* Removing unnecessary warnings from showing up on the UI

* Adding warning changes for all serializers and optimising filter_connectors

* Fixing playbook related model values and optimising before_run() methods

* Fixing not null errors due to parent_playbook value

* Adding better logging in test cases

* Adding debugging logs for test cases

* Adding debugging logs for test cases [fixing linting]

* Adding debugging logs for test cases [1]

* Fixing connector checks during CI checks

* Fixing connector serializer

* Optimising connector support in playbooks

* Fixing CI related connector test case issues

* Fixing before_run function for files

* Fixing typo in controller function start_playbook()

* Adding changes for playbooks

* Adding test cases for playbooks

* Updating tests for playbooks

* Fixing tests

* Fixing auto-imports

* Fixing test_start_playbooks_observable

* Adding TEST_PLAYBOOKS for ci

* Adding debugging logs for playbook tests

* Handling exceptions in playbook serializers

* Fixing linting errors

* Moving playbooks up for a while

* Covering edge cases for playbooks

* Moving playbook test workflow up and covering edge cases for playbooks

* Debugging tests for playbooks

* Fixing playbook tests

* Making playbook tests for a single playbook

* Fixing bugs in playbook tests

* Fixing bugs in playbook tests [1]

* Fixing bugs in playbook tests [2]

* Adding documentation and playbook test case for files

* Fixing tests for playbook files

* Fixing tests for playbook files [adding querydict]

* Fixing bugs in tests for playbook files

* Removing failing integrations

* Removing failing integrations

* Removing useless f strings

* Removing analyzers which took too long from free playbook provided

* Pushing playbooks down in github workflows

* Fixing frontend warnings

* Bump django from 3.2.14 to 3.2.15 in /requirements (#1144)

Bumps [django](https://github.com/django/django) from 3.2.14 to 3.2.15.
- [Release notes](https://github.com/django/django/releases)
- [Commits](django/django@3.2.14...3.2.15)

---
updated-dependencies:
- dependency-name: django
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Adding PR suggested changes

* Adding instructions for contributors to add free analyzers in free analyzers playbooks

* Adding instructions for contributors to add free analyzers in free analyzers playbooks

* Letting analyzers fail in playbook tests

* Fixing linting

* fixing playbook tests

* fixing linting errors

* Squashing migrations together

* Adding instructions in PR templates

* adjusted migrations

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Matteo Lodi <30625432+mlodic@users.noreply.github.com>
  • Loading branch information
3 people committed Oct 10, 2022
1 parent 26c9e92 commit d4756f1
Show file tree
Hide file tree
Showing 62 changed files with 2,631 additions and 467 deletions.
3 changes: 2 additions & 1 deletion .github/ISSUE_TEMPLATE/new_analyzer.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,8 @@ assignees: ''
## Type of analyzer
**this can be observable, file, and docker**


## Why should we use it


## Possible implementation
## Possible implementation
1 change: 1 addition & 0 deletions .github/release_template.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,3 +16,4 @@ WARNING: The release will be live within an hour!
- [ ] Wait for [dockerHub](https://hub.docker.com/repository/docker/intelowlproject/intelowl) to finish the builds
- [ ] Merge the PR to the `master` branch. **Note:** Only use "Merge and commit" as the merge strategy and not "Squash and merge". Using "Squash and merge" makes history between branches misaligned.
- [ ] Remove the "wait" statement in the release description.
- [ ] If the analyzer is free, Please add it in the `FREE_TO_USE_ANALYZERS` playbook in `playbook_config.json`
4 changes: 4 additions & 0 deletions .github/workflows/pull_request_automation.yml
Original file line number Diff line number Diff line change
Expand Up @@ -136,6 +136,10 @@ jobs:
run: |
docker/scripts/coverage_test.sh tests.analyzers_manager.test_file_scripts
- name: "Test: Playbooks Manager"
run: |
docker/scripts/coverage_test.sh tests.playbooks_manager.test_controller
- name: "Coverage: generate xml and transfer from docker container to host"
run: |
docker exec intelowl_uwsgi coverage combine
Expand Down
12 changes: 7 additions & 5 deletions api_app/analyzers_manager/classes.py
Original file line number Diff line number Diff line change
Expand Up @@ -93,7 +93,9 @@ def _validate_result(self, result, level=0, max_recursion=190):
result = 9223372036854775807
return result

def before_run(self):
def before_run(self, *args, **kwargs):
parent_playbook = kwargs.get("parent_playbook", "")
self.add_parent_playbook(parent_playbook=parent_playbook)
self.report.update_status(status=self.report.Status.RUNNING)

def after_run(self):
Expand Down Expand Up @@ -136,8 +138,8 @@ def __post__init__(self):
self.observable_classification = self._job.observable_classification
return super(ObservableAnalyzer, self).__post__init__()

def before_run(self):
super().before_run()
def before_run(self, *args, **kwargs):
super().before_run(**kwargs)
logger.info(
f"STARTED analyzer: {self.__repr__()} -> "
f"Observable: {self.observable_name}."
Expand Down Expand Up @@ -184,8 +186,8 @@ def __post__init__(self):
self.file_mimetype = self._job.file_mimetype
return super(FileAnalyzer, self).__post__init__()

def before_run(self):
super().before_run()
def before_run(self, *args, **kwargs):
super().before_run(**kwargs)
logger.info(
f"STARTED analyzer: {self.__repr__()} -> "
f"File: ({self.filename}, md5: {self.md5})"
Expand Down
9 changes: 9 additions & 0 deletions api_app/analyzers_manager/constants.py
Original file line number Diff line number Diff line change
Expand Up @@ -20,3 +20,12 @@ class ObservableTypes(models.TextChoices):
DOMAIN = "domain"
HASH = "hash"
GENERIC = "generic"


class AllTypes(models.TextChoices):
IP = "ip"
URL = "url"
DOMAIN = "domain"
HASH = "hash"
GENERIC = "generic"
FILE = "file"
112 changes: 13 additions & 99 deletions api_app/analyzers_manager/controller.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,16 +4,13 @@
import logging
from typing import Dict, List

from celery import chord, uuid
from django.conf import settings
from celery import chord
from django.utils.module_loading import import_string
from rest_framework.exceptions import ValidationError

from intel_owl import tasks
from intel_owl.celery import app as celery_app
from intel_owl.consts import DEFAULT_QUEUE

from ..exceptions import AlreadyFailedJobException
from ..helpers import get_now
from ..models import Job
from .classes import DockerBasedAnalyzer
from .dataclasses import AnalyzerConfig
Expand All @@ -27,115 +24,32 @@ def start_analyzers(
analyzers_to_execute: List[str],
runtime_configuration: Dict[str, Dict] = None,
) -> None:
from intel_owl import tasks

# we should not use mutable objects as default to avoid unexpected issues
if runtime_configuration is None:
runtime_configuration = {}

# to store the celery task signatures
task_signatures = []
cleaned_result = AnalyzerConfig.stack_analyzers(
job_id=job_id,
analyzers_to_execute=analyzers_to_execute,
runtime_configuration=runtime_configuration,
)

# get analyzer config
analyzer_dataclasses = AnalyzerConfig.all()

# get job
job = Job.objects.get(pk=job_id)
job.update_status(Job.Status.RUNNING) # set job status to running

# loop over and create task signatures
for a_name in analyzers_to_execute:
# get corresponding dataclass
config = analyzer_dataclasses[a_name]

# if disabled or unconfigured (this check is bypassed in STAGE_CI)
if not config.is_ready_to_use and not settings.STAGE_CI:
logger.info(f"skipping execution of analyzer {a_name}, job_id {job_id}")
continue

# get runtime_configuration if any specified for this analyzer
runtime_params = runtime_configuration.get(a_name, {})
# gen new task_id
task_id = uuid()
# construct arguments
args = [
job_id,
config.asdict(),
{"runtime_configuration": runtime_params, "task_id": task_id},
]
# get celery queue
queue = config.config.queue
if queue not in settings.CELERY_QUEUES:
logger.warning(
f"Analyzer {a_name} has a wrong queue: {queue}."
f" Setting to `{DEFAULT_QUEUE}`"
)
queue = DEFAULT_QUEUE
# get soft_time_limit
soft_time_limit = config.config.soft_time_limit
# create task signature and add to list
task_signatures.append(
tasks.run_analyzer.signature(
args,
{},
queue=queue,
soft_time_limit=soft_time_limit,
task_id=task_id,
)
)
task_signatures = cleaned_result[0]

# fire the analyzers in a grouped celery task
# also link the callback to be executed
# canvas docs: https://docs.celeryproject.org/en/stable/userguide/canvas.html
runner = chord(task_signatures)
cb_signature = tasks.post_all_analyzers_finished.signature(
[job.pk, runtime_configuration], immutable=True
[job_id, runtime_configuration], immutable=True
)

runner(cb_signature)

return None


def job_cleanup(job: Job) -> None:
logger.info(f"[STARTING] job_cleanup for <-- {job.__repr__()}.")
status_to_set = job.Status.RUNNING

try:
if job.status == job.Status.FAILED:
raise AlreadyFailedJobException()

stats = job.get_analyzer_reports_stats()

logger.info(f"[REPORT] {job.__repr__()}, status:{job.status}, reports:{stats}")

if len(job.analyzers_to_execute) == stats["all"]:
if stats["running"] > 0 or stats["pending"] > 0:
status_to_set = job.Status.RUNNING
elif stats["success"] == stats["all"]:
status_to_set = job.Status.REPORTED_WITHOUT_FAILS
elif stats["failed"] == stats["all"]:
status_to_set = job.Status.FAILED
elif stats["failed"] >= 1 or stats["killed"] >= 1:
status_to_set = job.Status.REPORTED_WITH_FAILS
elif stats["killed"] == stats["all"]:
status_to_set = job.Status.KILLED

except AlreadyFailedJobException:
logger.error(
f"[REPORT] {job.__repr__()}, status: failed. Do not process the report"
)

except Exception as e:
logger.exception(f"job_id: {job.pk}, Error: {e}")
job.append_error(str(e), save=False)

finally:
if not (job.status == job.Status.FAILED and job.finished_analysis_time):
job.finished_analysis_time = get_now()
job.status = status_to_set
job.save(update_fields=["status", "errors", "finished_analysis_time"])


def set_failed_analyzer(
job_id: int, name: str, err_msg, **report_defaults
) -> AnalyzerReport:
Expand All @@ -153,7 +67,7 @@ def set_failed_analyzer(


def run_analyzer(
job_id: int, config_dict: dict, report_defaults: dict
job_id: int, config_dict: dict, report_defaults: dict, parent_playbook
) -> AnalyzerReport:
aconfig = AnalyzerConfig.from_dict(config_dict)
try:
Expand All @@ -164,7 +78,7 @@ def run_analyzer(
raise Exception(f"Class: {cls_path} couldn't be imported")
# else
instance = klass(config=aconfig, job_id=job_id, report_defaults=report_defaults)
report = instance.start()
report = instance.start(parent_playbook=parent_playbook)
except Exception as e:
report = set_failed_analyzer(job_id, aconfig.name, str(e), **report_defaults)

Expand All @@ -180,7 +94,7 @@ def post_all_analyzers_finished(job_id: int, runtime_configuration: dict) -> Non
# get job instance
job = Job.objects.get(pk=job_id)
# execute some callbacks
job_cleanup(job)
job.job_cleanup()
# fire connectors when job finishes with success
# avoid re-triggering of connectors (case: recurring analyzer run)
if job.status == Job.Status.REPORTED_WITHOUT_FAILS and (
Expand Down
88 changes: 87 additions & 1 deletion api_app/analyzers_manager/dataclasses.py
Original file line number Diff line number Diff line change
@@ -1,9 +1,16 @@
# This file is a part of IntelOwl https://github.com/intelowlproject/IntelOwl
# See the file 'LICENSE' for copying permission.
import dataclasses
import logging
import typing

from celery import uuid
from celery.canvas import Signature
from django.conf import settings

from api_app.core.dataclasses import AbstractConfig
from api_app.models import Job
from intel_owl.consts import DEFAULT_QUEUE

from .constants import HashChoices, TypeChoices
from .serializers import AnalyzerConfigSerializer
Expand All @@ -12,6 +19,8 @@
"AnalyzerConfig",
]

logger = logging.getLogger(__name__)


@dataclasses.dataclass
class AnalyzerConfig(AbstractConfig):
Expand All @@ -31,7 +40,6 @@ class AnalyzerConfig(AbstractConfig):
run_hash_type: typing.Literal["md5", "sha256"] = HashChoices.MD5

# utils

@property
def is_type_observable(self) -> bool:
return self.type == TypeChoices.OBSERVABLE
Expand Down Expand Up @@ -104,3 +112,81 @@ def all(cls) -> typing.Dict[str, "AnalyzerConfig"]:
def filter(cls, names: typing.List[str]) -> typing.Dict[str, "AnalyzerConfig"]:
all_analyzer_config = cls.all()
return {name: ac for name, ac in all_analyzer_config.items() if name in names}

@staticmethod
def runnable_analyzers(analyzers_to_execute: typing.List[str]) -> typing.List[str]:
analyzer_dataclass = AnalyzerConfig.all()
return [
analyzer
for analyzer in analyzers_to_execute
if analyzer_dataclass.get(analyzer)
]

@classmethod
def stack_analyzers(
cls,
job_id: int,
analyzers_to_execute: typing.List[str],
runtime_configuration: typing.Dict[str, typing.Dict] = None,
parent_playbook="",
) -> typing.Tuple[typing.List[Signature], typing.List[str]]:
from intel_owl import tasks

# to store the celery task signatures
task_signatures = []
analyzers_used = []

analyzers_to_run = cls.runnable_analyzers(
analyzers_to_execute=analyzers_to_execute
)

analyzer_dataclasses = cls.all()

# get job
job = Job.objects.get(pk=job_id)
job.update_status(Job.Status.RUNNING) # set job status to running

# loop over and create task signatures
for a_name in analyzers_to_run:
# get corresponding dataclass
config = analyzer_dataclasses.get(a_name, None)

# if disabled or unconfigured (this check is bypassed in STAGE_CI)
if not config.is_ready_to_use and not settings.STAGE_CI:
logger.info(f"skipping execution of analyzer {a_name}, job_id {job_id}")
continue

# get runtime_configuration if any specified for this analyzer
runtime_params = runtime_configuration.get(a_name, {})
# gen new task_id
task_id = uuid()
# construct arguments
args = [
job_id,
config.asdict(),
{"runtime_configuration": runtime_params, "task_id": task_id},
parent_playbook,
]
# get celery queue
queue = config.config.queue
if queue not in settings.CELERY_QUEUES:
logger.warning(
f"Analyzer {a_name} has a wrong queue."
f" Setting to `{DEFAULT_QUEUE}`"
)
queue = DEFAULT_QUEUE
# get soft_time_limit
soft_time_limit = config.config.soft_time_limit
# create task signature and add to list
task_signatures.append(
tasks.run_analyzer.signature(
args,
{},
queue=queue,
soft_time_limit=soft_time_limit,
task_id=task_id,
)
)
analyzers_used.append(a_name)

return task_signatures, analyzers_used
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
# Generated by Django 3.2.14 on 2022-10-07 18:52

from django.db import migrations, models


class Migration(migrations.Migration):

dependencies = [
("analyzers_manager", "0001_initial"),
]

operations = [
migrations.AddField(
model_name="analyzerreport",
name="parent_playbook",
field=models.CharField(blank=True, default="", max_length=128),
),
]
1 change: 1 addition & 0 deletions api_app/analyzers_manager/serializers.py
Original file line number Diff line number Diff line change
Expand Up @@ -29,6 +29,7 @@ class Meta:
"end_time",
"runtime_configuration",
"type",
"parent_playbook",
)


Expand Down
Loading

0 comments on commit d4756f1

Please sign in to comment.