Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Playbooks (automated scripts) #628

Closed
Tracked by #680 ...
eshaan7 opened this issue Aug 11, 2021 · 14 comments
Closed
Tracked by #680 ...

Playbooks (automated scripts) #628

eshaan7 opened this issue Aug 11, 2021 · 14 comments
Milestone

Comments

@eshaan7
Copy link
Member

eshaan7 commented Aug 11, 2021

Description:
Playbooks could be high-level abstraction to an IntelOwl analysis. Pre-written playbooks that define a flow like:

 given an IoC -> run these particular analyzers -> run these particular connectors, etc.

For example, if a user wants that: given an IP: get the netblocks -> the AS Number -> BGP peering data, etc. Then a playbook could be written by them or included directly in IntelOwl to automate such a flow.

Implementation Idea:

  • Playbook could be another type of Plugin class.
  • A playbook would define a certain observable type with certain analyzers that it executes and certain connectors.
  • The configuration could look like:
{
    "IP_BASIC_INFO": {
        "supports": ["ip"],
        "description": "Fetch basic info for an IP",
        "python_module": "ip_basic_info.IPBasicInfo",
        "analyzers": {
            "AbuseIPDB": null,
            "Shodan": {
                "include_honeyscore": true
            },
            "FireHol_IPList": null
        },
        "connectors": {
            "MISP": {
                "ssl_check": true
            },
            "OpenCTI": {
                "ssl_verify": true
            }
        }
     }
}
  • Such playbooks would be easy for anyone to write and there could be great value in users sharing their playbooks publicly with others over twitter, slack, etc.

I do think this could be a great addition to IntelOwl.

Nothing about the implementation is final because this idea is still very fresh (we can even think of a different name other than "playbook") so I wish to encourage everyone to discuss about this in detail below.

@mlodic
Copy link
Member

mlodic commented Aug 16, 2021

yeah it does make sense. In particular because now we have so many analyzers that it is difficult for people to understand which one to use or not. The first instinct is to just run all the analyzers. Also, experienced users could create custom playbooks to have a flow that they can easily repeat each time.

Plus, I would create 2 different levels:

  • pre-written analyzers/connectors combo, exactly like the one explained above with the JSON code: this is useful to repeat the same kind of analysis over time , combinate similar analyzers together, etc
  • a combination of two or more of the previous ones to create an "investigation" (this is an upper level). The idea could be to help people organize a workflow where playbooks are executed based on the result of the previous ones. For example: you could run the playbook "ip basic info": based on the data that is found, the "investigation" will suggest to run "hash analysis playbook" on an hash found in an analyzer executed by the previous playbook or "domain analysis playbook" if a suspicios domain is found. Then the user would choose the next playbook.

@eshaan7 eshaan7 added this to the v4.0 milestone Aug 31, 2021
@eshaan7 eshaan7 changed the title [Feature] Playbooks (automated scripts) Playbooks (automated scripts) Sep 1, 2021
@eshaan7 eshaan7 pinned this issue Sep 7, 2021
@mlodic mlodic mentioned this issue Jan 13, 2022
@0x0elliot
Copy link
Member

0x0elliot commented Feb 5, 2022

I have been thinking about this issue. The one problem I always tend to get stuck on is how would you let the users parse the varied API responses safely.

1. How would we even let the users parse this data:

What I mean is:

One analyzer might return something like: {"necessary_data" : "important", "data_not_required_for_this_playbook" : "junk"}

To pass this on to the next analyzer, we would require letting people be able to select this data.

How would that be precisely implemented in the safest way possible technically? There isn't much I can myself come up with other than developing a light parsing "language" of sorts. Is there something already available like this which is safer?

2. The analyzer responses can be complicated:

There are analyzers that often respond with nested dictionaries and lists.

For example, Something like {"data" : [{"information_1" : [...],}, {"information_2" : [...],}]}

For these data structures, We might end up being slightly limited. They might need a more advanced parsing language to allow looping as well. Fitting this in the workflow might get too complicated. To be fair, for the time being we can start with something simpler and just tackle the former problem.

Tl;Dr Where do we define the limits for this idea? And how do we even get our hands on a "parsing language" of sorts to get this done safely?

My assumption for these points is that we are treating these playbooks as YAML files that users can upload directly on IntelOwl (this made sense from @eshaan7's description. Even something pluggable makes sense really. Depends on how you folks would like to pursue this project further.)

3. Where would we want to implement this?

Should we focus these "playbooks" to be a frontend feature or a backend one? Because all our APIs are nicely in place. We can make it so that the frontend maps the flow with the analyzers and connectors, and does the job while saving the playbook script as an option to be rerun in the future for only that particular user.

We can do the same but make it do all this work in the backend. This is doable as well. We will have to decide which option is more convenient w.r.t the amount of potential work to be done, security, scalability..etc.

Another way is to make it one file that the admin can plug into their backend. But this suggestion doesn't seem all that convenient.

cc: @mlodic

@mlodic
Copy link
Member

mlodic commented Feb 7, 2022

I really love your participation.

First, to be more clear, in my comment I suggested 2 levels ("playbooks", first one, and "investigations", second one): we also split this issue here #680. So, in some way, I think it could be possible to tackle one issue at a time. First, the "easier" playbooks, then the investigations.

The "investigation feature" is the most complicated one but also maybe the most important for the future of the project.
This means that it needs serious planning and design before starting to implement anything. So I think we could just start with the playbooks.

Then, regarding your considerations:

  • 1 and 2 could be considered after a first playbook version is implemented because they address a specific problem for the investigations
  • 3 surely on the backend too. We would have this to be also saved, shared and replicated.

@0x0elliot
Copy link
Member

Thank you for the clarification Matteo! I think I confused myself between what a "Playbook" was in here and what an "Investigation" was.

So basically,

  • "Playbooks" are just supposed to be a collection of similar analyzers and connectors to be run. This seems simpler to implement.

  • "Investigations" are supposed to run these playbooks one by one in a sequence to not only compile and capture data as we go (this also seems like a great addition if it wasn't thought of before specifically. We can use a tree-like/web-like structure to represent this information kind of how you can do it in maltego) and pass off specific kinds of data obtained from the last analyzer to the next one.

@mlodic
Copy link
Member

mlodic commented Feb 9, 2022

exactly!

@eshaan7
Copy link
Member Author

eshaan7 commented Feb 10, 2022

@0x0elliot - I like how well you articulated the potential issues/considerations that have to be taken care of as we implement this feature. These are issues that I have thought of already but were at the back-of-my-mind and did not get time to publicly explain here but you did a great job so kudos for that.

Should we focus these "playbooks" to be a frontend feature or a backend one

Backend.

A playbook_configuration.json with it's own set of dataclass, serializer and view classes. Basically playbook would be a new type of "plugin" only. We will create a new API endpoints:

  • GET /api/playbook_configuration - serialize the json config file and send it in response (implementation should be exactly similar to analyzer and connector configuration files)
  • POST /api/playbook_configuration - allow user to save/create new playbooks from the GUI. (Related to Allow editing plugin params from the GUI #433)
  • POST /api/execute_playbook - accepts observable_name, file, playbook in POST fields and executes the playbook for the given observable/file.

My assumption for these points is that we are treating these playbooks as YAML files that users can upload directly on IntelOwl

The playbooks will be another type of "plugin" so that means each playbook configuration will be stored in JSON file and will have a corresponding python class (referenced in python_module).

@0x0elliot
Copy link
Member

Thanks for clearing things up further! I appreciate it.

@0x0elliot
Copy link
Member

0x0elliot commented Feb 19, 2022

Continuing this thread, I would like to clear up two more things @mlodic and @eshaan7

The presence of a python_module

I feel like a proper python_module especially for "playbooks" isn't required. At least from what I understand. I am aware that the serializers defined expect python_module by default because they were dedicated to tackling Analyzers and Connectors properly.

One function/Class which accordingly executes the analyzers and connectors in the right order should be enough in the main API view file.

POST /api/playbook_configuration - allow user to save/create new playbooks from the GUI.

If we choose to include python_module then this would also be tough to manage. This is me assuming that python_module would contain that that executes the analyzers/connectors in order. There are a couple more problems to discuss as well:

  • Would this function just append to the playbook_configuration.json file? This does feel a bit hacky but I do see this as something that won't be too hard to manage. There might be a better solution to this which can come out of Allow editing plugin params from the GUI #433

However, We can do something else and instead just save the configurations directly to the Postgres database instead and give the user a way to download their configured playbooks. Then, upon the creation of a new instance, the user would be able to simply upload these onto the frontend and it would create the playbooks on their Postgres instance. Or to begin with, We can at least make it so that backend can get this done and leave this for the future.

But I think you folks already tried it out with #433 and had a rough time. So we can just leave this for the future when #433 is solved smoothly.

What do you folks say?

@mlodic
Copy link
Member

mlodic commented Feb 21, 2022

One function/Class which accordingly executes the analyzers and connectors in the right order should be enough in the main API view class.

I have the same doubts of @0x0elliot. Maybe @eshaan7 could clarify them to us.

Would this function just append to the playbook_configuration.json file?

imho definitively too much hacky.

I have also said something in the #433 about. I think that, generally, we should start to move away JSON configuration files as much as possible. In my opinion, it would make sense to have the playbooks managed regularly with the Postgres DB. The user would choose which ones to create, keep and mantain. We would just provide the tool to do that.

@0x0elliot
Copy link
Member

0x0elliot commented Feb 22, 2022

I agree with Matteo here. In my opinion, We should write new serializers for playbooks separately and avoid using a JSON config file for the time being. We should start with simply letting people either:

  • Manually select from the frontend the Analyzers and connectors a playbook has.
  • Upload a JSON file that does it for them (1 JSON file for 1 Playbook.)

In either case, We can let the frontend take care of the parsing and send the data in the appropriate format to the backend where this data would be stored in the database for /api/playbook_configuration [POST for creating new, PATCH for updating pre-created].

My only solvable concern with this approach is what the most appropriate structure for the database models would look like. As in, How we would map the playbooks table with the appropriate connectors and analyzers (and their configurations). I wanted to avoid json or jsonb fields but given our current setup, it seems like we would be using it here and there to make this workout. Still, we can split the playbook configurations and playbooks in different tables at least.

@eshaan7
Copy link
Member Author

eshaan7 commented Feb 22, 2022

allow user to save/create new playbooks from the GUI.
Would this function just append to the playbook_configuration.json file? This does feel a bit hacky
I think that, generally, we should start to move away JSON configuration files as much as possible

I am not in favor of moving away from the JSON configuration files. In my opinion, that is one of the most important USP of IntelOwl i.e. dead simple customization. This is why, ideally, we want one solution that allows us store these 3 configurations (analyzer, connector, playbook) as JSON files but also allows the user to manage these configurations from the GUI/API.

Furthermore, #433 is the issue for this problem. Whatever solution we design should be made thinking of all 3 plugin types since this is not a playbooks specific thing. Obviously I am open to the all kinds of new ideas (including dropping JSON files but only if I hear a viable and future-proof alternative).

I feel like a proper python_module especially for "playbooks" isn't required.
One function/Class which accordingly executes the analyzers and connectors in the right order should be enough in the main API view file.

I agree with you. You can think of it this way -- we can optionally allow python_module because we already have the primitives set-up for this. Right now, in case of analyzers/connectors the base class for e.g. Connector is abstract and must be extended to create a concrete class for each connector. What we can do differently here is that we can make the Playbook a concrete class itself and by default point each python_module attribute to this concrete and basic Playbook class. This means when someone creates a new playbook from the API/GUI the python_module by default points to the Playbook class but one could still extend this Playbook class to add custom logic if they so wish.

I feel strongly that the role of IntelOwl is to only lay the foundations in fetching and sending data while providing maximum customization in parsing and ordering of data in each step. But to play the devil's advocate, my thinking here is without taking into consideration the 2nd part which is "investigations" - it could make total sense to not allow custom logic in playbooks and leave that part completely to #680.

@mlodic
Copy link
Member

mlodic commented Feb 22, 2022

I am not in favor of moving away from the JSON configuration files. In my opinion, that is one of the most important USP of IntelOwl i.e. dead simple customization.

Yeah, I don't want to drop any actual feature that allows customization.
Mainly I feel bad regarding the huge size of the analyzer_config.json, that is becoming bigger and bigger and really difficult to manage or read. Maybe we could just split that file and create several files in a single directory and group them by similarity. I think we can talk about that in a separate issue.

I agree with you. You can think of it this way -- we can optionally allow python_module because we already have the primitives set-up for this. Right now, in case of analyzers/connectors the base class i.e. Connector is abstract and must be extended to create a concrete class for each connector. What we can do differently here is that we can make the Playbook a concrete class itself and by default point each python_module attribute to this concrete and basic Playbook class. This means when someone creates a new playbook from the API/GUI the python_module by default points to the Playbook class but one could still extend this Playbook class to add custom logic if they so wish.

This is smart idea.

@0x0elliot
Copy link
Member

0x0elliot commented Feb 22, 2022

I agree with you. You can think of it this way -- we can optionally allow python_module because we already have the primitives set-up for this. Right now, in case of analyzers/connectors the base class i.e. Connector is abstract and must be extended to create a concrete class for each connector. What we can do differently here is that we can make the Playbook a concrete class itself and by default point each python_module attribute to this concrete and basic Playbook class. This means when someone creates a new playbook from the API/GUI the python_module by default points to the Playbook class but one could still extend this Playbook class to add custom logic if they so wish.

This is a pretty good idea! I think this discussion by now has cleared a lot of technicalities of what a playbook class if implemented, should look like to me. As for investigations, what I understood from this discussion is that I should first have the playbooks feature properly implemented and then move further to Investigations later since all it needs is a response from the playbook executed.

@eshaan7 eshaan7 removed the question label Mar 15, 2022
mlodic added a commit that referenced this issue Oct 10, 2022
* Initialising work on Playbooks

* Setting up playbook_config.json

* Setting up playbook_config.json

* Debugging playbook dataclass file

* Debugging playbook serializer file

* Debugging playbook serializer file

* Debugging playbook serializer file

* Debugging core serializer file for playbooks

* Debugging core serializer file for playbooks

* Setting up test cases, urls and views for playbooks

* Fixing playbook tests

* Adding playbooks urls

* Fixing playbooks urls

* Debugging playbooks python_module

* Saving progress

* Analyze request update.

* Updated Job models

* Cleaning playbooks_manager

* Cleaning playbooks_manager

* Adding test playbook values

* Cleaning playbooks dataclass

* Updating playbook_config.json for testin

* Debugging controller file

* Debugging controller file

* Debugging controller file

* Debugging controller file

* Adding playbook specific responses

* Debugging backend

* Optimising job reports for playbooks

* Adding frontend scanning support for IntelOwl.

* Adding frontend scanning support for IntelOwl.

* Adding scan all support for IntelOwl

* Getting plugins page ready

* Setting up job results page

* Refactoring playbook scanform

* Fixing runtime_configuration bug

* Fixing job status update bug to set job status to running for Playbooks

* Fixing job status update bug for playbooks

* Fixing job status update bug for playbooks using chords

* populating analyzers_to_execute and connectors_to_execute in Playbook APIs

* Adding proper logging to Playbooks

* Taking care of conflicts

* Making it work after taking care of conflicts.

* Fixing scanform changes

* Fixing scanform changes

* Fixing the frontend after merging Playbooks branch

* Removing grouping of Playbooks for now

* Adding backend changes to support the additions of multiple observables

* Taking care of errors in adding Backend support

* Fixing dataclass errors after merge

* Fixing dataclass errors after merge [All parameters]

* Fixing the frontend after taking care of backend merge conflicts

* Fixing frontend bugs

* Acting on FFlake8 suggestions

* Making requested changes (cleaning up the code mostly)

* Fixing circular imports issue

* Fixing circular import issues by creating a utility.py file

* Fixing circular import issues by creating a utility.py file and importing tasks inside the functions themselves.

* Fixing invalid arguments bug for filter_playbooks()

* Fixing cleaning data in from_dict() for PlaybookConfig

* Fixing cleaning data in from_dict() for PlaybookConfig [1] (pop error)

* Fixing cleaning data in from_dict() for PlaybookConfig [2] (dictionary iteration error)

* Fixing cleaning data in from_dict() for PlaybookConfig [3] (dictionary creation error)

* Fixing cleaning data in from_dict() for PlaybookConfig [4] (dictionary creation error)

* Returning appropriate response for Playbook endpoints

* Cleaning up API response for Playbook endpoints

* Fixing up the frontend to show jobIds and redirect accordingly.

* Fixing scanpage frontend and backend API bugs

* Fixing package.json format

* Fixing up the frontend to show jobIds and redirect accordingly [1]

* Fixing backend type errors

* Adding migrations

* Adding migrations

* Adding FREE_TO_USE_ANALYZERS Playbooks

* Adding Playbook tests.

* Adding Playbook tests and removing comments which were for me

* Adding playbook test cases

* Fixing frontend bugs

* Fixing frontend bugs [2]

* Fixing frontend bugs [3]

* Fixing frontend bugs [4]

* Fixing frontend bugs [5]

* Fixing frontend bugs where plugins other than Playbooks weren't loading

* Fixing import error for logging

* Removing utility.py and making all it's functions classmethods/staticmethods of appropriate classes

* Fixing logging library's import error

* Fixing backend API bugs [1]

* fixing uuid import error

* Adding pre-commit suggested changes

* Fixing frontend bug where requests for files were sent to observable endpoint instead

* Fixing frontend bug where requests for files were sent to observable endpoint instead [1]

* Fixing frontend bug where requests for files were sent to observable endpoint instead [2]

* Fixing parent_playbook=null issue

* Fixing parent_playbook=null issue [1]

* Adding free to use playbooks with all free analyzers, Fixing supports'

* Fixing ObservableTypeWithFile inheritence errors

* Adding 'AllTypes' as an ENUM for choices in Playbooks

* Fixing inheritence errors in AllTypes

* Fixing issue where backend runs any observable for playbooks whether supported or not

* Fixing issue where backend runs any observable for playbooks whether supported or not [1]

* Adding linting

* Enabling multiple observable job results in playbook analyze scan result and adding CodeDoctor suggestions.

* Fixing migrations

* Untracking yarn.lock

* Adding test case for stack_analyzers and fixing AnalyzerActionViewSet perform_retry errors

* Adding test cases and fixing frontend bugs

* Adding linting

* Adding linting

* Fixing test cases

* Fixing test cases [1] and adding linting

* Fixing test cases [2] and adding linting

* Fixing test cases [3] and adding linting

* Fixing test cases [4] and adding linting

* Fixing test cases [5] and adding linting

* Fixing test cases [5] and adding linting

* Adding suggestions for the frontend.

* Adding suggestions for the frontend [1]

* Adding suggestions for the frontend [2]

* Reducing Description max length

* Reducing Description minWidth for Playbooks plugin page

* Reducing Description minWidth for Playbooks plugin page [1]

* Reducing Description minWidth for Playbooks plugin page [2]

* Adding the handling of analyzer/connector report numbers differently on the frontend when playbooks are run

* Fixing package.json changes

* Fixing test case breaking changes

* Fixing test case breaking changes [1]

* Fixing the number of analyzers, connectors and playbooks that show up on 'Plugins Executed'

* Adding analyzer/connector to playbook toggle through radio buttons

* Removing frontend comments

* Wrapping up frontend for Playbooks feature along with all known bugs 🎉

* Fixing pre-commit errors

* Disabling run_all

* Adding frontend support for disabling run_all for playbooks

* improving UX for playbooks

* Rewriting playbook serializers

* Fixing Serializers

* Fixing lint errors

* Fixing status code 500 for playbook APIs

* Fixing bug in playbook serializers that led to no analyzers/connectors being run

* Fixing 500 bugs in playbook run APIs

* Fixing start_playbook() related errors

* Fixing playbook file scan errors

* Fixing playbook file scan errors [1]

* Fixing playbook file scan errors [2]

* Fixing playbook file scan errors [3]

* Fixing playbook file scan errors [4]

* Fixing serializers and frontend

* Fixing serializers [Analyzers and connectors] and frontend

* Revert "Fixing serializers and frontend"

This reverts commit e7fc34b.

* Reverting

* Fixing serializers

* Fixing serializers

* Fixing serializers [1]

* Fixing serializers [2]

* Fixing serializers validation

* Fixing serializers validation for connectors

* Adding playbook documentation

* Fixing bug where error led to parent_playbook remaining empty

* Making it necessary for playbooks to be not empty

* Minor fix for the last commit

* Making parent_playbook nullable

* Adding new migrations and model changes

* Fixing multiple values for argument errors

* Adding support for playbooks and custom configs & fixing bugs

* Fixing response bugs

* Fixing response serializer bugs [1]

* Fixing tasks for playbooks

* Removing unnecessary warnings from showing up on the UI

* Adding warning changes for all serializers and optimising filter_connectors

* Fixing playbook related model values and optimising before_run() methods

* Fixing not null errors due to parent_playbook value

* Adding better logging in test cases

* Adding debugging logs for test cases

* Adding debugging logs for test cases [fixing linting]

* Adding debugging logs for test cases [1]

* Fixing connector checks during CI checks

* Fixing connector serializer

* Optimising connector support in playbooks

* Fixing CI related connector test case issues

* Fixing before_run function for files

* Fixing typo in controller function start_playbook()

* Adding changes for playbooks

* Adding test cases for playbooks

* Updating tests for playbooks

* Fixing tests

* Fixing auto-imports

* Fixing test_start_playbooks_observable

* Adding TEST_PLAYBOOKS for ci

* Adding debugging logs for playbook tests

* Handling exceptions in playbook serializers

* Fixing linting errors

* Moving playbooks up for a while

* Covering edge cases for playbooks

* Moving playbook test workflow up and covering edge cases for playbooks

* Debugging tests for playbooks

* Fixing playbook tests

* Making playbook tests for a single playbook

* Fixing bugs in playbook tests

* Fixing bugs in playbook tests [1]

* Fixing bugs in playbook tests [2]

* Adding documentation and playbook test case for files

* Fixing tests for playbook files

* Fixing tests for playbook files [adding querydict]

* Fixing bugs in tests for playbook files

* Removing failing integrations

* Removing failing integrations

* Removing useless f strings

* Removing analyzers which took too long from free playbook provided

* Pushing playbooks down in github workflows

* Fixing frontend warnings

* Bump django from 3.2.14 to 3.2.15 in /requirements (#1144)

Bumps [django](https://github.com/django/django) from 3.2.14 to 3.2.15.
- [Release notes](https://github.com/django/django/releases)
- [Commits](django/django@3.2.14...3.2.15)

---
updated-dependencies:
- dependency-name: django
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Adding PR suggested changes

* Adding instructions for contributors to add free analyzers in free analyzers playbooks

* Adding instructions for contributors to add free analyzers in free analyzers playbooks

* Letting analyzers fail in playbook tests

* Fixing linting

* fixing playbook tests

* fixing linting errors

* Squashing migrations together

* Adding instructions in PR templates

* adjusted migrations

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Matteo Lodi <30625432+mlodic@users.noreply.github.com>
@mlodic
Copy link
Member

mlodic commented Oct 12, 2022

closed with #1238

@mlodic mlodic closed this as completed Oct 12, 2022
@mlodic mlodic unpinned this issue Jan 4, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants