Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEATURE] Support for subqueries or chaining of queries #1441

Open
dtaivpp opened this issue Mar 15, 2023 · 5 comments
Open

[FEATURE] Support for subqueries or chaining of queries #1441

dtaivpp opened this issue Mar 15, 2023 · 5 comments
Labels
enhancement New feature or request feature

Comments

@dtaivpp
Copy link

dtaivpp commented Mar 15, 2023

Is your feature request related to a problem?
In threat hunting its often the case that you need to "join" on the same table for queries. For example: take a flat index filled with processes and information such as process ID and related details.

Case 1: In an environment it maybe normal for Outlook, OneNote, and some arbitrary.exe to have processes. When they are spawned in a chain like the following:

Outlook
L OneNote
L arbitrary.exe

That could be a malicious attacker starting some sub-process. There needs to be a mechanism for querying like this.

Case 2: When event pipes are spun up for inter-process communication they are named with the spawning process ID. When attackers create these pipes the often use random names. One way of checking if a pipe is validating if the process ID for the pipe is a valid process ID.

Event Pipe: event.1234.xyz

Process, 4321:
L Second process, 1234:
L Third process, 8282 that spun up the pipe

In the above you would see the pipe spun up by the 3rd process and would want to validate the ID of the pipe exists somewhere in the above chain of processes.

What solution would you like?
There should be some straightforward way to handle join queries such as this

What alternatives have you considered?
One compelling alternative would be to use a graph implementation such as Yang-DB. This is a less than optimal solution for most threat hunters. The reason is that most threat hunters are less than familiar with graph databases and how to query them. Additionally, they want their skills to be transferable and many of the other systems they would use to do this same task support joining in some manner.

@dtaivpp dtaivpp added enhancement New feature or request untriaged labels Mar 15, 2023
@MaxKsyunz
Copy link
Collaborator

Sounds like common table expressions would solve this. [ref]

@dtaivpp
Copy link
Author

dtaivpp commented Mar 16, 2023

@MaxKsyunz I think you are right there. Seems like this would be a perfect candidate since the WITH expression is the basis for recursive queries. I'd be a bit nervous to suggest implementing these as that feels like something that without care could take down a cluster but I am sure there must be some safety rails we can implement. Maybe backpressure could ensure these wouldn't run awry?

@dtaivpp dtaivpp added feature and removed untriaged labels Mar 16, 2023
@acedef
Copy link

acedef commented Mar 16, 2023

This is great! Joining/subsearching on the same "table" would be super useful - especially since a lot of useful relationships aren't known at the time of ingest. The named pipe use case is a good example of this.

@acarbonetto
Copy link
Collaborator

I was thinking this was a problem between tables. Doing table/index joins is hard for OpenSearch, because it isn't optimized to do cross-query searches. Using an alternate index source to map these joins would be extremely helpful for the plugin to process. One option is the use of materialized views (as proposed #1080), or a secondary storage like Spark. These data sources could accommodate relational joins data better than OpenSearch. Alternatively, a graph database would work very well to map relationships between indexes, and as a bonus - graph databases are well-known to help solve threat detection on graph-like systems (using a system like https://tinkerpop.apache.org/docs/current/reference/)

However, secondary storage doesn't always scale well like OpenSearch (e.g. tinkergraph) and requires that the user map their data on ingest.

Treating this as a single-index problem, were one might want to compare/join a single index against itself can be solved (in some cases) by a query re-write. Alternatively, mapping the data in a Nested object or Join object could potentially also solve the sub-query instance (https://opensearch.org/docs/latest/field-types/join/). I'm wondering if Join objects could satisfy the need for this use case.

@acarbonetto
Copy link
Collaborator

I put together a quick proposal for JOIN with USING to handle the same-table parent-child relation query. This solves the issue and utilizes OpenSearch specific functionality, so we won't be overloading the OS-SQL plugin.

Two caveats:

  1. users will have to setup their mappings properly with parent-child relations, and
  2. OS-SQL will need to set the routing shard itself (since this isn't configured by the OS system)

The syntax calls would look something like this (using a game-of-thrones dataset with houses and their members being the parent-child relations):

OS-SQL query:

SELECT m.name
FROM got as m
JOIN got as h USING h.member_of_house.house
WHERE h.housename = "Targaryen"

Mapping setup would look like this in the database:

{
  "mappings": {
    "properties": {
      "member_of_house": { 
        "type": "join",
        "relations": {
          "house": "member" 
        }
      },
      ...

The house data would be setup thusly:

{"index":{"_id":"1"}}
{"words":"Fire And Blood","housename":"Targaryen","sigil":"Dragon","seat":"Dragonstone", "member_of_house":"house"}

And the house members thusly:

{"index":{"_id":"4"}}
{"name":{"firstname":"Daenerys","lastname":"Targaryen","ofHerName":1},"nickname":"Daenerys \"Stormborn\"","gender":"F","parents":{"father":"Aerys","mother":"Rhaella"},"titles":[{"title":"motherOfDragons"},{"title":"queenOfTheAndals"},{"title":"breakerOfChains"},{"title":"Khaleesi"}],"member_of_house":{"name":"member", "parent":"1"}}

The pushdown to OpenSearch would look like:

{
    "query": {
        "has_parent": {
            "parent_type": "house",
            "query": {
                "match": {
                    "house": "Targaryen"
                }
            }
        }
    },
    "_source": {
        "includes": [
            "name"
        ],
        "excludes": []
    }
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request feature
Projects
None yet
Development

No branches or pull requests

4 participants