[FEATURE] Support for subqueries or chaining of queries #1441

dtaivpp · 2023-03-15T20:57:10Z

Is your feature request related to a problem?
In threat hunting its often the case that you need to "join" on the same table for queries. For example: take a flat index filled with processes and information such as process ID and related details.

Case 1: In an environment it maybe normal for Outlook, OneNote, and some arbitrary.exe to have processes. When they are spawned in a chain like the following:

Outlook
L OneNote
L arbitrary.exe

That could be a malicious attacker starting some sub-process. There needs to be a mechanism for querying like this.

Case 2: When event pipes are spun up for inter-process communication they are named with the spawning process ID. When attackers create these pipes the often use random names. One way of checking if a pipe is validating if the process ID for the pipe is a valid process ID.

Event Pipe: event.1234.xyz

Process, 4321:
L Second process, 1234:
L Third process, 8282 that spun up the pipe

In the above you would see the pipe spun up by the 3rd process and would want to validate the ID of the pipe exists somewhere in the above chain of processes.

What solution would you like?
There should be some straightforward way to handle join queries such as this

What alternatives have you considered?
One compelling alternative would be to use a graph implementation such as Yang-DB. This is a less than optimal solution for most threat hunters. The reason is that most threat hunters are less than familiar with graph databases and how to query them. Additionally, they want their skills to be transferable and many of the other systems they would use to do this same task support joining in some manner.

MaxKsyunz · 2023-03-16T02:17:58Z

Sounds like common table expressions would solve this. [ref]

dtaivpp · 2023-03-16T13:04:28Z

@MaxKsyunz I think you are right there. Seems like this would be a perfect candidate since the WITH expression is the basis for recursive queries. I'd be a bit nervous to suggest implementing these as that feels like something that without care could take down a cluster but I am sure there must be some safety rails we can implement. Maybe backpressure could ensure these wouldn't run awry?

acedef · 2023-03-16T14:55:25Z

This is great! Joining/subsearching on the same "table" would be super useful - especially since a lot of useful relationships aren't known at the time of ingest. The named pipe use case is a good example of this.

acarbonetto · 2023-03-21T21:34:22Z

I was thinking this was a problem between tables. Doing table/index joins is hard for OpenSearch, because it isn't optimized to do cross-query searches. Using an alternate index source to map these joins would be extremely helpful for the plugin to process. One option is the use of materialized views (as proposed #1080), or a secondary storage like Spark. These data sources could accommodate relational joins data better than OpenSearch. Alternatively, a graph database would work very well to map relationships between indexes, and as a bonus - graph databases are well-known to help solve threat detection on graph-like systems (using a system like https://tinkerpop.apache.org/docs/current/reference/)

However, secondary storage doesn't always scale well like OpenSearch (e.g. tinkergraph) and requires that the user map their data on ingest.

Treating this as a single-index problem, were one might want to compare/join a single index against itself can be solved (in some cases) by a query re-write. Alternatively, mapping the data in a Nested object or Join object could potentially also solve the sub-query instance (https://opensearch.org/docs/latest/field-types/join/). I'm wondering if Join objects could satisfy the need for this use case.

acarbonetto · 2023-03-23T17:46:41Z

I put together a quick proposal for JOIN with USING to handle the same-table parent-child relation query. This solves the issue and utilizes OpenSearch specific functionality, so we won't be overloading the OS-SQL plugin.

Two caveats:

users will have to setup their mappings properly with parent-child relations, and
OS-SQL will need to set the routing shard itself (since this isn't configured by the OS system)

The syntax calls would look something like this (using a game-of-thrones dataset with houses and their members being the parent-child relations):

OS-SQL query:

SELECT m.name
FROM got as m
JOIN got as h USING h.member_of_house.house
WHERE h.housename = "Targaryen"

Mapping setup would look like this in the database:

{
  "mappings": {
    "properties": {
      "member_of_house": { 
        "type": "join",
        "relations": {
          "house": "member" 
        }
      },
      ...

The house data would be setup thusly:

{"index":{"_id":"1"}}
{"words":"Fire And Blood","housename":"Targaryen","sigil":"Dragon","seat":"Dragonstone", "member_of_house":"house"}

And the house members thusly:

{"index":{"_id":"4"}}
{"name":{"firstname":"Daenerys","lastname":"Targaryen","ofHerName":1},"nickname":"Daenerys \"Stormborn\"","gender":"F","parents":{"father":"Aerys","mother":"Rhaella"},"titles":[{"title":"motherOfDragons"},{"title":"queenOfTheAndals"},{"title":"breakerOfChains"},{"title":"Khaleesi"}],"member_of_house":{"name":"member", "parent":"1"}}

The pushdown to OpenSearch would look like:

{
    "query": {
        "has_parent": {
            "parent_type": "house",
            "query": {
                "match": {
                    "house": "Targaryen"
                }
            }
        }
    },
    "_source": {
        "includes": [
            "name"
        ],
        "excludes": []
    }
}

dtaivpp added enhancement New feature or request untriaged labels Mar 15, 2023

dtaivpp added feature and removed untriaged labels Mar 16, 2023

This was referenced Mar 24, 2023

SQL ingestion and ops - standup #791

Open

Allow individual shards to be targeted during query execution [FEATURE] #1478

Open

This was referenced May 10, 2023

Design for Same-Table-JOINs Bit-Quill/opensearch-project-sql#263

Merged

Design for Same-Table-JOINs #1623

Draft

acarbonetto mentioned this issue Jun 13, 2023

Set operation improvement #50

Open

Yury-Fridlyand mentioned this issue Jul 13, 2023

Support parent/child join in SQL #236

Closed

YANG-DB mentioned this issue Jul 3, 2024

[FEATURE]Add Missing (OpenSearch-based) PPL commands opensearch-project/opensearch-spark#408

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FEATURE] Support for subqueries or chaining of queries #1441

[FEATURE] Support for subqueries or chaining of queries #1441

dtaivpp commented Mar 15, 2023

MaxKsyunz commented Mar 16, 2023

dtaivpp commented Mar 16, 2023

acedef commented Mar 16, 2023

acarbonetto commented Mar 21, 2023

acarbonetto commented Mar 23, 2023

[FEATURE] Support for subqueries or chaining of queries #1441

[FEATURE] Support for subqueries or chaining of queries #1441

Comments

dtaivpp commented Mar 15, 2023

MaxKsyunz commented Mar 16, 2023

dtaivpp commented Mar 16, 2023

acedef commented Mar 16, 2023

acarbonetto commented Mar 21, 2023

acarbonetto commented Mar 23, 2023