# Query a Data Source

In real-world hunts, the first thing to do is to get data from a data source such as a EDR, a SIEM, a firewall, so you can further analyze data, filter data, merge data, or form your new queries. In this hunt book, you will learn how to use the `GET` command to retrieve data from a data source.

## What you will learn

0. How to setup data sources?
1. How to `GET` data from a data source?
2. How to `GET` data from a Kestrel variable?
3. How to refer to a Kestrel variable in a `GET` command?
4. How to combine `NEW` and `GET` in a hunt flow?
5. Exercise: get Windows scheduler processes directly

###  0. How to setup data sources?

Skip the data source setup if you just want to try Kestrel and learn basic concepts here. You will be using a canned data source in this hunt book:
```
file:///tmp/lab101.json
```

For real-world hunts in your orgainzation, you need to deploy Kestrel and connect your data sources:
1. [Install Kestrel runtime](https://kestrel.readthedocs.io/en/latest/installation/runtime.html)
2. [Connect to Data Sources](https://kestrel.readthedocs.io/en/latest/installation/datasource.html)

### 1. How to `GET` data from a data source?

Given the data source `file:///tmp/lab101.json`, which packs Sysmon logs from a Windows 10 host on 2021-04-03, you will learn how to write a `GET` command: query the data source and retrieve `svchost.exe` processes.

Basically, `GET` retrieves a type of entities regarding the criteria given in the `WHERE` clause against the data source specified in `FROM`. In another word, this is a query to match entities in the pattern defined in `WHERE`. Visit the [language specification of GET](https://kestrel.readthedocs.io/en/latest/language.html#get) to learn more about the syntax and usage.

In [1]:
svchost = GET process
          FROM file:///tmp/lab101.json
          WHERE name = 'svchost.exe'
          START 2021-04-03T00:00:00Z STOP 2021-04-03T02:00:00Z



VARIABLE,TYPE,#(ENTITIES),#(RECORDS),directory*,file*,ipv4-addr*,ipv6-addr*,mac-addr*,network-traffic*,process*,user-account*,x-ecs-destination*,x-ecs-network*,x-ecs-process*,x-ecs-source*,x-ecs-user*,x-oca-asset*,x-oca-event*
svchost,process,389,533,1078,1114,3190,1910,1066,1014,725,1062,2016,2016,2120,2024,2124,2132,2132


### 2. How to `GET` data from a Kestrel variable?

`GET` can be used against a data source or a Kestrel variable, both are _a pool of entities_. The differences are:
- A data source contains multiple types of entities, while a Kestrel variable only has one type.
- The execution of `GET` against a data source sends queries to the data source, while the execution of `GET` against a Kestrel variable is performed on the local cache of the variable.

In the first task, you get all `svchost.exe` processes. Let's get a subset of them that are Windows schedulers, which have argument or command line `-k netsvcs -p -s Schedule`.

In [2]:
scheduler = GET process
            FROM svchost
            WHERE command_line = 'C:\Windows\system32\svchost.exe -k netsvcs -p -s Schedule'



VARIABLE,TYPE,#(ENTITIES),#(RECORDS),directory*,file*,ipv4-addr*,ipv6-addr*,mac-addr*,network-traffic*,process*,user-account*,x-ecs-destination*,x-ecs-network*,x-ecs-process*,x-ecs-source*,x-ecs-user*,x-oca-asset*,x-oca-event*
scheduler,process,18,18,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


When getting entities from a Kestrel variable, the syntax can be simplified into:

In [3]:
# this is equivalent to the previous Kestrel statement, i.e., scheduler == scheduler2
scheduler2 = svchost WHERE command_line = 'C:\Windows\system32\svchost.exe -k netsvcs -p -s Schedule'



VARIABLE,TYPE,#(ENTITIES),#(RECORDS),directory*,file*,ipv4-addr*,ipv6-addr*,mac-addr*,network-traffic*,process*,user-account*,x-ecs-destination*,x-ecs-network*,x-ecs-process*,x-ecs-source*,x-ecs-user*,x-oca-asset*,x-oca-event*
scheduler2,process,18,18,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


### 3. How to refer to a Kestrel variable in a `GET` command?

By default, you can use a [STIX pattern](http://docs.oasis-open.org/cti/stix/v2.0/stix-v2.0-part5-stix-patterning.html) in the `WHERE` clause of `GET`. You can go beyond standard STIX pattern to refer to a Kestrel variable in the pattern, a.k.a., parameterized STIX pattern. This is useful when you want to create new hunt steps based on the previous ones, bring data/knowledge from other Kestrel variables (will learn in the next task), or hunt across multiple data sources.

You will get all processes with the `PID` of the scheduler processes, but in a broader time range: instead of the 2 hours specified in the first task `START t'2021-04-03T00:00:00Z' STOP t'2021-04-03T02:00:00Z'`, you will query on the entire day `2021-04-03`.

In [4]:
scheduler3 = GET process
             FROM file:///tmp/lab101.json
             WHERE pid = scheduler.pid
             START 2021-04-03T00:00:00Z STOP 2021-04-04T00:00:00Z



VARIABLE,TYPE,#(ENTITIES),#(RECORDS),directory*,file*,ipv4-addr*,ipv6-addr*,mac-addr*,network-traffic*,process*,user-account*,x-ecs-destination*,x-ecs-network*,x-ecs-process*,x-ecs-source*,x-ecs-user*,x-oca-asset*,x-oca-event*
scheduler3,process,18,18,1101,1150,3226,1928,1084,1014,1132,1080,2016,2016,2210,2024,2214,2222,2222


You now get 72 line of logs, which describes the single scheduler process.

If you write the `GET` command against a new data source, then you can perform the hunt across two data sources. This is the common technique to hunt across multiple data sources such as firewall logs and EDR systems, e.g. in our RSA demo of a [cross-host APT hunt](https://www.youtube.com/watch?v=tASFWZfD7l8).

### 4. How to combine `NEW` and `GET` in a hunt flow?

You may want to find `network-traffic` with a source/destination IP address. You can simply write the IP in the `WHERE` clause of the `GET`, but you need to write the same string again if you need it in another `GET`. Is there a way to store the IP address in a Kestrel variable and refer to it in the following hunt?

Note a Kestrel variable is a list of entities, and it cannot hold a string. However, you can `NEW` an `ipv4-addr` or `ipv6-addr` entity with the IP addresses as its attribute into a Kestrel variable.

In [5]:
winlaptop141 = NEW ipv4-addr ["10.171.5.141"]

nt_src141 = GET network-traffic
            WHERE src_ref.value = winlaptop141.value
            START 2021-04-03T00:00:00Z STOP 2021-04-04T00:00:00Z



VARIABLE,TYPE,#(ENTITIES),#(RECORDS),directory*,file*,ipv4-addr*,ipv6-addr*,mac-addr*,network-traffic*,process*,user-account*,x-ecs-destination*,x-ecs-network*,x-ecs-process*,x-ecs-source*,x-ecs-user*,x-oca-asset*,x-oca-event*
winlaptop141,ipv4-addr,1,533,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
nt_src141,network-traffic,23,29,1107,1143,3277,1939,1095,1020,1143,1091,2161,2161,2301,2169,2305,2313,2313


Note that you omit the `FROM` clause in the `GET`, which will default to the last data source (`file:///tmp/lab101.json`) you executed in the hunt.

### 5. Exercise: get network-traffic with destination IP

Can you get `network-traffic` with destination IP `13.86.101.172` on the day `2021-04-03` from the same data source?

Tip: if you want to omit the `FROM` clause, you need to execute any hunt step in this hunt book with `FROM` first.\
Tip: the destination IP attribute of `network-traffic` is `dst_ref.value`.