# Query a Data Source

In real-world hunts, the first thing to do is to get data from a data source such as a EDR, a SIEM, a firewall, so you can further analyze data, filter data, merge data, or form your new queries. In this hunt book, you will learn how to use the [GET](https://kestrel.readthedocs.io/en/stable/language/commands.html#get) command to retrieve data from a data source.

## What you will learn

0. How to setup data sources?
1. How to `GET` data from a data source?
2. How to `GET` data from a Kestrel variable?
3. How to refer to a Kestrel variable in a `GET` command?
4. How to combine `NEW` and `GET` in a hunt flow?
5. Exercise: get Windows scheduler processes directly

###  0. How to setup data sources?

Skip the data source setup if you just want to try Kestrel and learn basic concepts in this huntbook. You will be using a canned data source here:
```
file:///tmp/lab101.json
```

For real-world hunts in your orgainzation, you need to deploy Kestrel and connect your data sources:
1. [Install Kestrel runtime](https://kestrel.readthedocs.io/en/latest/installation/runtime.html)
2. [Connect to Data Sources](https://kestrel.readthedocs.io/en/latest/installation/datasource.html)

### 1. How to `GET` data from a data source?

Given the canned data `file:///tmp/lab101.json`, which packs Sysmon logs from a Windows 10 host on 2021-04-03, you will learn how to write a `GET` command: query the data source and retrieve `svchost.exe` processes.

Basically, `GET` retrieves a type of entities regarding the criteria given in the `WHERE` clause against the data source specified in `FROM`. In another word, this is a query to match entities in the pattern defined in `WHERE`. Visit the [language specification of GET](https://kestrel.readthedocs.io/en/stable/language/commands.html#get) to learn more about the syntax and usage, and visit the [entity in Kestrel](https://kestrel.readthedocs.io/en/stable/language/eav.html#entities-in-kestrel) page to find which entity and which attribute are supported.

In [1]:
svchost = GET process
          FROM file:///tmp/lab101.json
          WHERE name = 'svchost.exe'
          START 2021-04-03T00:00:00Z STOP 2021-04-03T02:00:00Z



VARIABLE,TYPE,#(ENTITIES),#(RECORDS),directory*,file*,ipv4-addr*,ipv6-addr*,mac-addr*,network-traffic*,process*,user-account*,x-ecs-destination*,x-ecs-network*,x-ecs-process*,x-ecs-source*,x-ecs-user*,x-oca-asset*,x-oca-event*
svchost,process,389,533,1078,1114,3190,1910,1066,1014,725,1062,2016,2016,2120,2024,2124,1066,2132


### 2. How to `GET` data from a Kestrel variable?

`GET` can be used against a data source or a Kestrel variable, both are _a pool of entities_. The differences are:
- A data source contains multiple types of entities, while a Kestrel variable only has one type.
- The execution of `GET` against a data source sends queries to the data source, while the execution of `GET` against a Kestrel variable is performed on the local cache of the variable.

In the first task, you get all `svchost.exe` processes. Let's get a subset of them that are Windows schedulers, which have argument or command line `-k netsvcs -p -s Schedule`.

In [2]:
scheduler = GET process
            FROM svchost
            WHERE command_line = 'C:\Windows\system32\svchost.exe -k netsvcs -p -s Schedule'



VARIABLE,TYPE,#(ENTITIES),#(RECORDS),directory*,file*,ipv4-addr*,ipv6-addr*,mac-addr*,network-traffic*,process*,user-account*,x-ecs-destination*,x-ecs-network*,x-ecs-process*,x-ecs-source*,x-ecs-user*,x-oca-asset*,x-oca-event*
scheduler,process,18,18,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


Or use a simpler syntax (for getting from a variable):

In [3]:
# this is equivalent to the previous Kestrel statement, i.e., scheduler == scheduler2
scheduler2 = svchost WHERE command_line = 'C:\Windows\system32\svchost.exe -k netsvcs -p -s Schedule'



VARIABLE,TYPE,#(ENTITIES),#(RECORDS),directory*,file*,ipv4-addr*,ipv6-addr*,mac-addr*,network-traffic*,process*,user-account*,x-ecs-destination*,x-ecs-network*,x-ecs-process*,x-ecs-source*,x-ecs-user*,x-oca-asset*,x-oca-event*
scheduler2,process,18,18,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


### 3. How to refer to a Kestrel variable in a `GET` command?

You just created a pattern (the `WHERE` clause) from a single [Comparison Expression](https://kestrel.readthedocs.io/en/stable/language/ecgp.html#single-comparison-expression-pattern). You can describe [multiple attributes](https://kestrel.readthedocs.io/en/stable/language/ecgp.html#single-node-graph-pattern) of an [entity](https://kestrel.readthedocs.io/en/stable/language/tac.html#entity) in a pattern, or even multiple entities as a [connected graph pattern](https://kestrel.readthedocs.io/en/stable/language/ecgp.html#centered-graph-pattern).

An advanced syntax in building a Kestrel pattern is to refer to existing variables. This is useful when you want to create new hunt steps based on the previous ones, bring data/knowledge from other Kestrel variables (will learn in the next task), or hunt across multiple data sources.

Next, you will get all processes with the same name as `scheduler` but not same `PID` of `scheduler2`. Instead of writing the process name (`svchost.exe` as a string explicitly) and the `PID` (an integer explicitly) of `scheduler2`, you can do:

In [4]:
svchost_other = GET process
                FROM file:///tmp/lab101.json
                WHERE name = scheduler.name AND pid != scheduler2.pid
                START 2021-04-03T00:00:00Z STOP 2021-04-03T02:00:00Z



VARIABLE,TYPE,#(ENTITIES),#(RECORDS),directory*,file*,ipv4-addr*,ipv6-addr*,mac-addr*,network-traffic*,process*,user-account*,x-ecs-destination*,x-ecs-network*,x-ecs-process*,x-ecs-source*,x-ecs-user*,x-oca-asset*,x-oca-event*
svchost_other,process,371,515,1594,1635,4749,2847,1581,1521,1264,1575,4536,4536,4680,4554,4689,1581,4707


You get 371 entities. Let's see if the number is right: `371 = 389 - 18`
- 371: number of processes in `svchost_other`
- 389: number of processes in `svchost`
- 18: number of processes in `scheduler2`

If you write the `GET` command against a new data source, then you can perform the hunt across two data sources. This is the common technique to hunt across multiple data sources such as firewall logs and EDR systems, e.g. in our RSA demo of a [cross-host APT hunt](https://www.youtube.com/watch?v=tASFWZfD7l8).

### 4. How to combine `NEW` and `GET` in a hunt flow?

You may want to find `network-traffic` with a source/destination IP address. You can simply write the IP in the `WHERE` clause of the `GET`, but you need to write the same string again if you need it in another `GET`. Is there a way to store the IP address in a Kestrel variable and refer to it in the following hunt?

Note a Kestrel variable is a list of entities, and it cannot hold a string. However, you can `NEW` an `ipv4-addr` or `ipv6-addr` entity with the IP addresses as its attribute into a Kestrel variable.

In [5]:
winlaptop141 = NEW ipv4-addr ["10.171.5.141"]

nt_src141 = GET network-traffic
            WHERE src_ref.value = winlaptop141.value
            START 2021-04-03T00:00:00Z STOP 2021-04-04T00:00:00Z



VARIABLE,TYPE,#(ENTITIES),#(RECORDS),directory*,file*,ipv4-addr*,ipv6-addr*,mac-addr*,network-traffic*,process*,user-account*,x-ecs-destination*,x-ecs-network*,x-ecs-process*,x-ecs-source*,x-ecs-user*,x-oca-asset*,x-oca-event*
winlaptop141,ipv4-addr,1,533,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
nt_src141,network-traffic,23,29,1623,1664,4836,2876,1610,1527,1664,1604,4739,4739,4883,4757,4892,1610,4910


Note that you omit the `FROM` clause in the `GET`, which will default to the last data source (`file:///tmp/lab101.json`) you executed in the hunt.

To automate hunts, you can even go further, not to `NEW` a `ipv4-addr` variable, but to load the IP or IP list from a file on disk directly into a Kestrel variable. In this way, the huntbook can be automated with parameters in each run. Check the [LOAD](https://kestrel.readthedocs.io/en/stable/language/commands.html#load) command, which is discussed in another huntbook in this tutorial.

### 5. Exercise: get network-traffic with destination IP

Can you get `network-traffic` with destination IP `13.86.101.172` on the day `2021-04-03` from the same data source?

Tip: if you want to omit the `FROM` clause, you need to execute any hunt step in this hunt book with `FROM` first.\
Tip: the destination IP attribute of `network-traffic` is `dst_ref.value`.