# YARA retro hunt with timegating on the Spectra Analyze
This notebook contains an example of how to use the ReversingLabs SDK 
to add a YARA ruleset, start a retrohunt and filter output by time.

### Used Spectra Analyze methods
- **create_or_update_yara_ruleset**
- **enable_or_disable_yara_ruleset**
- **start_or_stop_yara_local_retro_scan**
- **get_yara_local_retro_scan_status**
- **get_yara_ruleset_matches_v2**
- **get_summary_report_v2**
- **delete_yara_ruleset**

### Credentials
Credentials are loaded from a local file instead of being written here in plain text.
To learn how to create credentials file, see the **Storing and using credentials** section in the [README file](./README.md)

### Time gated Retro Hunt with Spectra Analyze
The Retro Hunt concept is similar for the Spectra Analyze and TiCloud although with some differences.
In this example we will be using the Spectra Analyze local Yara Retro Hunt to illustrate these differences.

The first code block is almost identical to its TiCloud counterpart.
Note however that, depending on your Spectra Analyze certificate setup, you might need to set the `verify` parameter to `False`.
The Spectra Analyze ReversingLabs SDK module uses only one status

In [None]:
import json
from itertools import islice
import dateutil
import datetime

from ReversingLabs.SDK.a1000 import A1000

CREDENTIALS = json.load(open("credentials.json"))
HOST = CREDENTIALS.get("a1000").get("a1000_url")
TOKEN = CREDENTIALS.get("a1000").get("token")
USER_AGENT = json.load(open('../user_agent.json'))["user_agent"]

# Set the verify parameter to False if your Spectra Analyze instance doesn't have a valid CA certificate
a1000 = A1000(
    host=HOST,
    token=TOKEN,
    verify=True,
    user_agent=USER_AGENT
)

The ruleset used in this example is the same one from before.
Since your Spectra Analyze instance will most certainly have a smaller sample collection to work with
you should consider changing this rule to target samples from your collection.
A before we create the ruleset on the Spectra Analyze instance without using the `ticloud` parameter.
If your instance is set up with the TiCloud integration you may explore the TiCloud YARA Retro Hunt using the Spectra Analyze API.

In [None]:
RULESET_NAME = "Cookbook_NSIS_Installer"
RULESET_CONTENT = f"""
import "pe"

rule {RULESET_NAME}
{{
	/* a */
    meta:
        offset = "0x4031d1"
        examplar = "4313d352e0dafd1f22b6517126a655cae3b444fa758d2845eddfbe72f24f7bdd"
    strings:
        $op = {{
            81[2-3]efbeadde [2-6]
            81[2-3]496e7374 [2-6]
            81[2-3]736f6674 [2-6]
            81[2-3]4e756c6c
        }}
        $nsis = "\\xef\\xbe\\xad\\xdeNullsoftInst"
    condition:
        pe.sections[pe.section_index(@op)].characteristics & (pe.SECTION_MEM_READ | pe.SECTION_MEM_EXECUTE) and
        $nsis in (pe.overlay.offset..pe.overlay.offset+pe.overlay.size)
}}
"""

response = a1000.create_or_update_yara_ruleset(RULESET_NAME, RULESET_CONTENT, ticloud=False)
print(response.status_code)
print(json.dumps(response.json(), indent=1))

Once the rule is present on the appliance we may start the local Retro Hunt.
We can check the Retro Hunt status to see the start and end timestamps and the hunt progress.

In [None]:
response = a1000.enable_or_disable_yara_ruleset(True, RULESET_NAME)
print(response.status_code)
print(json.dumps(response.json(), indent=1))

response = a1000.start_or_stop_yara_local_retro_scan("START")
print(response.status_code)
print(json.dumps(response.json(), indent=1))

response = a1000.get_yara_local_retro_scan_status()
print(response.status_code)
print(json.dumps(response.json(), indent=1))

The SDK method `get_yara_ruleset_matches_v2()` allows us to get a paginated view of samples
which match our rule.

In [None]:
response = a1000.get_yara_ruleset_matches_v2(RULESET_NAME, page="1", page_size="10")
print(response.status_code)
print(json.dumps(response.json()["results"], indent=1))

To make the consumption of matched samples more convenient we will define a generator function.
The structure of the generator `a1000_complete_feed()` is the same as in the TiCloud example.
We use the matches function to get the first page of the matches feed.
The `response.next` property holds a full url of the next page.
To keep things simple we will just check for the presence of this property and assume that the page index increases by 1.
Depending on your use case you may wish to inspect/alter this url before fetching the next page.

Any match found in the response is yielded to the caller.
Once all are passed to the caller the generator fetches the next page until all have been checked.

The output of the cell below shows the first match of the feed.
Here we control the number of elements using the limit parameter of the generator.

**NOTE:** If the rule is cloud enabled and the cloud retro hunt started the feed might contain cloud samples.
These samples may only have the sha1 hash present for the sample. You may not be able to pull them onto the appliance for analysis.
These are just differences between local and cloud retro hunt matches.

In [None]:
def a1000_complete_feed(limit=None):
    produced = 0
    next_page = 1
    while next_page:
        parsed = a1000.get_yara_ruleset_matches_v2(RULESET_NAME, page=str(next_page), page_size="100").json()
        # NOTE that Spectra Analyze API returns a direct link but the page pointer is an incrementing integer
        next_page = next_page + 1 if parsed.get("next") else None
        for e in parsed.get("results", []):
            produced += 1
            yield e
            if limit and produced >= limit:
                return

match = list(a1000_complete_feed(1))[0]
print(json.dumps(match, indent=1))

Since this example uses only the local Retro Hunt only local samples will be present in the feed.
This means we can use `get_summary_report_v2()` to enrich found matches.
Here is an example of that report for the first match in the feed.

In [None]:
response = a1000.get_summary_report_v2(
    match["sha256"], 
    skip_reanalysis=True, 
    include_networkthreatintelligence=False, 
    fields=["sha256", "local_first_seen", "local_last_seen"]
)
print(response.status_code)
print(json.dumps(response.json(), indent=1))

Spectra Analyze allows us to retrieve the summary reports in bulk.
So as with the TiCloud example we will create the helper function `batched_1000()`.

We then define the `group_with_summary()` function which will iterate over the matches feed in batches.
Note that we request only the filed from the summary report which are interesting to us.
These are: sha256, local_first_seen, local_last_seen and classification.
We will use these fields to filter out samples which are not of interest from the feed.

Again as with the previous example we print out some of the matches and their summaries.

In [None]:
def batched_a1000(iterable, n):
    it = iter(iterable)
    while True:
        batch = list(islice(it, n))
        if not batch:
            return
        yield batch
        
        
def group_with_summary(retro_matches):
    for batch in batched_a1000(retro_matches, 100):
        sha256_to_match = {e["sha256"]: e for e in batch}
        batch_summary = a1000.get_summary_report_v2(
            list(sha256_to_match.keys()), 
            skip_reanalysis=True, 
            include_networkthreatintelligence=False, 
            fields=["sha256", "local_first_seen", "local_last_seen", "classification"]
        ).json()
        for summary in batch_summary["results"]:
            yield sha256_to_match[summary["sha256"]], summary

print(json.dumps(list(group_with_summary(a1000_complete_feed(3))), indent=1))

Our filter function will be `a1000_filter()` which accepts three filter parameters.
The `earlier` and `later` are two (timezone aware) `datetime` objects.
The filter requires that the matched sample's first and last seen time be between these values.
The third parameter is the classifications list which contains classification which we are interested in.

Finally, we can compose a stream of matches and their summaries by using the functions and generators defined so far.
We will consume this stream using `list` and `islice` to create the list of at most 10 match and summary pairs.
We will then print out the first element of that list.

Note that the 10 element stream size is to keep the example simple.
You may wish to further enrich the stream by reusing the `get_summary_report_v2()` method with additional fields.
Or you may wish to send the samples from the stream for reanalysis.

In [None]:
def a1000_time_parse(dt):
    return dateutil.parser.isoparse(dt)

def a1000_filter(earlier, later, classifications):
    def inner(match_summary):
        _, summary = match_summary
        fs = a1000_time_parse(summary["local_first_seen"])
        ls = a1000_time_parse(summary["local_last_seen"])
        classification = summary["classification"].lower()
        return earlier <= fs <= later and earlier <= ls <= later and classification in classifications
    return inner


now = datetime.datetime.now().replace(tzinfo=datetime.timezone.utc)
year_ago = now - datetime.timedelta(weeks=54)

stream = filter(
    a1000_filter(year_ago, now, ["malicious"]), 
    group_with_summary(a1000_complete_feed())
)
consumed = list(islice(stream, 10))

print(json.dumps(consumed[0], indent=1))

To round off the example we will clean up the appliance by doing the following:
- stop the local Retro Hunt
- check its status
- disable the ruleset we created at the start of the example
- delete the ruleset

**NOTE:** Depending on the size of your sample collection the retro hunt may already be finished.
This means that the STOP function call will return a 412 error code.

In [None]:
response = a1000.start_or_stop_yara_local_retro_scan("STOP")
print(response.status_code)
print(json.dumps(response.json(), indent=1))

response = a1000.get_yara_local_retro_scan_status()
print(response.status_code)
print(json.dumps(response.json(), indent=1))

response = a1000.enable_or_disable_yara_ruleset(False, RULESET_NAME)
print(response.status_code)
print(json.dumps(response.json(), indent=1))

response = a1000.delete_yara_ruleset(RULESET_NAME)
print(response.status_code)
print(json.dumps(response.json(), indent=1))
