# YARA retro hunt with timegating on TiCloud
This notebook contains an example of how to use the ReversingLabs SDK 
to add a YARA ruleset, start a retrohunt and filter output by time.

### Used titaniumCloud classes
- **YARAHunting** (*TCA-0303 - Create/Delete a YARA Ruleset*)
- **YARARetroHunting** (*TCA-0319 - Start/Cancel YARA Retro Hunt*)
- **FileReputation** (*TCA-0101 - File Reputation (Malware Presence)*)

### Credentials
Credentials are loaded from a local file instead of being written here in plain text.
To learn how to create credentials file, see the **Storing and using credentials** section in the [README file](./README.md)

### Time gated Retro Hunt with TiCloud
The first step on our hunt for interesting samples is to create a rule on TiCloud which we will use in our retro hunt.
But before that we import the necessary classes and load our configuration.

In [None]:
import json
import time
import datetime
from itertools import islice

from ReversingLabs.SDK.ticloud import YARAHunting, YARARetroHunting, FileReputation


CREDENTIALS = json.load(open('credentials.json'))
USERNAME = CREDENTIALS.get("ticloud").get("username")
PASSWORD = CREDENTIALS.get("ticloud").get("password")
USER_AGENT = json.load(open('../user_agent.json'))["user_agent"]

config = {
    "host": "https://data.reversinglabs.com",
    "username": USERNAME,
    "password": PASSWORD,
    "user_agent": USER_AGENT
}

The YARA ruleset in our example has a single rule which looks for 32/64 bit NSIS installers.
This rule should provide us a varied set of samples for our example.
So for now lets just create the ruleset on TiCloud.

In [None]:
RULESET_NAME = "retro_hunt_with_timegating_NSIS_Installer"
RULESET_CONTENT = f"""
import "pe"

rule {RULESET_NAME}
{{
	/* a */
    meta:
        offset = "0x4031d1"
        examplar = "4313d352e0dafd1f22b6517126a655cae3b444fa758d2845eddfbe72f24f7bdd"
    strings:
        $op = {{
            81[2-3]efbeadde [2-6]
            81[2-3]496e7374 [2-6]
            81[2-3]736f6674 [2-6]
            81[2-3]4e756c6c
        }}
        $nsis = "\\xef\\xbe\\xad\\xdeNullsoftInst"
    condition:
        pe.sections[pe.section_index(@op)].characteristics & (pe.SECTION_MEM_READ | pe.SECTION_MEM_EXECUTE) and
        $nsis in (pe.overlay.offset..pe.overlay.offset+pe.overlay.size)
}}
"""

yara_hunting = YARAHunting(**config)
yara_hunting.delete_ruleset(RULESET_NAME)

response = yara_hunting.create_ruleset(RULESET_NAME, RULESET_CONTENT)
print(response.status_code)
print(json.dumps(response.json(), indent=1))

Once the ruleset is present on TiCloud we are ready to start the retro hunt.
And after we start the Retro Hunt we can check its status.

In [None]:
yara_retro = YARARetroHunting(**config)

response = yara_retro.start_retro_hunt(RULESET_NAME)
print(response.status_code)
print(json.dumps(response.json(), indent=1))

In [None]:
response = yara_retro.check_status(RULESET_NAME)
print(response.status_code)
print(json.dumps(response.json(), indent=1))

As we can see the Retro Hunt status for our ruleset is "RUNNING".

Before we can start consuming the matches four our ruleset we should give TiCloud some time.
So take a 10-minute brake before we continue.

Now lets take a look at the first 3 returned matches.

**NOTE:** YARA retro hunt feed holds at most 10k matches. See the official documentation to handle this edge case.

In [None]:
start = str(int(time.time() - 60 * 60 * 24 * 1))

response = yara_retro.yara_retro_matches_feed(time_format="timestamp", time_value=start).json()["rl"]["feed"]
print(json.dumps(response, indent=1))

print("Time range", response["time_range"])

for _, entry in zip(range(3), response["entries"]):
    print(json.dumps(entry, indent=1))
    

The YARA retro hunt feed is paginated by the time value parameter.
The `response.rl.feed.last_timestamp` is a sort of pointer to the next page of the feed.
Now let us define a generator function which will allow us to consume the feed in its entirety.

In [None]:
def complete_retro_feed(start_time, limit=None):
    next_page = start_time
    produced = 0
    while next_page:
        raw = yara_retro.yara_retro_matches_feed(time_format="timestamp", time_value=next_page)
        parsed = raw.json()["rl"]["feed"]
        next_page = str(parsed.get("last_timestamp"))
        for e in parsed.get("entries", []):
            produced += 1
            yield e
            if limit and produced >= limit:
                return

The generator function `complete_retro_feed()` provides us only with samples which match our rules.
We would like to filter out samples which were not seen in the last week.
Since this information is not present in the retro hunt feed we will use the TCA-0101 File Reputation query.

To od this we will fist define a helper function `batched_ticloud()`.
This will allow us to combine the output from our `complete_retro_feed()` generator.
We do this since the TCA-0101 Bulk Query is more efficient and the max batch size is 100.

We pass the iterable retro matches feed as an argument into `group_with_reputation()`.
Elements of that iterable are batched using the helper function.
For each batch we create a list of sample hashes for the File Reputation query.
Then the responses from that query are combined with the appropriate sample YARA match.
Finally each pair is yielded.

In [None]:
file_reputation = FileReputation(**config)

def batched_ticloud(iterable, n):
    it = iter(iterable)
    while True:
        batch = list(islice(it, n))
        if not batch:
            return
        yield batch

def group_with_reputation(retro_matches):
    for batch in batched_ticloud(retro_matches, 100):
        sha1_to_match = {e["sha1"]: e for e in batch}
        batch_reputation = file_reputation.get_file_reputation(
            list(sha1_to_match.keys())
        ).json()["rl"]
        if batch_reputation.get("invalid_hashes"):
            print("Invalid hashes in stream", batch_reputation["invalid_hashes"])
        for reputation in batch_reputation["entries"]:
            # {'status': 'UNKNOWN', 'query_hash': {'sha1': '46c2c0dd5fc12062e3c390ef6cfb8ace6ec1274a'}}
            reputation_hash = reputation["query_hash"]["sha1"]
            yield sha1_to_match[reputation_hash], reputation
            
first_pair = list(group_with_reputation(complete_retro_feed(start, 1)))[0]
print(json.dumps(first_pair, indent=1))
    

Finally, we can use the `filter()` function to compose a stream of yara matches and reputations which are interesting to us.
Let us find a stream of malicious samples which were first seen in the last week.

Here we collect the first 10 samples which satisfy our condition and print out the first.

In [None]:
def interesting_filter(age_limit, classifications):
    def inner(match_reputation):
        _, reputation = match_reputation
        try:
            parsed = datetime.datetime.strptime(reputation["first_seen"], "%Y-%m-%dT%H:%M:%S")
            first_seen = time.mktime(parsed.timetuple())
            return first_seen >= age_limit and reputation["status"].upper() in classifications
        except KeyError:
            return False
    return inner


week_ago = int(time.time() - 60 * 60 * 24 * 7)

stream = filter(
    interesting_filter(week_ago, ["MALICIOUS"]), 
    group_with_reputation(complete_retro_feed(start))
)
consumed = list(islice(stream, 10))

print(json.dumps(consumed[0], indent=1))

Now we know how to use the YARA Retro Hunt API.
For thw sake of completeness we will show you how to use stop the Retro Hunt and delete rulesets.

In [None]:
response = yara_retro.cancel_retro_hunt(RULESET_NAME)
print(response.status_code)
print(json.dumps(response.json(), indent=1))

In [None]:
response = yara_retro.check_status(RULESET_NAME)
print(response.status_code)
print(json.dumps(response.json(), indent=1))

Finally, we can delete our rule.

In [None]:
response = yara_hunting.delete_ruleset(RULESET_NAME)
print(response.status_code)
print(json.dumps(response.json(), indent=1))

This was just a simple example how we can combine two TiCloud APIs to great effect.
**NOTE** that the code in this notebook is given as an example.
You may be able to further improve the code presented here with the use of asynchronous requests.