# ReversingLabs SDK Advanced Search 

This notebook demonstrates how to use the ReversingLabs SDK to search and analyze samples using the `AdvancedSearch` and `AdvancedActions` classes. 
The Advanced Search enables users to filter samples by search criteria submitted in a POST request. A wide range of search keywords is available, and they can be combined using search operators to build advanced queries.
Advanced Actions is a class containing advanced and combined actions utilizing various different classes such as Static analysis (TCA-0104) and Dynamic analsis (TCA-0106). 
Combined together, client can have a comprehensive enriched report, providing single URL, File type or any other supported filter value.

Script includes a recursive function to extract URLs from enriched reports.

For a similar implementation reference, see the [ReversingLabs SDK Cookbook - TitaniumCloud Search Notebook](https://alt-gitlab.rl.lan/integrations/sdk/reversinglabs-sdk-cookbook/-/blob/main/TitaniumCloud/search.ipynb?ref_type=heads).

#  1. Importing the required classes
First, we will import the required API classes from the ticloud module.

In [None]:
from ReversingLabs.SDK.helper import *
from ReversingLabs.SDK.ticloud import AdvancedSearch, AdvancedActions

#  2. Loading the credentials
Next, we will load our TitaniumCloud credentials from the local ticloud_credentials.json file.
NOTE: Instead of doing this step, you can paste your credentials while creating the Python object in the following step.

In [None]:
import json
import re

# ---------------------------------------------------
# Configuration
# ---------------------------------------------------
SERVER = "<server>"
USERNAME = "username"
PASSWORD = "password"

# 3. Filter query
This code block defines a Python dictionary named payload that sets up the parameters for an API query to the ReversingLabs platform. When running this in a Jupyter Notebook, it forms the basis for the search request by specifying filters, pagination, and the desired response format.

In [None]:
payload = {
    "query": [
        {
            "name": "firstseen", # Replace or add Lastseen to narrow the search
            "criteria": "range",
            "value": {"from": "2025-02-20T00:00:00Z", "to": "*"} # Replace with the desired date range
        },
        {"name": "uri", "criteria": "eq", "value": "<URL>"},  # Replace <URL> with the URL to search (example with wildcard https://api.telegram.org/bot*") 
        {"name": "type", "criteria": "eq", "value": "PE"}, # Optional: filter by sample type - values available here: https://docs.reversinglabs.com/SpectraIntelligence/API/MalwareHunting/tca-0320/
        {"name": "size", "criteria": "range", "value": {"from": 0, "to": "*"}}, # Optional: filter by file size
        {"name": "classification", "criteria": "in", "value": ["malicious", "suspicious"]} # Optional: filter by classification - values available: MALICIOUS, SUSPICIOUS, KNOWN, UNKNOWN.
    ],
    "page": 1,
    "records_per_page": 100,
    "format": "json"
}

# 4. AdvancedSearch class and subclass
 This part of the code creates a custom subclass named MyAdvancedSearch that extends the AdvancedSearch class from the ReversingLabs SDK. It customizes the search method to accept a JSON array as its query input, which is ideal for constructing complex, multi-criteria queries.

In [None]:

class MyAdvancedSearch(AdvancedSearch):
    def search(self, query_string, sorting_criteria=None, sorting_order="desc", page_number="page_number", records_per_page="records_per_page"):
        url = self._url.format(endpoint=AdvancedSearch._AdvancedSearch__SINGLE_QUERY_ENDPOINT)
        post_json = {
            "query": query_string,
            "page": page_number,
            "records_per_page": records_per_page,
            "format": "json"
        }
        if sorting_criteria:
            sorting_expression = f"{sorting_criteria} {sorting_order}"
            post_json["sort"] = sorting_expression
        response = self._post_request(url=url, post_json=post_json)
        self._raise_on_error(response)
        return response

# 5. Extract URL prefix from the query

In [None]:
def extract_url_prefix_from_query(query_array):
    for query_item in query_array:
        if query_item.get("name") == "uri" and query_item.get("criteria") == "eq":
            url_pattern = query_item.get("value", "")
            # Remove the wildcard asterisk if present
            return url_pattern.replace("*", "")
    return None

# 6. Resursive Function
This function, recursive_search_for_urls, recursively traverses an object (which may be a dictionary, list, or string) to find any strings that start with a specified prefix (for example, a URL). It collects these matching strings in a list and returns that list.

In [None]:
def recursive_search_for_urls(obj, prefix):
    found = []
    if isinstance(obj, dict):
        for key, value in obj.items():
            found.extend(recursive_search_for_urls(value, prefix))
    elif isinstance(obj, list):
        for item in obj:
            found.extend(recursive_search_for_urls(item, prefix))
    elif isinstance(obj, str):
        if obj.startswith(prefix):
            found.append(obj)
    return found

# 7. Main Execution: Search, Enrichment, and Report Export
- The code starts by reading the search query from the payload.
- It creates an instance of a custom AdvancedSearch client and uses it to perform an aggregated search.
- For each sample found, it collects minimal information (hashes, first/last seen, sample type, file size, classification, and threat name).
- It then enriches each sample using AdvancedActions and extracts any URLs that start with <URL> from the enriched report - example could be "https://api.telegram.org/bot"
- Finally, all records are saved to "report.json".
- Note: the output message will state "Error enriching sample <hash>: Not found. No reference was found for this input" if the filter did not found data. 

In [None]:
def main():
    query_array = payload["query"]
    
    # Extract URL prefix from query
    url_prefix = extract_url_prefix_from_query(query_array)
    if not url_prefix:
        print("Error: Could not find URL pattern in query")
        return
    
    print(f"Using URL prefix for search: {url_prefix}")

    # Instantiate AdvancedSearch client.
    search_client = MyAdvancedSearch(
        host=SERVER,
        username=USERNAME,
        password=PASSWORD,
        verify=True,
        proxies=None,
        user_agent="ReversingLabs-SDK",
        allow_none_return=False
    )

    try:
        results = search_client.search_aggregated(
            query_string=query_array,
            sorting_criteria="firstseen",
            sorting_order="desc",
            max_results=100,
            records_per_page=payload.get("records_per_page", 100)
        )
    except Exception as e:
        print("Error during search:", e)
        return

    print(f"Total samples returned: {len(results)}")
    if not results:
        print("No samples found.")
        return

    actions = AdvancedActions(
        host=SERVER,
        username=USERNAME,
        password=PASSWORD,
        verify=True,
        proxies=None,
        user_agent="ReversingLabs-SDK",
        allow_none_return=False
    )

    # Build minimal results with required fields.
    minimal_results = []
    for sample in results:
        sha1 = sample.get("sha1")
        if not sha1:
            continue

        sample_type = sample.get("sampletype") or sample.get("filetype")
        minimal_data = {
            "hashes": {
                "sha1": sample.get("sha1"),
                "sha256": sample.get("sha256", ""),
                "md5": sample.get("md5", "")
            },
            "first_seen": sample.get("firstseen"),
            "last_seen": sample.get("lastseen"),
            "sampletype": sample_type,
            "file_size": sample.get("size"),
            "classification": sample.get("classification"),
            "threatname": sample.get("threatname")
        }

        try:
            enriched_report = actions.enriched_file_analysis(sha1)
        except Exception as e:
            print(f"Error enriching sample {sha1}: {e}")
            continue

        # Use recursive search to capture URLs using the prefix from query
        found_urls = recursive_search_for_urls(enriched_report, url_prefix)
        minimal_data["extracted_urls"] = list(set(found_urls))  # Deduplicate

        minimal_results.append(minimal_data)

    # Group samples by each extracted URL.
    url_groups = {}
    for sample in minimal_results:
        for url in sample.get("extracted_urls", []):
            if url not in url_groups:
                url_groups[url] = []
            url_groups[url].append(sample)

    # Prepare final output: list of URL groups.
    grouped_output = {"urls": []}
    for url, samples in url_groups.items():
        # Also collect all SHA1 hashes for this URL
        hashes = [sample["hashes"]["sha1"] for sample in samples]
        
        grouped_output["urls"].append({
            "value": url,
            "hashes": hashes,
            "samples": samples
        })

    output_file = "report_grouped_new-5-b.json"
    try:
        with open(output_file, "w") as f:
            json.dump(grouped_output, f, indent=2)
        print(f"Grouped report written to {output_file}")
    except Exception as e:
        print("Error exporting report:", e)

if __name__ == "__main__":
    main()