# Certificate search
This notebook contains an example of how to use the ReversingLabs SDK 
to find signed MALICIOUS samples.

### Used Spectra Intelligence classes
- **CertificateIndex** (*TCA-0501 - Certificate Index*)
- **CertificateAnalytics** (*TCA-0502 - Certificate Analytics*)
- **CertificateThumbprintSearch** (*TCA-0503 - Certificate Thumbprint Search*)
- **RHA1FunctionalSimilarity** (*TCA-0301 - RHA Functional Similarity (Group by RHA1)*)

### Credentials
Credentials are loaded from a local file instead of being written here in plain text.
To learn how to create credentials file, see the **Storing and using credentials** section in the [README file](./README.md)

### 1. Certificate Thumbprint Search for signed malicious files
First import Spectra Intelligence classes as necessary and set up credentials.

In [None]:
import json
from ReversingLabs.SDK.ticloud import CertificateAnalytics, CertificateThumbprintSearch, CertificateIndex, RHA1FunctionalSimilarity


CREDENTIALS = json.load(open('credentials.json'))
USERNAME = CREDENTIALS.get("ticloud").get("username")
PASSWORD = CREDENTIALS.get("ticloud").get("password")
USER_AGENT = json.load(open('../user_agent.json'))["user_agent"]

config = {
    "host": "https://data.reversinglabs.com",
    "username": USERNAME,
    "password": PASSWORD,
    "user_agent": USER_AGENT
}

First lets assume that we have a thumbprint of a certificate.
In the case of Spectra Intelligence a certificate thumbprint is a MD5, SHA1 or SHA256 hash value.
Let's define a function which takes a thumbprint value and uses ReversingLabs SDK paginated CertificateIndex to find singed sample.
Our function is a generator function which will yield sample objects parsed from Spectra Intelligence response. 
Only samples with MALICIOUS classification are yielded.
This function iterates over the entire paginated collection.

In [None]:
def signed_samples(thumbprint):
    certificate_index = CertificateIndex(**config)
    next_page = None
    while True:
        cert_info = certificate_index.get_certificate_information(thumbprint, next_page_hash=next_page).json()["rl"]
        next_page = cert_info.get("next_page")
        for s in cert_info["samples"]:
            if s["classification"] == "MALICIOUS":
                yield s
        if next_page is None:
            return
        
print(list(signed_samples("029C9665ECA25F548163FC588AD0A2F176157F89B3B76621E9E4E8086893B92F"))[0])

Next we define a function which accepts a certificate common name (or a common name with wildcards) and a limit for the number of samples to process.
The `search_common_names()` method from the SDK returns a list of certificate thumbprints which match the common name (or a pattern).
For each of these matches we call the SDK method `get_certificate_analytics()`. This returns information about the certificate itself and counters for matched samples.

Since some certificates can sign a large number of samples, not all of which are malicious. We use the certificate sample statistics to skip over any certificates which do not sign malicious samples.

For any certificate which signs malicious samples we use our previously defined `signed_samples()`.
We combine the `certificate_data` retrieved from certificate analytics with the sample signed by that certificate and yield the pair.

Finally, we print at most 10 pairs of certificates matched with malicious samples they signed.

In [None]:
def extract_sha256(element):
    for hash_object in element["certificate_thumbprints"]:
        if hash_object["name"] == "SHA256":
            return hash_object["value"]
    raise ValueError("No SHA256 hash")

def malicious_count(certificate_data):
    return certificate_data["certificate_analytics"]["statistics"]["malicious"]

def find_malicious_samples(common_name, count):
    thumbprint_search = CertificateThumbprintSearch(**config)
    certificate_analytics = CertificateAnalytics(**config)
    produced = 0
    next_page = None
    while produced < count:
        response = thumbprint_search.search_common_names(common_name, next_page_thumbprint=next_page).json()["rl"]
        next_page = response.get("next_page_thumbprint")
        for element in response["search"]:
            for thumbprint_object in element["thumbprints"]:
                if produced >= count:
                    return
                thumbprint = extract_sha256(thumbprint_object)
                certificate_data = certificate_analytics.get_certificate_analytics(thumbprint).json()["rl"]
                if malicious_count(certificate_data) == 0:
                    continue
                for singed_sample in signed_samples(thumbprint):
                    produced = produced + 1
                    yield certificate_data, singed_sample
        if next_page is None:
            return
        
for cert, sample in find_malicious_samples("google.com", 10):
    print(cert)
    print()
    print(sample)
    print("\n" * 2)

### 2. Finding similar samples
We can now combine the output of our `find_malicious_samples()` function with Spectra Intelligence's RHA1 functional similarity to see if there are any samples which behave similarly to our signed malicious samples.

In [None]:
def find_similar(sample_sha1):
    rha1 = RHA1FunctionalSimilarity(allow_none_return=True, **config)
    page = None
    while True:
        raw = rha1.get_similar_hashes(sample_sha1, page_sha1=page)
        if not raw:
            break
        response = raw.json()["rl"]["group_by_rha1"]
        yield response
        page = response.get("next_page_sha1")
        if page is None:
            break


for cert_obj, sample in find_malicious_samples("microsoft.com", 1):
    cert = cert_obj["certificate_analytics"]["certificate"]
    cn = cert_obj["certificate_analytics"]["certificate"]["common_name"]
    thumb = extract_sha256(cert)
    found = 0
    
    print(cn, thumb)
    for matches in find_similar(sample["sha1"]):
        found += 1
        print(json.dumps(matches, indent=4))
        
    if not found:
        print("No samples found for cert")
        
    print("\n" * 2)

In this example we will search for only one signed malicious sample since there may be a large number of similar samples present on Spectra Intelligence.