# Malware Encoding/Decoding

Let's take a look at the greencat sample. This samples uses a custom decoding function to reveal the Command and Control server address. Therefore, we cannot identify the C2 server using static analysis since it is dynamically resolved. Can we reimplement this decoder in python to extract the address?


Load the [greencat](https://github.com/fullerj/PMA/raw/main/greencat1) malware into IDA. Remember, you need to extract it with password ```infected```. 

Go to (press ```g```) ```40297D``` to reverse engineer the decoding function.


In [2]:
c2_encoded_url = "E6E8E4C2E8DEE65CDAC6C2CCCACAE0C2F2D2DCCE5CC6DEDA"  # located at 4046C8 to 4046DF
c2_decoded_url = ""

# Implement greencat's decoder in python
# Each C2 character is decoded using ... (hint a "shift" and "logical and")

# For each character in the encoded URL, decode it then conctatenate the results
# together to form the decoded URL

nSize = len(c2_encoded_url)
if nSize > 0:
    counter = 0

    while counter < nSize:

      encoded_char = c2_encoded_url[counter:counter+2]
      int_encoded_char = int(encoded_char, 16)
      decoded_char = (int_encoded_char >> 1) & 0x7F
      c2_decoded_url += chr(decoded_char)
      counter += 2

# display the URL to the C2 server
print(c2_decoded_url)

stratos.mcafeepaying.com


# Prepare the the Colab environment for Signature Analysis

## Download malware (`git clone`)
Files:
1. greencat1
2. greencat2
3. not_greencat

Files are stored in the **/PMA** directory

## Install TLSH

Trend Micro Localiity Sensitive Hash (TLSH) is a fuzzy matching program and library. Given a file (min 50 bytes), TLSH generates a hash value which can be used for similarity comparisons.

In [3]:
!git clone https://github.com/fullerj/PMA.git

!pip install python-tlsh # Trend Micro Locality Sensitive Hash

Cloning into 'PMA'...
remote: Enumerating objects: 14, done.[K
remote: Counting objects: 100% (14/14), done.[K
remote: Compressing objects: 100% (13/13), done.[K
remote: Total 14 (delta 3), reused 9 (delta 1), pack-reused 0[K
Unpacking objects: 100% (14/14), 70.02 KiB | 1.67 MiB/s, done.
Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting python-tlsh
  Downloading python-tlsh-4.5.0.tar.gz (40 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m40.2/40.2 KB[0m [31m3.6 MB/s[0m eta [36m0:00:00[0m
[?25h  Preparing metadata (setup.py) ... [?25l[?25hdone
Building wheels for collected packages: python-tlsh
  Building wheel for python-tlsh (setup.py) ... [?25l[?25hdone
  Created wheel for python-tlsh: filename=python_tlsh-4.5.0-cp38-cp38-linux_x86_64.whl size=71104 sha256=58ceba6c6f72944bcc9ac7c29b30324087b02269a0f16e4f100bf44a5dc73f22
  Stored in directory: /root/.cache/pip/wheels/7f/56/d9/e0303b2411126bf3

# Locate malware file paths

All malware are stored in /content/PWA. Iterate through the files in the directory to identify greencat1, greencat2, and the not_greencat sample and save in the variables below for future use.

In [4]:
import os
# get current directory; this is /content
cd = os.getcwd() + "/PMA"

# identify paths for each samples

greencat = [os.path.join(cd, x) for x in os.listdir(cd) if 'greencat' in x]
greencat1 = greencat[0]
greencat2 = greencat[1]
not_greencat = [os.path.join(cd, x) for x in os.listdir(cd) if 'different' in x][0]

print(greencat1, greencat2, not_greencat)

/content/PMA/greencat1 /content/PMA/greencat2 /content/PMA/different


# Compute SHA256 hexademical hashes for each malware file

Fill in the proper code to initialize:

```
greencat1_sha256 = ?
greencat2_sha256 = ?
not_greencat_shsa256 = ?
```

Notice! All hashes are distinct.

In [5]:
# get sha256 hexadecimal hash string of greencat1 and greencat3
import hashlib

sha256_hash1 = hashlib.sha256()
with open(greencat1, "rb") as f:
  for byte_block in iter(lambda: f.read(4096), b""):
    sha256_hash1.update(byte_block)

sha256_hash2 = hashlib.sha256()
with open(greencat2, "rb") as f:
  for byte_block in iter(lambda: f.read(4096), b""):
    sha256_hash2.update(byte_block)

sha256_hash_not = hashlib.sha256()
with open(not_greencat, "rb") as f:
  for byte_block in iter(lambda: f.read(4096), b""):
    sha256_hash_not.update(byte_block)

greencat1_sha256 = sha256_hash1.hexdigest()
greencat2_sha256 = sha256_hash2.hexdigest()
not_greencat_sha256 = sha256_hash_not.hexdigest()
print("Greencat1 - sha256sum: {}".format(greencat1_sha256))
print("Greencat2 - sha256sum: {}".format(greencat2_sha256))
print("Not Greencat - sha256sum: {}".format(not_greencat_sha256))

Greencat1 - sha256sum: c23039cf2f859e659e59ec362277321fbcdac680e6d9bc93fc03c8971333c25e
Greencat2 - sha256sum: 8bf5a9e8d5bc1f44133c3f118fe8ca1701d9665a72b3893f509367905feb0a00
Not Greencat - sha256sum: e56f845142fb499a384e96bc7f1236072dbe368d3bdac063a28df227a9172cec


# Compute the fuzzy (TLSH) hash for each malware

Fill in the proper code to initialize
```
greencat1_tlsh = ?
greencat2_tlsh = ?
not_greencat_tlsh = ?
```

Notice the similarities in the output for **greencat1** and **greencat2**, and the dissimilaity with **not_greencat**. 


In [7]:
# get the fuzzy hexadecimal hash string for greencat1 and greencat2 

import tlsh

# this is really similar to the code above but you will need to find the TLSH 
# python documentation to identify the right APIs

greencat1_fuzzy = tlsh.Tlsh()
with open(greencat1, "rb") as f:
  for byte_block in iter(lambda: f.read(4096), b""):
    greencat1_fuzzy.update(byte_block)
  greencat1_fuzzy.final()

greencat2_fuzzy = tlsh.Tlsh()
with open(greencat2, "rb") as f:
  for byte_block in iter(lambda: f.read(4096), b""):
    greencat2_fuzzy.update(byte_block)
  greencat2_fuzzy.final()

not_greencat_fuzzy = tlsh.Tlsh()
with open(not_greencat, "rb") as f:
  for byte_block in iter(lambda: f.read(4096), b""):
    not_greencat_fuzzy.update(byte_block)
  not_greencat_fuzzy.final()  

greencat1_tlsh = greencat1_fuzzy.hexdigest()
greencat2_tlsh = greencat2_fuzzy.hexdigest()
not_greencat_tlsh = not_greencat_fuzzy.hexdigest()
print("Greencat1 - TLSH: {}".format(greencat1_tlsh))
print("Greencat2 - TLSH: {}".format(greencat2_tlsh))
print("Not Greencat - TLSH: {}".format(not_greencat_tlsh))

Greencat1 - TLSH: T1E1520A432ACC08F3D7C201B66A7DAB22DFF9DA2979399ED78B9409D83C76AD0D111705
Greencat2 - TLSH: T1ED5209432ACC08F3D7C201B66A7DAB22DFF99A2979399ED78B9409D83C76AD0D111705
Not Greencat - TLSH: T18DA35D23B2D88872D0791A788C19AAA8953EFD213D28315B76F93F8D4D3D2C1995C7D3


# Compute the TLSH score to evaluate similarity

1. Compare Greencat1 and Greencat2
2. Compare Greencat1 and Not Greencat
3. Compare Greencat2 and Not Greencat

In [None]:
# Compare fuzzy (TLSH) hashes to identify their similarity score
# The lower the score, the more equivalent they are

score1-2 = ?
print("Simililarity score Greencat 1 & 2: {}".format(score1-2))

score1-n = ?
print("Simililarity score Greencat 1 & Not Greencat: {}".format(score1-n))

score2-n = ?
print("Simililarity score Greencat 2 & Not Greencat: {}".format(score2-n))

# YARA Signatures

Let's make a YARA rule to catch malware communicating with the greencat C2 server. 

If the malware author realizes that we have signatures for all variants of greencat, they can create a new malware that is completely different but relies on the same C2 server.


In [8]:
# Install yara

!pip install -U git+https://github.com/VirusTotal/yara-python

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting git+https://github.com/VirusTotal/yara-python
  Cloning https://github.com/VirusTotal/yara-python to /tmp/pip-req-build-m_t7iymq
  Running command git clone --filter=blob:none --quiet https://github.com/VirusTotal/yara-python /tmp/pip-req-build-m_t7iymq
  Resolved https://github.com/VirusTotal/yara-python to commit 4863e25b2698ec6987548ae81349155915d80833
  Running command git submodule update --init --recursive -q
  Preparing metadata (setup.py) ... [?25l[?25hdone
Building wheels for collected packages: yara-python
  Building wheel for yara-python (setup.py) ... [?25l[?25hdone
  Created wheel for yara-python: filename=yara_python-4.2.0-cp38-cp38-linux_x86_64.whl size=864756 sha256=8fdfe8e1f79fca1a97ca0d87c397695737f2483644ec048d43e1c43138503d4d
  Stored in directory: /tmp/pip-ephem-wheel-cache-bva49k91/wheels/60/8e/e8/9f2208d7a75c2e673bb89ce86191672c2838634a55c8d983de
Succ

# Create YARA Rule

First, we will create a YARA rule to identify the greencat C2 server in malware. We will also add known strings.

A possible YARA rule to identify malware using the greencat C2 or known strings is:
```
rule greencatC2   
{
    strings:
        $c2_string = "stratos.mcafeepaying.com"
        $c2_hex = { E6 E8 E4 C2 E8 DE E6 5C DA C6 C2 CC CA CA E0 C2 F2 D2 DC CE 5C C6 DE DA }
        $s1 = "Shell started successfully!"
        $s2 = "Totally %d volumes found."

    condition:
        $c2_string or $c2_hex or $s1 or $s2
}
```

In [None]:
import yara
import os

# greencat's encoded C2 URL
greencat_c2 = ""

# get current directory; this is /content
cd = os.getcwd() + "/PMA"

# identify path to greencat
malware = [os.path.join(cd, x) for x in os.listdir(cd) if 'git' not in x]

rule = yara.compile(sources={
      "n1": "rule c2_string { strings: $a = 'stratos.mcafeepaying.com' condition: $a}",
    

})

for m in malware:
  m = read_in_malware(m)
  a = rule.match(data=malware_data)
  print(a)