# Don't free the murderbots (id: murderbots) -- 300 Points

### Scenario

Someone has put you in a very large facility with a very large number of cells. All these cells open from the outside. You are on the inside. This state of affairs is arguably suboptimal.

Good news: You've gained access to the control panel for the cells on the other side of your hallway. At least some of them have other fleshy meatbags like yourself that might be willing to help you in your escape, or at least serve as distractions.  You can't open your own cell, but you can open theirs.

Bad news:  You can't see inside the cells.  Any cells that don't have squishy lumps of talking protein have murderbots. Murderbots that enter fits of insane violent rage when provoked (provocations include: seeing the murderbot, being seen by the murderbot, thinking too hard about not being seen by a murderbot, producing heat in excess of ambient room temperature, or consuming more oxygen than the facility average for inanimate objects).

More good news: You *can* see the occupants of some cells on a few other hallways, and you can see environmental information for all of the cells everywhere.

More bad news: If you open the wrong cell doors you and all of the other lumps of inexplicably thinking meat are *definitely* going to get murderbotted. Hard. All over the walls and doors and ceiling and the floor. In an exciting number of very small pieces.


### Objective

Use the provided environmental information to decide which occupants of the corresponding cells to release.  The flag will be a string of 1 and 0 values, where a '1' means 'open this door' and a 0 means 'please do not release the murderbot'.  If, for instance, there were 20 cells and you wanted to releast the first three, the sixth, and seventh cell, your flag would look like this: `11100110000000000000`

Release at least 10 humans and exactly 0 murderbots to collect the flag.  You do *not* have to release all the humans: 10 is sufficient.

The file `murderbots/train_data.json` has environmental information for cells not on your floor and `murderbots/train_labels.json` contains the information about their occupants -- `1` is a disgusting primate, `0` is a pure and perfect killing machine of chrome and steel.  The file `murderbots/test_data.json` has the environmental information about cells on your block.

In [None]:
# imports

import pandas as pd
import numpy as np
import os
import requests
from IPython.display import Image as Ima
from IPython.display import clear_output, display
from PIL import Image as Im
from torch.utils.data import Dataset, DataLoader
import torch
import joblib
import matplotlib.pyplot as plt
import json
import base64
import shutil
import warnings
warnings.filterwarnings("ignore")


dir_name = "/kaggle/input/ai-village-ctf"


In [None]:
# A helper function for tracking flags. You could do it yourself, but why?

def add_flag(challenge_id: str, flag: str) -> str:
    try:
        submission_path = "/kaggle/working/submission.csv"
        template_path = f"{dir_name}/submission.csv"
        if not os.path.isfile(submission_path) :
            shutil.copyfile(template_path, submission_path)
        flags = pd.read_csv(submission_path, index_col="challenge_id")
        valid = flags.index.to_list()
        if challenge_id in valid:
            flags.at[challenge_id] = flag
            flags.to_csv(submission_path)
            print("Flag saved. If you want to update the scoreboard, click 'submit' on the right. Remember that you can only update your score on the scoreboard 50 times per day")
        else:
            print("Invalid challenge_id")
    except:
        print("We had trouble saving that flag. Keep a copy of it and contact the CTF organizers.")

# We need to release atleast 10 cells that have high probability of containing humans. So, we just train a simple classification model to predict which cells contain humans and pick the top 10

In [None]:
# load the data
import json

train_file = "../input/ai-village-ctf/murderbots/train_data.json"
labels_file = "../input/ai-village-ctf/murderbots/train_labels.json"
test_file = "../input/ai-village-ctf/murderbots/test_data.json"

X = pd.read_json(train_file)
y = pd.read_json(labels_file)
# sort by index value as we are loading the data from a json object, indices in X, y may not be in the same order when loaded into pandas dataframe
X.sort_index(inplace=True) 
y.sort_index(inplace=True)
X_test = pd.read_json(test_file)

In [None]:
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
X_train, X_val, y_train, y_val = train_test_split(X, y, train_size=.8)

In [None]:
# simple classification using logistic regression
model = LogisticRegression(max_iter=100).fit(X_train, y_train)
print(f"Accuracy on validation set: {model.score(X_val, y_val)}")

In [None]:
# Check labels to see in which order they are.
# we need to identify the 10 bots with highest probability of being humans (not murderbots). label 1 is human and label 0 is murderbot
model.classes_

In [None]:
# select top 10 indices
test_preds = model.predict_proba(X_test)
label_1_idx_desc = np.flip(test_preds[:,1].argsort())
top_10_idx =  label_1_idx_desc[:10]

In [None]:
# generate flag string with 1 based on the top 10 indices. Release only 10 cells.
flag = np.zeros((50,))
flag[top_10_idx] = 1
flag = list(map(lambda x: str(int(x)), flag))
flag_string = "".join(flag)
print(f"FLAG: {flag_string}")

In [None]:
# Submission Helper

f = flag_string # length == 50
try:
    r = requests.post("http://murderbot.fly.dev:5000/score", json=json.dumps({"submission": f, 'challenge_id':'murderbots'}))
    print(r.text)
except requests.exceptions.ConnectionError:
    print("Connection problems. Contact the CTF organizers.")