# Detect Easter Eggs with Template Matching

Here, I detect the eater eggs with template matching. 

With this technique, I could detect four eggs in train set (all of them are known) and five eggs in test set (two of them are known). I noticed that the numbers of kaggle, alien, and rocket easter eggs are all three. So, there are three types of eggs, and the number of each type is three. I suspect that there are members of [Earth-Trisolaris Organization (ETO)](https://en.wikipedia.org/wiki/The_Three-Body_Problem_(novel)) in SETI.

I also introduce a postprocessing function to use this detecter. Unfortunately the LB score did not change (0.97 -> 0.97), but I think it would be better than nothing.

I use some codes, data, and ideas from follow notebooks and webpages. Thank you.
* https://www.kaggle.com/c/seti-breakthrough-listen/discussion/241522
* https://www.kaggle.com/sherlockkay/visualize-rocket
* https://www.kaggle.com/c/seti-breakthrough-listen/discussion/241076
* https://www.kaggle.com/c/seti-breakthrough-listen/discussion/241411
* https://www.kaggle.com/abebe9849/visualization-of-oof?scriptVersionId=63001313&cellId=13
* https://www.kaggle.com/agentauers/seti21-normalization-of-data
* http://labs.eecs.tottori-u.ac.jp/sd/Member/oyamada/OpenCV/html/py_tutorials/py_imgproc/py_template_matching/py_template_matching.html (Japanese)
* https://www.kaggle.com/ttahara/seti-e-t-resnet18d-baseline

[Update, 20210715] [After the competition relaunch](https://www.kaggle.com/c/seti-breakthrough-listen/discussion/253079), I re-run the script as shown below. I could not find any easter eggs in the new dataset. I am very happy if anyone correct me if I am wrong.

In [None]:
import math
import cv2
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
from  scipy import ndimage
from tqdm.notebook import tqdm

In [None]:
data_dir='../input/seti-breakthrough-listen/'

In [None]:
train_df_old = pd.read_csv(data_dir + "old_leaky_data/train_labels_old.csv")
train_df_old["path"] = train_df_old["id"].apply(lambda x: f"{data_dir}old_leaky_data/train_old/{x[0]}/{x}.npy")
train_df_old.head()
test_df_old = pd.read_csv(data_dir + "old_leaky_data/test_labels_old.csv")
test_df_old["path"] = test_df_old["id"].apply(lambda x: f"{data_dir}old_leaky_data/test_old/{x[0]}/{x}.npy")

train_df = pd.read_csv(data_dir + "train_labels.csv")
train_df["path"] = train_df["id"].apply(lambda x: f"{data_dir}train/{x[0]}/{x}.npy")
train_df.head()
test_df = pd.read_csv(data_dir + "sample_submission.csv")
test_df["path"] = test_df["id"].apply(lambda x: f"{data_dir}test/{x[0]}/{x}.npy")

# Generate templates from known easter Eggs
I use normalization technique from this [notebook](https://www.kaggle.com/agentauers/seti21-normalization-of-data) and median filter to generate template.

In [None]:
def normalize_t(x):
    x = (x - np.mean(x, axis=2, keepdims=True)) / np.std(x, axis=2, keepdims=True)
    return x


def normalize_f(x):
    x = (x - np.mean(x, axis=1, keepdims=True)) / np.std(x, axis=1, keepdims=True)
    return x


def normalize_tf(x):
    x = (x - np.mean(x, axis=2, keepdims=True)) / np.std(x, axis=2, keepdims=True)
    x = (x - np.mean(x, axis=1, keepdims=True)) / np.std(x, axis=1, keepdims=True)
    return x


def normalize_ft(x):
    x = (x - np.mean(x, axis=1, keepdims=True)) / np.std(x, axis=1, keepdims=True)
    x = (x - np.mean(x, axis=2, keepdims=True)) / np.std(x, axis=2, keepdims=True)
    return x

In [None]:
def generate_template(img, min_x, max_x, min_y, max_y, threshold=0.5):
    img = img[[0, 2, 4], :, :]
    plt.figure()
    plt.imshow(img.mean(axis=0))
    img = normalize_ft(img)
    img = img[:, min_x:max_x, min_y:max_y]
    # for i in range(3):
    #    plt.figure()
    #    plt.imshow(img[i])
    img = np.clip(img, 0, 3)
    img = ndimage.median_filter(img, 3)
    # for i in range(3):
    #    plt.figure()
    #    plt.imshow(img[i])
    img = img.mean(axis=0)
    img_b = img.copy()
    img_b[img_b > threshold] = 1
    img_b[img_b < threshold] = 0
    plt.figure()
    plt.imshow(img)
    #plt.figure()
    #plt.imshow(img_b)
    return img, img_b

The path of eggs are from follwoing notebooks.

* https://www.kaggle.com/sherlockkay/visualize-rocket
* https://www.kaggle.com/c/seti-breakthrough-listen/discussion/241076
* https://www.kaggle.com/c/seti-breakthrough-listen/discussion/241411

In [None]:
kaggle_file_path = data_dir + "old_leaky_data/train_old/2/2503d7f6e5c4.npy"
alien_file_path = data_dir + "old_leaky_data/train_old/4/4f7bb8cf2d15.npy"
rocket_file_path = data_dir + "old_leaky_data/train_old/6/6c12bab0aeb4.npy"

image = np.load(kaggle_file_path).astype(np.float32)
kaggle_template, kaggle_template_b = generate_template(image, 15, 70, 20, 170, 0.5)
image = np.load(alien_file_path).astype(np.float32)
alien_template, alien_template_b = generate_template(image, 60, 110, 50, 110, 0.5)
image = np.load(rocket_file_path).astype(np.float32)
rocket_template, rocket_template_b = generate_template(image, 55, 120, 45, 110, 0.5)

# Detect eggs with template matching
Template matching is based on cv2.TM_SQDIFF_NORMED. Please note that the smaller output value is better. I use the code from  [here (in japanese sorry)](http://labs.eecs.tottori-u.ac.jp/sd/Member/oyamada/OpenCV/html/py_tutorials/py_imgproc/py_template_matching/py_template_matching.html)


In [None]:
def template_matching(img, template, threshold, verbose=True):
    img = img[[0, 2, 4], :, :]
    img = normalize_ft(img)
    img = np.clip(img, 0, 3)
    #omit to speed up in kaggle notebook
    #img = ndimage.median_filter(img, 3)
    img = img.mean(axis=0)
    w, h = template.shape[::-1]
    # methods = ['cv2.TM_CCOEFF', 'cv2.TM_CCOEFF_NORMED', 'cv2.TM_CCORR',
    #        'cv2.TM_CCORR_NORMED', 'cv2.TM_SQDIFF', 'cv2.TM_SQDIFF_NORMED']
    method = eval("cv2.TM_SQDIFF_NORMED")

    # Apply template Matching
    res = cv2.matchTemplate(img, template, eval("cv2.TM_SQDIFF_NORMED"))
    min_val, max_val, min_loc, max_loc = cv2.minMaxLoc(res)
    if min_val < threshold and verbose:
        # If the method is TM_SQDIFF or TM_SQDIFF_NORMED, take minimum
        if method in [cv2.TM_SQDIFF, cv2.TM_SQDIFF_NORMED]:
            top_left = min_loc
        else:
            top_left = max_loc
        bottom_right = (top_left[0] + w, top_left[1] + h)

        cv2.rectangle(img, top_left, bottom_right, 1, 2)

        print(min_val)
        print(image_path)
        plt.subplot(121), plt.imshow(res, cmap="gray")
        plt.title("Matching Result"), plt.xticks([]), plt.yticks([])
        plt.subplot(122), plt.imshow(img)
        plt.title("Detected Point"), plt.xticks([]), plt.yticks([])
        plt.show()
    return min_val

# Detect 'Kaggle' Eggs
Here, I use a bit liberal threshold to search optimal value.

In [None]:
threshold = 0.7

min_vals = []
for image_path in tqdm(train_df["path"].values):
    img = np.load(image_path).astype(np.float32)
    min_val = template_matching(img, kaggle_template, threshold=threshold)
    min_vals.append(min_val)

for image_path in tqdm(test_df["path"].values):
    img = np.load(image_path).astype(np.float32)
    min_val = template_matching(img, kaggle_template, threshold=threshold)
    min_vals.append(min_val)

for image_path in tqdm(train_df_old["path"].values):
    img = np.load(image_path).astype(np.float32)
    min_val = template_matching(img, kaggle_template, threshold=threshold)
    min_vals.append(min_val)

for image_path in tqdm(test_df_old["path"].values):
    img = np.load(image_path).astype(np.float32)
    min_val = template_matching(img, kaggle_template, threshold=threshold)
    min_vals.append(min_val)
    
min_vals = np.concatenate([min_vals])
plt.figure()
_ = plt.hist(min_vals, bins=100)

# Detect Alien Eggs
Here, I use a bit liberal threshold to search optimal value.

In [None]:
threshold = 0.35

min_vals = []
for image_path in tqdm(train_df["path"].values):
    img = np.load(image_path).astype(np.float32)
    min_val = template_matching(img, alien_template, threshold=threshold)
    min_vals.append(min_val)


for image_path in tqdm(test_df["path"].values):
    img = np.load(image_path).astype(np.float32)
    min_val = template_matching(img, alien_template, threshold=threshold)
    min_vals.append(min_val)

for image_path in tqdm(train_df_old["path"].values):
    img = np.load(image_path).astype(np.float32)
    min_val = template_matching(img, alien_template, threshold=threshold)
    min_vals.append(min_val)

for image_path in tqdm(test_df_old["path"].values):
    img = np.load(image_path).astype(np.float32)
    min_val = template_matching(img, alien_template, threshold=threshold)
    min_vals.append(min_val)
    
min_vals = np.concatenate([min_vals])
plt.figure()
_ = plt.hist(min_vals, bins=100)

# Detect Rocket Eggs
Here, I use a bit liberal threshold to search optimal value.

In [None]:
threshold = 0.6

min_vals = []
for image_path in tqdm(train_df["path"].values):
    img = np.load(image_path).astype(np.float32)
    min_val = template_matching(img, rocket_template, threshold=threshold)
    min_vals.append(min_val)


for image_path in tqdm(test_df["path"].values):
    img = np.load(image_path).astype(np.float32)
    min_val = template_matching(img, rocket_template, threshold=threshold)
    min_vals.append(min_val)

for image_path in tqdm(train_df_old["path"].values):
    img = np.load(image_path).astype(np.float32)
    min_val = template_matching(img, rocket_template, threshold=threshold)
    min_vals.append(min_val)

for image_path in tqdm(test_df_old["path"].values):
    img = np.load(image_path).astype(np.float32)
    min_val = template_matching(img, rocket_template, threshold=threshold)
    min_vals.append(min_val)
    
min_vals = np.concatenate([min_vals])
plt.figure()
_ = plt.hist(min_vals, bins=100)

I think optimal threshold would be 0.5 for kaggle and rocket, 0.3 for alien.
My detector would be suffered from [Pareidolia](https://en.wikipedia.org/wiki/Pareidolia).

# Use Easter Egg Detector for postprocessing
I use the results of [the current public best notebook](https://www.kaggle.com/ttahara/seti-e-t-resnet18d-baseline).

In [None]:
def easter_egg_postprocessing(sub_df, templates, thresholds, data_dir='../input/seti-breakthrough-listen/'):
    processed_df = sub_df.copy()
    processed_df["path"] = processed_df["id"].apply(
        lambda x: f"{data_dir}test/{x[0]}/{x}.npy"
    )
    for cnt, row in tqdm(enumerate(processed_df[["id", "target", "path"]].values)):
        # id = str(row[0])
        # target = float(row[1])
        path = str(row[2])
        img = np.load(path).astype(np.float32)
        for i in range(len(thresholds)):
            min_val = template_matching(img, templates[i], threshold=thresholds[i])
            if min_val < thresholds[i]:
                print('path: '+path+', target: '+str(processed_df.loc[cnt, "target"])+' -> 1')
                processed_df.loc[cnt, "target"] = 1
    processed_df = processed_df.drop(["path"], axis=1)
    return processed_df

In [None]:
# sub_df = pd.read_csv("../input/seti-e-t-resnet18d-baseline/submission.csv")
# processed_df = easter_egg_postprocessing(
#     sub_df,
#     [kaggle_template, alien_template, rocket_template],
#     [0.5, 0.3, 0.5],
#     data_dir
# )
# processed_df.to_csv("sample_submission.csv", index=False)

Here is a list of the eggs my detector found (including known ones).

In [None]:
egg_paths=['../input/seti-breakthrough-listen/old_leaky_data/train_old/2/2503d7f6e5c4.npy', 
           '../input/seti-breakthrough-listen/old_leaky_data/train_old/8/805a7f4cac38.npy', 
           '../input/seti-breakthrough-listen/old_leaky_data/test_old/e/e05a5e667d06.npy', 
           '../input/seti-breakthrough-listen/old_leaky_data/train_old/4/4f7bb8cf2d15.npy', 
           '../input/seti-breakthrough-listen/old_leaky_data/test_old/1/1397c4ab0e5c.npy', 
           '../input/seti-breakthrough-listen/old_leaky_data/test_old/1/1725ceec6de4.npy', 
           '../input/seti-breakthrough-listen/old_leaky_data/train_old/6/6c12bab0aeb4.npy', 
           '../input/seti-breakthrough-listen/old_leaky_data/test_old/1/1e6e43ddc15a.npy', 
           '../input/seti-breakthrough-listen/old_leaky_data/test_old/7/72bc12d576e2.npy']

egg_ids=['2503d7f6e5c4', 
        '805a7f4cac38', 
        'e05a5e667d06', 
        '4f7bb8cf2d15', 
        '1397c4ab0e5c', 
        '1725ceec6de4', 
        '6c12bab0aeb4', 
        '1e6e43ddc15a', 
        '72bc12d576e2']

Should I omit eggs in train set when training NN models?

Any comments are welcome.