# inspect filters of deepbind model

https://www.doi.org/10.1038/nbt.3300

predicts binding score $f(s) = net_W(pool(rect_b(conv_M(s))))$

Find hyperparameters in [supplementary information pdf](https://static-content.springer.com/esm/art%3A10.1038%2Fnbt.3300/MediaObjects/41587_2015_BFnbt3300_MOESM51_ESM.pdf).

## questions

1. ReLU uses formula $Y_{i,k} = max(0, X_{i,k} - b_k)$. How do we add $b_k$? That is a tunable threshold.
2. The maxpool operation yields a single value per motif? In other words, the pooling kernel is the same length as the motif?

In [None]:
# format code with "black" formatter. optional
%load_ext nb_black

## load data

In [None]:
!wget --timestamping https://www.dropbox.com/s/c3umbo5y13sqcfp/synthetic_dataset.h5

In [None]:
from pathlib import Path
import h5py
import numpy as np

data_path = Path("synthetic_dataset.h5")
with h5py.File(data_path, "r") as dataset:
    x_train = dataset["X_train"][:].astype(np.float32)
    y_train = dataset["Y_train"][:].astype(np.float32)
    x_valid = dataset["X_valid"][:].astype(np.float32)
    y_valid = dataset["Y_valid"][:].astype(np.int32)
    x_test = dataset["X_test"][:].astype(np.float32)
    y_test = dataset["Y_test"][:].astype(np.int32)

x_train = x_train.transpose([0, 2, 1])
x_valid = x_valid.transpose([0, 2, 1])
x_test = x_test.transpose([0, 2, 1])

N, L, A = x_train.shape
print(f"{N} sequences, {L} nts per sequence, {A} nts in alphabet")

## create model

In [None]:
import tensorflow as tf

tfk = tf.keras
tfkl = tf.keras.layers

In [None]:
# See "Supplementary Information" PDF
# also see https://www.nature.com/articles/nbt.3300/figures/7
model = tfk.Sequential(
    [
        tfkl.Conv1D(
            filters=16,
            kernel_size=24,
            use_bias=True,
            activation=tf.nn.relu,
            input_shape=(L, A),
            padding="same",
        ),
        # Are we sure?
        tfkl.MaxPool1D(pool_size=L),
        tfkl.Flatten(),
        tfkl.Dense(32, use_bias=False, activation=tf.nn.relu),
        tfkl.Dropout(0.5),
        tfkl.Dense(12, use_bias=True, activation=tf.nn.sigmoid),
    ],
    name="deepbind",
)

In [None]:
metrics = [
    tfk.metrics.AUC(curve="ROC", name="auroc"),
    tfk.metrics.AUC(curve="PR", name="aupr"),  # precision-recall
]
model.compile(
    optimizer=tfk.optimizers.Adam(learning_rate=0.001),
    loss=tfk.losses.BinaryCrossentropy(from_logits=False),
    metrics=metrics,
)

In [None]:
callbacks = [
    tfk.callbacks.EarlyStopping(
        monitor="val_aupr",
        patience=20,
        verbose=1,
        mode="max",
        restore_best_weights=False,
    ),
    tfk.callbacks.ReduceLROnPlateau(
        monitor="val_aupr",
        factor=0.2,
        patience=5,
        min_lr=1e-7,
        mode="max",
        verbose=1,
    ),
]
# train
history: tfk.callbacks.History = model.fit(
    x=x_train,
    y=y_train,
    batch_size=100,
    epochs=100,
    shuffle=True,
    validation_data=(x_valid, y_valid),
    callbacks=callbacks,
    verbose=2,
)

In [None]:
# parameters used in models
import pandas as pd

print("Model parameters")
pd.read_excel(
    "https://static-content.springer.com/esm/art%3A10.1038%2Fnbt.3300/MediaObjects/41587_2015_BFnbt3300_MOESM61_ESM.xlsx"
)