In [None]:
# This Python 3 environment comes with many helpful analytics libraries installed
# It is defined by the kaggle/python Docker image: https://github.com/kaggle/docker-python
# For example, here's several helpful packages to load

import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)
import time

# Input data files are available in the read-only "../input/" directory
# For example, running this (by clicking run or pressing Shift+Enter) will list all files under the input directory

import os
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))

# You can write up to 20GB to the current directory (/kaggle/working/) that gets preserved as output when you create a version using "Save & Run All" 
# You can also write temporary files to /kaggle/temp/, but they won't be saved outside of the current session

In [None]:
import tensorflow as tf
import tensorflow_probability as tfp


device_name = tf.test.gpu_device_name()
if "GPU" not in device_name:
    print("GPU device not found")
print('Found GPU at: {}'.format(device_name))

# Introduction


The genes expressions in a cell are linked to its environment, i.e. if a cell is stimulated from miscellaneous factors of it's environement then with intracellular transduction of the signal, this cell's genome is stimulated.
Of course, this is a really poor modelling of the problem, but it will help us explain the general idea and goals of this paper.

The goal is predicting the mechanism of action of molecules through records of 772 genes expressions and cell survivability markers.

In machine learning, given the data and answers of a problematic, we are trying to get its rules in order to predict further answers. In this competition, given the cellular response of a stimulus we are trying to get the "cellular rules" that implied its response, i.e we are trying to find the mechanism of action of this stimulus given the genes expressions.


### A. A stimulus implies a cellular response, thus a modulation of the genes expressions:

Let's think of the molecule we are trying to predict its MoA from as the "environment stimulus" we sent to the cell.
If the cell properly receives the stimulus then it will implies a cellular response. A cellular response we will be able to monitor through its gene expression.
This is obviously false, but let's resolve 2 little problematics:


    a) A properly received stimulus: What is it?
    
If the stimulus is not strong enough or if the cell don't have any receptor to perceive this signal, then it's trying to hear imperceptible sounds or to listen to ultrasound with human ears. The stimulus must be strong enough and needs specific organs of perception (=Receptors) to be perceived.
Thus, a cell will implies a cellular response we can monitor if the molecule is presented to the cell in a sufficient concentration and only if the cell has receptors able to "perceive" this molecule.
This idea of the receptor-molecule perception will be detailed in B.1)
Let's modelize this idea of received stimulus as a function of the stimulus s: f(s) = ReLU(r*s + r') -> r in R(0,+), r' in R-. If no receptors then r=0
We will simplify this model by defining a valid stimulus (vS) which is a properly received stimulus: f(vS) = vS


    b) Monitoring the cellular response: Why ?

The theory in this paper is: if a stimulus is received it will implies a cellular response, and we will monitor it through gene expression.
This is obviously false but we will define a valid CR (=cellular response) as a CR we can monitor through gene expression.
We will also extend the definition of a valid stimulus as a properly received stimulus that implies a valid CR (vCR).



    c) Studying the case where a stimulus can be perceived by several different receptors :

If a vS is properly perceived by several different types of receptors(different receptors implying different vCR), this means we can modelize the total of vCR as a vector.
Thus VCR will be the notation of a vector stacking vertically all vCR.


    d) Pathway: Studying the case where one specific receptor can implies several (positif or negative) CR and define what a positive or negative CR is.
    
We modelized that a receptor will implies a vCR but a vS can activate the receptor or inhibate it. Thus activating a receptor will be a positive vS and inhibating it will be a negative vS.
If we imagine a single receptor able to implies several CR, we can start speaking of pathways.
A pathway(PW) will be an internal mechanism that will implies a specific vCR. Activating or Inhibiting a PW will implies positively or negatively modulating genes expressions. 



### B. Stimulus, Cellular response, pathways and Mechanisms of Action


    1) Stimulus and Cellular response: the Molecule-Receptor perception
    
A receptor is able to perceive a molecule because of several factors (3D layout of the molecule, Electronegativity, polarity). A molecule is able to be "perceived" by several types of receptors.
Because of its 3D layout and physico-chemical properties a molecule will be able to implies a VCR.
We can imagine that studying VCR is, in a way, studying the 3D layout and physico-chemical properties of a molecule.
Then when we will try to predict MoA from data, we will try to predict this properties from the molecule.

The properties of a stimulus is what implies the molecule-receptor perception and the type of the receptor is what implies a vCR.

A vCR is implied by several positive or negative activation of intracellular PW that will implies regulation and modulation of the genes expressions. Regulation and modulation of the genes expressions being what we monitor and defining what a vCR is.

    2) Chain Reaction

The CR and PW we talk about where downstream. We went from the stimulus through the receptor, then by several pathways we finally modulate the genes expression. This was downward or downstream.

Activating a gene will implies DNA transcription, then RNA transcription into Proteins. It's a very simplified model.
This proteins could be perceived by intracellular receptors or modulate the activity of other proteins implied in a specific PW or even be itself another receptor. Thus a protein will modulate the activity of PW, i.e. activating a gene will modulate the expression of other genes.

Thus the vCR is the sum of all gene modulation implied by a chain reaction of several gene modulation.

### C. What is a CoordiNET and why ?

    1) CoordiNET?
    

    2) Why?
    
Let's imagine the gene expressions in our data as a time-series, i.e. we will have the level of expression of a first gene and after this the level of expression of a second gene modified because of the first one.
The problem of MoA will be very easy to check. Because we will find obviously high level of correlation of several following genes and we will be able to say "hey this molecule is negatively modulating this genes, those genes are implied in pro-inflammatory response, the mechanism of action of this molecule could be an inhibitor of NF-κB.


### Going farther

This MOA predictor is obviously a promising tool for High-Throughput Screening.
But I think it could be even more interesting, given a specific tissue and a specific disease, we could check the overall genes expression of those cells and then being able with several drugs to bring back the gene expression to a physiological level.
This means, instead of taking normal drugs, a specific cocktail of drugs in little concentration could bring back a degenerate homeostasis to a normal state.

# Ideas

First trained ConvNet -> We cluster by drugs structure with train_drug.csv

We freeze the convnet and train Dense layers on train_targets_scored

There is 729! permutations possible, we can't try them all! #about 10¹⁷⁷² permutations
We have to find a better way to get a good layout.

### I. First attempt : Correlation matrix
    Version: 0, 1
    Correlation matrix seems promising but highly computational! We face overfitting and it would take too much time to try fixing it.

    
### II. Compute Γ and position genes in a 27x27 or 9x9x9 matrix -- Algorithmic solution
    Matrices dimension: 27*27=9*9*9=729
    ---> We will either select the 729 best genes or compute PCA

    Let X a tensor of shape (729,1), the representation of the 729 genes vertically stacked.
    Let xᵢ, xⱼ respectively the i-th and j-th gene of X.
    Let γᵢ,ⱼ the correlation value of xᵢ, xⱼ.
    ∀i in [0:728], Γᵢ = (Σⱼ₌₀⁷²⁸|γᵢ,ⱼ|)-1
    Then get the indices that would sort the array of Γᵢs.
    
    Next idea, genes with high Γ are more likely correlated to other genes than low Γ genes.
    Thus, highly correlated genes means genes that modulate the expression of each others.
    We will put low Γ genes in the corners and borders of the matrix because they are correlated to less genes.
    
    All positions of the matrix must be ranked: A position with high rank means that it's a position where its correlation has a high influence on its neighbores
    
    The position ranks and Γ ranks will help us filling all position with genes
    
    Limitations: obviously we are badly positioning the genes because we didn't care about the correlation between each xᵢ, xⱼ
    
    
### III. Use Γ and a neural network to find the perfect coordinates of the genes in a 27x27 or 9x9x9 matrix -- CoordiNET
    Find a way to make a model trying to find the 3 dimensional layout of a set of 729 flatten features.
    Consume to much ressources and loss function was not adapted: 32 minutes per epoch with very high loss values.
    
# IV. Auto-coordinating layers + Residual blocks

## TO DO:
- Try a CNN with Residual blocks
- Try average pooling
- Try PCA
- DRUGS CLUSTERING
- Correlation matrix with only the non-zero rows

In [None]:
##################### HYPER PARAMS #####################
GENES_REGEX = '^g-*|^c-'
N_SPLITS = 3

BATCH_SIZE=128
EPOCHS=200

LEARNING_RATE = 4e-1
LABEL_SMOOTHING = 1e-5

RLR_PATIENCE = 4
RLR_FACTOR = .7
ES_PATIENCE = 20
########################################################

In [None]:
##################### HYPER PARAMS #####################
REGULARIZATION = 1

INPUT_SHAPE = 729
AUTOCOORD_L1, AUTOCOORD_L2 = 1e-3, 1e-2
AUTO_SIZE = 9**3
RESHAPE_SIZE = (9,9,9,1)
AUTO_DROPOUT = 0

CONV_FILTER = 2
CONV_DROPOUT = .2
RESBLOCK_L1, RESBLOCK_L2 = 1e-12, 1e-11

DENSE_L1, DENSE_L2 = 1e-9, 1e-8
DENSE_DROPOUT = .4

OUTPUT_SIZE = 206
########################################################

In [None]:
print(f'''##################### HYPER PARAMS #####################\n
      BATCH_SIZE = {BATCH_SIZE} \n
      EPOCHS = {EPOCHS}\n
      N_SPLITS = {N_SPLITS}\n
      LABEL_SMOOTHING = {LABEL_SMOOTHING}\n
      GENES_REGEX = {GENES_REGEX}\n
      RLR_PATIENCE = {RLR_PATIENCE}\n
      RLR_FACTOR = {RLR_FACTOR}\n
      ES_PATIENCE = {ES_PATIENCE}\n
########################################################
      ''')

print(f'''##################### HYPER PARAMS #####################\n
      LEARNING_RATE = {LEARNING_RATE}\n
      REGULARIZATION = {REGULARIZATION}\n
      INPUT_SHAPE = {INPUT_SHAPE}\n
      AUTOCOORD_L1, AUTOCOORD_L2 = {AUTOCOORD_L1}, {AUTOCOORD_L2}\n
      AUTO_SIZE = {AUTO_SIZE}\n
      RESHAPE_SIZE = {RESHAPE_SIZE}\n
      AUTO_DROPOUT = {AUTO_DROPOUT}\n
      CONV_FILTER = {CONV_FILTER}\n
      CONV_DROPOUT = {CONV_DROPOUT}\n
      RESBLOCK_L1, RESBLOCK_L2 = {RESBLOCK_L1}, {RESBLOCK_L2}\n
      DENSE_L1, DENSE_L2 = {DENSE_L1}, {DENSE_L2}\n
      DENSE_DROPOUT = {DENSE_DROPOUT}\n
      OUTPUT_SIZE = {OUTPUT_SIZE}\n
########################################################
      ''')

# Data showcase

In [None]:
def head(numpy_array):
    return pd.DataFrame(numpy_array)

train_features = pd.read_csv('../input/lish-moa/train_features.csv')
train_features.head()

genes = train_features.filter(regex=(GENES_REGEX)).to_numpy()
head(genes)


targets = pd.read_csv('../input/lish-moa/train_targets_scored.csv').drop(['sig_id'],axis=1).to_numpy()
non_scored_targets = pd.read_csv('../input/lish-moa/train_targets_nonscored.csv').drop(['sig_id'],axis=1).to_numpy()
drug_targets = pd.get_dummies(pd.read_csv('../input/lish-moa/train_drug.csv').drop(['sig_id'],axis=1)).to_numpy()
all_targets = np.concatenate([targets, non_scored_targets], axis=-1)

train_features.head()

# Data preprocessing

In [None]:
def cleanFeatures(features):
    features_ = features.copy()
    cp_type = {'trt_cp': 0, 'ctl_vehicle': 1}
    cp_dose = {'D1': 0, 'D2': 1}
    features_['cp_type'] = features_['cp_type'].map(cp_type)
    features_['cp_dose'] = features_['cp_dose'].map(cp_dose)
    features_ = pd.get_dummies(features_, columns=['cp_time'])
    features_.drop(['sig_id'], inplace=True, axis=1)
    return features_

X_clean = cleanFeatures(train_features)
X_clean

In [None]:
def rankGauss(x):
    from scipy.special import erfinv
    N = x.shape[0]
    temp = x.argsort(axis=1)
    rank_x = temp.argsort(axis=1) / N
    rank_x -= rank_x.mean()
    rank_x *= 2
    efi_x = erfinv(rank_x)
    efi_x -= efi_x.mean()
    return efi_x

In [None]:
REGEX = '^cp_*'

X = rankGauss(genes)
other_X = X_clean.filter(regex=(REGEX)).to_numpy()  #X_clean['cp_type'].to_numpy()

X.shape, other_X.shape

In [None]:
# from sklearn.preprocessing import quantile_transform
# def quantileTransform(X, seed=51):
#     X_ = X.copy()
#     X_ = quantile_transform(X_, n_quantiles=100,output_distribution='normal', random_state=seed, axis=0)
#     X_ = pd.DataFrame(X_)
#     return X_
# train = quantileTransform(X_clean)
# test = quantileTransform(test)

In [None]:
# from sklearn.decomposition import PCA

# pca = PCA(n_components=729)
# pca.fit(train.append(test))
# train = pca.transform(train)
# test = pca.transform(test)

# Iterative stratification

In [None]:
from shutil import copyfile
copyfile(src = "../input/minimalistic-v2/ml_stratifiers.py", dst = "../working/ml_stratifiers.py")
from ml_stratifiers import MultilabelStratifiedKFold



def getMskfDataset(X, other_X, y, all_y, n_splits=N_SPLITS, SEED=7):
    mskf = MultilabelStratifiedKFold(n_splits=n_splits, random_state=SEED, shuffle=True) # test_size=0.2
    for i, (train_idx, val_idx) in enumerate(mskf.split(X, y)):
        X_train, other_X_train, y_train = X[train_idx], other_X[train_idx], y[train_idx] # np.expand_dims(other_X[train_idx],1)
        X_val, other_X_val, y_val = X[val_idx], other_X[val_idx], y[val_idx] # np.expand_dims(other_X[val_idx],1)
        
        X_train_0, other_X_train_0, y_train_0 = X[train_idx], other_X[train_idx], all_y[train_idx]
        X_val_0, other_X_val_0, y_val_0 = X[val_idx], other_X[val_idx], all_y[val_idx]
        
        X_train_1, other_X_train_1, y_train_1 = X[train_idx], other_X[train_idx], y[train_idx]
        X_val_1, other_X_val_1, y_val_1 = X[val_idx], other_X[val_idx], y[val_idx]
        yield X_train_0, other_X_train_0, y_train_0, X_val_0, other_X_val_0, y_val_0, X_train_1, other_X_train_1, y_train_1, X_val_1, other_X_val_1, y_val_1

# Data Augmentation

Data augmentation with over- and under-correlated features

In [None]:
# def dataAugmentation(X, y, randomness=.3):
#     dim0 =X.shape[0]
#     dim1 = X.shape[1]
#     dim2 = X.shape[2]
#     dim3 = X.shape[3]
#     for i in range(dim0):
#         translation_matrix_1 = np.random.rand(dim1, dim2, dim3)    
#         translation_matrix_2 = np.random.rand(dim1, dim2, dim3) 
#         randint_matrix = np.random.randint(-1,1, (dim1, dim2, dim3))
#         X_i = X[i] + X[i]*(randomness*translation_matrix_1) + (randomness*(translation_matrix_2*randint_matrix))
#         yield X_i, y[i]

In [None]:
# ds_to_augment = tf.data.Dataset.from_tensor_slices((X_ds,targets))

# aug_ds = tf.data.Dataset.from_generator(
#     dataAugmentation, args=[ds_to_augment._tensors[0], ds_to_augment._tensors[1]],
#     output_signature=(
#         tf.TensorSpec(shape=(729,1), dtype=tf.float64),
#         tf.TensorSpec(shape=(206), dtype=tf.int64))
# ).batch(128)

# ConvNET

In [None]:
class AutoCoordBlock(tf.keras.layers.Layer):
    def __init__(self, dense_size=AUTO_SIZE, trainable=True):
        super(AutoCoordBlock, self).__init__()
        self.dense_size= dense_size
        self.l2 = tf.keras.regularizers.L2(l2=AUTOCOORD_L1)
        self.l1 = tf.keras.regularizers.L1(l1=AUTOCOORD_L2*1e-7)
        self.dense = tf.keras.layers.Dense(self.dense_size, kernel_regularizer=self.l2 ,bias_regularizer=self.l1, trainable=trainable)
        self.batch_norm = tf.keras.layers.BatchNormalization(trainable=trainable)
        self.relu = tf.keras.layers.ReLU()
        self.dropout = tf.keras.layers.Dropout(AUTO_DROPOUT)
        
        self.reshape= tf.keras.layers.Reshape(RESHAPE_SIZE)
    
    @tf.function
    def call(self, x, training=False):
        x = self.dense(x, training=training)
        x = self.batch_norm(x,training=training) 
        x = self.relu(x, training=training)
        x = self.reshape(x, training=training)
        x =  self.dropout(x, training=training) 
        return x


class ResBlock(tf.keras.layers.Layer):
    def __init__(self, pre_filter_size, k3_filter_size, k1_filter_size, padding='same', is_first=False, trainable=True):
        super(ResBlock, self).__init__()
        self.padding = padding
        self.l2 = tf.keras.regularizers.L2(l2=RESBLOCK_L1)
        self.l1 = tf.keras.regularizers.L1(l1=RESBLOCK_L2)
        self.is_first = is_first
        self.dropout = tf.keras.layers.Dropout(CONV_DROPOUT)
        
        if not is_first:
            self.pre_filter_size = pre_filter_size        
            self.pre_conv = tf.keras.layers.Conv3D(self.pre_filter_size, (1,1,1), padding=self.padding, kernel_regularizer=self.l2 ,bias_regularizer=self.l1, trainable=trainable)
            self.batch_norm_pre = tf.keras.layers.BatchNormalization(trainable=trainable)
            self.relu_pre = tf.keras.layers.ReLU()

            self.k1_filter_size = k1_filter_size
            self.conv_k1 = tf.keras.layers.Conv3D(self.k1_filter_size, (5,5,5), padding=self.padding, kernel_regularizer=self.l2 ,bias_regularizer=self.l1, trainable=trainable)
            self.batch_norm_k1 = tf.keras.layers.BatchNormalization(trainable=trainable)
            self.relu_k1 = tf.keras.layers.ReLU()
        
        self.k3_filter_size = k3_filter_size
        self.conv_k3 = tf.keras.layers.Conv3D(self.k3_filter_size, (3,3,3), padding=self.padding, kernel_regularizer=self.l2 ,bias_regularizer=self.l1, trainable=trainable)     
        self.batch_norm_k3 = tf.keras.layers.BatchNormalization(trainable=trainable)       
        self.relu_k3 = tf.keras.layers.ReLU()
        
        self.concat = tf.keras.layers.Concatenate(axis=-1)
    
    @tf.function
    def call(self, x, training=False):
        if not self.is_first:
            x = self.pre_conv(x, training=training)
            x = self.batch_norm_pre(x,training=training)
            x = self.relu_pre(x, training=training)
            x = self.dropout(x, training = training)

            x_k1 = self.conv_k1(x, training=training)
            x_k1 = self.batch_norm_k1(x_k1, training=training) 
            x_k1 = self.relu_k1(x_k1, training=training)
            x_k1 = self.dropout(x, training = training)
        
        x_k3 = self.conv_k3(x, training=training)
        x_k3 = self.batch_norm_k3(x_k3,training=training)
        x_k3 = self.relu_k3(x_k3, training=training)
        x_k3 = self.dropout(x, training = training)
        
        if not self.is_first:
            x_ = self.concat([x_k3, x_k1, x])
        else: 
            x_ = self.concat([x_k3, x])
        return x_

    
class ReduceBlock(tf.keras.layers.Layer):
    def __init__(self, trainable=True):
        super(ReduceBlock, self).__init__()
        self.l2 = tf.keras.regularizers.L2(l2=RESBLOCK_L1)
        self.l1 = tf.keras.regularizers.L1(l1=RESBLOCK_L2)
        self.padding = 'valid'
        self.flatten = tf.keras.layers.Flatten()
        self.dropout = tf.keras.layers.Dropout(CONV_DROPOUT)
        
        self.filter_size_0 = 8*CONV_FILTER
        self.conv_0 = tf.keras.layers.Conv3D(self.filter_size_0, (3,3,3), padding=self.padding, kernel_regularizer=self.l2 ,bias_regularizer=self.l1, trainable=trainable)     
        self.batch_norm_0 = tf.keras.layers.BatchNormalization(trainable=trainable)       
        self.relu_0 = tf.keras.layers.ReLU()
        
        self.filter_size_1 = 16*CONV_FILTER
        self.conv_1 = tf.keras.layers.Conv3D(self.filter_size_1, (3,3,3), padding=self.padding, kernel_regularizer=self.l2 ,bias_regularizer=self.l1, trainable=trainable)     
        self.batch_norm_1 = tf.keras.layers.BatchNormalization(trainable=trainable)       
        self.relu_1 = tf.keras.layers.ReLU()
        
        self.filter_size_2 = 32*CONV_FILTER
        self.conv_2 = tf.keras.layers.Conv3D(self.filter_size_2, (3,3,3), padding=self.padding, kernel_regularizer=self.l2 ,bias_regularizer=self.l1, trainable=trainable)     
        self.batch_norm_2 = tf.keras.layers.BatchNormalization(trainable=trainable)       
        self.relu_2 = tf.keras.layers.ReLU()
        
        self.concat = tf.keras.layers.Concatenate(axis=-1)
    
    @tf.function
    def call(self, x, training=False):
        x_ = self.flatten(x)
        x = self.conv_0(x, training = training)
        x = self.batch_norm_0(x, training = training)
        x = self.relu_0(x, training = training)
        x = self.dropout(x, training = training)
        
        x = self.conv_1(x, training = training)
        x = self.batch_norm_1(x, training = training)
        x = self.relu_1(x, training = training)
        x = self.dropout(x, training = training)
        
        x = self.conv_2(x, training = training)
        x = self.batch_norm_2(x, training = training)
        x = self.relu_2(x, training = training)
        x = self.dropout(x, training = training)
        x = self.flatten(x)
        return self.concat([x, x_])
    
    
class DenseBlock(tf.keras.layers.Layer):
    def __init__(self, dense_size=512, x_is_flat=False, dropout=DENSE_DROPOUT):
        super(DenseBlock, self).__init__()
        self.l2 = tf.keras.regularizers.L2(l2=DENSE_L1)
        self.l1 = tf.keras.regularizers.L1(l1=DENSE_L2)
        self.x_is_flat = x_is_flat
        if not x_is_flat:
            self.flatten = tf.keras.layers.Flatten()
        self.concat = tf.keras.layers.Concatenate(axis=1)
        self.dense = tf.keras.layers.Dense(dense_size, kernel_regularizer=self.l2 ,bias_regularizer=self.l1)
        self.batch_norm = tf.keras.layers.BatchNormalization()
        self.relu = tf.keras.layers.ReLU()
        self.dropout = tf.keras.layers.Dropout(dropout)
        
    @tf.function
    def call(self, x, other_x, training=False):
        if not self.x_is_flat:
            x_ = self.concat([self.flatten(x), tf.cast(other_x, 'float')])
        else:
            x_ = self.concat([x, tf.cast(other_x, 'float')])
        x_ = self.dense(x_, training=training)
        x_ = self.batch_norm(x_,training=training) 
        x_ = self.relu(x_, training=training)
        x_ = self.dropout(x_, training=training) 
        return x_
    
    

In [None]:
class AutoCoordNet(tf.keras.Model):
    def __init__(self, output_size=OUTPUT_SIZE, everything_trainable=True):
        super(AutoCoordNet, self).__init__()
        self.auto_block = AutoCoordBlock(trainable=everything_trainable)
        self.res_block_0 = ResBlock(None, 1*CONV_FILTER, None, is_first=True, trainable=everything_trainable)
        self.res_block_1 = ResBlock(2*CONV_FILTER, 2*CONV_FILTER, 1*CONV_FILTER, trainable=everything_trainable)
        self.res_block_2 = ResBlock(4*CONV_FILTER, 4*CONV_FILTER, 2*CONV_FILTER, trainable=everything_trainable)
        self.reduce_block = ReduceBlock(trainable=everything_trainable)
        self.dense0 = DenseBlock(x_is_flat=True)
        self.dense1 = DenseBlock(dense_size=256, x_is_flat=True)
        self.output_layer = tf.keras.layers.Dense(output_size, activation='sigmoid')
        self.history = {'loss':[], 'val_loss':[] }
        
        self.loss_metrics = tf.keras.metrics.BinaryCrossentropy()

#     @tf.function
    def train(self, train_ds, validation_ds, epochs, reduce_lr, early_stopping):
        self.onTrainBegin(reduce_lr, early_stopping)
        for epoch in range(1,epochs+1):
            self.loss_metrics.reset_states()
            start_time = time.time()
            for step,(x_batch, other_x_batch, y_batch) in enumerate(train_ds):           
                self.train_step(x_batch, other_x_batch, y_batch)
            loss = self.loss_metrics.result().numpy()
            self.history['loss'].append(loss)

            self.loss_metrics.reset_states()
            for (val_x_batch, val_other_x_batch, val_y_batch) in validation_ds:
                self.val_step(val_x_batch, val_other_x_batch, val_y_batch)
            val_loss = self.loss_metrics.result().numpy()
            self.history['val_loss'].append(val_loss)

            print(f"Epoch {epoch}/{epochs} -","loss: %.4f,"%(float(loss)),"val_loss: %.4f,"%(float(val_loss)), "time spent: %.2fs."%(time.time()-start_time))
            self.onEpochEnd(epoch, val_loss)
            
            if self.stop_training == True:
                return self.history
        return self.history
                
#     @tf.function
    def predict(self, ds):
        y = []
        for (x_batch, other_x_batch) in ds:
            logits = self(x_batch, other_x_batch, training=False)
            y.extend(logits.numpy())
        return np.array(y)
        
    @tf.function    
    def train_step(self, x, other_x, y):
        with tf.GradientTape() as tape:
            y_pred = self(x, other_x, training=True)
            loss = self.compiled_loss(y, y_pred, regularization_losses=self.losses)
        gradients = tape.gradient(loss, self.trainable_variables)
        self.optimizer.apply_gradients(zip(gradients, self.trainable_variables))
        self.loss_metrics.update_state(y, y_pred)
    
    @tf.function
    def val_step(self, x, other_x, y):
        val_logits = self(x, other_x, training=False)
        self.loss_metrics.update_state(y, val_logits)

    
    @tf.function
    def call(self, x, other_x, training=False):
        x = self.auto_block(x, training=training)
        x = self.res_block_0(x, training=training)
        x = self.res_block_1(x, training=training)
        x = self.res_block_2(x, training=training)
        x = self.reduce_block(x, training=training)
        x = self.dense0(x, other_x, training=training)
        x = self.dense1(x, other_x, training=training)
        return self.output_layer(x, training=training)
    
    @tf.function
    def onTrainBegin(self, reduce_lr, early_stopping):
        self.reduce_lr, self.early_stopping = reduce_lr, early_stopping
        self.reduce_lr.set_model(self), self.early_stopping.set_model(self)
        self.reduce_lr.on_train_begin(), self.early_stopping.on_train_begin()
    
    def auto_train(self, trainable):
        self.auto_block.trainable = trainable
        
    def res_train(self, trainable):
        self.res_block_0.trainable = trainable    
        self.res_block_1.trainable = trainable
        self.res_block_2.trainable = trainable
        self.reduce_block.trainable = trainable

    def dense_train(self, trainable):
        self.dense0.trainable = trainable
        self.dense1.trainable = trainable
        self.output_layer.trainable = trainable
        
#     @tf.function
    def onEpochEnd(self, epoch, val_loss):
        logs = {'val_loss': val_loss}
        self.reduce_lr.on_epoch_end(epoch,logs), self.early_stopping.on_epoch_end(epoch,logs)
    
    def setOutputSize(self, size):
        self.output_layer = tf.keras.layers.Dense(size, activation='sigmoid')

In [None]:
test_features = pd.read_csv('../input/lish-moa/test_features.csv')
test_genes = test_features.filter(regex=(GENES_REGEX)).to_numpy()
test_clean = cleanFeatures(test_features)
X_test = rankGauss(test_genes)
other_X_test = test_clean.filter(regex=(REGEX)).to_numpy() # REGEX = '^cp_*'
test_ds = tf.data.Dataset.from_tensor_slices((X_test, other_X_test)).batch(BATCH_SIZE)
# X_test.shape, other_X_test.shape, test_ds

In [None]:
%matplotlib inline
import matplotlib.pyplot as plt
def plotHistory(model):
    loss = model.history['loss']
    val_loss = model.history['val_loss']

    epochs = range(len(loss))
    start_epoch=5

    plt.plot(epochs[start_epoch:], loss[start_epoch:], 'r', label='Training Loss')
    plt.plot(epochs[start_epoch:], val_loss[start_epoch:], 'b', label='Validation Loss')
    plt.title('Training and validation loss')
    plt.legend()
    plt.show()
    model.summary()

In [None]:
def getCallbacks():
    reduce_lr = tf.keras.callbacks.ReduceLROnPlateau(
        monitor='val_loss',
        mode='min',
        factor=RLR_FACTOR,
        patience=RLR_PATIENCE,
        verbose=2)
    early_stopping = tf.keras.callbacks.EarlyStopping(
        monitor='val_loss',
        mode='min',
        patience=ES_PATIENCE,
        restore_best_weights=True,
        verbose=2)
    return reduce_lr, early_stopping

In [None]:
def getInfer():
    stratified_ds_generator = getMskfDataset(X, other_X, targets, all_targets) 
    preds=[] 
    i = 0
    for (X_train_0, other_X_train_0, y_train_0, X_val_0, other_X_val_0, y_val_0, X_train_1, other_X_train_1, y_train_1, X_val_1, other_X_val_1, y_val_1) in stratified_ds_generator: 
        i+=1
        print(f'FOLD {i} starting...')
#         train_ds_0 = tf.data.Dataset.from_tensor_slices((X_train_0, other_X_train_0, y_train_0)).batch(BATCH_SIZE)
#         val_ds_0 = tf.data.Dataset.from_tensor_slices((X_val_0, other_X_val_0, y_val_0)).batch(BATCH_SIZE)
#         model_0 = AutoCoordNet(output_size=all_targets.shape[-1])
#         model_0.compile(loss = tf.keras.losses.BinaryCrossentropy(label_smoothing = LABEL_SMOOTHING),
#                         optimizer = tf.keras.optimizers.SGD(learning_rate=LEARNING_RATE, momentum=.9))
#         rlr, es = getCallbacks()
#         model_0.train(train_ds_0, val_ds_0, epochs=EPOCHS, reduce_lr=rlr, early_stopping=es)
#         plotHistory(model_0)
#         model_0.save_weights('model_weights')
        
        train_ds_1 = tf.data.Dataset.from_tensor_slices((X_train_1, other_X_train_1, y_train_1)).batch(BATCH_SIZE)
        val_ds_1 = tf.data.Dataset.from_tensor_slices((X_val_1, other_X_val_1, y_val_1)).batch(BATCH_SIZE)
        model_1 = AutoCoordNet()
#         model_1.load_weights('model_weights', by_name=False)
        model_1.setOutputSize(OUTPUT_SIZE)
        model_1.compile(loss = tf.keras.losses.BinaryCrossentropy(label_smoothing = LABEL_SMOOTHING),
                        optimizer = tf.keras.optimizers.SGD(learning_rate=LEARNING_RATE, momentum=.9))
        rlr, es = getCallbacks()
        model_1.train(train_ds_1, val_ds_1, epochs=EPOCHS, reduce_lr=rlr, early_stopping=es)
        plotHistory(model_1)        
        pred = model_1.predict(test_ds)
        preds.append(pred)  
        
    infer = np.zeros((test_features.shape[0],206))
    for pred_ in preds:
        infer += pred_
    infer /= N_SPLITS
    return infer

# Prediction

In [None]:
infer = getInfer()
ctl_idx = np.where(test_features.iloc[:,1].to_numpy() == 'ctl_vehicle')
infer[ctl_idx] = 0

In [None]:

sub = pd.read_csv('/kaggle/input/lish-moa/sample_submission.csv')

sub.iloc[:,1:] = infer

sub.to_csv('submission.csv', index=False)