<h1><center>RANZCR CLiP - Catheter and Line Position Challenge</center></h1>
<center><img src="https://images.unsplash.com/photo-1440624949267-b8aa7b7293a4?ixlib=rb-1.2.1&ixid=MXwxMjA3fDB8MHxwaG90by1wYWdlfHx8fGVufDB8fHw%3D&auto=format&fit=crop&w=1053&q=80" width="60%"></center>

**In previous versions, I used different keras available models to see how they affect the results.**
**Here is the complete list of <a href='https://keras.io/api/applications/'>[keras available models] </a> . Please feel free to fork this notebook and tweak it using different models/processings. I would be happy to discuss the results and make a team. <span style="color:red">Please do not forget to upvote</span>.**

### Update: Training bigger networks (for example Xception in comparison with EfficientNetB0) with bigger image sizes and batch sizes will exhaust the GPU and result in an error. From version 5, I am using TPU to the train models. The reference notebook I used to learn is <a href='https://www.kaggle.com/xhlulu/ranzcr-efficientnet-tpu-training'>[this great notebook] </a>

#### <span style="color:blue"> Version 12: using Xception, trainable = True, dropout 0.5. batch_size = 16*8, img_size = 900, random_flip_left_right, random_flip_up_down, random_saturation(0.8, 1.2), random_brightness(0.2), random_hue(0.2), random_contrast(0.8, 1.2), max_epoch = 30 with early stopping callback</span>.

With tf.data API, we can create complex input pipelines from simple, reusable pieces. For example, the pipeline for an image model might aggregate data from files in a distributed file system, apply random transformations to each image, and merge randomly selected images into a batch for training.<a href='https://www.tensorflow.org/guide/data'>[Reference] </a> 

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

import warnings
warnings.filterwarnings("ignore")

# ML tools 
import tensorflow as tf
from kaggle_datasets import KaggleDatasets
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Flatten, Activation, Conv2D, MaxPooling2D, Dropout, Conv2D,MaxPooling2D,GlobalAveragePooling2D
from tensorflow.keras.optimizers import Adam
from tensorflow.keras import Model
from tensorflow.keras.applications import Xception
import os
from tensorflow.keras import optimizers
from sklearn.model_selection import train_test_split
from tensorflow.keras.callbacks import ReduceLROnPlateau, ModelCheckpoint, EarlyStopping

In [None]:
df_target = pd.read_csv('../input/ranzcr-clip-catheter-line-classification/train.csv')
display(df_target.head(3))
print(df_target.shape)
df_sample = pd.read_csv('../input/ranzcr-clip-catheter-line-classification/sample_submission.csv')
display(df_sample.head(3))
print(df_sample.shape)

In [None]:
target_cols = df_target.drop(['StudyInstanceUID','PatientID'], axis=1).columns.to_list()

In [None]:
n_classes = len(target_cols)
img_size = 900
n_epochs = 30

In [None]:
def auto_select_accelerator():
    try:
        tpu = tf.distribute.cluster_resolver.TPUClusterResolver()
        tf.config.experimental_connect_to_cluster(tpu)
        tf.tpu.experimental.initialize_tpu_system(tpu)
        strategy = tf.distribute.experimental.TPUStrategy(tpu)
        print("Running on TPU:", tpu.master())
    except ValueError:
        strategy = tf.distribute.get_strategy()
    print(f"Running on {strategy.num_replicas_in_sync} replicas")
    
    return strategy

def build_decoder(with_labels=True, target_size=(img_size, img_size), ext='jpg'):
    def decode(path):
        file_bytes = tf.io.read_file(path) # Reads and outputs the entire contents of the input filename.

        if ext == 'png':
            img = tf.image.decode_png(file_bytes, channels=3) # Decode a PNG-encoded image to a uint8 or uint16 tensor
        elif ext in ['jpg', 'jpeg']:
            img = tf.image.decode_jpeg(file_bytes, channels=3) # Decode a JPEG-encoded image to a uint8 tensor
        else:
            raise ValueError("Image extension not supported")

        img = tf.cast(img, tf.float32) / 255.0 # Casts a tensor to the type float32 and divides by 255.
        img = tf.image.resize(img, target_size) # Resizing to target size
        return img
    
    def decode_with_labels(path, label):
        return decode(path), label
    
    return decode_with_labels if with_labels else decode


def build_augmenter(with_labels=True):
    def augment(img):
        img = tf.image.random_flip_left_right(img)
        img = tf.image.random_flip_up_down(img)
        img = tf.image.random_saturation(img, 0.8, 1.2)
        img = tf.image.random_brightness(img, 0.2)
        img = tf.image.random_contrast(img, 0.8, 1.2)
        img = tf.image.random_hue(img, 0.2)
        return img
    
    def augment_with_labels(img, label):
        return augment(img), label
    
    return augment_with_labels if with_labels else augment

def build_dataset(paths, labels=None, bsize=32, cache=True,
                  decode_fn=None, augment_fn=None,
                  augment=True, repeat=True, shuffle=1024, 
                  cache_dir=""):
    if cache_dir != "" and cache is True:
        os.makedirs(cache_dir, exist_ok=True)
    
    if decode_fn is None:
        decode_fn = build_decoder(labels is not None)
    
    if augment_fn is None:
        augment_fn = build_augmenter(labels is not None)
    
    AUTO = tf.data.experimental.AUTOTUNE
    slices = paths if labels is None else (paths, labels)
    
    dset = tf.data.Dataset.from_tensor_slices(slices)
    dset = dset.map(decode_fn, num_parallel_calls=AUTO)
    dset = dset.cache(cache_dir) if cache else dset
    dset = dset.map(augment_fn, num_parallel_calls=AUTO) if augment else dset
    dset = dset.repeat() if repeat else dset
    dset = dset.shuffle(shuffle) if shuffle else dset
    dset = dset.batch(bsize).prefetch(AUTO) # overlaps data preprocessing and model execution while training
    return dset


In [None]:
COMPETITION_NAME = "ranzcr-clip-catheter-line-classification"
strategy = auto_select_accelerator()
batch_size = strategy.num_replicas_in_sync * 16
print('batch size', batch_size)
GCS_DS_PATH = KaggleDatasets().get_gcs_path(COMPETITION_NAME)

In [None]:
load_dir = '/kaggle/input/ranzcr-clip-catheter-line-classification/'
df_train = pd.read_csv(load_dir + 'train.csv')
paths = GCS_DS_PATH + "/train/" + df_train['StudyInstanceUID'] + '.jpg'

df_sub = pd.read_csv(load_dir + 'sample_submission.csv')
test_paths = GCS_DS_PATH + "/test/" + df_sub['StudyInstanceUID'] + '.jpg'

# Get the multi-labels
label_cols = df_sub.columns[1:]
labels = df_train[label_cols].values

In [None]:
# Train test split
train_paths, valid_paths, train_labels, valid_labels = train_test_split(paths, labels, test_size=0.12, random_state=42)

In [None]:
# Build the tensorflow datasets

decoder = build_decoder(with_labels=True, target_size=(img_size, img_size))

# Build the tensorflow datasets
dtrain = build_dataset(
    train_paths, train_labels, bsize=batch_size, decode_fn=decoder
)

dvalid = build_dataset(
    valid_paths, valid_labels, bsize=batch_size, 
    repeat=False, shuffle=False, augment=False, decode_fn=decoder
)

## Visualizing some images in a batch

In [None]:
data, _ = dtrain.take(2)
images = data[0].numpy()

In [None]:
fig, axes = plt.subplots(4, 4, figsize=(24,24))
axes = axes.flatten()
for img, ax in zip(images, axes):
    ax.imshow(img)
    ax.axis('off')
plt.tight_layout()
plt.show()

In [None]:
with strategy.scope():
    net = Xception(include_top=False,input_shape=(img_size, img_size, 3), weights='imagenet')
    x = net.output
    x = GlobalAveragePooling2D()(x)
    x = Dropout(0.5)(x)
    output = Dense(n_classes, activation='sigmoid')(x)
    model = Model(inputs=net.input, outputs=output)
    model.compile(optimizers.Adam(lr=1e-3),loss='binary_crossentropy',metrics=[tf.keras.metrics.AUC(multi_label=True)])

### The whole network architecture

In [None]:
tf.keras.utils.plot_model(model, show_shapes=True)

In [None]:
model.summary()

In [None]:
rlr = ReduceLROnPlateau(monitor = 'val_loss', factor = 0.1, patience = 2, verbose = 0, 
                                min_delta = 1e-4, min_lr = 1e-6, mode = 'min')
        
ckp = ModelCheckpoint('model.h5',monitor = 'val_loss',
                      verbose = 0, save_best_only = True, mode = 'min')
        
es = EarlyStopping(monitor = 'val_loss', min_delta = 1e-4, patience = 5, mode = 'min', 
                    restore_best_weights = True, verbose = 0)

In [None]:
steps_per_epoch = train_paths.shape[0] // batch_size

In [None]:
history = model.fit(dtrain,                      
                    validation_data=dvalid,                                       
                    epochs=n_epochs,
                    callbacks=[rlr,es,ckp],
                    steps_per_epoch=steps_per_epoch,
                    verbose=1)

In [None]:
plt.rcParams.update({'font.size': 16})
hist = pd.DataFrame(history.history)
fig, (ax1, ax2) = plt.subplots(figsize=(12,12),nrows=2, ncols=1)
hist['loss'].plot(ax=ax1,c='k',label='training loss')
hist['val_loss'].plot(ax=ax1,c='r',linestyle='--', label='validation loss')
ax1.legend()
hist['auc'].plot(ax=ax2,c='k',label='training AUC')
hist['val_auc'].plot(ax=ax2,c='r',linestyle='--',label='validation AUC')
ax2.legend()
plt.show()

### The model is saved using the checkpoint callback.
## For the results, please see the second notebook <a href='https://www.kaggle.com/sinamhd9/keras-models-image-data-generator-part2/'> [Part 2]</a>

## Please upvote this notebook and its reference if you found them useful. Thanks!