added dataset: https://www.kaggle.com/srg9000/cassava-plant-disease-merged-20192020 (Merged data from 2019 and 2020 competition)

# Using the combined and unlabelled data to pretrain a model
One of the methods to make the network learn important image features is training it with autoencoders.

Autoencoders are pair of Encoder and Decoder networks which try to recreate the source image (or some variant, such as denoised image). The encoder encodes the image in a smaller dimension, whereas the decoder tries to recreate the image from the encodings produced by decoder.

More on Autoencoders: https://en.wikipedia.org/wiki/Autoencoder

In [None]:
import numpy as np
import tensorflow as tf
import tensorflow.keras as keras

In [None]:
IMAGE_SIZE = [512, 512]

## Image augmentation and creating generators

In [None]:
def preprocess_func(image):
    image = tf.image.random_saturation(image, 0.9, 2)
    image = tf.image.random_contrast(image, lower=0.8, upper=1.2)
    return image

datagen = tf.keras.preprocessing.image.ImageDataGenerator(
    rescale=1./255,
    rotation_range=20,
    brightness_range=(0.8, 1.2),
    shear_range=5.0,
    zoom_range=0.2, 
    fill_mode="constant",
    validation_split=0.3,
    preprocessing_function=preprocess_func
)


In [None]:
train_gen = datagen.flow_from_directory('../input/cassava-plant-disease-merged-20192020/extra_images', target_size=(512, 512), batch_size=3, class_mode='input', subset='training')
val_gen = datagen.flow_from_directory('../input/cassava-plant-disease-merged-20192020/extra_images', target_size=(512, 512), batch_size=3, class_mode='input', subset='validation')

In [None]:
# Visualize a batch
import matplotlib.pyplot as plt
x = next(iter(train_gen))
plt.imshow(x[0][0])
plt.show()
plt.imshow(x[0][1])
plt.show()
plt.imshow(x[0][2])
plt.show()

# Creating the model

Here, I have used an EfficientNetB5 as the base model and instead of flattening and reshaping the outputs, I have taken the upscaling from 2D activations only.

I have used Conv2D layers along with Transposed convolutions as this sometimes improves the results by not giving the Mosaic like output of transposed convs. (It is more of a personal choice and I have not checked the outputs in this case)

Instead of dilation = 2, dilation and stride of 1 worked better for me in TransposedConvolutions

In [None]:
def get_upsample_layers(x, filters, tx_size, block_name):
    with keras.backend.name_scope('decode_'+block_name):
        x = tf.keras.layers.Conv2DTranspose(filters, tx_size, strides=(1, 1), padding='valid',dilation_rate=(1, 1))(x)
        x = tf.keras.layers.BatchNormalization()(x)
        x = tf.keras.layers.Dropout(0.1)(x)
        x = tf.keras.layers.UpSampling2D((2,2))(x)
        x = tf.keras.layers.Conv2D(filters, (3,3), activation='relu')(x)
        # x = tf.keras.layers.LeakyReLU(alpha=0.3)(x)
    return x

def make_model():
    base_model = tf.keras.applications.EfficientNetB5(
        input_shape=(*IMAGE_SIZE, 3), include_top=False, weights='imagenet'
    )

    base_model.trainable = True
    for layer in base_model.layers:
        layer.trainable = True
        layer._trainable = True

    inputs = tf.keras.layers.Input([*IMAGE_SIZE, 3])
    # x = tf.keras.applications.densenet.preprocess_input(inputs)
    x = base_model(inputs)
    x = tf.keras.layers.Conv2D(512, (1,1))(x)
    x = tf.keras.layers.BatchNormalization()(x)
    x = tf.keras.layers.Conv2D(256, (1,1))(x)
    x = get_upsample_layers(x, 256, (2,2), '1')
    x = get_upsample_layers(x, 256, (2,2), '2')
    x = get_upsample_layers(x, 128, (2,2), '3')
    x = get_upsample_layers(x, 128, (2,2), '4')
    x = tf.keras.layers.MaxPool2D((2,2))(x)
    x = tf.keras.layers.Conv2D(64, (3,3))(x)
    x = get_upsample_layers(x, 32, (4,4), '5')
    x = get_upsample_layers(x, 32, (4,4), '6')
    x = tf.keras.layers.Conv2D(16, (3,3), activation='relu')(x)
    x = tf.keras.layers.Conv2D(3, (3,3), activation='sigmoid')(x)
    model = tf.keras.Model(inputs=inputs, outputs=x)      


    return model

In [None]:
model = make_model()

In [None]:
model.summary()

In [None]:
initial_learning_rate = 0.0001
lr_schedule = tf.keras.optimizers.schedules.ExponentialDecay(
    initial_learning_rate, decay_steps=1000, decay_rate=0.95, staircase=True
)
model.compile(
    optimizer=tf.keras.optimizers.Adam(learning_rate=lr_schedule),
    loss=keras.losses.MSE
)

In [None]:
callbacks = [
             tf.keras.callbacks.ModelCheckpoint('./pretrained2_cp{epoch:.2f}.h5', save_best_only=True, monitor='loss'),
             tf.keras.callbacks.EarlyStopping(patience=10),
]

In [None]:
# Check if model architecture works on properly as per inputs and outputs provided
model.fit(np.zeros((1, 512, 512, 3)), np.zeros((1, 512, 512, 3)), verbose=1)

## Fit the model

In [None]:
tf.keras.backend.clear_session()

In [None]:
model.fit(train_gen, verbose=1, epochs=1, callbacks=callbacks, validation_data=val_gen)

In [None]:
# model.save('pretrained3.h5')

## Check results

In [None]:
import matplotlib.pyplot as plt
x = next(iter(train_gen))
plt.imshow(x[0][0])
plt.show()
plt.imshow(x[0][1])
plt.show()
plt.imshow(x[0][2])
plt.show()

In [None]:
# x = next(iter(train_gen))
# model.fit(x[0],x[0], epochs=10, verbose=0)
xx = model.predict(x[0])
plt.imshow(xx[0])
plt.show()
plt.imshow(xx[1])
plt.show()
plt.imshow(xx[2])
plt.show()

After one epoch, the results seem like they will improve (they did for me), and we can add training data images for pretraining as well.

## Separate the encoder

In [None]:
model_encoder = keras.models.Model(model.layers[1].input, model.layers[1].outputs) # separate the base model

# Thank you

### Using autoencoders, we might be able to learn general shape of plants but when it comes to diseases, it comes down to smaller patches on leaves, which don't seem like major components for reconstructing the source image. In this case, we can use something like deep-metric learning where we can learn differentiating features among classes (Will try to publish a notebook for that soon)

If you find this kernel helpful, do upvote it and also comment any suggestions.