

![](https://web14.bernama.com/storage/photos/4a4a6e340871b6d98de1a9c458ffb2635ebcf0e42c9ed)

<h1> <center> Shopee - Price Match Guarantee </center> </h1> 
<h2> <center> Convolutional Auto Encoder for Image Retrieval </center> </h2>



# Summary : 

* [1. Introduction](#section_one)
    - [1.1 What is Shopee](#section_one_one)
    - [1.2 What we need to do ?](#section_one_two)
    - [1.3 How to evaluate the model ?](#section_one_three)
* [2. Preparing the data & EDA](#section_two)
    - [2.1 Librairies and loading the files](#section_two_one)
    - [2.2 Cleaning the data](#section_two_two)
    - [2.3 EDA](#section_two_three)
* [3. Image similarity pipeline](#section_three)
    - [3.1 Plan of Attack](#section_three_one)
    - [3.2 Convolutional Autoencoder to generate latent representation of images](#section_three_two)
    - [3.3 Autoencoder to reduce the dimensionality](#section_three_three)
    - [3.4 Ensemble clustering](#section_three_four)
    - [3.5 Visualization of results](#section_three_five)
* [4. Conclusion](#section_four)
     

<a id="section_one"></a>
# 1. Introduction 

## 1.1 What is Shopee ?

Shopee is the leading e-commerce platform in Southeast Asia and Taiwan.  Customers appreciate its easy, secure, and fast online shopping experience tailored to their region. The company also provides strong payment and logistical support along with a 'Lowest Price Guaranteed' feature on thousands of Shopee's listed products.

## 1.2 What we need to do ? 

Retail companies use a variety of methods to assure customers that their products are the cheapest. Among them is product matching, which allows a company to offer products at rates that are competitive to the same product sold by another retailer.

The purpose of this competition is to use our machine learning skills to perform a model that is able to predict which items are the same product in function of image and text item information. 

## 1.3 How to evaluate the model ?

Each item is represented by a row in a dataframe. For each item, the model predict each items corresponding to the same product. Then, the F1 Score is evaluated for each row and then averaged. 

The F1-Score : 
$ F_1 = \frac{tp}{tp + \frac{1}{2}(fp + fn)} $

$tp$ : true positive 

$fp$ : false positive

$fn$ : false negative



<a id="section_two"> </a>
# 2. Preparating the data & EDA 
<a id="section_two_one"> </a>
## 2.1 Librairies and loading the files ðŸ“š 

In [None]:
#Basics
import numpy as np
import pandas as pd
import tensorflow as tf
import matplotlib.pyplot as plt
import matplotlib.image as mpimg
import tqdm
import cv2

#Deep learning
from tensorflow.keras.models import Model
from tensorflow.keras.preprocessing.image import ImageDataGenerator
from albumentations import  OneOf, Compose, CLAHE, HueSaturationValue, RandomGamma, HorizontalFlip

#Sklearn
from sklearn.model_selection import KFold
from sklearn.cluster import DBSCAN
from sklearn.cluster import OPTICS
from sklearn.neighbors import NearestNeighbors


In [None]:
DIR_TRAIN = "../input/shopee-product-matching/train_images/"
DIR_TEST = "../input/shopee-product-matching/test_images/"

TRAIN_CSV = "../input/shopee-product-matching/train.csv"
TEST_CSV = "../input/shopee-product-matching/test.csv"
SS_CSV = "../input/shopee-product-matching/sample_submission.csv"

train = pd.read_csv(TRAIN_CSV)
test = pd.read_csv(TEST_CSV)
sample_submission = pd.read_csv(SS_CSV)
AUTO = tf.data.experimental.AUTOTUNE

<a id="section_two_two"> </a>
## 2.2 Cleaning the data

Here are the first lines of the dataset : 

In [None]:
train.head()

Informations on the columns : 
* posting_id - the ID code for the posting.
* image - the image id/md5sum.
* image_phash - a perceptual hash of the image.
* title - the product description for the posting.
* label_group - ID code for all postings that map to the same product. 

In [None]:
train = train.drop_duplicates(subset=['image']).reset_index() #drop duplicated rows
tmp = train.groupby('label_group').posting_id.agg('unique').to_dict()
train['target'] = train.label_group.map(tmp) #make a column with posting_id in the same label group
filenames = train['image']

<a id="section_two_three"></a>
## 2.3 EDA

In developement...

<a id="section_three"></a>
# 3 Image similarity pipeline

<a id="section_three_one"></a>
## 3.1 Plan of attack
       
In this notebook, i'm focus on image retrieval without using text informations (for the moment). I made a pipeline summary : 



![l](https://www.hebergeur-image.com/upload/81.2.149.83-6058a1cbf196d.PNG)

In [None]:
parameters = {"IMG_SIZE" : (256, 256),
              "EPOCHS":1,
              "BATCH_SIZE" : 32,
             "LEARNING_RATE":0.001}

These functions are used to read the image, resize them and for image augmentation : 

In [None]:
def process_path(image_input, image_output):
  # load the raw data from the file as a string
    img_input = tf.io.read_file(image_input)
    img_input = tf.image.decode_jpeg(img_input, channels=3)

    img_output = tf.io.read_file(image_output)
    img_output = tf.image.decode_jpeg(img_output, channels=3)

    return img_input, img_output

def aug_fn(image, img_size):
    data = {"image": image}
    aug_data = transforms(**data)
    #print(aug_data.shape)
    aug_img = aug_data["image"]
    aug_img = tf.cast(aug_img/255.0, tf.float32)
    aug_img = tf.image.resize(aug_img, size=[img_size, img_size])
    return aug_img

def process_aug(img_input, img_output):
    
    aug_img_input = tf.numpy_function(func=aug_fn, inp=[img_input, 256], Tout=[tf.float32])
    aug_img_input = tf.reshape(aug_img_input, [256, 256, 3])
    aug_img_output = tf.numpy_function(func=aug_fn, inp=[img_output, 256], Tout=[tf.float32])
    aug_img_output = tf.reshape(aug_img_output, [256, 256, 3])
    return aug_img_input, aug_img_output

def set_shapes(img_input, img_output):
    img_input.set_shape((256, 256, 3))
    img_output.set_shape((256, 256, 3))
    return img_input, img_output

def read_image(image_input, image_output):
    
    image_input = tf.io.read_file(image_input)
    #image_input = transforms(image = image_input)
    image_input = tf.io.decode_jpeg(image_input, channels = 3)
    image_input = tf.image.resize(image_input, [256, 256])
    image_input = tf.cast(image_input, tf.float32)/255.0
    
    image_output = tf.io.read_file(image_output)
    #image_output = transforms(image = image_output)
    image_output = tf.io.decode_jpeg(image_output, channels = 3)
    image_output = tf.image.resize(image_output, [256, 256])
    image_output = tf.cast(image_output, tf.float32)/255.0
    
    return image_input, image_output
    

In [None]:
#transforms = OneOf([CLAHE(clip_limit=2), IAASharpen(), IAAEmboss(), RandomBrightnessContrast()], p=0.3)
def get_train_dataset(batch_size = parameters['BATCH_SIZE'], with_augmentation = False):
    if with_augmentation :
        train_dataset = (tf.data.Dataset.from_tensor_slices((DIR_TRAIN + filenames, 
                                                             DIR_TRAIN + filenames))
                            .map(process_path, num_parallel_calls=AUTO)
                            .map(process_aug, num_parallel_calls=AUTO)
                            .map(set_shapes, num_parallel_calls=AUTO)
                            .batch(batch_size)
                            .prefetch(AUTO)
                        )
        return train_dataset
    else : 
        train_dataset = (tf.data.Dataset.from_tensor_slices((DIR_TRAIN + filenames, 
                                                         DIR_TRAIN + filenames))
                     .map(read_image, num_parallel_calls=tf.data.AUTOTUNE)
                     .batch(parameters['BATCH_SIZE'])
                     .prefetch(AUTO)
                    )
    return train_dataset

train_dataset = get_train_dataset(parameters['BATCH_SIZE'], with_augmentation = False)

Thanks to [Andrada Olteanu](https://www.kaggle.com/andradaolteanu/shopee-text-prep-fe-image-augmentation) : 

In [None]:
def display_augmentations(path):
    '''Displays different types of augmentations on a chosen image.
    path: the direct path to the desired image.'''
    
    # Read in original image
    original = cv2.imread(path)
    original = cv2.cvtColor(original, cv2.COLOR_BGR2RGB)

    # Transformations
    transform_clahe = Compose([CLAHE(clip_limit=2)])
    transform_huesv = Compose([HueSaturationValue(hue_shift_limit=100)])
    transform_hf = Compose([HorizontalFlip()])
    transform_rg = Compose([RandomGamma(gamma_limit=(200, 400))])
    

    # Apply transformations
    transform_clahe = transform_clahe(image=original)["image"]
    transform_huesv = transform_huesv(image=original)["image"]
    transform_hf = transform_hf(image=original)["image"]
    transform_rg = transform_rg(image=original)["image"]

    all_transformations = [original, transform_clahe, transform_huesv, transform_hf, transform_rg]
    all_names = ["Original", "CLAHE", "HueSaturationValue", "HorizontalFlip", "RandomGamma"]
    
    # Plot
    fig = plt.figure(figsize=(10, 7))
    plt.suptitle(f"Image Augmentations", fontsize=20)
    for k, image in enumerate(all_transformations):
        fig.add_subplot(1, 5, k+1)
        plt.title(all_names[k])
        plt.imshow(image)
        plt.axis("off")

    plt.show();

In [None]:
display_augmentations("../input/shopee-product-matching/test_images/0006c8e5462ae52167402bac1c2e916e.jpg")

<a id="section_three_two"> </a>
## 3.2 Convolutional Autoencoder to generate latent representation of images

The convolutional autoencoder  allows us to represent the images as a vector with 1024 components. 
The image having a dimension of 256 * 256 * 3 = 196608 components is thus encoded into a vector with 1024 components.

In [None]:
class CAE(Model):
  def __init__(self):
    super(CAE, self).__init__()
    self.encoder = tf.keras.Sequential([
        tf.keras.layers.Conv2D(filters = 32, kernel_size = (3, 3), padding = "same",
                               activation = "relu", input_shape = (256, 256, 3)),
        tf.keras.layers.AveragePooling2D(pool_size = (2, 2), padding = "same"),
        tf.keras.layers.Conv2D(filters = 16, kernel_size = (3, 3), padding = "same", 
                               activation = "relu"),
        tf.keras.layers.AveragePooling2D(pool_size = (2, 2), padding = "same"),
        tf.keras.layers.Conv2D(filters = 16, kernel_size = (3, 3), padding = "same", 
                               activation = "relu"),
        tf.keras.layers.AveragePooling2D(pool_size = (2, 2), padding = "same"),
        tf.keras.layers.Conv2D(filters = 8, kernel_size = (3, 3), padding = "same", 
                               activation = "relu"),
        tf.keras.layers.AveragePooling2D(pool_size = (2, 2), padding = "same"),

        tf.keras.layers.Flatten(),
        #tf.keras.layers.Dense(64, activation = 'relu')
    ])

    self.decoder = tf.keras.Sequential([
        #tf.keras.layers.Dense(8192, activation = 'relu'),
        tf.keras.layers.Reshape(target_shape = (16, 16, 8)),
        tf.keras.layers.Conv2D(filters = 8, kernel_size = (3, 3), padding = "same", 
                               activation = "relu"),
        tf.keras.layers.UpSampling2D(size = (2, 2)),
        tf.keras.layers.Conv2D(filters = 16, kernel_size = (3, 3), padding = "same", 
                               activation = "relu"),
        tf.keras.layers.UpSampling2D(size = (2, 2)),
        tf.keras.layers.Conv2D(filters = 16, kernel_size = (3, 3), padding = "same", 
                               activation = "relu"),
        tf.keras.layers.UpSampling2D(size = (2, 2)),
        tf.keras.layers.Conv2D(filters = 32, kernel_size = (3, 3), padding = "same",
                               activation = "relu"),
        tf.keras.layers.UpSampling2D(size = (2, 2)),
        tf.keras.layers.Conv2D(filters = 3, kernel_size = (3, 3), padding = "same", 
                               activation = "sigmoid")
        ])
    
  def call(self, x):
    encoded = self.encoder(x)
    decoded = self.decoder(encoded)
    return decoded


In [None]:
cae_model = CAE()
opt = tf.keras.optimizers.Adam(learning_rate=parameters['LEARNING_RATE'])
cae_model.compile(optimizer=opt, loss='binary_crossentropy')

cae_model.fit(train_dataset, 
                epochs=parameters['EPOCHS'],
                batch_size=parameters['BATCH_SIZE'],
                shuffle=True,
              
              #validation_data=val_dataset
             )

In [None]:
encoding = cae_model.encoder.predict(train_dataset) # Generate the latent representation for each images

In [None]:
train_dataset_encoding = (tf.data.Dataset.from_tensor_slices((encoding, encoding))
                        .batch(parameters['BATCH_SIZE'])
                        .prefetch(AUTO)
                    ) #make it a tensorflow dataset

<a id="section_three_three"> </a>
## 3.3 Autoencoder to reduce the dimensionality

Now that we have our 1024 components vector of encoded images, these vectors are again encoded into a auto-encoder neural network to reduce the dimensionality. 

Why do not preserve a vector with 1024 component for our next step of clustering ?

Only to reduce the costs of computing, storing and acquiring data.

Why not add layers to the initial convolutional auto-encoder for both get a latent representation of images and reduce dimensionality ?

Because, if I add convolutional layers to reduce dimensionality, I would have to use also pooling layers that makes the network lose informations in the encoding part. 

So, why not use Dense layers ? When I am decoding the vecteur representation using a CAE with dense layers to reduce dimensionality, the loss is higher. 

So I preferred to use twice autoencoder to get a vecteur represention of image and an other auto encoder to reduce dimensionality (perform better than PCA or t-SNE).

In [None]:
class AutoEncoder(Model):
  def __init__(self):
    super(AutoEncoder, self).__init__()
    self.encoder = tf.keras.Sequential([
        tf.keras.layers.Dense(2048),
        tf.keras.layers.LeakyReLU(alpha=0.3),
        tf.keras.layers.Dense(64),
        tf.keras.layers.LeakyReLU(alpha=0.3),

        
    ])
    self.decoder = tf.keras.Sequential([
        tf.keras.layers.Dense(512),
        tf.keras.layers.LeakyReLU(alpha=0.3),
        tf.keras.layers.Dense(2048),
        tf.keras.layers.LeakyReLU(alpha=0.3),
    ])

  def call(self, x):
    encoded = self.encoder(x)
    decoded = self.decoder(encoded)
    return decoded

In [None]:
def scheduler(epoch, lr):
  if epoch < 10:
    return lr
  else:
    return lr * tf.math.exp(-0.1)


autoencoder = AutoEncoder()
learningrate_scheduler = tf.keras.callbacks.LearningRateScheduler(scheduler)
opt = tf.keras.optimizers.Adam(learning_rate=0.001)
autoencoder.compile(optimizer=opt, loss=tf.keras.losses.MeanSquaredError())

autoencoder.fit(train_dataset_encoding, 
                epochs=30,
                batch_size=32,
                shuffle=True,
              callbacks = [learningrate_scheduler],
              #validation_data=val_dataset
             )

In [None]:
encoded_images = autoencoder.encoder.predict(train_dataset_encoding)

<a id="section_three_four"></a>
## 3.4 Ensemble clustering

In [None]:
def match_self_dbscan(row):
    if row['posting_id'] not in row['matches_dbscan']:
        return [row['posting_id']] + row['matches_dbscan']
    else:
        return row['matches_dbscan']
def match_self_nn(row):
    if row['posting_id'] not in row['matches_nn']:
        return [row['posting_id']] + row['matches_nn']
    else:
        return row['matches_nn']
def match_self_optics(row):
    if row['posting_id'] not in row['matches_optics']:
        return [row['posting_id']] + row['matches_optics']
    else:
        return row['matches_optics']
def getMetric(col):
    def f1score(row):
        n = len(np.intersect1d(row.target, row[col]))
        return 2 * n / (len(row.target) + len(row[col]))
    return f1score
def match_self_ensembling(row):
    max_len_matches = max([len(row['matches_dbscan']), len(row['matches_nn']), 
                           len(row['matches_optics'])])
    if max_len_matches == len(row['matches_dbscan']):
        return row['matches_dbscan']
    elif max_len_matches == len(row['matches_nn']):
        return row['matches_nn']
    elif max_len_matches == len(row['matches_optics']):
        return row['matches_optics']

Thanks to [Robert Tacbad](https://www.kaggle.com/jollibobert) for some piece of code.

In [None]:
eps_range = np.arange(0.001, 0.009, 0.001)
last_f1_dbscan = 1
last_f1_optics = 1

for eps in eps_range:
    #dbscan
    clustering_DBSCAN = DBSCAN(eps=eps, min_samples=2, metric='cosine').fit(encoded_images)
    labels = clustering_DBSCAN.labels_
    
    train['clusters_dbscan'] = labels
    clustered = (train['clusters_dbscan'] != -1)
    tmp = train.loc[clustered].groupby('clusters_dbscan')['posting_id'].agg('unique').to_dict()
    tmp[-1] = []
    for key in tmp:
        if len(tmp[key]) > 50:
            tmp[key] = tmp[key][:50]  
    train['matches_dbscan'] = train['clusters_dbscan'].map(tmp)
    train['matches_dbscan'] = train.apply(match_self_dbscan, axis=1)
    if train.apply(getMetric('matches_dbscan'), axis=1).mean() < last_f1_dbscan:
        train['f1_dbscan'] = train.apply(getMetric('matches_dbscan'), axis=1)
        last_f1_dbscan = train['f1_dbscan'].mean()
    print("DBSCAN : " + "eps : " + str(eps) + " f1 : " + str(train['f1_dbscan'].mean()))
    
    #optics
    clustering_OPTICS = OPTICS(min_samples=2, metric='cosine', max_eps = eps).fit(encoded_images)
    labels = clustering_OPTICS.labels_
    train['clusters_optics'] = labels
    clustered = (train['clusters_optics'] != -1)
    tmp = train.loc[clustered].groupby('clusters_optics')['posting_id'].agg('unique').to_dict()
    tmp[-1] = []
    for key in tmp:
        if len(tmp[key]) > 50:
            tmp[key] = tmp[key][:50]  
    train['matches_optics'] = train['clusters_optics'].map(tmp)
    train['matches_optics'] = train.apply(match_self_optics, axis=1)
    if train.apply(getMetric('matches_optics'), axis=1).mean() < last_f1_optics:
        train['f1_optics'] = train.apply(getMetric('matches_optics'), axis=1)
        last_f1_optics = train['f1_optics'].mean()
    print("OPTICS : " + "eps : " + str(eps) + " f1 : " + str(train['f1_optics'].mean()))

In [None]:
eps_range = np.arange(0.001, 0.009, 0.001)
last_f1_nn = 1
for distance in eps_range:
    preds = []
    model = NearestNeighbors(n_neighbors=50, metric = 'cosine')
    model.fit(encoded_images)
    distances, indices = model.kneighbors(encoded_images)
    for k in range(len(encoded_images)):
        IDX = np.where(distances[k,]<distance)[0]
        IDS = indices[k,IDX]
        o = train.iloc[IDS].posting_id.values
        preds.append(o)
    train['matches_nn'] = preds
    train['matches_nn'] = train.apply(match_self_nn, axis=1)
    if train.apply(getMetric('matches_nn'), axis=1).mean() < last_f1_nn:
        train['f1_nn'] = train.apply(getMetric('matches_nn'), axis=1)
        last_f1_nn = train['f1_nn'].mean()
    print("eps : " + str(distance) + " f1 : " + str(train['f1_nn'].mean()))

In [None]:
train['matches_ensemble'] = train.apply(match_self_ensembling, axis = 1)
train['f1_ensembling'] = train.apply(getMetric('matches_ensemble'), axis=1)
last_f1_ensembling = train['f1_ensembling'].mean()
print("F1 TOTAL: " + str(train['f1_ensembling'].mean()))

<a id="section_three_five"></a>
## 3.5 Visualization of results

In [None]:
train['count_images_predicted'] = train.apply(lambda row : len(row['matches_ensemble']), axis = 1)
train[train['count_images_predicted'] > 1].head(3)

In [None]:
def query_image(filename_img):
    row_image_query = train.query("image == @filename_img")
    similar_images = row_image_query['matches_ensemble'].iloc[0]

    fig = plt.figure(figsize=(15, 15))

    ax = fig.add_subplot(1, 4, 1)
    img = mpimg.imread(DIR_TRAIN + filename_img)
    imgplot = plt.imshow(img)
    ax.set_title('Query image')
    
    ax = fig.add_subplot(1, 4, 2)
    img = mpimg.imread(DIR_TRAIN + train.query("posting_id == @similar_images[0]")['image'].iloc[0])
    imgplot = plt.imshow(img)
    ax.set_title('Similar image 1')
    
    if len(similar_images) > 1:
        ax = fig.add_subplot(1, 4, 3)
        img = mpimg.imread(DIR_TRAIN + train.query("posting_id == @similar_images[1]")['image'].iloc[0])
        imgplot = plt.imshow(img)
        ax.set_title('Similar image 2')
        
    if len(similar_images) > 2:
        ax = fig.add_subplot(1, 4, 4)
        img = mpimg.imread(DIR_TRAIN + train.query("posting_id == @similar_images[2]")['image'].iloc[0])
        imgplot = plt.imshow(img)
        ax.set_title('Similar image 3')

filename_img = "002f978c58a44a00aadfca71c3cad2bb.jpg"
query_image(filename_img)

In [None]:
filename_img = "004076b57135e761ab8b41d84acc4c94.jpg"
query_image(filename_img)

In [None]:
filename_img = "0083bce179f59cdb2234bb2e621bf4b9.jpg"
query_image(filename_img)

<a id="section_four"></a>
# 4. Conclusion

Convolutional Auto encoder are a good way to retrieve similar images. But I got a F1 Score of almost 0.56 which I think is not enough to get a robust model. A further improvement would be to use a CNN instead of using a Convolutional auto-encoder.

