 # Strategy hi im changing things!

## Data augmentation
- mixup? It's a library that changes images' backgrounds, <span style="color:blue">this sounds cool, and useful<span>
- flip horizontally? <span style="color:blue">does this mean upside down? there was an argument against flipping left to right, since each side of the fin is unique and could cause confusion as the left side could be mistaken for the right side.<span>
- could borrow other users' datasets to have more datapoints, <span style="color:blue">good idea, especially the cropped ones.<span>

## Architecture
- Ensemble? use multiple models and average/vote for a result <span style="color:blue">this sounds like a good idea<span>

### Backbone/Feature extraction
- ResNet? with Triplet loss <span style="color:blue">I like this, though I never done it before<span>
- ConvNeXT? <span style="color:blue">unfamiliar<span>
    
### Head/Classification
- Support Vector Classification (SVC)?
- Bayesian Ridge?
- Gradient Boosting Machine?
- Random Forests?
   

## Validation
- KFolds to cross-validate
- Use Grad-CAM to see what could be improved.

# Ideas prioritization:
- ResNet101 with tf.keras.layers.Embedding, clustering embeddings with sklearn.cluster.AffinityPropagation.
- Data augmentation with ImageDataGenerator (tuning its parameters when we have an idea), maybe using save_to_dir to skip the processing in subsequent fetchs.
    - We need to stop the IDG from stretching the images
    - We should augment the classes that are in just one or two photos. How? IMG.fit with 100 rows and augment=True yielded 16k images (!)
- Use pyimagesearch.gradcam to check if the model is looking at the correct parts of the images.
- Use other computer vision pretrained models in an ensemble with tf.keras.layers.Average() or a simple voting (I have a snippet).

In [1]:
import pandas as pd
import tensorflow as tf
import numpy as np
from tensorflow.keras.preprocessing import image
from tensorflow.keras.preprocessing.image import ImageDataGenerator
from tensorflow.keras.applications.imagenet_utils import preprocess_input
from sklearn.model_selection import StratifiedKFold, KFold
from tqdm.autonotebook import tqdm
from sklearn.preprocessing import LabelEncoder, OneHotEncoder
import os, gc
import matplotlib.pyplot as plt
import shutil
import cv2
# !pip install -q -U tensorflow-addons
# import tensorflow_addons as tfa
from tensorflow.keras.callbacks import ModelCheckpoint
from sklearn.manifold import TSNE
from sklearn.neighbors import NearestNeighbors
from matplotlib import cm
%matplotlib inline


  


# Data preparation
### File names retrieval from labeled CSV

In [2]:
DATA_SUBSET = 300

In [3]:
train_df = pd.read_csv("../input/happy-whale-and-dolphin/train.csv")
train_df = train_df#[:DATA_SUBSET]

# To get only top 10 most photographed whales
# famous_whale_ids = train_df["individual_id"].value_counts()[:10].index.tolist()
# train_df = train_df[train_df['individual_id'].isin(famous_whale_ids)]
# train_df["individual_id"].value_counts()

### Encoding

In [4]:
def prepare_labels(y):
    values = np.array(y)
    label_encoder = LabelEncoder()
    integer_encoded = label_encoder.fit_transform(values)
    onehot_encoder = OneHotEncoder(sparse=False)
    integer_encoded = integer_encoded.reshape(len(integer_encoded), 1)
    onehot_encoded = onehot_encoder.fit_transform(integer_encoded)
    y = onehot_encoded
    return y, label_encoder, onehot_encoder
y, label_encoder, onehot_encoder = prepare_labels(train_df['individual_id'])

### Formatting

In [5]:
def crop_resize_and_save(df, target_size, subset: ['train', 'test']):
    save_dir = f'/kaggle/working/{subset}_resized/'
    os.makedirs(save_dir, exist_ok=True)
    for filename in tqdm(df['image']):
        img = image.load_img(f"../input/happy-whale-and-dolphin/{subset}_images/"+filename)
        x = image.img_to_array(img)
        scale = max(np.array(target_size)/np.array(x.shape[:-1]))
        x = cv2.resize(x, None, fx = scale, fy = scale, interpolation=cv2.INTER_CUBIC)
        crops = (np.array(x.shape[:-1])-np.array(target_size))//2
        crops = crops.astype(int)
        x = x[crops[0]:target_size[0]+crops[0], crops[1]:target_size[1]+crops[1], :]
        image.save_img(save_dir+filename, x)
    return f'/kaggle/working/{subset}_resized/'

target_size=(300, 300) # what efficientNet expects
                        # EfficientNetB0	224
                        # EfficientNetB1	240
                        # EfficientNetB2	260
                        # EfficientNetB3	300
                        # EfficientNetB4	380
                        # EfficientNetB5	456
                        # EfficientNetB6	528
                        # EfficientNetB7	600
train_image_dir = "../input/happy-whale-and-dolphin/train_images/"
# train_image_dir = crop_resize_and_save(train_df, target_size, 'train')

### Data augmentation

In [6]:
X_gen = ImageDataGenerator(rescale=1./255, validation_split=0.1, rotation_range=10,
    width_shift_range=0.2,
    height_shift_range=0.2,
    shear_range= 0.2,
    zoom_range = 0.3,
)
flow_options=dict(
        x_col='image',
        y_col='individual_id',
        directory=train_image_dir,
        target_size=target_size,
        batch_size=32,
        class_mode='categorical',
        classes=list(label_encoder.classes_),
        augment=True,
)
def arcface_gen(gen, y=True):
    while True:
        x_batch = next(gen)
        if type(x_batch) is tuple:
            x_batch, y_batch = x_batch
            y_batch_0 = y_batch
            if not y:
                y_batch_0 = np.zeros((len(x_batch), len(label_encoder.classes_)))
            yield [x_batch, y_batch_0], y_batch
        else:
            y_batch = np.zeros((len(x_batch), len(label_encoder.classes_)))
            yield [x_batch, y_batch], y_batch

# X_train = X_gen.flow_from_dataframe(train_df, **flow_options, subset='training')
# X_validation = X_gen.flow_from_dataframe(train_df, **flow_options, subset='validation')

### Data Visualization

In [7]:
def show_items(gen):
    for e, i in zip(arcface_gen(gen), range(10)):
        print(e[0][0].shape, e[0][1], e[1])
# show_items(X_train)
# show_items(X_validation)

In [8]:
#for plotting augmented images
def plotImages(train_data_gen):
    #plotting augmentations of same image
    images_arr = [train_data_gen[1][0][0] for i in range(5)]
    fig, axes = plt.subplots(len(images_arr), 1, figsize=(5,len(images_arr) * 3))
    for img, ax in zip( images_arr, axes):
          ax.imshow(img)
          ax.axis('off')
   
    plt.show()

# plotImages(X_train)

In [9]:
def preview_augmented_data(image_data_generator):
    X_preview = image_data_generator.flow_from_dataframe(
        train_df[:32],
        **flow_options,
        save_to_dir='/kaggle/working/aug_data_preview')

    shutil.rmtree(X_preview.save_to_dir, ignore_errors=True)
    os.makedirs(X_preview.save_to_dir)
        
    fig, axes = plt.subplots(4, 8, figsize=(30, 15))
    axes = axes.flatten()
    image_batch = next(X_preview)[0]
    for i, (ax, ai) in enumerate(zip(axes, image_batch)):
        ax.imshow(ai)

# preview_augmented_data(X_gen)

### Model definition

In [10]:
# src: https://github.com/4uiiurz1/keras-arcface/blob/master/metrics.py

from keras import backend as K
from keras.layers import Layer
from keras import regularizers

import tensorflow as tf


class ArcFace(Layer):
    def __init__(self, n_classes=10, s=30.0, m=0.50, regularizer=None, **kwargs):
        super(ArcFace, self).__init__(**kwargs)
        self.n_classes = n_classes
        self.s = s
        self.m = m
        self.regularizer = regularizers.get(regularizer)

    def build(self, input_shape):
        super(ArcFace, self).build(input_shape[0])
        self.W = self.add_weight(name='W',
                                shape=(input_shape[0][-1], self.n_classes),
                                initializer='glorot_uniform',
                                trainable=True,
                                regularizer=self.regularizer)

    def call(self, inputs):
        x, y = inputs
        c = K.shape(x)[-1]
        # normalize feature
        x = tf.nn.l2_normalize(x, axis=1)
        # normalize weights
        W = tf.nn.l2_normalize(self.W, axis=0)
        # dot product
        logits = x @ W
        # add margin
        # clip logits to prevent zero division when backward
        theta = tf.acos(K.clip(logits, -1.0 + K.epsilon(), 1.0 - K.epsilon()))
        target_logits = tf.cos(theta + self.m)
        # sin = tf.sqrt(1 - logits**2)
        # cos_m = tf.cos(logits)
        # sin_m = tf.sin(logits)
        # target_logits = logits * cos_m - sin * sin_m
        #
        logits = logits * (1 - y) + target_logits * y
        # feature re-scale
        logits *= self.s
        out = tf.nn.softmax(logits)

        return out

    def compute_output_shape(self, input_shape):
        return (None, self.n_classes)
    
    def get_config(self):
        config = super().get_config().copy()
        config.update({
            'n_classes': self.n_classes,
            's': self.s,
            'm': self.m,
            'regularizer': self.regularizer,
        })
        return config

In [11]:
#I had to run this code twice to avoid errors
def get_custom_model():
    inputs = tf.keras.layers.Input((256, 256, 3))
    outputs = tf.keras.layers.Conv2D(64, kernel_size=(3, 3))(inputs)
    outputs = tf.keras.layers.MaxPooling2D()(outputs)
    outputs = tf.keras.layers.Conv2D(32, kernel_size=(3, 3))(outputs)
    outputs = tf.keras.layers.MaxPooling2D()(outputs)
    outputs = tf.keras.layers.Flatten()(outputs)
    outputs = tf.keras.layers.Dense(80)(outputs)
    outputs = tf.keras.layers.Dense(30)(outputs)
    outputs = tf.keras.layers.Dense(len(label_encoder.classes_))(outputs)
    model = tf.keras.Model(inputs=inputs, outputs=outputs)
    model.compile(optimizer='adam', loss='mse', metrics=['accuracy'])
    model.summary()
    return model

def get_arceff_model(show_summary=False):
    from keras.backend import cast
    n_classes = len(label_encoder.classes_)
    
    inputs = tf.keras.layers.Input(shape=target_size+(3,), name="input_image")
    labels = tf.keras.layers.Input(shape=(n_classes,), name="input_label")
    
    base_model = tf.keras.applications.EfficientNetB3(include_top=False)
#     base_model = tf.keras.applications.resnet50.ResNet50(include_top=False)
    base_model.trainable=False
    outputs = base_model(inputs)
    outputs = tf.keras.layers.BatchNormalization(name='bn')(outputs)
    outputs = tf.keras.layers.Flatten()(outputs)
    outputs = tf.keras.layers.Dense(512, kernel_initializer='he_normal', name='embeddings')(outputs)
    outputs = tf.keras.layers.Dropout(0.1)(outputs)

#     outputs = tf.keras.layers.GlobalMaxPooling2D(name="gap")(outputs)
#     outputs = tf.keras.layers.LeakyReLU(400)(outputs)
#     outputs = tf.keras.layers.Lambda(lambda x: tf.math.l2_normalize(x, axis=1))(outputs) # L2 normalize embeddings
    outputs = ArcFace(n_classes, name='arc')([outputs, labels])
    
    model = tf.keras.Model([inputs, labels], outputs)

    model.compile(loss='categorical_crossentropy',
                  optimizer=tf.keras.optimizers.Adam(learning_rate=0.05),
                  metrics=['accuracy'])
    if show_summary:
        model.summary(line_length=150)
    return model

### Training

In [12]:
input_model_path = "../input/modelhdf5/model_weights.hdf5"
checkpoint_filepath = '/kaggle/working/model_weights.hdf5'
best_weights_filepath = '/kaggle/working/best_model_weights.hdf5'

# os.remove(best_weights_filepath)
if os.path.exists(best_weights_filepath):
    model = get_arceff_model(show_summary=True)
    model.load_weights(best_weights_filepath)
    print('weights loaded')
else:
    min_val_loss = 99999

    fold_current, fold_max = 0, 5
    kfold = KFold(n_splits=fold_max)
    for train, test in kfold.split(train_df):
        fold_current += 1
        print(f'\nFold {fold_current} started.')

        model = get_arceff_model(show_summary=fold_current==1)
        
        X_train = X_gen.flow_from_dataframe(train_df.iloc[train], **flow_options)#, subset='training')
        X_validation = X_gen.flow_from_dataframe(train_df.iloc[test], **flow_options)#, subset='validation')
        
        history = model.fit(arcface_gen(X_train),
            batch_size = X_train.batch_size, #havent tried other batch sizes
            steps_per_epoch = len(train)//X_train.batch_size,

            validation_data = arcface_gen(X_validation),
            validation_batch_size = X_validation.batch_size,
            validation_steps = len(test)//X_validation.batch_size,

            epochs=6, #something weird was happening with the loss and accuracy values after 10 epochs

            verbose=1,
            callbacks=[ModelCheckpoint(checkpoint_filepath, verbose=1, save_best_only=True, save_weights_only=True)],
        )
        
        fold_min_val_loss = min(history.history['val_loss'])
        if fold_min_val_loss < min_val_loss:
            os.rename(checkpoint_filepath, best_weights_filepath)
            min_val_loss = fold_min_val_loss
            print(f'New best model: val_loss {min_val_loss}')
        print('Fold finished ︵‿︵‿︵‿︵‿︵‿︵‿︵‿︵‿︵‿︵‿')
        break
        
    print('All done.')


Fold 1 started.


2022-04-18 08:45:31.174847: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-04-18 08:45:31.274447: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-04-18 08:45:31.275199: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-04-18 08:45:31.276475: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 AVX512F FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compil

Downloading data from https://storage.googleapis.com/keras-applications/efficientnetb3_notop.h5
Model: "model"
______________________________________________________________________________________________________________________________________________________
Layer (type)                                     Output Shape                     Param #           Connected to                                      
input_image (InputLayer)                         [(None, 300, 300, 3)]            0                                                                   
______________________________________________________________________________________________________________________________________________________
efficientnetb3 (Functional)                      (None, None, None, 1536)         10783535          input_image[0][0]                                 
______________________________________________________________________________________________________________________________________

2022-04-18 08:46:52.410257: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:185] None of the MLIR Optimization Passes are enabled (registered 2)


Epoch 1/6


2022-04-18 08:47:01.111831: I tensorflow/stream_executor/cuda/cuda_dnn.cc:369] Loaded cuDNN version 8005



Epoch 00001: val_loss improved from inf to 8.19972, saving model to /kaggle/working/model_weights.hdf5
Epoch 2/6

Epoch 00002: val_loss improved from 8.19972 to 7.60069, saving model to /kaggle/working/model_weights.hdf5
Epoch 3/6

Epoch 00003: val_loss improved from 7.60069 to 7.49766, saving model to /kaggle/working/model_weights.hdf5
Epoch 4/6

Epoch 00004: val_loss improved from 7.49766 to 7.40423, saving model to /kaggle/working/model_weights.hdf5
Epoch 5/6

Epoch 00005: val_loss improved from 7.40423 to 7.34001, saving model to /kaggle/working/model_weights.hdf5
Epoch 6/6

Epoch 00006: val_loss improved from 7.34001 to 7.27659, saving model to /kaggle/working/model_weights.hdf5
New best model: val_loss 7.2765936851501465
Fold finished ︵‿︵‿︵‿︵‿︵‿︵‿︵‿︵‿︵‿︵‿
All done.


In [13]:
model.load_weights(best_weights_filepath)
prediction_model = tf.keras.Model(inputs=model.inputs[0], outputs=model.get_layer('embeddings').output)

### Embeddings encoding

In [14]:
prediction_flow_options = flow_options.copy()
prediction_flow_options.update(dict(shuffle = False, batch_size=1))
train_pred_gen = X_gen.flow_from_dataframe(train_df, **prediction_flow_options)

# validation_df = train_df[:100].copy()
# val_pred_gen = X_gen.flow_from_dataframe(validation_df, **prediction_flow_options)

Found 51033 validated image filenames belonging to 15587 classes.


In [15]:
# this just makes embeddings (at least the arcface model)
def predict_embeddings(model, gen):
    embedded_features = prediction_model.predict(gen, verbose=1)
    return embedded_features / np.linalg.norm(embedded_features, axis=1, keepdims=True)

nn = NearestNeighbors(n_neighbors=11)
nn.fit(predict_embeddings(prediction_model, train_pred_gen))



NearestNeighbors(n_neighbors=11)

In [16]:
# get pred from KNN
def embeddings_to_ids(embeddings):
    neighbor_too_far_new_individual_threshold = 0.00005 # the less the more new_individuals appear

    dists, inds = nn.kneighbors(X=embeddings, return_distance=True)

    # get labels of the neighbours
    pred_ids =[]
    for ind, dis in zip(inds, dists):
        neighs = pd.DataFrame({'id': train_df.iloc[ind]['individual_id'], 'dist': dis})
        neighs = neighs.append({'id': 'new_individual', 'dist': neighbor_too_far_new_individual_threshold}, ignore_index=True)
        group = neighs.groupby('id')
        neighs['dist'] = group[['dist']].transform(lambda x: sum(x)/len(x))
        ids = neighs.sort_values('dist')['id'].unique()[:5].tolist()
        pred_ids.append(ids)

    fist_dist = dists[:,0].ravel()
    mean_first_dist = np.mean(fist_dist)
    above_average_distances = sorted(fist_dist[fist_dist>mean_first_dist], reverse=True)
    print('top 15% longest distances: ', above_average_distances[:1+int(len(above_average_distances)/7)])
    print('our threshold: ', neighbor_too_far_new_individual_threshold)
    return pred_ids

# val_embeddings = predict_embeddings(prediction_model, val_pred_gen)
# pred_ids = embeddings_to_ids(val_embeddings)

### Visualizing validation set's score and embeddings

In [17]:
def map_per_image(label, predictions):   
    try:
        return 1 / (predictions[:5].index(label) + 1)
    except ValueError:
        return 0.0

def map_per_set(labels, predictions):
    return [map_per_image(l, p) for l,p in zip(labels, predictions)]

# validation_df['pred_ids'] = [' '.join(label_list) for label_list in pred_ids]
# ap5 = map_per_set(validation_df['individual_id'].tolist(), pred_ids)

pd.set_option('display.max_colwidth', 500)
pd.set_option('display.width', 1000)
# print(validation_df)
# print(ap5, np.mean(ap5))

In [18]:
def scatter3D(embeddings, pred_labels, total_labels):
    tsne = TSNE(3, verbose=1)
    tsne_proj = tsne.fit_transform(embeddings)
    cmap = cm.get_cmap('rainbow')
    fig = plt.figure(figsize=(8,8))
    ax = fig.add_subplot(projection='3d')
#     print(np.unique(pred_labels, return_counts=True))
    for i, lab in enumerate(total_labels):
        indices = np.array([total_labels.index(p) for p in pred_labels])  == i
        ax.scatter(tsne_proj[indices, 0],
                   tsne_proj[indices, 1],
                   tsne_proj[indices, 2],
                   c=np.array(cmap(i)).reshape(1, 4),
                   label=lab,
                   alpha=0.5)
    plt.show()

# scatter3D(val_embeddings, np.array(pred_ids)[:, 0], train_df['individual_id'].unique().tolist()+['new_individual'])

# Prediction
### File names retrieval from test_images folder

In [19]:
test = os.listdir("../input/happy-whale-and-dolphin/test_images")
test_df = pd.DataFrame(test, columns=['image'])#[:DATA_SUBSET]
test_df['predictions'] = ''

In [20]:
test_df.head()

Unnamed: 0,image,predictions
0,cd50701ae53ed8.jpg,
1,177269f927ed34.jpg,
2,9137934396d804.jpg,
3,c28365a55a0dfe.jpg,
4,1a40b7b382923a.jpg,


In [21]:
# The same cropping process as the train dataset. Test images need
# to undergo the same process for the model to predict on them.
test_image_dir = "../input/happy-whale-and-dolphin/test_images/"
# test_image_dir = crop_resize_and_save(test_df, target_size, 'test') 

In [22]:
target_dir = test_image_dir.rstrip('/').split('/')[-1]
parent_dir = test_image_dir.rstrip('/').rstrip(target_dir)

Y_gen = ImageDataGenerator(rescale=1./255)
Y_flow = Y_gen.flow_from_directory(
        directory=parent_dir,
        target_size=target_size,
        batch_size=1,
        shuffle=False,
        class_mode=None,
        classes=[target_dir])

Found 27956 images belonging to 1 classes.


### Embeddings encoding and label inference

In [23]:
%%time

y_embeddings = predict_embeddings(prediction_model, Y_flow)
y_pred_ids = embeddings_to_ids(y_embeddings)

top 15% longest distances:  [0.00016727880920975574, 0.00014104246452978405, 0.00013744163969038928, 0.00012032016375067623, 5.8823287403513595e-05, 5.1762769696894975e-05, 5.1644458652846275e-05, 5.120729605187965e-05, 5.045594672119291e-05, 4.971121021326796e-05, 4.8812141321342235e-05, 4.856537518595193e-05, 4.695087452278052e-05, 4.6499071430694194e-05, 4.527443704065973e-05, 4.509320713634908e-05, 4.399112434406288e-05, 4.373029136149653e-05, 4.3621644317435724e-05, 4.254937846295963e-05, 4.239262904350296e-05, 4.19356577796037e-05, 4.114570781977967e-05, 4.113814903525142e-05, 4.0984138545299256e-05, 4.086080351546733e-05, 4.051872654581746e-05, 3.982780472392032e-05, 3.933725911187893e-05, 3.932195665461842e-05, 3.811026215004178e-05, 3.784974062662396e-05, 3.7503751120489865e-05, 3.7462136187138865e-05, 3.745577679883919e-05, 3.738718057754946e-05, 3.6937387776450546e-05, 3.6836272149830494e-05, 3.674660388471653e-05, 3.669628220100308e-05, 3.645717518460055e-05, 3.640262806064

### Visualizing testing set's embeddings

In [24]:
%%time
# scatter3D(y_embeddings, np.array(y_pred_ids)[:,0], train_df['individual_id'].unique().tolist()+['new_individual'])

CPU times: user 3 µs, sys: 0 ns, total: 3 µs
Wall time: 6.44 µs


### Submission preparation

In [25]:
%%time
str_y_pred_ids = [' '.join(label_list) for label_list in y_pred_ids]

test_df.loc[:len(str_y_pred_ids)-1,'predictions'] = str_y_pred_ids
pd.set_option('display.max_colwidth', 500)
test_df.head(10)

CPU times: user 23.5 ms, sys: 1.01 ms, total: 24.5 ms
Wall time: 24.1 ms


Unnamed: 0,image,predictions
0,cd50701ae53ed8.jpg,0f72263cd384 5eb72a46aa6c 938b7e931166 b90d49ab0905 bc48b7c97463
1,177269f927ed34.jpg,9070d4778a52 f4a45e49df72 b3ddd5b1f9bc 6f021f47ab3a 4c47fe2b6931
2,9137934396d804.jpg,1e802c3294cc 51081e431bca d9da9aa05a90 aeae6f5bf5cd a52e4ad2dbb3
3,c28365a55a0dfe.jpg,2a76791b975d 4d18b20adff3 b7065da154c5 b54c1f8df53f ebb6f9c885f9
4,1a40b7b382923a.jpg,091b540fb82e e5c50e5f2e52 29d62ab98b53 cb3ab35e8dfa fcc7ade0c50a
5,0eb65d9495a8ad.jpg,ddca2dbdf9c0 2dd8974deb39 7a58308da755 ede672637a68 7a25857bbcfc
6,3cf81d69cc5911.jpg,9e89f8e28807 d36d5a07500f bc1eb2241633 51f29ea40aeb f36c618eda4a
7,bade5ab0f99289.jpg,d46efdc12fc2 3cdec3c60be9 29cb5ccedb51 a91388b5d86f 5fc809d9e819
8,036dc852e2ec94.jpg,339e32b3b3d1 be944362657a c02b389616b5 6a767e4dc395 50c307188621
9,1c5bced4d28e48.jpg,44bd3fec6ad6 da65f819073c 6818edefecc9 7845337998d6 8e5253662392


In [26]:
test_df.to_csv('submission.csv',index=False)
test_df.describe()

Unnamed: 0,image,predictions
count,27956,27956
unique,27956,27947
top,cd50701ae53ed8.jpg,new_individual 5f80a6446397 9103a95dee61 be0d0fc07a33 4b00fe572063
freq,1,2
