## Sample solution part of [iWildCam 2021 - Starter Notebook](https://www.kaggle.com/nayuts/iwildcam-2021-starter-notebook).

I haven't beaten kaggle_sample_all_zero_iwildcam_2021.csv yet, but I will publish the idea.

1. First we crop the image based on the bbox detected by MegaDetector.
2. In the training data, the correct answer labels are given as annotations, so we can use them to train the model.
3. Classify the cropped images of the test data with the trained model.
4. We choose the animal species and their counts of the image with the highest count among the images in the same image burst.

Cropping is time consuming, so I did it on [a different notebook](https://www.kaggle.com/nayuts/256-x-256-cropped-images). This notebook is also available to the public.

<img src="https://raw.githubusercontent.com/tasotasoso/kaggle_media/main/iwildcam2021/model_image.png" width="***300***">

In [1]:
import sys
sys.path.append('../input/pytorch-image-models/pytorch-image-models-master')

import collections
import gc
import json
import os
import random
import time
import warnings
warnings.simplefilter("ignore")

from albumentations import *
from albumentations.pytorch import ToTensor
import cv2
from imblearn.under_sampling import RandomUnderSampler
from matplotlib import pyplot as plt
import numpy as np
import pandas as pd
from PIL import Image, ImageFilter
from sklearn.model_selection import KFold
from sklearn.model_selection import train_test_split
import tifffile as tiff
import timm
import torch
import torch.backends.cudnn as cudnn
import torch.nn as nn
from torch.nn import functional as F
import torch.optim as optim
from torch.optim.lr_scheduler import ReduceLROnPlateau
from torch.utils.data import DataLoader, Dataset, sampler
from tqdm import tqdm_notebook as tqdm

%matplotlib inline

# Settings
- Set batch size, device, spochs, seed (for reproducability)
- store filepaths in variables

In [None]:
!ls ./data/256-x-256-cropped-images

In [68]:
DATASET = "./data/"
CROPPED_DATA = "./data/"

TRAIN_CROPPED_DATA = "./data/crop_train/"
TEST_CROPPED_DATA = "./data/crop_test/"

In [46]:
BATCH_SIZE = 32
DEVICE = ('cuda' if torch.cuda.is_available() else 'cpu')
EPOCHS = 300
NUM_WORKERS = 4
SEED = 2021

- Check if device is GPU (cuda)

In [47]:
DEVICE

'cuda'

- Set seed for reproducability

In [48]:
def set_seed(seed=2**3):
    torch.manual_seed(seed)
    torch.cuda.manual_seed(seed)
    np.random.seed(seed)
    random.seed(seed)
    torch.backends.cudnn.deterministic = True
set_seed(SEED)

- Load information about cropped train, test data into pandas dataframe

In [49]:
df_cropped_img_ids_train = pd.read_csv(CROPPED_DATA + "cropped_train.csv")
df_cropped_img_ids_test = pd.read_csv(CROPPED_DATA + "cropped_test.csv")

Train data contains
- ID: id of photo
- idx: increase with more detections in one photo
    - Below example shows that 
    1. 1 detection (374)
    2. 1 detection (374)
    3. 3 detections (97, 97, 97)
    4. 1 detection (90)
    5. 1 detection (375)
    6. 8 detections (2, 2, 2, ... 2)
    7. 1 detection (3)
    8. 1 detection (317)
    9. 2 detections (96, 96)
    10. 

In [67]:
df_cropped_img_ids_train.head()

Unnamed: 0,id,idx,category_id
0,905a3c8c-21bc-11ea-a13a-137349068a90,1,374
1,905a3c8c-21bc-11ea-a13a-137349068a90,1,374
2,905a4416-21bc-11ea-a13a-137349068a90,1,97
3,905a4416-21bc-11ea-a13a-137349068a90,2,97
4,905a4416-21bc-11ea-a13a-137349068a90,3,97


In [51]:
df_cropped_img_ids_test.head()

Unnamed: 0,id,idx
0,915879a0-21bc-11ea-a13a-137349068a90,1
1,91588116-21bc-11ea-a13a-137349068a90,1
2,9158a2f4-21bc-11ea-a13a-137349068a90,1
3,9158aaa6-21bc-11ea-a13a-137349068a90,1
4,9158f1a0-21bc-11ea-a13a-137349068a90,1


# Create train dataframe

In [52]:
with open('./data/metadata/iwildcam2021_train_annotations.json', encoding='utf-8') as json_file:
    train_annotations = json.load(json_file)
df_train_annotation = pd.DataFrame(train_annotations["annotations"])

In [53]:
train = df_cropped_img_ids_train[["id", "idx"]].merge(df_train_annotation[["image_id", "category_id"]], 
                                      left_on='id', right_on='image_id')[["id", "idx", "category_id"]]

In [54]:
df_categories = pd.DataFrame(train_annotations["categories"])

In [55]:
cat_idxs = df_categories["id"]

def convert_cat_to_index(x):
    return np.where(cat_idxs==x)[0][0]

In [56]:
train["category_id"] = train["category_id"].map(lambda x: convert_cat_to_index(x))

In [57]:
train.head()

Unnamed: 0,id,idx,category_id
0,905a3c8c-21bc-11ea-a13a-137349068a90,1,164
1,905a3c8c-21bc-11ea-a13a-137349068a90,1,164
2,905a4416-21bc-11ea-a13a-137349068a90,1,39
3,905a4416-21bc-11ea-a13a-137349068a90,2,39
4,905a4416-21bc-11ea-a13a-137349068a90,3,39


# unzip cropped data

In [58]:
#! unzip ../input/256-x-256-cropped-images/croped_images_train.zip 

In [59]:
#! unzip ../input/256-x-256-cropped-images/croped_images_test.zip

# Train

## Create dataset for training

In [60]:
# ====================================================
# Dataset for train
# ====================================================

mean = np.array([0.37087523, 0.370876, 0.3708759] )
std = np.array([0.21022698, 0.21022713, 0.21022706])

def img2tensor(img,dtype:np.dtype=np.float32):
    if img.ndim==2 : img = np.expand_dims(img,2)
    img = np.transpose(img,(2,0,1))
    return torch.from_numpy(img.astype(dtype, copy=False))

class IWildcamTrainDataset(Dataset):
    def __init__(self, df, tfms=None):
        self.ids = df["id"]
        self.idxs = df["idx"]
        self.categories = df["category_id"]
        self.tfms = tfms
        
    def __len__(self):
        return len(self.ids)
    
    def __getitem__(self, idx):
        size = (256, 256)
        image_id = self.ids[idx]
        image_idx = self.idxs[idx]
        iamge_categorie = self.categories[idx]
        
        image_path = TRAIN_CROPPED_DATA + f"{image_id}_{image_idx}.jpg"
        img = cv2.resize(cv2.cvtColor(cv2.imread(image_path), cv2.COLOR_BGR2RGB),size)

        if self.tfms is not None:
            augmented = self.tfms(image=img)
            img = augmented['image']
            
        # we should normalize here
        return img2tensor((img/255.0  - mean)/std), torch.tensor(iamge_categorie)

In [61]:
def get_aug(p=1.0):
    return Compose([
        HorizontalFlip(),
        ShiftScaleRotate(shift_limit=0.0625, scale_limit=0.2, rotate_limit=15, p=0.9, 
                         border_mode=cv2.BORDER_REFLECT),
        RandomBrightnessContrast(p=0.9),
    ], p=p)

# Create model

In [62]:
# ====================================================
# EfficientNet Model
# ====================================================

class enet_v2(nn.Module):

    def __init__(self, backbone, out_dim, pretrained=False):
        super(enet_v2, self).__init__()
        self.enet = timm.create_model(backbone, pretrained=pretrained)
        in_ch = self.enet.classifier.in_features
        self.myfc = nn.Linear(in_ch, out_dim)
        self.enet.classifier = nn.Identity()

    def forward(self, x):
        x = self.enet(x)
        x = self.myfc(x)
        return x

In [63]:
model = enet_v2(backbone="tf_efficientnet_b0", out_dim=205)
model.to(DEVICE)

enet_v2(
  (enet): EfficientNet(
    (conv_stem): Conv2dSame(3, 32, kernel_size=(3, 3), stride=(2, 2), bias=False)
    (bn1): BatchNorm2d(32, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
    (act1): SiLU(inplace=True)
    (blocks): Sequential(
      (0): Sequential(
        (0): DepthwiseSeparableConv(
          (conv_dw): Conv2d(32, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=32, bias=False)
          (bn1): BatchNorm2d(32, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
          (act1): SiLU(inplace=True)
          (se): SqueezeExcite(
            (conv_reduce): Conv2d(32, 8, kernel_size=(1, 1), stride=(1, 1))
            (act1): SiLU(inplace=True)
            (conv_expand): Conv2d(8, 32, kernel_size=(1, 1), stride=(1, 1))
          )
          (conv_pw): Conv2d(32, 16, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (bn2): BatchNorm2d(16, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
          (act2): I

## train setting

In [64]:
# ====================================================
# Optimizer and Loss
# ====================================================

optimizer = torch.optim.Adam([{'params': model.parameters(), 'lr': 1e-4}])
criterion = nn.CrossEntropyLoss()

## Train

Since we know that [the training data is imbalanced](https://www.kaggle.com/nayuts/iwildcam-2021-overviewing-for-start#EDA), I undersampled it.

In [65]:
rus = RandomUnderSampler(random_state=SEED, replacement=True)

def generate_dataloders(train):
    
    train_resampled, _ = rus.fit_resample(train, train["category_id"])
    test_resampled, _ = rus.fit_resample(train, train["category_id"])

    train_resampled = train_resampled.reset_index(drop=True)
    test_resampled = test_resampled.reset_index(drop=True)
    
    ds_train = IWildcamTrainDataset(train_resampled, tfms=get_aug())
    dl_train = DataLoader(ds_train,batch_size=BATCH_SIZE, shuffle=False, num_workers=NUM_WORKERS)
    ds_test = IWildcamTrainDataset(test_resampled)
    dl_test = DataLoader(ds_test,batch_size=BATCH_SIZE, shuffle=False, num_workers=NUM_WORKERS)
    
    return dl_train, dl_test

In [69]:
# ====================================================
# Train
# ====================================================

for epoch in tqdm(range(EPOCHS)):
    
    dl_train, dl_test = generate_dataloders(train)
    
    ###Train
    model.train()
    train_loss = 0
    
    for data in dl_train:
        optimizer.zero_grad()
        imgs, categories = data
        imgs = imgs.to(DEVICE)
        categories = categories.to(DEVICE)
        
        outputs = model(imgs)
    
        loss = criterion(outputs, categories)
        loss.backward()
        optimizer.step()
            
        train_loss += loss.item()
    train_loss /= len(dl_train)
        
    print(f"EPOCH: {epoch + 1}, train_loss: {train_loss}")
        
    ###Validation
    model.eval()
    valid_loss = 0
        
    for data in dl_test:
        imgs, categories = data
        imgs = imgs.to(DEVICE)
        categories = categories.to(DEVICE)
        
        outputs = model(imgs)
    
        loss = criterion(outputs, categories)
        
        valid_loss += loss.item()
    valid_loss /= len(dl_test)
        
    print(f"EPOCH: {epoch + 1}, valid_loss: {valid_loss}")
        
    
    if (epoch+1)%50 == 0 or (epoch+1)%EPOCHS == 0:
        ###Save model
        torch.save(model.state_dict(), f"{epoch+1}_.pth")

HBox(children=(HTML(value=''), FloatProgress(value=0.0, max=300.0), HTML(value='')))

6184476
EPOCH: 67, valid_loss: 3.27899169921875
EPOCH: 68, train_loss: 2.671871915459633
EPOCH: 68, valid_loss: 3.2799102578844344
EPOCH: 69, train_loss: 2.7005896823746816
EPOCH: 69, valid_loss: 3.2230419431413924
EPOCH: 70, train_loss: 2.6807531203542436
EPOCH: 70, valid_loss: 3.1775190830230713
EPOCH: 71, train_loss: 2.5575363082545146
EPOCH: 71, valid_loss: 3.1833245754241943
EPOCH: 72, train_loss: 2.6587932450430736
EPOCH: 72, valid_loss: 3.146918841770717
EPOCH: 73, train_loss: 2.5649352031094685
EPOCH: 73, valid_loss: 3.105034589767456
EPOCH: 74, train_loss: 2.510074883699417
EPOCH: 74, valid_loss: 3.161461114883423
EPOCH: 75, train_loss: 2.518278343336923
EPOCH: 75, valid_loss: 3.110767228262765
EPOCH: 76, train_loss: 2.4387832496847426
EPOCH: 76, valid_loss: 3.0280093465532576
EPOCH: 77, train_loss: 2.4484084908451353
EPOCH: 77, valid_loss: 2.9954092502593994
EPOCH: 78, train_loss: 2.3887475931218694
EPOCH: 78, valid_loss: 2.9972080162593295
EPOCH: 79, train_loss: 2.4039146602

# Inference

## Create dataset for test

In [71]:
# ====================================================
# Dataset for test
# ====================================================

mean = np.array([0.37087523, 0.370876, 0.3708759] )
std = np.array([0.21022698, 0.21022713, 0.21022706])

class IWildcamTestDataset(Dataset):
    def __init__(self, df, tfms=None):
        self.ids = df["id"]
        self.idx = df["idx"]
        self.tfms = tfms
        
    def __len__(self):
        return len(self.ids)
    
    def __getitem__(self, idx):
        size = (256, 256)
        image_id = self.ids[idx]
        image_idx = self.idx[idx]
        
        image_path = TEST_CROPPED_DATA + f"{image_id}_{image_idx}.jpg"
        
        img = cv2.resize(cv2.cvtColor(cv2.imread(image_path), cv2.COLOR_BGR2RGB),size)

        if self.tfms is not None:
            augmented = self.tfms(image=img)
            img = augmented['image']
            
        # we should normalize here
        return img2tensor((img/255.0 - mean)/std), image_id

In [73]:
ds_test = IWildcamTestDataset(df_cropped_img_ids_test)
dl_test = DataLoader(ds_test,batch_size=32,shuffle=False,num_workers=NUM_WORKERS)

## Load trained model

In [74]:
model = enet_v2(backbone="tf_efficientnet_b0", out_dim=205)
model.to(DEVICE)
model.load_state_dict(torch.load(f"{epoch+1}_.pth"))
model.eval()

enet_v2(
  (enet): EfficientNet(
    (conv_stem): Conv2dSame(3, 32, kernel_size=(3, 3), stride=(2, 2), bias=False)
    (bn1): BatchNorm2d(32, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
    (act1): SiLU(inplace=True)
    (blocks): Sequential(
      (0): Sequential(
        (0): DepthwiseSeparableConv(
          (conv_dw): Conv2d(32, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=32, bias=False)
          (bn1): BatchNorm2d(32, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
          (act1): SiLU(inplace=True)
          (se): SqueezeExcite(
            (conv_reduce): Conv2d(32, 8, kernel_size=(1, 1), stride=(1, 1))
            (act1): SiLU(inplace=True)
            (conv_expand): Conv2d(8, 32, kernel_size=(1, 1), stride=(1, 1))
          )
          (conv_pw): Conv2d(32, 16, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (bn2): BatchNorm2d(16, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
          (act2): I

In [75]:
pred_categories = []
pred_img_ids = []

## inference

In [76]:
with torch.no_grad():
    for imgs, img_ids in tqdm(dl_test):
        imgs = imgs.to(DEVICE)
        
        outputs = model(imgs)
        output_labels = torch.argmax(outputs, dim=1).tolist()
        pred_categories += output_labels
        pred_img_ids += img_ids

HBox(children=(HTML(value=''), FloatProgress(value=0.0, max=1008.0), HTML(value='')))




In [77]:
pred = collections.defaultdict(list)
for category, img_id in zip(pred_categories, pred_img_ids):
    pred[img_id].append(category)

In [80]:
pred

a-a13a-137349068a90': [170],
             '91d55632-21bc-11ea-a13a-137349068a90': [113, 189],
             '91d594ee-21bc-11ea-a13a-137349068a90': [69, 53],
             '91d5aefc-21bc-11ea-a13a-137349068a90': [128],
             '91d60794-21bc-11ea-a13a-137349068a90': [69],
             '91d669dc-21bc-11ea-a13a-137349068a90': [48],
             '91d69d62-21bc-11ea-a13a-137349068a90': [26],
             '91d6cd00-21bc-11ea-a13a-137349068a90': [157],
             '91d71e40-21bc-11ea-a13a-137349068a90': [150],
             '91d74438-21bc-11ea-a13a-137349068a90': [198],
             '91d77750-21bc-11ea-a13a-137349068a90': [32],
             '91d7d0e2-21bc-11ea-a13a-137349068a90': [110],
             '91d827cc-21bc-11ea-a13a-137349068a90': [76],
             '91d85260-21bc-11ea-a13a-137349068a90': [46],
             '91d872ea-21bc-11ea-a13a-137349068a90': [198],
             '91d8768c-21bc-11ea-a13a-137349068a90': [80, 30],
             '91d893ba-21bc-11ea-a13a-137349068a90': [188],
      

# Create submit file

In [81]:
sub = pd.read_csv("./data/sample_submission.csv")
col_Predicted = [col for col in sub.columns if "Predicted" in col]

In [82]:
with open('./data/metadata/iwildcam2021_train_annotations.json', encoding='utf-8') as json_file:
    train_annotations =json.load(json_file)
df_categories = pd.DataFrame.from_records(train_annotations["categories"])

For each image, count the number of each animal species and store them in the corresponding column.

In [83]:
results = []

for key in pred.keys():
    c = collections.Counter(pred[key])
    
    res = []
    cnts = [ 0 for i in range(205)]
    for category, cnt in c.items():
        cnts[category] = cnt
    res += [key] + cnts[1:]
    results.append(res)

Convert to pandas dataframe.

In [84]:
sub_tmp = pd.DataFrame(results, columns=sub.columns)

In [85]:
sub_tmp.head()

Unnamed: 0,Id,Predicted2,Predicted3,Predicted4,Predicted6,Predicted7,Predicted8,Predicted9,Predicted10,Predicted12,...,Predicted559,Predicted562,Predicted563,Predicted564,Predicted565,Predicted566,Predicted567,Predicted568,Predicted570,Predicted571
0,915879a0-21bc-11ea-a13a-137349068a90,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,91588116-21bc-11ea-a13a-137349068a90,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,9158a2f4-21bc-11ea-a13a-137349068a90,0,0,0,0,0,0,0,0,0,...,0,0,0,1,0,0,0,0,0,0
3,9158aaa6-21bc-11ea-a13a-137349068a90,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,9158f1a0-21bc-11ea-a13a-137349068a90,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [86]:
sub_tmp.to_csv("./sub_tmp.csv", index=False)

Add seq_id information to the counted results. iwildcam2021_test_information.json contains the mapping between the id of the image and the id of the sequence.

In [88]:
with open('./data/metadata/iwildcam2021_test_information.json', encoding='utf-8') as json_file:
    test_information =json.load(json_file)
    
df_test_info = pd.DataFrame(test_information["images"])[["id", "seq_id"]]
df_test_info.head()

Unnamed: 0,id,seq_id
0,8b31d3be-21bc-11ea-a13a-137349068a90,a91ebc18-0cd3-11eb-bed1-0242ac1c0002
1,8cf202be-21bc-11ea-a13a-137349068a90,a91ebc18-0cd3-11eb-bed1-0242ac1c0002
2,8a87e62e-21bc-11ea-a13a-137349068a90,a91ebc18-0cd3-11eb-bed1-0242ac1c0002
3,8e6994f4-21bc-11ea-a13a-137349068a90,a91ebc18-0cd3-11eb-bed1-0242ac1c0002
4,948b29e2-21bc-11ea-a13a-137349068a90,a91ebc18-0cd3-11eb-bed1-0242ac1c0002


Take right join on the image id.

In [89]:
sub_tmp = sub_tmp.merge(df_test_info, left_on="Id", right_on="id", how="right")

In [90]:
sub_tmp.head()

Unnamed: 0,Id,Predicted2,Predicted3,Predicted4,Predicted6,Predicted7,Predicted8,Predicted9,Predicted10,Predicted12,...,Predicted563,Predicted564,Predicted565,Predicted566,Predicted567,Predicted568,Predicted570,Predicted571,id,seq_id
0,8b31d3be-21bc-11ea-a13a-137349068a90,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,8b31d3be-21bc-11ea-a13a-137349068a90,a91ebc18-0cd3-11eb-bed1-0242ac1c0002
1,8cf202be-21bc-11ea-a13a-137349068a90,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,8cf202be-21bc-11ea-a13a-137349068a90,a91ebc18-0cd3-11eb-bed1-0242ac1c0002
2,8a87e62e-21bc-11ea-a13a-137349068a90,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,8a87e62e-21bc-11ea-a13a-137349068a90,a91ebc18-0cd3-11eb-bed1-0242ac1c0002
3,8e6994f4-21bc-11ea-a13a-137349068a90,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,8e6994f4-21bc-11ea-a13a-137349068a90,a91ebc18-0cd3-11eb-bed1-0242ac1c0002
4,948b29e2-21bc-11ea-a13a-137349068a90,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,948b29e2-21bc-11ea-a13a-137349068a90,a91ebc18-0cd3-11eb-bed1-0242ac1c0002


Since there are multiple lines for the same sequence ID. We should aggregate them to single line. In this case, we will choose the image with the highest number of animals shown and submit the animal species and the number of animals shown in that image.

In [91]:
sum_counts = []
for i in range(len(sub_tmp)):
    sum_counts.append(sum(sub_tmp.iloc[i][col_Predicted]))

In [92]:
sub_tmp["total"] =  sum_counts
sub_tmp = sub_tmp.sort_values('total', ascending=False)
sub_tmp = sub_tmp[~sub_tmp.duplicated(keep='first', subset='seq_id')].fillna("0")

In [93]:
sub_tmp

Unnamed: 0,Id,Predicted2,Predicted3,Predicted4,Predicted6,Predicted7,Predicted8,Predicted9,Predicted10,Predicted12,...,Predicted564,Predicted565,Predicted566,Predicted567,Predicted568,Predicted570,Predicted571,id,seq_id,total
56856,973b59c8-21bc-11ea-a13a-137349068a90,0,0,0,0,0,0,2,0,0,...,0,0,0,0,0,0,0,973b59c8-21bc-11ea-a13a-137349068a90,a9173a42-0cd3-11eb-bed1-0242ac1c0002,10
42319,91fb2bc8-21bc-11ea-a13a-137349068a90,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,91fb2bc8-21bc-11ea-a13a-137349068a90,a92467da-0cd3-11eb-bed1-0242ac1c0002,10
57082,98372ea6-21bc-11ea-a13a-137349068a90,0,0,0,0,0,0,0,0,0,...,0,1,0,0,0,0,0,98372ea6-21bc-11ea-a13a-137349068a90,a91738a8-0cd3-11eb-bed1-0242ac1c0002,10
15176,8bde6a20-21bc-11ea-a13a-137349068a90,0,0,0,0,0,0,1,0,0,...,0,0,0,0,0,0,0,8bde6a20-21bc-11ea-a13a-137349068a90,a9173ae2-0cd3-11eb-bed1-0242ac1c0002,10
59874,9903a97c-21bc-11ea-a13a-137349068a90,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,9903a97c-21bc-11ea-a13a-137349068a90,a917398e-0cd3-11eb-bed1-0242ac1c0002,9
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
60115,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,8c3b0424-21bc-11ea-a13a-137349068a90,9500e290-21bc-11ea-a13a-137349068a90,0
60148,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,8d2b546a-21bc-11ea-a13a-137349068a90,386634a2-6fe2-11eb-844f-0242ac1c0002,0
60158,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,902c630c-21bc-11ea-a13a-137349068a90,982a8f20-21bc-11ea-a13a-137349068a90,0
60167,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,867fbf02-21bc-11ea-a13a-137349068a90,92c82344-21bc-11ea-a13a-137349068a90,0


I'll match the result to the sample submission format. I was told that the order of the rows is not related to the score, but we will match it just in case.

In [94]:
# Since it was difficult to join the pandas series, I intentionally created an extra column.
sub = sub.reset_index()
sub = sub[["index", "Id"]].merge(sub_tmp, left_on="Id", right_on="seq_id")

In [95]:
sub = sub[["Id_x"] + col_Predicted].rename(columns={"Id_x": "Id"})
sub.to_csv("sub.csv", index=False)

In [96]:
sub.head()

Unnamed: 0,Id,Predicted2,Predicted3,Predicted4,Predicted6,Predicted7,Predicted8,Predicted9,Predicted10,Predicted12,...,Predicted559,Predicted562,Predicted563,Predicted564,Predicted565,Predicted566,Predicted567,Predicted568,Predicted570,Predicted571
0,32ce8026-7ec9-11eb-b675-4f3cc0c82eb3,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,945c6602-21bc-11ea-a13a-137349068a90,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,a91c7e26-0cd3-11eb-bed1-0242ac1c0002,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,9926239e-21bc-11ea-a13a-137349068a90,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,9672184c-21bc-11ea-a13a-137349068a90,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [None]:
#If we don't delete them, csv files are buried and cannot be retrieved.
!rm -r croped_images_train
!rm -r croped_images_test