# Re-identification
In the second task of your assignment, you will build
a person re-identification model to retrieve from the test directory all the images depicting the same person identity appearing in a given query image. More formally, for each image in the queries directory, produce an ordered list of images from the test directory that you believe correspond to
the same person identity depicted in the query image. Each
query image has at least a corresponding image in the test
directory, but the number of corresponding images varies
from query to query. The test directory also contains a set
of distractor and junk images that should never be returned as the result of a query. Start by creating your own set of
queries from the validation set that you previously used for
the classification task and train your person re-identification
model. Use the set of queries extracted from the validation
set to evaluate the performance of your model. Mean average precision (mAP) is a commonly used metric to evaluate performance in person re-identification and should be
included in your analysis. A reference python implementation of mAP is available here. Once you are satisfied with
the performance of your model, you are required to produce
predictions for each query image in the queries folder. For
each query image with filename qi
, produce a list of images
in the test folder with filenames ti,1, ti,2, ..., ti,mi
that you
believe depict the same person as qi
. Note that the number of images mi associated with qi
is variable. Produce
a file containing your predictions named reid test.txt with
2.248 rows, one for each query image. The i
th row has
the format qi
: ti,1, ti,2, ..., ti,mi
. For example, the row you
would write for the results corresponding to the query image
000616.jpg may be “000616.jpg: 001249.jpg, 000316.jpg,
000924.jpg, 009847.jpg”.



## Settings

In [2]:
!pip install -Uqq fastbook
## Import libraries in python
import os
os.environ['CUDA_LAUNCH_BLOCKING'] = "1"
import argparse
import time
import json
import sys
import io
import math
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import random
import torch
import glob
import torch.nn as nn
from torchvision.transforms import transforms
from torchvision.utils import make_grid
from torch.utils.tensorboard import SummaryWriter
from torch.utils.data import DataLoader, Dataset
from torch.optim import Adam
from torch.autograd import Variable
from PIL import Image
import torchvision.transforms as tt
from tqdm.notebook import tqdm
import sklearn
import torchvision
import pathlib 
from fastai.vision.all import *
from sklearn import preprocessing
from skimage import io, transform
from sklearn.preprocessing import OneHotEncoder, LabelEncoder
from sklearn.metrics.pairwise import cosine_similarity
# Ignore warnings
import warnings
warnings.filterwarnings("ignore")
import zipfile
from google.colab import drive
import fastbook

fastbook.setup_book()
drive.mount('/content/drive')

[K     |████████████████████████████████| 720 kB 4.2 MB/s 
[K     |████████████████████████████████| 1.2 MB 53.5 MB/s 
[K     |████████████████████████████████| 46 kB 6.2 MB/s 
[K     |████████████████████████████████| 189 kB 62.8 MB/s 
[K     |████████████████████████████████| 56 kB 5.6 MB/s 
[K     |████████████████████████████████| 51 kB 295 kB/s 
[?25hMounted at /content/gdrive
Mounted at /content/drive


In [3]:
%cp ./drive/MyDrive/DL/dataset.zip ./

In [4]:
with zipfile.ZipFile('dataset.zip', 'r') as zip_ref:
    zip_ref.extractall('./')

In [5]:
current_dir = '.'
## Dataset path
train_path = current_dir +'/train'
test_path = current_dir +'/test'
queries_path = current_dir +'/queries'
annotation_path = current_dir +'/annotations_train.csv'
## temporary model path for NN
model_path = './drive/MyDrive/DL/reid_net.pth'
# Checking for device
device = torch.device('cuda:0' if torch.cuda.is_available() else 'cpu')
print(device)

cuda:0


# Defining classes

## Dataset for Market 1501

In [6]:
class ReidDatasetTrainVal(Dataset):
  '''
  Market 1501 dataset for re-id train and validation.
  The class is implemented with the Anchor, Positive, Negative approach.
  Anchor is the main object
  Positive is a photo with the same id (randomly chosen)
  Negative is a photo with different id (randomly chosen)
  '''

  def __init__(self, csv_file, root_dir, transform=None):
      '''
      Args:
          csv_file (string): Path to the csv file with annotations(csv path).
          root_dir (string): Directory with all the images (train_path).
          transform (callable, optional): Optional transform to be applied
              on a sample.
      '''
      self.re_id_df = pd.read_csv(csv_file)
      self.root_dir = root_dir
      self.transform = transform
      print('root_dir: {}'.format(self.root_dir))
      print('csv_file: {}'.format(csv_file))

  def __len__(self):
      return len(self.re_id_df)
  
  def __getitem__(self, idx):
    '''
    filename: name of the image file
    path: path of the image file
    id: image id
    idx: index in dataframe
    idxs: indexes in dataframe
    '''
    if torch.is_tensor(idx):
        idx = idx.tolist()
    # Anchor
    a_filename = str(self.re_id_df.iloc[idx, 1]) 
    a_path = os.path.join(self.root_dir,a_filename)
    a_id = self.re_id_df.iloc[idx, 0]
    a_image = io.imread(a_path)
    # Positive
    p_idxs = self.re_id_df[self.re_id_df['id'] == a_id].index
    p_idx = random.choice(p_idxs)
    p_filename = self.re_id_df.iloc[p_idx, 1]
    p_path = os.path.join(self.root_dir, p_filename)
    p_image = io.imread(p_path)
    p_id = int(self.re_id_df.iloc[p_idx, 0])
    # Negative
    n_idxs = self.re_id_df[self.re_id_df['id'] != a_id].index
    n_idx = random.choice(n_idxs)
    n_filename = self.re_id_df.iloc[n_idx, 1]
    n_path = os.path.join(self.root_dir, n_filename)
    n_image = io.imread(n_path)
    n_id = int(self.re_id_df.iloc[n_idx, 0])

    sample = {
        'a_image': a_image,
        'a_id': a_id,
        'a_path': a_path,
        'a_filename': a_filename,
        'a_id_ds': a_id - 1,
        'p_image': p_image,
        'p_id': p_id,
        'p_path': p_path,
        'p_filename': p_filename,
        'p_id_ds': p_id - 1,
        'n_image': n_image,
        'n_id': n_id,
        'n_path': n_path,
        'n_filename': n_filename,
        'n_id_ds': n_id - 1,
        }

    if self.transform:
        sample['a_image'] = self.transform(sample['a_image'])
        sample['p_image'] = self.transform(sample['p_image'])
        sample['n_image'] = self.transform(sample['n_image'])

    return sample

class ReidDatasetTestQuery(Dataset):
  """
  Market 1501 dataset for re-id testing and queries
  """

  def __init__(self, csv_file, root_dir, transform=None):
    '''
    Args:
        csv_file (string): Path to the csv file with annotations(csv path).
        root_dir (string): Directory with all the images (train_path).
        transform (callable, optional): Optional transform to be applied
            on a sample.
    '''
    self.re_id_df = pd.read_csv(csv_file)
    self.root_dir = root_dir
    self.transform = transform
    print('root_dir: {}'.format(self.root_dir))
    print('csv_file: {}'.format(csv_file))

  def __len__(self):
    return len(self.re_id_df)
  
  def __getitem__(self, idx):
    if torch.is_tensor(idx):
        idx = idx.tolist()

    filename = str(self.re_id_df.iloc[idx, 0])
    path = os.path.join(self.root_dir,filename)
    image = io.imread(path)
    
    sample = {
        'a_filename': filename,
        'path': path,
        'a_image': image
        }

    if self.transform:
        sample['a_image'] = self.transform(sample['a_image'])

    return sample


## Custom class for ResNet50 model

In [7]:
class CustomRNet50(nn.Module):
  """
  New Custom ResNet50.
  The class takes care of instantiating a new version of ResNet50.
  """
  def __init__(self, dropout=0.5):
    """
    Extract the feature extractor from the original model.
    Then are added a layer to predict 1501 images from dataset
    """
    super(CustomRNet50, self).__init__()
    # Instantiate a model
    model = torchvision.models.resnet50(pretrained=True)
    self.in_features = model.fc.in_features
    layers = list(model.children())[:-1]
    #Create a feature extractor
    self.feature_extractor = nn.Sequential(*layers)
    # Single classifier for 1501 market
    self.id = nn.Sequential(nn.Dropout(p=dropout), 
                            nn.Linear(self.in_features,1024), 
                            nn.ReLU(), 
                            nn.Dropout(p=dropout), 
                            nn.Linear(1024,1501))

  def forward(self, x):
    x = self.feature_extractor(x)
    x = x.view(x.size(0), -1)
    
    return x, self.id(x)

## Network fit

In [19]:
class network_fit(object):
  """
  Class for network, manage training, validation and test
  """

  def __init__ (self, model, device, train_dataloader, validation_dataloader, test_dataloader, query_dataloader, valid_df, lr):
    """
    Constructor
    generate a CNN
    """
    self.device = device
    self.model = model
    self.vali_df = valid_df
    self.train_dataloader = train_dataloader
    self.validation_dataloader = validation_dataloader
    self.test_dataloader = test_dataloader
    self.query_dataloader = query_dataloader
    self.triplet_cost_function = nn.TripletMarginWithDistanceLoss(swap=True)
    self.cost_function = nn.CrossEntropyLoss()
    self.lr = lr
    self.optimizer = torch.optim.SGD(model.parameters(), lr=self.lr, momentum=0.9, weight_decay=0.001)
  
  def classification_loss(self, a_image_pred, a_image_truth, p_image_pred, p_image_truth, n_image_pred, n_image_truth):
    '''
    Return the sum of all classification cost function
    '''
    a_cost = self.cost_function(a_image_pred, a_image_truth)
    p_cost = self.cost_function(p_image_pred, p_image_truth)
    n_cost = self.cost_function(n_image_pred, n_image_truth)

    return a_cost + p_cost + n_cost    

  def loss(self, a_image_pred, a_image_truth, p_image_pred, p_image_truth,n_image_pred, n_image_truth, a_values, p_values, n_values):
    """
    Calculate the total loss in batch
    """
    classification_loss = self.classification_loss(a_image_pred, a_image_truth, p_image_pred, p_image_truth, n_image_pred, n_image_truth)
    triplet_loss = self.triplet_cost_function(a_values, p_values, n_values)
    tot_loss = classification_loss + self.triplet_cost_function(a_values, p_values, n_values)
    return tot_loss, triplet_loss, classification_loss


  def train_net(self):
    """
    Training function:
      1- iterate over dataloader batch
      2- give to the model an ancor image, a positive image and degative image
      3- calc the batch total, triplet and classification loss
      4- use total loss and do backwords
      5- update weights
      6- set gradient to 0
      7- update train triplet loss and classification loss
      8- normalize loss
    """
    self.model.train()
    
    # Total train loss
    train_triplet_loss = 0.0
    train_classification_loss = 0.0
    batch = 0

    # Iteration in dataloader
    for id, data in enumerate(self.train_dataloader):
      # print('data.pic: {}'.format(data['pic'].size()))
      a_image = data['a_image'].to(self.device)
      a_truth_id = data['a_id_ds'].to(self.device)
      p_image = data['p_image'].to(self.device)
      p_truth_id = data['p_id_ds'].to(self.device)
      n_image = data['n_image'].to(self.device)
      n_truth_id = data['n_id_ds'].to(self.device)
      
      # Predictions
      a_values, a_pred_id = model(a_image.float())
      a_pred_id = a_pred_id.to(self.device)
      p_values, p_pred_id = model(p_image.float())
      p_pred_id = p_pred_id.to(self.device)
      n_values, n_pred_id = model(n_image.float())
      n_pred_id = n_pred_id.to(self.device)

      # Batch loss
      total_loss, triplet_loss, classification_loss = self.loss(a_pred_id, a_truth_id, p_pred_id, p_truth_id, n_pred_id, n_truth_id, a_values, p_values, n_values)
      # Backward
      total_loss.backward()
      # Weights update
      self.optimizer.step()
      # Gradient to zero
      self.optimizer.zero_grad()
      # Train Loss
      train_triplet_loss += triplet_loss.item()
      train_classification_loss += classification_loss/3
      # Batch size
      batch+=1

    # Normalization loss
    triplet_batch_loss = train_triplet_loss/batch
    classification_batch_loss = train_classification_loss/batch
    
    return triplet_batch_loss, classification_batch_loss


  
  def validation_net(self):
    """
    Validation function:
      1- iterate over dataloader batch
      2- give to the model an ancor image, a positive image and negative image
      3- calc the batch total, triplet and classification loss
      4- update train triplet loss and classification loss
      5- normalize loss
    """
    self.model.eval()
    
    # Total train loss
    train_triplet_loss = 0.0
    train_classification_loss = 0.0
    batch = 0

    # Iteration in dataloader
    with torch.no_grad(): # reduce memory usage
      for id, data in enumerate(self.validation_dataloader):
        # print('data.pic: {}'.format(data['pic'].size()))
        a_image = data['a_image'].to(self.device)
        a_truth_id = data['a_id_ds'].to(self.device)
        p_image = data['p_image'].to(self.device)
        p_truth_id = data['p_id_ds'].to(self.device)
        n_image = data['n_image'].to(self.device)
        n_truth_id = data['n_id_ds'].to(self.device)
        
        # Predictions
        a_values, a_pred_id = model(a_image.float())
        a_pred_id = a_pred_id.to(self.device)
        p_values, p_pred_id = model(p_image.float())
        p_pred_id = p_pred_id.to(self.device)
        n_values, n_pred_id = model(n_image.float())
        n_pred_id = n_pred_id.to(self.device)
        # Batch loss
        total_loss, triplet_loss, classification_loss = self.loss(a_pred_id, 
                                                                  a_truth_id, 
                                                                  p_pred_id, 
                                                                  p_truth_id, 
                                                                  n_pred_id, 
                                                                  n_truth_id, 
                                                                  a_values, 
                                                                  p_values,
                                                                  n_values)
        # Train Loss
        train_triplet_loss += triplet_loss.item()
        train_classification_loss += classification_loss/3
        # Batch size
        batch+=1

      # Normalization loss
      triplet_batch_loss = train_triplet_loss/batch
      classification_batch_loss = train_classification_loss/batch
      
      return triplet_batch_loss, classification_batch_loss



  def mAP_dict(self,ground_truth_dict, pred_dict):
    """
    The map function thake 2 dictionary:
    for each ground_truth calc precision and ap if the candidate is predicted.
    Return mean overall queries ap
    """
    map = 0.0
    for gt_query, gt_candidates in ground_truth_dict.items():
      # check for no predictions
      if(not gt_query in pred_dict):
        continue
      ap = 0.0 # area under the curve
      predictions = pred_dict[gt_query] # prediction given query in ground truth
      # calc recall and precision
      recall = 1.0 / len(gt_candidates)
      counter = 0
      for idx, pred in enumerate(predictions):
        # if pred in candidates calc precision and the AP
        if pred in gt_candidates:
          counter +=1
          precision = counter/(idx+1)
          ap += precision*recall
      map+=ap
    # mean overall queries
    map /= len(ground_truth_dict)
    return map



  def extract_feature_vector(self, model, dataloader):
    """
    Evaluation for testing in validation dataloader
    """
    model.eval()
    embedded_values = []
    filename_imgs = []
    id_imgs = []

    # Iteration in dataloader
    with torch.no_grad():
      for id, data in enumerate(tqdm(dataloader)):
        a_image = data['a_image'].to(device)
        a_filename = data['a_filename']
        outputs, _ = model(a_image.float())
        embedded_values.append(outputs.cpu().numpy())
        filename_imgs.append(a_filename)

    embedded_values = np.concatenate(embedded_values)
    filename_imgs = np.concatenate(filename_imgs)

    embedded_values_df = pd.DataFrame(embedded_values)
    embedded_values_df['filename'] = filename_imgs
  
    return embedded_values_df



  def gt_pred_extraction(self,cosine_similarity_df, query_embedded, test_val_embedded_df, valid_ground_truth, max_similar_images=72):
    """
    Given cosine similarity dataframe, query feature vectors and test feature vectors 
    return prediction and ground_truth in dictionary type
    """
    pred_df = pd.DataFrame(columns=['query', 'filename'])
    for index, row in cosine_similarity_df.iterrows():
      query_filename = query_embedded.iloc[index,-1]
      similar_index = row.sort_values(ascending=False)[:max_similar_images].index
      similar_filename = pd.DataFrame(test_val_embedded_df.iloc[similar_index,-1])
      similar_filename['query'] = query_filename
      pred_df = pred_df.append(similar_filename, ignore_index=True)
    pred_df = pred_df.rename(columns={'filename': 'prediction'})
    pred_df.sort_values(['query'])
    pred_dict = pred_df.groupby('query')['prediction'].apply(list).to_dict()
    ground_truth_dict = valid_ground_truth.groupby('query')['candidate'].apply(list).to_dict()
    return pred_dict, ground_truth_dict



  def validation_test(self,model, max_similar_images = 73):
    """
    Function that tests the model on the validation set.
    From the validation set, the ground_truth dataframe is created and the mAP is applied.
    WORKFLOW:
        1- for each id a filename is chosen and the others of the same id become the candidates
        2- save the id of each query
        3- you validate the model by giving it only the anchor images
        4- extract the embedded values in 2760 rows × 2048 columns
        5- id and filename are associated
        6- those with filename and id referring to the queries are selected
        7- the others are selected
        8- I calculate the cosine similarity between the two embedded elements
          - Rows -> Query images
          - Columns -> Test images
        9- I order the probabilities of each row and take the first results obtained
        10- I get from here the ground_truth dataframe and the predictions in dictionary form
        11- I calculate the map for the dictionaries
    """
    # 1
    validation_ids = self.vali_df.id.unique()
    self.vali_df = self.vali_df.reset_index()
    print(f'Validation ids len: {len(validation_ids)}')
    query_ids_list = []
    ground_truth = []
    for id in sorted(validation_ids):
      sample = self.vali_df[self.vali_df.id == id].sample()
      index = int(sample.index.values)
      idx = int(sample['id'].values)
      filename = sample['filename'].values[0]
      truth_df = self.vali_df.loc[(self.vali_df['id'] == idx)&(self.vali_df.filename != filename)]
      truth_df['query'] = filename
      ground_truth.extend(truth_df[['query','filename']].values.tolist())
      # 2
      query_ids_list.append(index)
    valid_ground_truth = pd.DataFrame(ground_truth, columns=['query', 'candidate'])
    val_query_df = self.vali_df[self.vali_df.index.isin(query_ids_list)].sort_values(['id'])
    val_query_filename = val_query_df['filename'].to_numpy()
    # 3-4-5
    embedded_values_df = self.extract_feature_vector(model, self.validation_dataloader)
    # 6 
    query_embedded = embedded_values_df[embedded_values_df['filename'].isin(val_query_filename)]
    print(f'Query embedded shape: {query_embedded.shape}')
    # 7
    test_val_embedded_df = embedded_values_df[~embedded_values_df['filename'].isin(val_query_filename)]
    print(f'Test_validation embedded shape: {test_val_embedded_df.shape}')
    # 8
    cosine_similarity_df = pd.DataFrame(cosine_similarity(query_embedded.iloc[:,:-1], test_val_embedded_df.iloc[:,:-1]))
    # 9-10
    pred_dict, ground_truth_dict = self.gt_pred_extraction(cosine_similarity_df, query_embedded, test_val_embedded_df, valid_ground_truth)
    ##11
    return self.mAP_dict(ground_truth_dict, pred_dict)


  def create_reid_filetext(self, pred_dict, filename ='reid_test.txt'):
    counter = 0
    file=open(filename,'w')
    for key, values in pred_dict.items():
      text = '{}: {}\n'.format(key, ", ".join(values))
      file.write(text)


  def testing(self, model, max_similar_images = 72):
    """
    Testing the model on query-test folders.
    """
    query_embedded_df = self.extract_feature_vector(model, self.query_dataloader)
    test_embedded_df = self.extract_feature_vector(model, self.test_dataloader)
    cosine_similarity_df = pd.DataFrame(cosine_similarity(query_embedded_df.iloc[:,:-1], test_embedded_df.iloc[:,:-1]))
    
    pred_df = pd.DataFrame(columns=['query', 'filename'])
    for index, row in cosine_similarity_df.iterrows():
      query_filename = query_embedded_df.iloc[index,-1]
      similar_index = row.sort_values(ascending=False)[:max_similar_images].index
      similar_filename = pd.DataFrame(test_embedded_df.iloc[similar_index,-1])
      similar_filename['query'] = query_filename
      pred_df = pred_df.append(similar_filename, ignore_index=True)
    pred_df = pred_df.rename(columns={'filename': 'prediction'})
    pred_df.sort_values(['query'])
    pred_dict = pred_df.groupby('query')['prediction'].apply(list).to_dict()
    # Write a file
    self.create_reid_filetext(pred_dict)

    return pred_dict


  

# Preprocessing

## Images distribution

In [9]:
# Extract list of image names
imgs = get_image_files(train_path)
get_image_name = lambda x: str(x).split('/')[-1]
imgs = list(map(get_image_name, imgs))
id_list = [int(x.split('_')[0]) for x in imgs]
ids_imgs_df = pd.DataFrame(zip(id_list, imgs), columns=['id', 'filename'])
ids_imgs_df.head()

Unnamed: 0,id,filename
0,1032,1032_c5_052942248.jpg
1,1442,1442_c2_073830407.jpg
2,1497,1497_c1_033449962.jpg
3,524,0524_c6_093026917.jpg
4,752,0752_c6_055012400.jpg


In [10]:
# Calc min, mean and max in order to decide how many images to retrieve per each query
ad_person = ids_imgs_df.groupby('id')['filename'].count()
print('Min: {},\nMean: {:.2f},\nMax: {}'.format(ad_person.min(), ad_person.mean(), ad_person.max()))

Min: 2,
Mean: 17.30,
Max: 72


## Id-Images dataframes
In questa sezione calcolo come é fatto il dataframe che andrá a formare train e validation

In [11]:
df = pd.read_csv(annotation_path)
df.head()

Unnamed: 0,id,age,backpack,bag,handbag,clothes,down,up,hair,hat,gender,upblack,upwhite,upred,uppurple,upyellow,upgray,upblue,upgreen,downblack,downwhite,downpink,downpurple,downyellow,downgray,downblue,downgreen,downbrown
0,474,2,1,1,1,1,2,2,2,1,2,1,2,1,1,1,1,1,1,1,2,1,1,1,1,1,1,1
1,857,2,1,2,1,2,2,2,2,1,2,2,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,2
2,1487,2,2,1,1,2,2,2,1,1,1,2,1,1,1,1,1,1,1,1,1,1,1,1,1,2,1,1
3,1422,2,1,2,1,2,2,2,1,1,1,1,2,1,1,1,1,1,1,2,1,1,1,1,1,1,1,1
4,856,2,2,1,1,2,2,2,2,1,2,1,2,1,1,1,1,1,1,2,1,1,1,1,1,1,1,1


In [12]:
# New reidentification dataframe
annotation_df = pd.merge(df, ids_imgs_df, on='id')
re_id_df = annotation_df[['id', 'filename']]
re_id_df.to_csv(current_dir+'/re_id_df.csv', index=False)
print('Re-identification-dataframe shape {}'.format(re_id_df.shape))

Re-identification-dataframe shape (12989, 2)


# Dataframes
Train e validatoin dataset sono randomly splitted in 80/20 prendendo come riferimento gli ids.
Test dataset contiene solo i filename

In [13]:
## Re-identification train and validation 
re_id_df = pd.read_csv(current_dir+'/re_id_df.csv')
# Randomly split ids
idxs = re_id_df.id.unique()
train_idxs, validation_idxs  = sklearn.model_selection.train_test_split(idxs, train_size=0.8)
print('Train_idxs: {}'.format(len(train_idxs)))
print('Validation_idxs: {}'.format(len(validation_idxs)))
print('All idx: {}'.format(len(idxs)))
# Transform Dataframe
train_df = re_id_df[re_id_df.id.isin(train_idxs)]
val_df = re_id_df[re_id_df.id.isin(validation_idxs)]
print('Train DataFrame len: {}'.format(len(train_df)))
print('Validation DataFrame len: {}'.format(len(val_df)))
# Save
train_df.to_csv(current_dir + '/train_re_id.csv', index=False)
val_df.to_csv(current_dir + '/validation_re_id.csv', index=False)

Train_idxs: 600
Validation_idxs: 151
All idx: 751
Train DataFrame len: 10229
Validation DataFrame len: 2760


In [14]:
# Test dataframe
imgs = get_image_files(test_path)
get_image_name = lambda x: str(x).split('/')[-1]
imgs = list(map(get_image_name, imgs))
test_df = pd.DataFrame(imgs, columns=['filename'])
test_csv = 'test_re_id.csv'
test_df.to_csv('{}/{}'.format(current_dir,test_csv), index=False)
test_df = pd.read_csv(test_csv)
# Query dataframe
imgs = get_image_files(queries_path)
get_image_name = lambda x: str(x).split('/')[-1]
imgs = list(map(get_image_name, imgs))
query_df = pd.DataFrame(imgs, columns=['filename'])
query_csv = 'query_re_id.csv'
query_df.to_csv('{}/{}'.format(current_dir, query_csv), index=False)
query_df = pd.read_csv(query_csv)

# Datasets and DataLoaders


In [15]:
# Basic Trasnsformation of Market1501
imagenet_stats = ([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])

tfms_train = tt.Compose([
    tt.ToPILImage(),
    tt.ToTensor(), 
    tt.Normalize(*imagenet_stats,inplace=True)
])

tfms_valid = tt.Compose([
    tt.ToTensor(), 
    tt.Normalize(*imagenet_stats)
])

# Datasets
validation_ds = ReidDatasetTrainVal(csv_file = '{}/validation_re_id.csv'.format(current_dir), root_dir = train_path, transform=tfms_valid)
train_ds = ReidDatasetTrainVal(csv_file = '{}/train_re_id.csv'.format(current_dir), root_dir = train_path , transform = tfms_train)
test_ds = ReidDatasetTestQuery(csv_file = test_csv, root_dir = test_path, transform=tfms_valid)
query_ds = ReidDatasetTestQuery(csv_file = query_csv, root_dir = queries_path, transform= tfms_valid)
# DataLoaders
train_dl = DataLoader(train_ds, batch_size=50, shuffle=True, num_workers=2, pin_memory = True)
validation_dl = DataLoader(validation_ds, batch_size=50, shuffle=True, num_workers=2, pin_memory = True)
test_dl = torch.utils.data.DataLoader(test_ds, batch_size=50,shuffle=True, num_workers=0, pin_memory = True)
query_dl = torch.utils.data.DataLoader(query_ds, batch_size=50,shuffle=True, num_workers=0, pin_memory = True)

root_dir: ./train
csv_file: ./validation_re_id.csv
root_dir: ./train
csv_file: ./train_re_id.csv
root_dir: ./test
csv_file: test_re_id.csv
root_dir: ./queries
csv_file: query_re_id.csv


# Basic Model pipeline


In [15]:
# Basic Trasnsformation of Market1501
imagenet_stats = ([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])

tfms_train = tt.Compose([
    tt.ToPILImage(),
    tt.ToTensor(), 
    tt.Normalize(*imagenet_stats,inplace=True)
])

tfms_valid = tt.Compose([
    tt.ToTensor(), 
    tt.Normalize(*imagenet_stats)
])

In [16]:
# Datasets
validation_ds = ReidDatasetTrainVal(csv_file = '{}/validation_re_id.csv'.format(current_dir), root_dir = train_path, transform=tfms_valid)
train_ds = ReidDatasetTrainVal(csv_file = '{}/train_re_id.csv'.format(current_dir), root_dir = train_path , transform = tfms_train)
test_ds = ReidDatasetTestQuery(csv_file = test_csv, root_dir = test_path, transform=tfms_valid)
query_ds = ReidDatasetTestQuery(csv_file = query_csv, root_dir = queries_path, transform= tfms_valid)
# DataLoaders
train_dl = DataLoader(train_ds, batch_size=50, shuffle=True, num_workers=2, pin_memory = True)
validation_dl = DataLoader(validation_ds, batch_size=50, shuffle=True, num_workers=2, pin_memory = True)
test_dl = torch.utils.data.DataLoader(test_ds, batch_size=50,shuffle=True, num_workers=2,pin_memory = True)
query_dl = torch.utils.data.DataLoader(query_ds, batch_size=50,shuffle=True, num_workers=2,pin_memory = True)

root_dir: ./train
csv_file: ./validation_re_id.csv
root_dir: ./train
csv_file: ./train_re_id.csv
root_dir: ./test
csv_file: test_re_id.csv
root_dir: ./queries
csv_file: query_re_id.csv


In [17]:
# Initialize model
device = torch.device('cuda') if torch.cuda.is_available() else torch.device('cpu')
model = CustomRNet50().to(device)
if (next(model.parameters()).is_cuda):
  print('Model on GPU!!')

Downloading: "https://download.pytorch.org/models/resnet50-0676ba61.pth" to /root/.cache/torch/hub/checkpoints/resnet50-0676ba61.pth


  0%|          | 0.00/97.8M [00:00<?, ?B/s]

Model on GPU!!


In [18]:
# Base Training settings
settings = {
    'epochs': 150,
    'batch_size': 64,
    'stop_epoch': 10,
    'learning_rate': 1e-4
}
# Initialize fit class
net_fit = network_fit(model, device, train_dl, validation_dl, test_dl,query_dl , val_df, lr = settings['learning_rate'])
# Settings
epochs = 50
batch_size = 35
# Settings early stopping
early_stopping = 10
min_val_loss = np.Inf
no_improvement = 0

## Analytics tensorboard writers
writer_loss = SummaryWriter(log_dir="{}/runs/loss".format(current_dir))

for epoch in range(epochs):
  print('|epoch {}/{}'.format(epoch + 1, epochs))
  ## Training
  t_triplet_loss, t_classification_loss = net_fit.train_net()
  # Print training
  print('Train |triplet_loss: {:.3f}|classification_loss: {:.3f}'.format(t_triplet_loss, t_classification_loss))
  ## Validation
  v_triplet_loss, v_classification_loss = net_fit.validation_net()
  # Print validatoin
  print('Validation |triplet_loss: {:.3f}|classification_loss: {:.3f}'.format(v_triplet_loss, v_classification_loss))
  ## Early stopping
  if (v_triplet_loss < min_val_loss):
    torch.save(model, model_path)
    no_improvement = 0
    min_val_loss = v_triplet_loss
    print('|Model saved')
  else:
    no_improvement +=1
    if (no_improvement == early_stopping):
      print('|Early stopping')
      break

  ## Analytics
  writer_loss.add_scalar('Loss/t_triplet_loss', t_triplet_loss, epoch + 1)
  writer_loss.add_scalar('Loss/t_classification_loss', t_classification_loss, epoch + 1)
  writer_loss.add_scalar('Loss/v_triplet_loss', v_triplet_loss, epoch + 1)
  writer_loss.add_scalar('Loss/v_classification_loss', v_classification_loss, epoch + 1)


# Close tensorboard writers
writer_loss.close()

## Testing on validation test
mAP = net_fit.validation_test()
print('Mean Average Precision in validation: {:.3f}'.format(mAP))


|epoch 1/50
Train |triplet_loss: 1.152|classification_loss: 7.314
Validation |triplet_loss: 0.847|classification_loss: 7.355
|Model saved
|epoch 2/50
Train |triplet_loss: 0.426|classification_loss: 7.158
Validation |triplet_loss: 0.504|classification_loss: 7.417
|Model saved
|epoch 3/50
Train |triplet_loss: 0.287|classification_loss: 6.942
Validation |triplet_loss: 0.452|classification_loss: 7.566
|Model saved
|epoch 4/50
Train |triplet_loss: 0.204|classification_loss: 6.657
Validation |triplet_loss: 0.353|classification_loss: 7.842
|Model saved
|epoch 5/50
Train |triplet_loss: 0.173|classification_loss: 6.358
Validation |triplet_loss: 0.341|classification_loss: 8.164
|Model saved
|epoch 6/50
Train |triplet_loss: 0.140|classification_loss: 6.091
Validation |triplet_loss: 0.286|classification_loss: 8.450
|Model saved
|epoch 7/50
Train |triplet_loss: 0.126|classification_loss: 5.827
Validation |triplet_loss: 0.270|classification_loss: 8.698
|Model saved
|epoch 8/50


KeyboardInterrupt: ignored

In [None]:
# %load_ext tensorboard
%reload_ext tensorboard
%tensorboard --logdir=runs

# Testing transformation


In [None]:
imagenet_normalization = ([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
tfms_train1 =  tt.Compose([
    tt.ToPILImage(),
    tt.RandomCrop((128, 64), padding=8, padding_mode='reflect'),
    tt.ToTensor(), 
    tt.Normalize(*imagenet_stats,inplace=True)
])

tfms_train2 =  tt.Compose([
    tt.ToPILImage(),
    tt.RandomCrop((128, 64), padding=8, padding_mode='reflect'),
    tt.RandomHorizontalFlip(p=0.5), 
    tt.ToTensor(), 
    tt.Normalize(*imagenet_stats,inplace=True)
])

tfms_train3 =  tt.Compose([
    tt.ToPILImage(),
    tt.RandomRotation(10),
    tt.ToTensor(), 
    tt.Normalize(*imagenet_stats,inplace=True), 
])

tfms_train4 =  tt.Compose([
    tt.ToPILImage(),
    tt.RandomHorizontalFlip(p=0.5), 
    tt.RandomRotation(10),
    tt.ToTensor(), 
    tt.Normalize(*imagenet_stats,inplace=True), 
])

tfms_train5 =  tt.Compose([
    tt.ToPILImage(),
    tt.RandomCrop((128, 64), padding=8, padding_mode='reflect'),
    tt.RandomHorizontalFlip(p=0.5), 
    tt.RandomRotation(10),
    tt.ToTensor(), 
    tt.Normalize(*imagenet_stats,inplace=True)
])

tfms_train6 =  tt.Compose([
    tt.ToPILImage(),
    tt.RandomCrop((128, 64), padding=8, padding_mode='reflect'),
    tt.RandomHorizontalFlip(p=0.5), 
    tt.RandomRotation(10),
    tt.ToTensor(), 
    tt.Normalize(*imagenet_stats,inplace=True), 
    tt.RandomErasing(p=0.5, inplace=True)
])

tfms_train_list = [tfms_train1, tfms_train2, tfms_train3, tfms_train4, tfms_train5, tfms_train6]

tfms_valid = tt.Compose([
    tt.ToTensor(), 
    tt.Normalize(*imagenet_stats)
])

# Base Training settings
settings = {
    'epochs': 150,
    'batch_size': 64,
    'stop_epoch': 10,
    'learning_rate': 1e-4
}
for times in range(0,3):
  # Initialize model
  device = torch.device('cuda') if torch.cuda.is_available() else torch.device('cpu')
  model = CustomRNet50().to(device)
  if (next(model.parameters()).is_cuda):
    print('Model on GPU!!')

  # Datasets
  test_ds = ReidDatasetTestQuery(csv_file = test_csv, root_dir = test_path, transform=tfms_valid)
  query_ds = ReidDatasetTestQuery(csv_file = query_csv, root_dir = queries_path, transform= tfms_valid)
  # DataLoaders
  test_dl = torch.utils.data.DataLoader(test_ds, batch_size=50,shuffle=True, num_workers=0, pin_memory = True)
  query_dl = torch.utils.data.DataLoader(query_ds, batch_size=50,shuffle=True, num_workers=0, pin_memory = True)

  map_list = []
  for tfms_train in tfms_train_list:
    # Datasets
    validation_ds = ReidDatasetTrainVal(csv_file = '{}/validation_re_id.csv'.format(current_dir), root_dir = train_path, transform=tfms_valid)
    train_ds = ReidDatasetTrainVal(csv_file = '{}/train_re_id.csv'.format(current_dir), root_dir = train_path , transform = tfms_train)
    # DataLoaders
    train_dl = DataLoader(train_ds, batch_size=settings['batch_size'], shuffle=True, num_workers=2, pin_memory = True)
    validation_dl = DataLoader(validation_ds, batch_size=settings['batch_size'], shuffle=True, num_workers=2, pin_memory = True)

    # Initialize fit class
    net_fit = network_fit(model, device, train_dl, validation_dl, test_dl, query_dl, val_df, lr = settings['learning_rate'])
    # Settings early stopping
    min_val_loss = np.Inf
    no_improvement = 0

    for epoch in range(settings['epochs']):
      print('|epoch {}/{}'.format(epoch + 1, settings['epochs']))
      ## Training
      t_triplet_loss, t_classification_loss = net_fit.train_net()
      ## Validation
      v_triplet_loss, v_classification_loss = net_fit.validation_net()
      ## Early stopping
      if (v_triplet_loss < min_val_loss):
        torch.save(model, model_path)
        no_improvement = 0
        min_val_loss = v_triplet_loss
      else:
        no_improvement +=1
        if (no_improvement == settings['stop_epoch']):
          print('|Early stopping')
          break

    
    ## Testing on validation test
    model = torch.load(model_path)
    mAP = net_fit.validation_test(model)
    map_list.append(mAP)
    print('Mean Average Precision in validation: {:.3f}'.format(mAP))
  map_df = pd.DataFrame(zip(map_list, range(1,7)), columns = ['map','tfms_id'])
  map_df.to_csv('./drive/MyDrive/DL/Evaluations/reid_tfms_eval{}.csv'.format(times), index=False)

## Evaluation transformation

In [None]:
path = './drive/MyDrive/DL/Evaluations/reid_tfms_eval'
all_files = glob.glob(path + "*.csv")
print(*all_files, sep='\n')
li = []

for filename in all_files:
    df = pd.read_csv(filename, index_col=None, header=0)
    li.append(df)

frame = pd.concat(li, axis=0, ignore_index=True)
frame.groupby(['tfms_id']).mean().sort_values(['map'], ascending = False)

# Batch size evaluations


In [None]:
imagenet_normalization = ([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])

tfms_train =  tt.Compose([
    tt.ToPILImage(),
    tt.RandomCrop((128, 64), padding=8, padding_mode='reflect'),
    tt.RandomHorizontalFlip(p=0.5), 
    tt.RandomRotation(10),
    tt.ToTensor(), 
    tt.Normalize(*imagenet_stats,inplace=True), 
    tt.RandomErasing(p=0.5, inplace=True)
])

tfms_valid = tt.Compose([
    tt.ToTensor(), 
    tt.Normalize(*imagenet_stats)
])

# Base Training settings
setting_master = [
            {
              'epochs': 150,
              'batch_size': 32,
              'stop_epoch': 10,
              'learning_rate': 1e-4
            },
             {
              'epochs': 150,
              'batch_size': 64,
              'stop_epoch': 10,
              'learning_rate': 1e-4
            },
             {
              'epochs': 150,
              'batch_size': 128,
              'stop_epoch': 10,
              'learning_rate': 1e-4
            }
             
]
for times in range(0,3):
  # Initialize model
  device = torch.device('cuda') if torch.cuda.is_available() else torch.device('cpu')
  model = CustomRNet50().to(device)
  if (next(model.parameters()).is_cuda):
    print('Model on GPU!!')
  
   # Datasets
  test_ds = ReidDatasetTestQuery(csv_file = test_csv, root_dir = test_path, transform=tfms_valid)
  query_ds = ReidDatasetTestQuery(csv_file = query_csv, root_dir = queries_path, transform= tfms_valid)
  # DataLoaders
  test_dl = torch.utils.data.DataLoader(test_ds, batch_size=50,shuffle=True, num_workers=0, pin_memory = True)
  query_dl = torch.utils.data.DataLoader(query_ds, batch_size=50,shuffle=True, num_workers=0, pin_memory = True)


  map_list = []
  for settings in setting_master:
    # Datasets
    validation_ds = ReidDatasetTrainVal(csv_file = '{}/validation_re_id.csv'.format(current_dir), root_dir = train_path, transform=tfms_valid)
    train_ds = ReidDatasetTrainVal(csv_file = '{}/train_re_id.csv'.format(current_dir), root_dir = train_path , transform = tfms_train)
    # DataLoaders
    train_dl = DataLoader(train_ds, batch_size=settings['batch_size'], shuffle=True, num_workers=2, pin_memory = True)
    validation_dl = DataLoader(validation_ds, batch_size=settings['batch_size'], shuffle=True, num_workers=2, pin_memory = True)

    # Initialize fit class
    net_fit = network_fit(model, device, train_dl, validation_dl, test_dl, query_dl, val_df, settings['learning_rate'])
    # Settings early stopping
    min_val_loss = np.Inf
    no_improvement = 0

    for epoch in range(settings['epochs']):
      print('|epoch {}/{}'.format(epoch + 1, settings['epochs']))
      ## Training
      t_triplet_loss, t_classification_loss = net_fit.train_net()
      ## Validation
      v_triplet_loss, v_classification_loss = net_fit.validation_net()
      ## Early stopping
      if (v_triplet_loss < min_val_loss):
        torch.save(model, model_path)
        no_improvement = 0
        min_val_loss = v_triplet_loss
      else:
        no_improvement +=1
        if (no_improvement == settings['stop_epoch']):
          print('|Early stopping')
          break

    
    ## Testing on validation test
    model = torch.load(model_path)
    mAP = net_fit.validation_test(model)
    map_list.append(mAP)
    print('Mean Average Precision in validation: {:.3f}'.format(mAP))
  map_df = pd.DataFrame(zip(map_list, [32,64,128]), columns = ['map','batch_size'])
  map_df.to_csv('./drive/MyDrive/DL/Evaluations/reid_batch_size_eval{}.csv'.format(times), index=False)

In [None]:
path = './drive/MyDrive/DL/Evaluations/reid_batch_size_eval'
all_files = glob.glob(path + "*.csv")
print(*all_files, sep='\n')
li = []

for filename in all_files:
    df = pd.read_csv(filename, index_col=None, header=0)
    li.append(df)

frame = pd.concat(li, axis=0, ignore_index=True)
frame.groupby(['batch_size']).mean().sort_values(['map'], ascending = False)

# Evaluation Learning rate

In [None]:
imagenet_normalization = ([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])

tfms_train =  tt.Compose([
    tt.ToPILImage(),
    tt.RandomCrop((128, 64), padding=8, padding_mode='reflect'),
    tt.RandomHorizontalFlip(p=0.5), 
    tt.RandomRotation(10),
    tt.ToTensor(), 
    tt.Normalize(*imagenet_stats,inplace=True), 
    tt.RandomErasing(p=0.5, inplace=True)
])

tfms_valid = tt.Compose([
    tt.ToTensor(), 
    tt.Normalize(*imagenet_stats)
])

# Base Training settings
setting_master = [
            {
              'epochs': 150,
              'batch_size': 64,
              'stop_epoch': 10,
              'learning_rate': 1e-3
            },
            {
              'epochs': 150,
              'batch_size': 64,
              'stop_epoch': 10,
              'learning_rate': 1e-2
            },
            {
              'epochs': 150,
              'batch_size': 64,
              'stop_epoch': 10,
              'learning_rate': 1e-1
            }
             
]
for times in range(0,3):
  # Initialize model
  device = torch.device('cuda') if torch.cuda.is_available() else torch.device('cpu')
  model = CustomRNet50().to(device)
  if (next(model.parameters()).is_cuda):
    print('Model on GPU!!')
     # Datasets
  test_ds = ReidDatasetTestQuery(csv_file = test_csv, root_dir = test_path, transform=tfms_valid)
  query_ds = ReidDatasetTestQuery(csv_file = query_csv, root_dir = queries_path, transform= tfms_valid)
  # DataLoaders
  test_dl = torch.utils.data.DataLoader(test_ds, batch_size=50,shuffle=True, num_workers=0, pin_memory = True)
  query_dl = torch.utils.data.DataLoader(query_ds, batch_size=50,shuffle=True, num_workers=0, pin_memory = True)


  map_list = []
  for settings in setting_master:
    # Datasets
    validation_ds = ReidDatasetTrainVal(csv_file = '{}/validation_re_id.csv'.format(current_dir), root_dir = train_path, transform=tfms_valid)
    train_ds = ReidDatasetTrainVal(csv_file = '{}/train_re_id.csv'.format(current_dir), root_dir = train_path , transform = tfms_train)
    # DataLoaders
    train_dl = DataLoader(train_ds, batch_size=settings['batch_size'], shuffle=True, num_workers=2, pin_memory = True)
    validation_dl = DataLoader(validation_ds, batch_size=settings['batch_size'], shuffle=True, num_workers=2, pin_memory = True)

    # Initialize fit class
    net_fit = network_fit(model, device, train_dl, validation_dl, train_dl, query_dl, val_df, lr = settings['learning_rate'])
    # Settings early stopping
    min_val_loss = np.Inf
    no_improvement = 0

    for epoch in range(settings['epochs']):
      print('|epoch {}/{}'.format(epoch + 1, settings['epochs']))
      ## Training
      t_triplet_loss, t_classification_loss = net_fit.train_net()
      ## Validation
      v_triplet_loss, v_classification_loss = net_fit.validation_net()
      ## Early stopping
      if (v_triplet_loss < min_val_loss):
        torch.save(model, model_path)
        no_improvement = 0
        min_val_loss = v_triplet_loss
      else:
        no_improvement +=1
        if (no_improvement == settings['stop_epoch']):
          print('|Early stopping')
          break

    
    ## Testing on validation test
    model = torch.load(model_path)
    mAP = net_fit.validation_test(model)
    map_list.append(mAP)
    print('Mean Average Precision in validation: {:.3f}'.format(mAP))
  map_df = pd.DataFrame(zip(map_list, [1e-4,1e-3,1e-2, 1e-1]), columns = ['map','lr'])
  map_df.to_csv('./drive/MyDrive/DL/Evaluations/reid_lr_eval{}.csv'.format(times), index=False)

In [None]:
path = './drive/MyDrive/DL/Evaluations/reid_lr_eval'
all_files = glob.glob(path + "*.csv")
print(*all_files, sep='\n')
li = []

for filename in all_files:
    df = pd.read_csv(filename, index_col=None, header=0)
    li.append(df)

frame = pd.concat(li, axis=0, ignore_index=True)
frame.groupby(['lr']).mean().sort_values(['map'], ascending = False)

# Final Model

In [20]:
imagenet_normalization = ([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])

tfms_train =  tt.Compose([
    tt.ToPILImage(),
    tt.RandomCrop((128, 64), padding=8, padding_mode='reflect'),
    tt.RandomHorizontalFlip(p=0.5), 
    tt.RandomRotation(10),
    tt.ToTensor(), 
    tt.Normalize(*imagenet_stats,inplace=True), 
    tt.RandomErasing(p=0.5, inplace=True)
])

tfms_valid = tt.Compose([
    tt.ToTensor(), 
    tt.Normalize(*imagenet_stats)
])

# Base Training settings
setting_master = [
            {
              'epochs': 150,
              'batch_size': 64,
              'stop_epoch': 10,
              'learning_rate': 1e-2
            }
             
]
for times in range(0,5):
  # Initialize model
  device = torch.device('cuda') if torch.cuda.is_available() else torch.device('cpu')
  model = CustomRNet50().to(device)
  if (next(model.parameters()).is_cuda):
    print('Model on GPU!!')
     # Datasets
  test_ds = ReidDatasetTestQuery(csv_file = test_csv, root_dir = test_path, transform=tfms_valid)
  query_ds = ReidDatasetTestQuery(csv_file = query_csv, root_dir = queries_path, transform= tfms_valid)
  # DataLoaders
  test_dl = torch.utils.data.DataLoader(test_ds, batch_size=50,shuffle=True, num_workers=0, pin_memory = True)
  query_dl = torch.utils.data.DataLoader(query_ds, batch_size=50,shuffle=True, num_workers=0, pin_memory = True)


  map_list = []
  for settings in setting_master:
    # Datasets
    validation_ds = ReidDatasetTrainVal(csv_file = '{}/validation_re_id.csv'.format(current_dir), root_dir = train_path, transform=tfms_valid)
    train_ds = ReidDatasetTrainVal(csv_file = '{}/train_re_id.csv'.format(current_dir), root_dir = train_path , transform = tfms_train)
    # DataLoaders
    train_dl = DataLoader(train_ds, batch_size=settings['batch_size'], shuffle=True, num_workers=2, pin_memory = True)
    validation_dl = DataLoader(validation_ds, batch_size=settings['batch_size'], shuffle=True, num_workers=2, pin_memory = True)

    # Initialize fit class
    net_fit = network_fit(model, device, train_dl, validation_dl, test_dl, query_dl, val_df, lr = settings['learning_rate'])
    # Settings early stopping
    min_val_loss = np.Inf
    no_improvement = 0

    for epoch in range(settings['epochs']):
      print('|epoch {}/{}'.format(epoch + 1, settings['epochs']))
      ## Training
      t_triplet_loss, t_classification_loss = net_fit.train_net()
      ## Validation
      v_triplet_loss, v_classification_loss = net_fit.validation_net()
      ## Early stopping
      if (v_triplet_loss < min_val_loss):
        torch.save(model, model_path)
        no_improvement = 0
        min_val_loss = v_triplet_loss
      else:
        no_improvement +=1
        if (no_improvement == settings['stop_epoch']):
          print('|Early stopping')
          break

    
    ## Testing on validation test
    model = torch.load(model_path)
    mAP = net_fit.validation_test(model)
    map_list.append(mAP)
    print('Mean Average Precision in validation: {:.3f}'.format(mAP))
  map_df = pd.DataFrame(map_list, columns = ['map'])
  map_df.to_csv('./drive/MyDrive/DL/Evaluations/reid_model_eval{}.csv'.format(times), index=False)

Model on GPU!!
root_dir: ./test
csv_file: test_re_id.csv
root_dir: ./queries
csv_file: query_re_id.csv
root_dir: ./train
csv_file: ./validation_re_id.csv
root_dir: ./train
csv_file: ./train_re_id.csv
|epoch 1/150
|epoch 2/150
|epoch 3/150
|epoch 4/150
|epoch 5/150
|epoch 6/150
|epoch 7/150
|epoch 8/150
|epoch 9/150
|epoch 10/150
|epoch 11/150
|epoch 12/150
|epoch 13/150
|epoch 14/150
|epoch 15/150
|epoch 16/150
|epoch 17/150
|epoch 18/150
|Early stopping


  0%|          | 0/44 [00:00<?, ?it/s]

Mean Average Precision in validation: 0.742
Model on GPU!!
root_dir: ./test
csv_file: test_re_id.csv
root_dir: ./queries
csv_file: query_re_id.csv
root_dir: ./train
csv_file: ./validation_re_id.csv
root_dir: ./train
csv_file: ./train_re_id.csv
|epoch 1/150
|epoch 2/150
|epoch 3/150
|epoch 4/150
|epoch 5/150
|epoch 6/150
|epoch 7/150
|epoch 8/150
|epoch 9/150
|epoch 10/150
|epoch 11/150
|epoch 12/150
|epoch 13/150
|epoch 14/150
|epoch 15/150
|epoch 16/150
|Early stopping


  0%|          | 0/44 [00:00<?, ?it/s]

Mean Average Precision in validation: 0.671
Model on GPU!!
root_dir: ./test
csv_file: test_re_id.csv
root_dir: ./queries
csv_file: query_re_id.csv
root_dir: ./train
csv_file: ./validation_re_id.csv
root_dir: ./train
csv_file: ./train_re_id.csv
|epoch 1/150
|epoch 2/150
|epoch 3/150
|epoch 4/150
|epoch 5/150
|epoch 6/150
|epoch 7/150
|epoch 8/150
|epoch 9/150
|epoch 10/150
|epoch 11/150
|epoch 12/150
|epoch 13/150
|epoch 14/150
|epoch 15/150
|epoch 16/150
|epoch 17/150
|epoch 18/150
|epoch 19/150
|epoch 20/150
|epoch 21/150
|epoch 22/150
|epoch 23/150
|epoch 24/150
|epoch 25/150
|epoch 26/150
|epoch 27/150
|epoch 28/150
|Early stopping


  0%|          | 0/44 [00:00<?, ?it/s]

Mean Average Precision in validation: 0.744
Model on GPU!!
root_dir: ./test
csv_file: test_re_id.csv
root_dir: ./queries
csv_file: query_re_id.csv
root_dir: ./train
csv_file: ./validation_re_id.csv
root_dir: ./train
csv_file: ./train_re_id.csv
|epoch 1/150
|epoch 2/150
|epoch 3/150
|epoch 4/150
|epoch 5/150
|epoch 6/150
|epoch 7/150
|epoch 8/150
|epoch 9/150
|epoch 10/150
|epoch 11/150
|epoch 12/150
|epoch 13/150
|epoch 14/150
|epoch 15/150
|epoch 16/150
|epoch 17/150
|epoch 18/150
|epoch 19/150
|epoch 20/150
|epoch 21/150
|epoch 22/150
|epoch 23/150
|epoch 24/150
|epoch 25/150
|epoch 26/150
|epoch 27/150
|epoch 28/150
|epoch 29/150
|epoch 30/150
|epoch 31/150
|epoch 32/150
|epoch 33/150
|epoch 34/150
|epoch 35/150
|epoch 36/150
|epoch 37/150
|epoch 38/150
|epoch 39/150
|epoch 40/150
|epoch 41/150
|epoch 42/150
|epoch 43/150
|epoch 44/150
|Early stopping


  0%|          | 0/44 [00:00<?, ?it/s]

Mean Average Precision in validation: 0.758
Model on GPU!!
root_dir: ./test
csv_file: test_re_id.csv
root_dir: ./queries
csv_file: query_re_id.csv
root_dir: ./train
csv_file: ./validation_re_id.csv
root_dir: ./train
csv_file: ./train_re_id.csv
|epoch 1/150
|epoch 2/150
|epoch 3/150
|epoch 4/150
|epoch 5/150
|epoch 6/150
|epoch 7/150
|epoch 8/150
|epoch 9/150
|epoch 10/150
|epoch 11/150
|epoch 12/150
|epoch 13/150
|epoch 14/150
|epoch 15/150
|epoch 16/150
|epoch 17/150
|epoch 18/150
|epoch 19/150
|epoch 20/150
|epoch 21/150
|epoch 22/150
|epoch 23/150
|Early stopping


  0%|          | 0/44 [00:00<?, ?it/s]

Mean Average Precision in validation: 0.729


In [1]:
path = './drive/MyDrive/DL/Evaluations/reid_model_eval'
all_files = glob.glob(path + "*.csv")
print(*all_files, sep='\n')
li = []

for filename in all_files:
    df = pd.read_csv(filename, index_col=None, header=0)
    li.append(df)

frame = pd.concat(li, axis=0, ignore_index=True)
frame['map'].mean()

NameError: ignored

# Testing

In [16]:
model = torch.load(model_path)

In [20]:
net_fit = network_fit(model, device, train_dl, validation_dl, test_dl, query_dl, val_df, lr=1e-2)

In [21]:
net_fit.validation_test(model)

Validation ids len: 151


  0%|          | 0/56 [00:00<?, ?it/s]

Query embedded shape: (151, 2049)
Test_validation embedded shape: (2609, 2049)


0.7442318194798301

In [22]:
predictions = net_fit.testing(model)

  0%|          | 0/45 [00:00<?, ?it/s]

  0%|          | 0/394 [00:00<?, ?it/s]

In [23]:
len(predictions.keys())

2248

In [24]:
predictions

{'000000.jpg': ['003682.jpg',
  '017550.jpg',
  '012105.jpg',
  '018464.jpg',
  '015886.jpg',
  '003213.jpg',
  '012139.jpg',
  '008094.jpg',
  '001251.jpg',
  '008765.jpg',
  '013572.jpg',
  '011978.jpg',
  '003510.jpg',
  '003858.jpg',
  '005759.jpg',
  '009289.jpg',
  '005054.jpg',
  '002369.jpg',
  '010084.jpg',
  '013901.jpg',
  '003021.jpg',
  '001132.jpg',
  '007564.jpg',
  '002933.jpg',
  '008581.jpg',
  '001317.jpg',
  '012951.jpg',
  '018593.jpg',
  '018776.jpg',
  '018252.jpg',
  '003912.jpg',
  '002154.jpg',
  '009973.jpg',
  '012778.jpg',
  '000321.jpg',
  '005756.jpg',
  '012568.jpg',
  '017031.jpg',
  '006984.jpg',
  '014503.jpg',
  '017255.jpg',
  '005328.jpg',
  '005046.jpg',
  '008511.jpg',
  '010379.jpg',
  '006129.jpg',
  '002647.jpg',
  '006895.jpg',
  '015820.jpg',
  '016456.jpg',
  '001167.jpg',
  '008841.jpg',
  '008223.jpg',
  '007774.jpg',
  '007219.jpg',
  '002129.jpg',
  '011112.jpg',
  '002294.jpg',
  '017845.jpg',
  '001886.jpg',
  '008144.jpg',
  '000233.