The New York Botanical Garden (NYBG) herbarium contains more than 7.8 million plant and fungal specimens. Herbaria are a massive repository of plant diversity data. These collections not only represent a vast amount of plant diversity, but since herbarium collections include specimens dating back hundreds of years, they provide snapshots of plant diversity through time. The integrity of the plant is maintained in herbaria as a pressed, dried specimen; a specimen collected nearly two hundred years ago by Darwin looks much the same as one collected a month ago by an NYBG botanist. All specimens not only maintain their morphological features but also include collection dates and locations, and the name of the person who collected the specimen. This information, multiplied by millions of plant collections, provides the framework for understanding plant diversity on a massive scale and learning how it has changed over time.

In [None]:
import numpy as np                                    # Array, Linear Algebra
from torch.utils.data.dataset import random_split     # spliting inTrain Val
import pandas as pd                                   # handling CSV
import os                                             # For File handling
import random                                         # Choosing from images dataset
import time                                           # timing Epochs  
from tqdm.notebook import tqdm                        # Testing
from os.path import join                              # File Handling
from torchvision import transforms                    # Data Aug
import torch                                          # Framework
from PIL import Image                                 # Loading Image
from torch.utils.data import Dataset, DataLoader      # Dataset
import torch.nn.functional as F                       # Function
import json                                           # Loading Metadat
from PIL import  ImageOps                             # Data Aug 
from PIL.Image import open as openIm                  # Image Handling
import matplotlib.pyplot  as plt                      # Ploting Image
import cv2

### Configuration

In [None]:
TRAIN       = "../input/herbarium-2020-fgvc7/nybg2020/train/"
TEST        = "../input/herbarium-2020-fgvc7/nybg2020/test/"
META        = "metadata.json"
BATCH_SIZE  = 7
NUM_WORKERS = 2
BATCH_EVAL  = 1
SHUFFLE     = True
EPOCHS      = 3
RESIZE      = (800, 600)
CLASSES     = 32094
LENGTH      = 2*CLASSES

## DATA INSIGTH

The dataset is in COCO Format.


COCO is a large image dataset designed for object detection, segmentation, person keypoints detection, stuff segmentation, and caption generation. This package provides Matlab, Python, and Lua APIs that assists in loading, parsing, and visualizing the annotations in COCO. Please visit http://cocodataset.org/ for more information on COCO, including for the data, paper, and tutorials. The exact format of the annotations is also described on the COCO website. The Matlab and Python APIs are complete, the Lua API provides only basic functionality.

**Here is how the json file Looks Like**

### TRAIN FILE

In [None]:
with open(join(TRAIN,META),"r", encoding = "ISO-8859-1") as file:
    metadata = json.load(file)
print("Metadata has {} sections. These section has all the Information regarding Images in dataset like class, id, size etc. ".format(len(list(metadata.keys()))))
print("Let us see al the sections in metadata:- ", [print(" - ",i) for i in list(metadata.keys())])

print("Number of Images in our Training set is:- ", len(metadata["images"]))
print("\n Let us see how every section of Dataset Looks like:-\n")
for i in list(metadata.keys()):
    print(" - sample and number of elements in {} :- ".format(i),len(list(metadata[i])))
    print("\t",list(metadata[i])[0], end = "\n\n")
    

### TEST FILE

In [None]:
with open(join(TEST,META),"r", encoding = "ISO-8859-1") as file:
    metadata_test = json.load(file)
print("Metadata has {} sections. These section has all the Information regarding Images in dataset like class, id, size etc. ".format(len(list(metadata_test.keys()))))
print("Let us see al the sections in metadata:- ", [print(" - ",i) for i in list(metadata_test.keys())])

print("Number of Images in our Training set is:- ", len(metadata_test["images"]))
print("\n Let us see how every section of Dataset Looks like:-\n")
for i in list(metadata_test.keys()):
    print(" - sample and number of elements in {} :- ".format(i),len(list(metadata_test[i])))
    print("\t",list(metadata_test[i])[0], end = "\n\n")

There are 1030747 Images in Train set.

There are 32094 Classes in The dataset.

Now let us see the Image Sample.

In [None]:
train_img = pd.DataFrame(metadata['images'])
train_ann = pd.DataFrame(metadata['annotations'])
train_df = pd.merge(train_ann, train_img, left_on='image_id', right_on='id', how='left').drop('image_id', axis=1).sort_values(by=['category_id'])
train_df.head()

In [None]:
im = Image.open("../input/herbarium-2020-fgvc7/nybg2020/train/images/156/72/354106.jpg")
print("Category Id is 15672 and Image Id is 354106 is shown below")
im

**The Distribution**

In [None]:
size_of_img = (28, 28)
fig=plt.figure(figsize=(72,72))
for i in range(60):
    ax=fig.add_subplot(12,12,i+1)
    img = cv2.imread(TRAIN + metadata["images"][i]["file_name"])
    img = cv2.resize(img,size_of_img)
    ax.imshow(img)
plt.show()

## DATALOADER

A lot of effort in solving any machine learning problem goes in to preparing the data. PyTorch provides many tools to make data loading easy and hopefully, to make your code more readable. In this tutorial, we will see how to load and preprocess/augment data from a non trivial dataset.


Dataset class
torch.utils.data.Dataset is an abstract class representing a dataset. Your custom dataset should inherit Dataset and override the following methods:

__len__ so that len(dataset) returns the size of the dataset.
__getitem__ to support the indexing such that dataset[i] can be used to get ith sample
Let’s create a dataset class for our face landmarks dataset. We will read the csv in __init__ but leave the reading of images to __getitem__. This is memory efficient because all the images are not stored in the memory at once but read as required.

In [None]:
def random_img(cat,df, tmp):
    
    cat_df = train_df[train_df["category_id"] == cat]
    try: 
        return TRAIN + cat_df.iloc[tmp, 3]
    except:
        TRAIN + cat_df.iloc[0, 3]

In [None]:
def fill(df, cat, sim):
    pair = [random_img(cat,df, 0)]
    ncat = cat if sim else random.randint(0,CLASSES)
    pair.append(random_img(ncat,df,1))
    return pair

In [None]:
class HerbariumDataset(Dataset): 

    def __init__(self, Folder,metadata, trans, length, phase):
        
        self.transform  =  trans
        self.length     =  length if length != None else len(metadata["images"])
        self.root       =  Folder
        self.metadata   =  metadata
        self.phase      =  phase
        if self.phase == "TRAIN":
            for i in self.metadata["categories"]:
                self.cat_id =  {i["family"]:i["id"]}
                self.id_cat =  {i["id"]:i["family"]}
            self.id_cat     =  {a["id"] : a["category_id"] for a in self.metadata["annotations"]}
            self.classes    =  list(self.cat_id.keys())
        self.paths      =  {im["id"] : join(self.root , im["file_name"]) for im in self.metadata["images"]}
        
    
    def _rand_another(self, idx):
        pool = np.arange(self.length)
        return np.random.choice(pool)
                                
    def image(self, x):
        
        try:
            im  = openIm(x[0])
            im1 = openIm(x[1])
            
            return im, im1
        
        except:
            print("The Image id {} not found ".format(x))
            return None , None
    
                     

    def __len__(self):
        
        return self.length
        
        

    def __getitem__(self, idx):
        idx = idx.tolist() if torch.is_tensor(idx) else idx
        if self.phase == "TRAIN":
            while True:
                sim = idx % 2
                cat = idx // 2
                pair = fill(train_df, cat, sim)
                im, im2 = self.image(pair)
                if im is None or im2 is None:
                    print("\nfrom ", idx,end =  "")
                    idx = self._rand_another(idx)
                    print(" Moving to Image id :-",idx)
                    continue
                if self.transform:
                    im  = self.transform(im)
                    im1 = self.transform(im2)
                    return im, im1, sim
                return im, im1, sim

        elif self.phase == "TEST":
            im, im2 = self.image(pair)
            if self.transform:
                im = self.transform(im)
                im2 = self.transform(im2)

            return im, im2  
        else:
            im   = Image.open(TRAIN + train_df[train_df["category_id"]  == idx].iloc[0,3])
            im   = trans(im)
            return im
            

## DATA AUGMENTATION

***Data augmentation*** is the technique of increasing the size of data used for training a model. For reliable predictions, the deep learning models often require a lot of training data, which is not always available. Therefore, the existing data is augmented in order to make a better generalized model.

Although data augmentation can be applied in various domains, it's commonly used in computer vision. Some of the most common data augmentation techniques used for images are:

1) Position augmentation
* Scaling
* Cropping
* Flipping
* Padding
* Rotation
* Translation

2) Affine transformation
* Color augmentation
* Brightness
* Contrast
* Saturation
* Hue

In [None]:
class invert:
    def invert(self, img):
        r"""Invert the input PIL Image.
        Args:
            img (PIL Image): Image to be inverted.
        Returns:
            PIL Image: Inverted image.
        """

        return ImageOps.invert(img)
    def __call__(self, img):
            
        return self.invert(img)

    def __repr__(self):
        
        return self.__class__.__name__ + '()'
trans = transforms.Compose(
    [invert(),
     transforms.CenterCrop(400),
     transforms.ColorJitter(brightness=0.3, contrast=0.3, saturation=0.1, hue=0.1),
     transforms.RandomHorizontalFlip(p=0.50),
     transforms.RandomRotation(25), 
     transforms.Resize(RESIZE, interpolation=2),
     transforms.ToTensor(),
     #transforms.Normalize([123.675, 116.28, 103.53],[58.395, 57.12, 57.375]),
    ])

In [None]:
def torch_loader( dataset, batch_size, num_workers, shuffle):
    return DataLoader(dataset = dataset, batch_size = batch_size, num_workers = num_workers, shuffle = shuffle)

train_dataset = HerbariumDataset(TRAIN, metadata, trans, LENGTH, "TRAIN")
test_dataset  = HerbariumDataset(TEST, metadata_test, trans, None, "TEST")
train_loader  = torch_loader( train_dataset, BATCH_SIZE, NUM_WORKERS, SHUFFLE)
test_loader   = torch_loader( test_dataset, 3*(BATCH_SIZE), 2, False)

**This is How our Image is Going to look before feeding it to model**

In [None]:
im = train_dataset.__getitem__(127)[0]
trans = transforms.ToPILImage()
im = trans(im)
print(np.array(im).shape)
im

## MODEL

> In case of **standard classification**, the input image is fed into a series of layers, and finally at the output we generate a probability distribution over all the classes (typically using a Softmax). For example, if we are trying to classify an image as cat or dog or horse or elephant, then for every input image, we generate 4 probabilities, indicating the probability of the image belonging to each of the 4 classes. Two important points must be noticed here. First, during the training process, we require a large number of images for each of the class (cats, dogs, horses and elephants). Second, if the network is trained only on the above 4 classes of images, then we cannot expect to test it on any other class, example “zebra”. If we want our model to classify the images of zebra as well, then we need to first get a lot of zebra images and then we must re-train the model again. There are applications wherein we neither have enough data for each class and the total number classes is huge as well as dynamically changing. Thus, the cost of data collection and periodical re-training is too high.
On the other hand, in a **one shot classification**, we require only **one training example** for each class. Yes you got that right, just one. Hence the name One Shot.

### Few shot learning
One of the main requisites of highly accurate deep learning models is large amount of data. The set of hyperparameters a Deep Model need to be tuned are very large, and the amount of data needed to get the right set of value for these hyperparameters is also large.

But what if we need an automated system, which can successfully classify images to various classes given the data for each image class is quite less.

Few shot learning is such a problem. We can Few shot learning as a problem to classify data into K classes where each class has only few examples. The paper written by Gregory et. al, suggest ideas for building a Neural Network Architecture to solve this problem.

![](https://camo.githubusercontent.com/1d29ae8092dea858f45e4519b7454782df3c9328/68747470733a2f2f656e637279707465642d74626e302e677374617469632e636f6d2f696d616765733f713d74626e3a414e643947635154684d757375386232754b386b4777724673672d63755a58614e38576337486b666779694d2d3859416643664e5f3275694a51)

The above image has been chosen from the Coursera course on Deep Learning by DeepLearning.ai
Machine learning has been successfully used to achieve state-ofthe-art performance in a variety of applications such as web search, spam detection, caption generation, and speech and image recognition. However, these algorithms often break down when forced to make predictions about data for which little supervised information is available. We desire to generalize to these unfamiliar categories without necessitating extensive retraining which may be either expensive or impossible due to limited data or in an online prediction setting, such as web retrieval.

One particularly interesting task is classification under the restriction that we may only observe a single example of each possible class before making a prediction about a test instance. This is called one-shot learning and it is the primary focus of our model presented in this work

In [None]:
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
device

In [None]:
import torch.nn as nn
from torch.nn import functional as F
import torch.optim as optim
import torchvision 
model = torchvision.models.resnet50(pretrained=True).to(device)
    
for param in model.parameters():
    param.requires_grad = True   
    
model.fc = nn.Sequential(*list(model.fc.children())[:-1])

In [None]:
class ContrastiveLoss(nn.Module):

    def __init__(self, margin=2.0):
        super(ContrastiveLoss, self).__init__()
        self.margin = margin

    def forward(self, output1, output2, label):
        euclidean_distance = F.pairwise_distance(output1, output2, keepdim = True)
        loss_contrastive = torch.mean((1-label) * torch.pow(euclidean_distance, 2) +
                                      (label) * torch.pow(torch.clamp(self.margin - euclidean_distance, min=0.0), 2))


        return loss_contrastive

In [None]:
def train_model(model, num_epochs=1):
    
    criterion  = ContrastiveLoss()
    optimizer  = optim.Adam(model.parameters(), lr=0.01)
    for epoch in range(num_epochs):
        phase = "train"
        print('Epoch {}/{}'.format(epoch+1, num_epochs), end = "\n"+"_" * 75 + "\n")
        model.train()

        running_loss     = 0.0
        starttime = time.time()
        for i, (inputs1, inputs2, labels) in enumerate(train_loader):
            try:
                inputs1 = inputs1.to(device)
                inputs2 = inputs2.to(device)            
                labels = labels.to(device)
                outputs1 = model(inputs1)
                outputs2 = model(inputs2)
                loss = criterion(outputs1, outputs2, labels)
                optimizer.zero_grad()
                loss.backward()
                optimizer.step()


                running_loss += loss.item() 
                if i % 100  == 0 and i != 0:
                    print("running_loss:- {:.4f} ".format(running_loss))
            except:
                print("ERROR IN BATCH", i)
                continue

                
        epoch_loss = running_loss / len(train_loader)
        print('{} epoch_loss: {:.4f}, time:-{:.4f}  '.format(phase,epoch_loss,starttime - time.time()))
        
        if epoch != 0:  # Due to Memory Constrain
            torch.save(model, "/kaggle/working/epoch1_{}.pth".format(epoch+1))
            
        
    return model

In [None]:
model_trained = train_model(model, num_epochs=EPOCHS)

In [None]:
torch.cuda.empty_cache()