<a href="https://colab.research.google.com/github/srishti-git1110/Lets-go-deep-with-PyTorch/blob/main/Dataset_and_DataLoader_blog.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [1]:
!pip install -q kaggle
!mkdir -p ~/.kaggle

!cp kaggle.json ~/.kaggle/
!ls ~/.kaggle
!chmod 600 /root/.kaggle/kaggle.json

kaggle.json


# Getting the dataset from kaggle.

Here's the link for you to check it out - https://www.kaggle.com/datasets/lefterislymp/neuralsntua-image-captioning

In [2]:
!kaggle datasets download -d lefterislymp/neuralsntua-image-captioning

Downloading neuralsntua-image-captioning.zip to /content
100% 4.07G/4.08G [01:06<00:00, 56.2MB/s]
100% 4.08G/4.08G [01:06<00:00, 66.0MB/s]


In [4]:
!unzip /content/neuralsntua-image-captioning.zip

[1;30;43mStreaming output truncated to the last 5000 lines.[0m
  inflating: flickr30k-images-ecemod/image_dir/_694194816.jpg  
  inflating: flickr30k-images-ecemod/image_dir/_69432510.jpg  
  inflating: flickr30k-images-ecemod/image_dir/_694331092.jpg  
  inflating: flickr30k-images-ecemod/image_dir/_69441488.jpg  
  inflating: flickr30k-images-ecemod/image_dir/_694421819.jpg  
  inflating: flickr30k-images-ecemod/image_dir/_694490980.jpg  
  inflating: flickr30k-images-ecemod/image_dir/_694543955.jpg  
  inflating: flickr30k-images-ecemod/image_dir/_694560000.jpg  
  inflating: flickr30k-images-ecemod/image_dir/_694573726.jpg  
  inflating: flickr30k-images-ecemod/image_dir/_694678960.jpg  
  inflating: flickr30k-images-ecemod/image_dir/_694698435.jpg  
  inflating: flickr30k-images-ecemod/image_dir/_69470505.jpg  
  inflating: flickr30k-images-ecemod/image_dir/_694750680.jpg  
  inflating: flickr30k-images-ecemod/image_dir/_694757895.jpg  
  inflating: flickr30k-images-ecemod/image

# Pytorch makes deep learning easier and highly accessible and so, we "depend" a lot on it. 
</br> So, some dependencies -

In [5]:
import torch.nn as nn
import torch
from torch.utils.data import DataLoader, Dataset
from torchvision import transforms

from PIL import Image
import os
import pandas as pd

**Custom** **Dataset** **class**

In [24]:
class KaggleImageCaptioningDataset(Dataset):
  def __init__(self, train_captions, root_dir, transform=None, bert_model='distilbert-base-uncased', max_len=512):
    self.df = pd.read_csv(train_captions, header=None, sep='|')
    self.root_dir = root_dir
    self.transform = transform
    self.tokenizer = AutoTokenizer.from_pretrained(bert_model)
    self.max_len = max_len

    self.images = self.df.iloc[:,0]
    self.captions = self.df.iloc[:,2]

  def __len__(self):
    return len(self.df)


  def __getitem__(self, idx):
    caption = self.captions[idx]
    image_id = self.images[idx]
    path_to_image = os.path.join(self.root_dir, image_id)
    image = Image.open(path_to_image).convert('RGB')
    
    if self.transform is not None:
      image = self.transform(image)

    tokenized_caption = self.tokenizer(caption, 
                                      padding='max_length',  # Pad to max_length
                                      truncation=True,  # Truncate to max_length
                                      max_length=self.max_len,  
                                      return_tensors='pt')['input_ids']
    
    return image, tokenized_caption

# Let's load the data with the mighty DataLoader

In [27]:
root_dir = '/content/flickr30k-images-ecemod/image_dir'
train_captions = '/content/train_captions.csv'
bert_model = 'distilbert-base-uncased'
transform = transforms.Compose([transforms.Resize(256),
                                transforms.CenterCrop(224),
                                transforms.PILToTensor()])
train_dataset = KaggleImageCaptioningDataset(train_captions=train_captions,
                                       root_dir=root_dir,
                                       transform=transform,
                                       bert_model=bert_model)
train_loader = DataLoader(train_dataset, 
                          batch_size=64, 
                          num_workers=2, 
                          shuffle=True)

**Hoping everything went right...**

In [30]:
for batch_num, (image, caption) in enumerate(train_loader):
  if batch_num > 3:
    break
  print(f'batch {batch_num} image {image} tokenized caption {caption}')

batch 0 image tensor([[[[152, 149, 151,  ..., 255, 255, 254],
          [154, 158, 161,  ..., 255, 255, 255],
          [155, 162, 164,  ..., 255, 255, 255],
          ...,
          [211, 211, 212,  ..., 186, 182, 188],
          [212, 213, 213,  ..., 192, 189, 193],
          [215, 216, 215,  ..., 199, 194, 198]],

         [[ 90,  89,  92,  ..., 255, 255, 255],
          [ 90,  94,  95,  ..., 255, 255, 255],
          [ 90,  96,  96,  ..., 255, 255, 255],
          ...,
          [146, 146, 146,  ..., 112, 109, 115],
          [146, 146, 146,  ..., 118, 116, 120],
          [151, 151, 153,  ..., 130, 125, 128]],

         [[ 51,  49,  55,  ..., 255, 255, 255],
          [ 49,  53,  56,  ..., 255, 255, 255],
          [ 47,  54,  54,  ..., 255, 255, 255],
          ...,
          [116, 115, 113,  ...,  69,  67,  74],
          [114, 114, 114,  ...,  76,  75,  78],
          [116, 116, 121,  ...,  91,  86,  88]]],


        [[[ 49,  39,  20,  ...,  34,  30,  28],
          [ 71,  40, 

**IT DID !!!**

Don't forget to experiment with a more complicated dataset.

# That's how the amazing 🤗 tokenizers work!

In [7]:
!pip install transformers
from transformers import AutoTokenizer

bert_model = 'distilbert-base-uncased'    # use any model of your choice
tokenizer = AutoTokenizer.from_pretrained(bert_model)
tokenizer('hi how are you')

{'input_ids': [101, 7632, 2129, 2024, 2017, 102], 'attention_mask': [1, 1, 1, 1, 1, 1]}

**Thanks for going through my notebook. I hope to see you in a new PyTorch blog of mine!** 👋