Download the COCO 2017 dataset from https://cocodataset.org/

Also, note that the non-LoRA finetuned versions of BLIP2 models are around 15GB each as in case of model fintuned using LoRA only the adapter weights are stored which is smaller in size when compared to the actual model weights. When we load the finetuned model, we first load the actual model (which is loaded from HF cache if already exists otherwise it'll download from the web and will cache it.)

**Set paths to dataset and saving temporary files.**

Set CUDA_VISIBLE_DEVICES env variable as well.

In [None]:
%env DATASET_PATH=/NLP - Project/COCO 2017 NLP/
%env SAVE_DIR=/nlp_project/
%env CUDA_VISIBLE_DEVICES=0

**Generate training and validation data**

In [2]:
!python gen_data_from_COCO.py

___________Working on train data_________________
[{'image_id': 558840, 'bbox': [199.84, 200.46, 77.71, 70.88], 'category_id': 58}, {'image_id': 200365, 'bbox': [234.22, 317.11, 149.39, 38.55], 'category_id': 58}, {'image_id': 200365, 'bbox': [239.48, 347.87, 160.0, 57.81], 'category_id': 58}, {'image_id': 200365, 'bbox': [296.65, 388.33, 1.03, 0.0], 'category_id': 58}, {'image_id': 200365, 'bbox': [251.87, 333.42, 125.94, 22.71], 'category_id': 58}, {'image_id': 495357, 'bbox': [337.02, 244.46, 66.47, 66.75], 'category_id': 18}, {'image_id': 116061, 'bbox': [213.81, 192.39, 53.94, 70.28], 'category_id': 18}, {'image_id': 16164, 'bbox': [324.66, 247.92, 250.87, 181.02], 'category_id': 18}, {'image_id': 205350, 'bbox': [260.18, 252.76, 67.91, 53.3], 'category_id': 18}, {'image_id': 74, 'bbox': [61.87, 276.25, 296.42, 103.18], 'category_id': 18}]
Time taken:  0.31240365902582806  minutes.
Written 860001 annotations to file
Done. Time taken:  0.4595642566680908  minutes.
Written 201358 an

**Finetune the classification model.**

Inputs: Image + object name (as text) | Output: Classification label

We'll be using the ViltForQuestionAnswering model.

In [3]:
!python classification_train.py

Available processors list: {64}
All required directories exist of COCO dataset.
Loading train data dictionary............
Len of dic: 171
Encoding data: 100%|█████████████████████████| 171/171 [00:01<00:00, 130.27it/s]
Saving data_encoded to disk.... at /scratch/efk7cz/nlp_project/data_generation/saved_as_pth/classification/classification_train_-1.pth
Done saving!
Some weights of ViltForQuestionAnswering were not initialized from the model checkpoint at dandelin/vilt-b32-mlm and are newly initialized: ['classifier.1.bias', 'classifier.0.weight', 'classifier.1.weight', 'classifier.0.bias', 'classifier.3.weight', 'classifier.3.bias']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.

Training for 2 epochs.............
Epoch 0: 100%|█████████████████████████| 6/6 [00:03<00:00,  1.51it/s, loss=5.56]
Epoch 0, Loss: 6.0210206508636475
Epoch 1: 100%|█████████████████████████| 6/6 [00:01<00:00,  3.69it/s, loss=4.61]
Epoch 1, Loss: 4.

**Load the saved classification model to run it on validation data and save the results to csv file.**

In [4]:
!python classification_val.py

Available processors list: {64}
All required directories exist of COCO dataset.
Loading dictionary............
Len of dic: 19
Dataset({
    features: ['row_id', 'image_id', 'image_path', 'positionName', 'categoryName'],
    num_rows: 19
})
Map: 100%|███████████████████████████████| 19/19 [00:01<00:00, 13.18 examples/s]
Processed dataset:
Dataset({
    features: ['image', 'question', 'answer'],
    num_rows: 19
})
Running Inference:: 100%|███████████████████████| 19/19 [00:01<00:00, 14.66it/s]
Validation Accuracy: 0.15789473684210525
---------Saved the results at classification_results.csv-----------
Total time taken: 0:00:09.051300


**Now moving onto Question-Answering model which generates text.**

Inputs: Image + Question (text) | Output: Generated Answer (text)

We'll be finetuning the BLIP2 model using LoRA.

The model itself take around 15GB on GPU, so make sure the GPU you are using has atleast 32GB of memory to run the training code.

In [5]:
!python answer_generation_train.py

Available processors list: {64}
All required directories exist of COCO dataset.
Loading train data dictionary............
Len of dic: 1186
Encoding image data: 100%|████████████████| 1186/1186 [00:01<00:00, 1175.21it/s]
Saving data_encoded to disk.... at /scratch/efk7cz/nlp_project/data_generation/saved_as_pth/answer_generation/answer_generation_train_-1.pth
Done saving!

 Loading the model..........
Downloading shards: 100%|███████████████████████| 8/8 [00:00<00:00, 3219.89it/s]
Loading checkpoint shards: 100%|██████████████████| 8/8 [00:03<00:00,  2.01it/s]
trainable params: 83,886,080 || all params: 3,828,566,016 || trainable%: 2.191057425924767

Training for 2 epochs.........
Epoch 0: 100%|██████████████████████| 38/38 [00:47<00:00,  1.25s/it, loss=0.764]
Epoch 0, Loss: 2.523900082236842
Epoch 1: 100%|███████████████████████| 38/38 [00:45<00:00,  1.19s/it, loss=0.58]
Epoch 1, Loss: 0.5673057154605263
Figure(1000x500)
Saved model to disk after final epoch !!

Inference check on samp

**Load the saved QA model to run it on validation data and save the results to csv file.**

In [6]:
!python answer_generation_val.py

Available processors list: {64}
All required directories exist of COCO dataset.
Loading val data dictionary............
Len of dic: 112
Encoding image data: 100%|██████████████████| 112/112 [00:00<00:00, 1012.96it/s]
Saving data_encoded to disk.... at /scratch/efk7cz/nlp_project/data_generation/saved_as_pth/answer_generation/answer_generation_val_-1.pth
Done saving!

 Loading the model..........
Downloading shards: 100%|███████████████████████| 8/8 [00:00<00:00, 1838.70it/s]
Loading checkpoint shards: 100%|██████████████████| 8/8 [00:06<00:00,  1.19it/s]
Running Inference:: 100%|█████████████████████| 112/112 [02:58<00:00,  1.59s/it]
---------Saved the results at answer_generation_results.csv-----------
Total time taken: 0:03:19.094545


**For evaluation on the generated results, please check the intrinsic_evaluation.ipynb and extrinsic_evaluation.ipynb notebooks.**

****Delete the SAVE_DIR and its contents****

In [7]:
!rm -rf $SAVE_DIR/*

Remove the initially set environment variables.

In [8]:
# Remove environment variables.
%env -u DATASET_PATH
%env -u SAVE_DIR

env: -u=DATASET_PATH
env: -u=SAVE_DIR
