# ROAM Challenge 2: LATAM Out-of-Distribution Few-shot Challenge

## Unzip the Data:

Unzip the supportAugust5.tar file to access the support samples for the first two classes.

Unzip the PartialRelease.tar file to get the initial 5000 images.


## Inspect the Data:

Look at the support samples and understand the characteristics of the two vehicle classes.

Analyze the distribution of the 5000 images in the initial release. Note any class imbalances that may need to be addressed.


## Prepare the Data for Training:

Set up a data loader that can efficiently load the images and their corresponding labels.

Decide on the image size and preprocessing steps (resizing, normalization, data augmentation, etc.) that you'll apply to the data.

In [1]:
from IPython.display import IFrame
src="https://sebastianraschka.com/faq/docs/few-shot.html#:~:text=In%20regular%20supervised%20learning%2C%20we,task%20consists%20of%20different%20classes"
width=920
height=1080
IFrame(src, width, height)

reference: "https://encord.com/blog/few-shot-learning-in-computer-vision/"

There are various subsets of the ImageNet dataset used in various context. One of the most highly used subset of ImageNet is the "ImageNet Large Scale Visual Recognition Challenge (ILSVRC) 2012–2017 image classification and localization dataset". This is also referred to in the research literature as ImageNet-1K or ILSVRC2017, reflecting the original ILSVRC challenge that involved 1,000 classes. ImageNet-1K contains 1,281,167 training images, 50,000 validation images and 100,000 test images.[15] The full original dataset is referred to as ImageNet-21K. ImageNet-21k contains 14,197,122 images divided into 21,841 classes. Some papers round this up and name it ImageNet-22k.

In [2]:
import sys
import os
import pathlib 
import pandas as pd

In [3]:
data_path = "/home/shravan/documents/deeplearning/github/ComputerVision-Research/finetuning/roam2/data"

In [4]:
data_path1 = pathlib.Path(data_path)

In [5]:
os.listdir(data_path1)

['.ipynb_checkpoints',
 'support_samples',
 'solutionAugust5.csv',
 'support.csv',
 'evaluation_data',
 'supportAugust5.tar',
 'PartialRelease.tar']

In [6]:
images_list = (os.listdir(f"{data_path}/evaluation_data"))

In [7]:
os.listdir(f"{data_path}/support_samples")

['1a3aa65100a643bf90d237b9be588b76.jpg',
 '508435b119d04b18b4ca4b68131e375b.jpg',
 '4c643a9bf86845dcba3a3058d31e89ed.jpg',
 '160519cec29849c780ac2ba958ab52ca.jpg',
 '95aa8f477e5b44379486a7b5dd33b118.jpg',
 'cdbc673d9eab4bb1a7edc5cd804588d3.jpg',
 '05b75a8e417e4ee79dcdd4f04660c53c.jpg',
 'ccaf12198649464da6c48caf3b0763b2.jpg',
 '6c303e6c344d4007ba3aceddf5987a1f.jpg',
 'e33b3abe21af47b5af93b92d41e383a9.jpg',
 '8e7a5214307244218ae7a87d675d6358.jpg',
 '2c2c94a8ce304405a15df01c853f854b.jpg']

In [8]:
len(images_list)

5000

In [9]:
df_support = pd.read_csv(data_path1/"support.csv")

In [10]:
df_support

Unnamed: 0,id,label
0,cdbc673d9eab4bb1a7edc5cd804588d3,1
1,160519cec29849c780ac2ba958ab52ca,1
2,05b75a8e417e4ee79dcdd4f04660c53c,1
3,508435b119d04b18b4ca4b68131e375b,1
4,e33b3abe21af47b5af93b92d41e383a9,1
5,8e7a5214307244218ae7a87d675d6358,1
6,95aa8f477e5b44379486a7b5dd33b118,2
7,6c303e6c344d4007ba3aceddf5987a1f,2
8,2c2c94a8ce304405a15df01c853f854b,2
9,ccaf12198649464da6c48caf3b0763b2,2


In [11]:
df_support.groupby('label').count()

Unnamed: 0_level_0,id
label,Unnamed: 1_level_1
1,6
2,6


In [12]:
df_solutionAugust5 = pd.read_csv(data_path1/"solutionAugust5.csv")

In [13]:
df_solutionAugust5

Unnamed: 0,id,split,label
0,afc50dc671ea44fb8375b560c8019b43,public,0
1,621af6f5776541c78bf344b177bdb7ad,public,0
2,1287bddbad1c47e79965dfb5458b8098,public,0
3,a735c1ba09cb47f8be1f21cbdb95c84e,public,0
4,b6142dd35c4e4a888b9fb835c38cb6e2,public,0
...,...,...,...
4995,01458799bc6e4005bc6988abfb710317,public,0
4996,063ac9587b0f46bcae1dd0e5599beebc,public,0
4997,97c7c29232fd496a8e5dcf2469c6e864,public,0
4998,3b542a9f862b4e32b572fd467b7d6cfb,public,0


In [14]:
df_solutionAugust5.groupby('label').count()

Unnamed: 0_level_0,id,split
label,Unnamed: 1_level_1,Unnamed: 2_level_1
0,4922,4922
1,43,43
2,35,35


In [15]:
df_ideal_solution = df_solutionAugust5.copy()

In [19]:
df_ideal_solution[['id', 'label']].to_csv('submission.csv', index=False)

In [34]:
df_solutionAugust5['id'].sort_values()

3656    0025727670db447db2165a0344fbd0bd
3001    00298b05d29744b79b04e1c82a28fde0
740     002b378a1c3b4e6588665f8cec07ff76
4023    0031a76a62e8455b921d4a466197ebf8
3695    00419bdf989d4c03b6a7ebee5ebab1dc
                      ...               
2277    ff9666bf561f4d4c9b4b88efb042d240
1141    ffa1c96086614f38b9f78f9979942d22
3299    ffa61769c041402aa17c841b021e3e5c
2186    ffa6a48a520f458d9df707be69154993
1784    ffdc9f9e9d4a443e99988891e72fe890
Name: id, Length: 5000, dtype: object

In [35]:
sorted(images_list)

['0025727670db447db2165a0344fbd0bd.jpg',
 '00298b05d29744b79b04e1c82a28fde0.jpg',
 '002b378a1c3b4e6588665f8cec07ff76.jpg',
 '0031a76a62e8455b921d4a466197ebf8.jpg',
 '00419bdf989d4c03b6a7ebee5ebab1dc.jpg',
 '004bae447d3b41e0acf6da71b472cf81.jpg',
 '00577300968d42fcb0e037ea12ce6b76.jpg',
 '00709bc2d0ee469e91299c542e662ca1.jpg',
 '008bf903d52745faa0fb04a13fed8ff9.jpg',
 '00a12f62b6904966b3444f7006bde980.jpg',
 '00bfdca15f0d4f69b0c9ac96970e9b83.jpg',
 '00c48ab4577d4f188393530b90d7562d.jpg',
 '00d11f4622c0438db46d98754e830c03.jpg',
 '00d9c473f7ff41c7aca626361b279c55.jpg',
 '00e1a490f3dc4bfca7abd5d18f17cde0.jpg',
 '00e436d15e714879ab13d15666f36427.jpg',
 '00f5f3c51ce1464494c9ed22bde72649.jpg',
 '0102913a6ffc4225940399ff839d930b.jpg',
 '01072e7feb554c1494cb961f4e926a17.jpg',
 '0107c12d02254114aff81cf806599864.jpg',
 '01128a3573f845519bad14e875f67750.jpg',
 '012819d1a7c242238ef85edd588c9c8f.jpg',
 '01458799bc6e4005bc6988abfb710317.jpg',
 '0164d9bf96c4423384fcb57d8a10a341.jpg',
 '0174c79907444a