Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Traning with custom dataset #45

Closed
gushengzhao1996 opened this issue Jun 2, 2022 · 19 comments
Closed

Traning with custom dataset #45

gushengzhao1996 opened this issue Jun 2, 2022 · 19 comments
Labels
question Further information is requested

Comments

@gushengzhao1996
Copy link

gushengzhao1996 commented Jun 2, 2022

Hi, thanks for your code, it helps me a lot.
But it also got some problems for a newbie like me.
Although I make the code run successfully now, I also make a lot of compromises to some errors.
I combined the code from classical_training.ipynb and my_first_few_shot_classifier.ipynb.

I post all my code step by step and point out the problems I met.
I am running Windows10.
The environment is created by Anaconda.
Cuda10.2, Cudnn 7.0, PyTorch 1.10.1

At last, great thanks for your code again.
Let's discuss this together.

@gushengzhao1996 gushengzhao1996 added the question Further information is requested label Jun 2, 2022
@gushengzhao1996
Copy link
Author

gushengzhao1996 commented Jun 2, 2022

Step 1 Import lib and load datasets

I use the method from PyTorch's official transfer learning tutorial.

import torch
from torch import nn, optim
from torch.utils.data import DataLoader
from torchvision import datasets, models, transforms
from torchvision.models import resnet18
from tqdm import tqdm

from easyfsl.samplers import TaskSampler
from easyfsl.utils import plot_images, sliding_average

from pathlib import Path
import random
from statistics import mean
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import time
import os
import copy

random_seed = 0
np.random.seed(random_seed)
torch.manual_seed(random_seed)
random.seed(random_seed)
torch.backends.cudnn.deterministic = True
torch.backends.cudnn.benchmark = False

transform = transforms.Compose([
             transforms.Resize((224, 224)),
             transforms.ToTensor(),
             transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
            ])

data_dir = './competition data/data/'

train_set = datasets.ImageFolder(os.path.join(data_dir, 'train'), transform = transform)
val_set = datasets.ImageFolder(os.path.join(data_dir, 'val'), transform = transform)

train = open("./competition data/train_set.csv","r")
df_train = pd.read_csv(train)
records_train = df_train.to_records(index=False)
list_train = list(records_train)

val = open("./competition data/val_set.csv","r")
df_val = pd.read_csv(val)
records_val = df_val.to_records(index=False)
list_val = list(records_val)

class_names = train_set.classes

@gushengzhao1996
Copy link
Author

gushengzhao1996 commented Jun 2, 2022

Step 2 data loader

batch_size = 64
n_workers = 0

device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")

train_loader = DataLoader(
    train_set,
    batch_size=batch_size,
    num_workers=n_workers,
    pin_memory=True,
    shuffle=True,
)

n_way = 5
n_shot = 1
n_query = 1
n_validation_tasks = 500

val_set.get_labels = lambda: [
    instance[1] for instance in list_val
]

val_sampler = TaskSampler(
    val_set, n_way=n_way, n_shot=n_shot, n_query=n_query, n_tasks=n_validation_tasks
)

val_loader = DataLoader(
    val_set,
    batch_sampler=val_sampler,
    num_workers=n_workers,
    pin_memory=True,
    collate_fn = val_sampler.episodic_collate_fn,
)

In step 2, I got 2 problems.
First is the num_workers. It affects some parts like for data in val_loader: or enumerate(val_loader).
The error message like below. The solution I used is simple, just make num_workers = 0.
But I don't think it's the best solution.
BTW, this problem also occurs when I was trying classical_training.ipynb.
It may be caused by the enviroment.

---------------------------------------------------------------------------
PicklingError                             Traceback (most recent call last)
~\AppData\Local\Temp\ipykernel_27300\744700885.py in <module>
----> 1 for data in val_loader:
      2     print(data)

~\Anaconda3\envs\pytorch37\lib\site-packages\torch\utils\data\dataloader.py in __iter__(self)
    357             return self._iterator
    358         else:
--> 359             return self._get_iterator()
    360 
    361     @property

~\Anaconda3\envs\pytorch37\lib\site-packages\torch\utils\data\dataloader.py in _get_iterator(self)
    303         else:
    304             self.check_worker_number_rationality()
--> 305             return _MultiProcessingDataLoaderIter(self)
    306 
    307     @property

~\Anaconda3\envs\pytorch37\lib\site-packages\torch\utils\data\dataloader.py in __init__(self, loader)
    916             #     before it starts, and __del__ tries to join but will get:
    917             #     AssertionError: can only join a started process.
--> 918             w.start()
    919             self._index_queues.append(index_queue)
    920             self._workers.append(w)

~\Anaconda3\envs\pytorch37\lib\multiprocessing\process.py in start(self)
    110                'daemonic processes are not allowed to have children'
    111         _cleanup()
--> 112         self._popen = self._Popen(self)
    113         self._sentinel = self._popen.sentinel
    114         # Avoid a refcycle if the target function holds an indirect

~\Anaconda3\envs\pytorch37\lib\multiprocessing\context.py in _Popen(process_obj)
    221     @staticmethod
    222     def _Popen(process_obj):
--> 223         return _default_context.get_context().Process._Popen(process_obj)
    224 
    225 class DefaultContext(BaseContext):

~\Anaconda3\envs\pytorch37\lib\multiprocessing\context.py in _Popen(process_obj)
    320         def _Popen(process_obj):
    321             from .popen_spawn_win32 import Popen
--> 322             return Popen(process_obj)
    323 
    324     class SpawnContext(BaseContext):

~\Anaconda3\envs\pytorch37\lib\multiprocessing\popen_spawn_win32.py in __init__(self, process_obj)
     87             try:
     88                 reduction.dump(prep_data, to_child)
---> 89                 reduction.dump(process_obj, to_child)
     90             finally:
     91                 set_spawning_popen(None)

~\Anaconda3\envs\pytorch37\lib\multiprocessing\reduction.py in dump(obj, file, protocol)
     58 def dump(obj, file, protocol=None):
     59     '''Replacement for pickle.dump() using ForkingPickler.'''
---> 60     ForkingPickler(file, protocol).dump(obj)
     61 
     62 #

PicklingError: Can't pickle <function <lambda> at 0x000001B06041C1F8>: attribute lookup <lambda> on __main__ failed

Second is the value setting of n_way, n_shot and n_query.
I set them to [5, 1, 1] as shown above. However, if set n_shot or n_query bigger even like [5, 2, 1].
There is an error that occurred in the training process, the error message shows below.
I think it's because of my datasets. It has 219 classes and each class only has 10 images.

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
~\AppData\Local\Temp\ipykernel_27300\2390850040.py in <module>
     14         model.eval()
     15         validation_accuracy = evaluate(
---> 16             few_shot_classifier, val_loader, device = device, tqdm_prefix="Validation"
     17         )
     18         model.train()

~\AppData\Local\Temp\ipykernel_27300\3882128419.py in evaluate(model, data_loader, device, use_tqdm, tqdm_prefix)
     57         ) as tqdm_eval:
     58             #print(tqdm_eval)
---> 59             for _, (support_images, support_labels, query_images, query_labels, _,) in tqdm_eval:
     60                 correct, total = evaluate_on_one_task(
     61                     model,

~\Anaconda3\envs\pytorch37\lib\site-packages\tqdm\std.py in __iter__(self)
   1193 
   1194         try:
-> 1195             for obj in iterable:
   1196                 yield obj
   1197                 # Update and possibly print the progressbar.

~\Anaconda3\envs\pytorch37\lib\site-packages\torch\utils\data\dataloader.py in __next__(self)
    519             if self._sampler_iter is None:
    520                 self._reset()
--> 521             data = self._next_data()
    522             self._num_yielded += 1
    523             if self._dataset_kind == _DatasetKind.Iterable and \

~\Anaconda3\envs\pytorch37\lib\site-packages\torch\utils\data\dataloader.py in _next_data(self)
    558 
    559     def _next_data(self):
--> 560         index = self._next_index()  # may raise StopIteration
    561         data = self._dataset_fetcher.fetch(index)  # may raise StopIteration
    562         if self._pin_memory:

~\Anaconda3\envs\pytorch37\lib\site-packages\torch\utils\data\dataloader.py in _next_index(self)
    510 
    511     def _next_index(self):
--> 512         return next(self._sampler_iter)  # may raise StopIteration
    513 
    514     def _next_data(self):

~\Anaconda3\envs\pytorch37\lib\site-packages\easyfsl\samplers\task_sampler.py in __iter__(self)
     59                     )
     60                     # pylint: enable=not-callable
---> 61                     for label in random.sample(self.items_per_label.keys(), self.n_way)
     62                 ]
     63             )

~\Anaconda3\envs\pytorch37\lib\site-packages\easyfsl\samplers\task_sampler.py in <listcomp>(.0)
     59                     )
     60                     # pylint: enable=not-callable
---> 61                     for label in random.sample(self.items_per_label.keys(), self.n_way)
     62                 ]
     63             )

~\Anaconda3\envs\pytorch37\lib\random.py in sample(self, population, k)
    319         n = len(population)
    320         if not 0 <= k <= n:
--> 321             raise ValueError("Sample larger than population or is negative")
    322         result = [None] * k
    323         setsize = 21        # size of a small set minus size of an empty list

ValueError: Sample larger than population or is negative

@gushengzhao1996
Copy link
Author

Step3 load pre-trained model

This step is more like the part frommy_first_few_shot_classifier.ipynb.

I trained the model myself using Resnet18

from easyfsl.methods import PrototypicalNetworks

model = torch.load('./pretrained_model1.pth', map_location = device)
model.eval()

num_ftrs = model.fc.in_features
model.fc = nn.Linear(num_ftrs, len(class_names))
#print(conv_net)

few_shot_classifier = PrototypicalNetworks(model).to(device)

@gushengzhao1996
Copy link
Author

gushengzhao1996 commented Jun 2, 2022

Step 4 Training set

This part comes from classical_training.ipynb.
I choose this because it has the validation part, and the other notebook doesn't.

from torch.optim import SGD, Optimizer, Adam
from torch.optim.lr_scheduler import MultiStepLR, ReduceLROnPlateau
from torch.utils.tensorboard import SummaryWriter


LOSS_FUNCTION = nn.CrossEntropyLoss()

n_epochs = 200
scheduler_milestones = [50, 100, 150, 190]
scheduler_gamma = 0.1
learning_rate = 1e-01
tb_logs_dir = Path(".")

train_optimizer = SGD(
    model.parameters(), lr=learning_rate, momentum=0.9, weight_decay=5e-4
)
train_scheduler = MultiStepLR(
    train_optimizer,
    milestones=scheduler_milestones,
    gamma=scheduler_gamma,
)

tb_writer = SummaryWriter(log_dir=str(tb_logs_dir))

def training_epoch(model_: nn.Module, data_loader: DataLoader, optimizer: Optimizer):
    all_loss = []
    model_.train()
    with tqdm(data_loader, total=len(data_loader), desc="Training") as tqdm_train:
        for images, labels in tqdm_train:
            optimizer.zero_grad()

            loss = LOSS_FUNCTION(model_(images.to(device)), labels.to(device))
            loss.backward()
            optimizer.step()

            all_loss.append(loss.item())

            tqdm_train.set_postfix(loss=mean(all_loss))

    return mean(all_loss)

@gushengzhao1996
Copy link
Author

Step 5 Start training

This part also comes from classical_training.ipynb.
I modified the line of validation part.
Because I use my pre-trained model, it doesn't have .set_use_fc().
I use the concept in transfer learning, call model.eval() first so it won't change weights with inputs.
Then call evaluate() and model.train().

best_state = model.state_dict()
best_model_wts = copy.deepcopy(model.state_dict())
best_validation_accuracy = 0.0
validation_frequency = 4

for epoch in range(n_epochs):
    print(f"Epoch {epoch}")
    average_loss = training_epoch(model, train_loader, train_optimizer)

    if epoch % validation_frequency == validation_frequency - 1:

        # We use this very convenient method from EasyFSL's ResNet to specify
        # that the model shouldn't use its last fully connected layer during validation.
        model.eval()
        validation_accuracy = evaluate(
            few_shot_classifier, val_loader, device = device, tqdm_prefix="Validation"
        )
        model.train()

        if validation_accuracy > best_validation_accuracy:
            best_validation_accuracy = validation_accuracy
            best_state = model.state_dict()
            best_model_wts = copy.deepcopy(model.state_dict())
            print("Ding ding ding! We found a new best model!")

        tb_writer.add_scalar("Val/acc", validation_accuracy, epoch)

    tb_writer.add_scalar("Train/loss", average_loss, epoch)

    # Warn the scheduler that we did an epoch
    # so it knows when to decrease the learning rate
    train_scheduler.step()

@ebennequin
Copy link
Collaborator

Hi @gushengzhao1996, thank you for your kind words and the detailed description of your problem. This is very helpful!

PicklingError: Can't pickle <function at 0x000001B06041C1F8>: attribute lookup on main failed

I googled this, and it seems that PyTorch's multiprocessing (used when num_workers > 0) has some problems with Windows. I found this tutorial that might help you to make it work.

Second is the value setting of n_way, n_shot and n_query.
I set them to [5, 1, 1] as shown above. However, if set n_shot or n_query bigger even like [5, 2, 1].
There is an error that occurred in the training process, the error message shows below.
I think it's because of my datasets. It has 219 classes and each class only has 10 images.

This error occurs when a class's population is strictly smaller than n_shot + n_query. I believe the problem is that you have at least one class with only two instances.
The ImageFolder dataset that you're using crawls all subfolders of your root folder. Can you check the number of files in each subfolder?

@gushengzhao1996
Copy link
Author

@ebennequin Thanks for your reply.

First problem.
I found someone said add if __name__ == '__main__': before your main code would work, like step 2 in the tutorial. But this didn't work.
For the step 1 in the tutorial, I not sure how to do that because it's a very basic function like for... in ... loop.
It doesn't affect the training process to much, so I will just leave it there.

Second problem.
Yes, like that's it. I split the dataset myself and set the validation with 0.2, each class in validation has exact 2 images.

@gushengzhao1996
Copy link
Author

another question

Sorry @ebennequin , I have another question about prediction.
I checked #17 and my_first_few_shot_classifier.ipynb, so I have to set the support images and their labels.

  1. Support set for prediction.
    As I said, my dataset has a total of 219 classes.
    What should I do for the support set? Create it from train and validation data?
    Use TaskSampler() and those parameter to control?

2.How to process the unknown data
I checked the other ones' notebooks in #17.

model.eval()
example_scores = model(
    example_support_images.to(device), #.cuda(),
    example_support_labels.to(device),#.cuda(),
    example_query_images.to(device), #.cuda(),
).detach()

_, example_predicted_labels = torch.max(example_scores.data, 1)

print("Ground Truth / Predicted")
for i in range(len(example_query_labels)):
    print(
        f"{valid_set._image_labels[example_class_ids[example_query_labels[i]]]} / {valid_set._image_labels[example_class_ids[example_predicted_labels[i]]]}"
    )

I think this part from my_first_few_shot_classifier.ipynb is helpful to me.
Because the dataset(no label) I will use to predict is really big.
But I am not sure how to process the dataset to be like example_query_images.
Use ImageFolder and DataLoader like what I do to the training set above?

@ebennequin
Copy link
Collaborator

Second problem. Yes, like that's it. I split the dataset myself and set the validation with 0.2, each class in validation has exact 2 images.

It's common practice in Few-Shot Learning to split the classes (and not the examples inside the classes) between train and val. This way, you're actually validating the ability of your model to solve few-shot tasks on novel classes. In your case, you could save ~40 classes for validation.

As for your additional questions, I have to say I am unsure what problem you're trying to solve or what experiments you're trying to run. Could you give me some context?

@gushengzhao1996
Copy link
Author

gushengzhao1996 commented Jun 4, 2022

Sorry @ebennequin , fsl is completely new to me, so some questions may be a basic concept.

Basically, what I want to do now, is to predict a bunch of images' classes (novel data without labels).
The class is the same as the training set which is a total of 219 classes.

The First is the support set.
Should I create a support set from training data when I am performing prediction?

The second is how to load the novel data for prediction in batch.
As I said, the novel data is huge, so I think it's not a good idea to predict them single by single.
But I am unsure how to load the novel data.
And I googled this.
Is this the way to load data?

data_transforms = {
    'predict': transforms.Compose([
        transforms.RandomResizedCrop(224),
        transforms.RandomHorizontalFlip(),
        transforms.ToTensor(),
        transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
    ])
    }

dataset = {'predict' : datasets.ImageFolder("./data", data_transforms['predict'])}
dataloader = {'predict': torch.utils.data.DataLoader(dataset['predict'], batch_size = 1, shuffle=False, num_workers=4)}

outputs = list()
since = time.time()
for inputs, labels in dataloader['predict']:
    inputs = inputs.to(device)
    output = model(inputs)
    output = output.to(device)
    index = output.data.numpy().argmax()
    print index

@ebennequin
Copy link
Collaborator

In FSL we usually train the model on a large base dataset, and then apply it on novel classes for which we only have a few examples per class. As I understand it, in your case the 219 classes are the novel classes. With only 10 examples per class, you most likely won't get very good results by training your network on them. Instead, you could use your 219*10 examples (or a subset of them) as your support set. Instead of training your own model, you could simply use on-the-shelf models, e.g. from this great library.

As for loading, as I explained in #17, the only requirement of EasyFSL is that the query images are fed in a tensor of shape (n_images, 3, width, height). It seems that the code you provided will work, although

  • random preprocessing such as RandomResizedCrop or RandomHorizontalFlip are usually used for trivial data augmentation during training, but it is odd to use them for inference. You might want to prefer a simple Resize
  • you probably don't need the dictionary structure with the "predict" key ;
  • you may want to increase your batch size when defining the DataLoader.

@gushengzhao1996
Copy link
Author

Thanks!

@shraddha291996
Copy link

shraddha291996 commented Dec 7, 2022

Hi @gushengzhao1996, thank you for your kind words and the detailed description of your problem. This is very helpful!

PicklingError: Can't pickle <function at 0x000001B06041C1F8>: attribute lookup on main failed

I googled this, and it seems that PyTorch's multiprocessing (used when num_workers > 0) has some problems with Windows. I found this tutorial that might help you to make it work.

Second is the value setting of n_way, n_shot and n_query.
I set them to [5, 1, 1] as shown above. However, if set n_shot or n_query bigger even like [5, 2, 1].
There is an error that occurred in the training process, the error message shows below.
I think it's because of my datasets. It has 219 classes and each class only has 10 images.

This error occurs when a class's population is strictly smaller than n_shot + n_query. I believe the problem is that you have at least one class with only two instances. The ImageFolder dataset that you're using crawls all subfolders of your root folder. Can you check the number of files in each subfolder?

Hi, thank you so much for your amazing work , I am facing the same issue in my training phase I have 4 classes in my dataset with the following n way n shot and n query
n_way = 4
n_shot = 1
n_query = 1
so I have 4 classes in train test val set just the number of samples in train set each class has 14 images and test val set has 4
images each class. Im following the gushengzhao1996 steps exactly . Any suggestions would be helpful.

error :
C:\Users\wn00204104\Anaconda3\envs\few-test\python.exe C:\Users\wn00204104\PycharmProjects\Data-Preparation\easy-few-shot-learning\notebooks\test_prot_episodic.py
C:\Users\wn00204104\Anaconda3\envs\few-test\lib\site-packages\torchvision\models_utils.py:208: UserWarning: The parameter 'pretrained' is deprecated since 0.13 and may be removed in the future, please use 'weights' instead.
warnings.warn(
C:\Users\wn00204104\Anaconda3\envs\few-test\lib\site-packages\torchvision\models_utils.py:223: UserWarning: Arguments other than a weight enum or None for 'weights' are deprecated since 0.13 and may be removed in the future. The current behavior is equivalent to passing weights=ResNet18_Weights.IMAGENET1K_V1. You can also use weights=ResNet18_Weights.DEFAULT to get the most up-to-date weights.
warnings.warn(msg)
Epoch 0
Training: 100%|██████████| 37/37 [00:49<00:00, 1.33s/it, loss=22]
Training: 0%| | 0/37 [00:00<?, ?it/s]Epoch 1
Training: 100%|██████████| 37/37 [00:47<00:00, 1.29s/it, loss=4.89]
Training: 0%| | 0/37 [00:00<?, ?it/s]Epoch 2
Training: 100%|██████████| 37/37 [00:48<00:00, 1.30s/it, loss=1.36]
Training: 0%| | 0/37 [00:00<?, ?it/s]Epoch 3
Training: 100%|██████████| 37/37 [00:50<00:00, 1.36s/it, loss=1.07]
Validation: 0%| | 0/500 [00:00<?, ?it/s]
Traceback (most recent call last):
File "C:\Users\wn00204104\PycharmProjects\Data-Preparation\easy-few-shot-learning\notebooks\test_prot_episodic.py", line 153, in
validation_accuracy = evaluate(
File "C:\Users\wn00204104\Anaconda3\envs\few-test\lib\site-packages\easyfsl\methods\utils.py", line 63, in evaluate
for _, (
File "C:\Users\wn00204104\Anaconda3\envs\few-test\lib\site-packages\tqdm\std.py", line 1195, in iter
for obj in iterable:
File "C:\Users\wn00204104\Anaconda3\envs\few-test\lib\site-packages\torch\utils\data\dataloader.py", line 628, in next
data = self._next_data()
File "C:\Users\wn00204104\Anaconda3\envs\few-test\lib\site-packages\torch\utils\data\dataloader.py", line 670, in _next_data
index = self._next_index() # may raise StopIteration
File "C:\Users\wn00204104\Anaconda3\envs\few-test\lib\site-packages\torch\utils\data\dataloader.py", line 618, in _next_index
return next(self._sampler_iter) # may raise StopIteration
File "C:\Users\wn00204104\Anaconda3\envs\few-test\lib\site-packages\easyfsl\samplers\task_sampler.py", line 53, in iter
[
File "C:\Users\wn00204104\Anaconda3\envs\few-test\lib\site-packages\easyfsl\samplers\task_sampler.py", line 56, in
random.sample(
File "C:\Users\wn00204104\Anaconda3\envs\few-test\lib\random.py", line 482, in sample
raise ValueError("Sample larger than population or is negative")
ValueError: Sample larger than population or is negative
Thanks

@gushengzhao1996
Copy link
Author

gushengzhao1996 commented Dec 7, 2022

Hi @gushengzhao1996, thank you for your kind words and the detailed description of your problem. This is very helpful!

PicklingError: Can't pickle <function at 0x000001B06041C1F8>: attribute lookup on main failed

I googled this, and it seems that PyTorch's multiprocessing (used when num_workers > 0) has some problems with Windows. I found this tutorial that might help you to make it work.

Second is the value setting of n_way, n_shot and n_query.
I set them to [5, 1, 1] as shown above. However, if set n_shot or n_query bigger even like [5, 2, 1].
There is an error that occurred in the training process, the error message shows below.
I think it's because of my datasets. It has 219 classes and each class only has 10 images.

This error occurs when a class's population is strictly smaller than n_shot + n_query. I believe the problem is that you have at least one class with only two instances. The ImageFolder dataset that you're using crawls all subfolders of your root folder. Can you check the number of files in each subfolder?

Hi, thank you so much for your amazing work , I am facing the same issue in my training phase I have 4 classes in my dataset with the following n way n shot and n query n_way = 4 n_shot = 1 n_query = 1 so I have 4 classes in train test val set just the number of samples in train set each class has 14 images and test val set has 4 images each class. Im following the gushengzhao1996 steps exactly . Any suggestions would be helpful.

Hi.
I think my problem is a little bit different from yours.
Let me explain something about the [n-way k-shot] setting. As you said, you set n_way = 4 n_shot = 1 n_query = 1, which means each time, it will pick 1 sample from every 4 classes that formed the support set (total 4 samples), and the same for the query set (also total 4 samples).
My problem also occurs in the validation step, because my validation set only has 2 samples in each class. So when I set[5, 2, 1], the Sampler should take 2+1 samples from each class. That's why the Sampler value is larger than the population(the actual sample number in each class of the validation set).
However, as your described, your [n-way k-shot] setting should be ok for your training and validation set.
But since the error occurred in the validation step, first I suggest you check the number of samples in each class of the validation set.
Second, try to change the [n-way k-shot] setting, set the n-way value smaller(2 or 3).

@shraddha291996
Copy link

Hi @gushengzhao1996, thank you for your kind words and the detailed description of your problem. This is very helpful!

PicklingError: Can't pickle <function at 0x000001B06041C1F8>: attribute lookup on main failed

I googled this, and it seems that PyTorch's multiprocessing (used when num_workers > 0) has some problems with Windows. I found this tutorial that might help you to make it work.

Second is the value setting of n_way, n_shot and n_query.
I set them to [5, 1, 1] as shown above. However, if set n_shot or n_query bigger even like [5, 2, 1].
There is an error that occurred in the training process, the error message shows below.
I think it's because of my datasets. It has 219 classes and each class only has 10 images.

This error occurs when a class's population is strictly smaller than n_shot + n_query. I believe the problem is that you have at least one class with only two instances. The ImageFolder dataset that you're using crawls all subfolders of your root folder. Can you check the number of files in each subfolder?

Hi, thank you so much for your amazing work , I am facing the same issue in my training phase I have 4 classes in my dataset with the following n way n shot and n query n_way = 4 n_shot = 1 n_query = 1 so I have 4 classes in train test val set just the number of samples in train set each class has 14 images and test val set has 4 images each class. Im following the gushengzhao1996 steps exactly . Any suggestions would be helpful.

Hi. I think my problem is a little bit different from yours. Let me explain something about the [n-way k-shot] setting. As you said, you set n_way = 4 n_shot = 1 n_query = 1, which means each time, it will pick 1 sample from every 4 classes that formed the support set (total 4 samples), and the same for the query set (also total 4 samples). My problem also occurs in the validation step, because my validation set only has 2 samples in each class. So when I set[5, 2, 1], the Sampler should take 2+1 samples from each class. That's why the Sampler value is larger than the population(the actual sample number in each class of the validation set). However, as your described, your [n-way k-shot] setting should be ok for your training and validation set. But since the error occurred in the validation step, first I suggest you check the number of samples in each class of the validation set. Second, try to change the [n-way k-shot] setting, set the n-way value smaller(2 or 3).

Hi, thank you very much for your reply and suggestions , so basically I tried keeping the different settings for n way k shot with your suggested values also (2 or 3) but it still gives me the same error. Sorry Im new in this , but I did not exactly get what do you mean by checking the number samples in each class in validation set? Im bit confused in between splitting the classes and number of samples in each classes in train test val set. Currently in my val set there are 4 classes and each class has 4 samples and train set 4 classes each has 14 samples. Can you please show your dataset structure ? That would be really helpful.

Thanks

@gushengzhao1996
Copy link
Author

'checking the number of samples in each class in the validation set' means:
How many classes are in the validation set?
how many samples are in each class above?

My dataset has a total of 219 classes, each class has 10 images (samples).
The ratio of training, testing, and validation is [6,2,2]. all sets have 219 classes.

As I showed, we don't split the class for each set.
BTW, I'm confused by your description.
Usually, we just mention the total number of the whole dataset. Like 'My dataset has N classes, 80% (samples not classes) for the training set, 20% for the validation (testing) set.'
So from your description, it might be a dataset with 4 classes, 8 classes, or between them (4~8).

@ebennequin
Copy link
Collaborator

It's also possible that your samples are not correctly loaded by EasySet. I suggest you observe the demography of your dataset, for instance by calling my_dataset.get_labels()

@ebennequin
Copy link
Collaborator

My dataset has a total of 219 classes, each class has 10 images (samples).
The ratio of training, testing, and validation is [6,2,2]. all sets have 219 classes.

While this may be relevant to your use case, this is not the usual Few-Shot Learning setting, in which the model infers at test time on classes that were not seen during training (nor validation). I assume @shraddha291996 is referring to the usual Few-Shot Learning setting, and therefore their train, val and test classes are different.

@shraddha291996
Copy link

My dataset has a total of 219 classes, each class has 10 images (samples).
The ratio of training, testing, and validation is [6,2,2]. all sets have 219 classes.

While this may be relevant to your use case, this is not the usual Few-Shot Learning setting, in which the model infers at test time on classes that were not seen during training (nor validation). I assume @shraddha291996 is referring to the usual Few-Shot Learning setting, and therefore their train, val and test classes are different.

yes exactly, Im trying to perform usual FSL setting so for that number of classes should be different for train test val set right and not the samples inside each classes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

3 participants