Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Getting NaN loss values while training PatchCore model on custom dataset #288

Closed
RahaviSelvarajan opened this issue Apr 27, 2022 · 5 comments
Assignees
Labels
Milestone

Comments

@RahaviSelvarajan
Copy link

Describe the bug

  • I am training the patchcore model on a custom dataset. During training, loss value in the progress bar is shown as NaN. Why would this happen?

Screenshots
Epoch 0: 2%|█▋ Aggregating the embedding extracted from the training set.5/218 [00:06<04:30, 1.27s/it, loss=nan]
Creating CoreSet Sampler via k-Center Greedy
Getting the coreset from the main embedding.
Assigning the coreset as the memory bank.
Epoch 1: 2%|███▏ Aggregating the embedding extracted from the training set. | 5/218 [01:34<1:07:23, 18.98s/it, loss=nan]
Creating CoreSet Sampler via k-Center Greedy
Getting the coreset from the main embedding.
Assigning the coreset as the memory bank.
Epoch 2: 2%|███▏ Aggregating the embedding extracted from the training set. | 5/218 [06:39<4:43:55, 79.98s/it, loss=nan]

@alexriedel1
Copy link
Contributor

This is because you do not update any weights during the "training" of patchcore but only save features while inferencing your training data (https://arxiv.org/pdf/2106.08265.pdf). Doing this for multiple epochs will not improve the model unless you apply random image augmentations in each epoch (which is why the default number of training epochs is 1 and you shouldn't change this parameter unless for any obvious reasons)

@samet-akcay
Copy link
Contributor

@RahaviSelvarajan, as @alexriedel1 pointed out, the reason is because the CNN is only used for feature extraction, which doesn't produce a loss value. If this looks confusing, we could consider removing it.

@samet-akcay samet-akcay self-assigned this Jun 17, 2022
@ashwinvaidya17 ashwinvaidya17 added this to the Backlog milestone Oct 3, 2022
@nguyenanhtuan1008
Copy link

@samet-akcay so in this situation. Train 1 epoch is the same result compared to multiple epochs?

@samet-akcay
Copy link
Contributor

samet-akcay commented Oct 5, 2022

@nguyenanhtuan1008, yes. Training single vs multiple epoch would yield the same result for all models that use CNN for feature extraction.

The exception would be to apply random image augmentations, as @alexriedel1 pointed out above.

@nixczhou
Copy link

hi regarding this, how to do augmentation in patchcore. Seems like patchcore need heavy augmentation right?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

7 participants