Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

1 epoch #67

Closed
dk-teknologisk-mlnn opened this issue Jan 10, 2022 · 20 comments · Fixed by #77
Closed

1 epoch #67

dk-teknologisk-mlnn opened this issue Jan 10, 2022 · 20 comments · Fixed by #77
Assignees

Comments

@dk-teknologisk-mlnn
Copy link

Is it meant to be only one epoch in training?
Your config files state 1 epoch, is that just as quick example?
I tried to train PADIM for 10 on mvtec leather and wood and the metrics stay the same anyway, so it seems nothing as gained by training more.
Lightning module also warn that there is no optimizer so I guess train only finds the correct thresholds and that takes 1 epoch.

@samet-akcay
Copy link
Contributor

Hi @sequoiagrove, PADIM algorithm doesn't require any CNN-based learning. It rather uses the CNN to extract the features of the training set, which is then used to fit a multivariate gaussian model. We therefore use 1 epoch to go through the entire training set and extract the features.

@samet-akcay
Copy link
Contributor

For the warning that there is no optimiser is also related to the above statement. Since we use the CNN only to extract the features, there is no optimiser set for the CNN training.

@dk-teknologisk-mlnn
Copy link
Author

that's what I thought.
to make the infrence work I had to copy paste some code snippets from your differnt branches to get a displayable heatmap that makes sense.
One issue is that there is not "stats" and thresholds in meta_data, only the image size, so I changed it to return the anomaly map unaltered : output ,score= inference.predict(image=args.image_path, superimpose=False)
and the inference.py takes anomaly_map, image_score . then I normalize it myself i = (i-min) / (max-min) . Looking good.
I tried training my own example with only good samples. It highlighted most my flaws, except some very small subtle changes.
is that expected outcome and will it be better at finding small anomalies if I provide annotated anomaly images in training? or do I need to choose one of the other models for such challenges?

Nevertheless, Impressive work :)

@samet-akcay
Copy link
Contributor

One issue is that there is not "stats" and thresholds in meta_data, only the image size, so I changed it to return the anomaly map unaltered : output ,score= inference.predict(image=args.image_path, superimpose=False)
and the inference.py takes anomaly_map, image_score . then I normalize it myself i = (i-min) / (max-min) . Looking good.
I tried training my own example with only good samples. It highlighted most my flaws, except some very small subtle changes.

This is a PR we just merged this morning, and haven't thoroughly tested yet. Maybe @ashwinvaidya17 could provide a better insight here.

is that expected outcome and will it be better at finding small anomalies if I provide annotated anomaly images in training? or do I need to choose one of the other models for such challenges?

The models don't use annotated images, so adding them wouldn't help. To find the small anomalies, you could either increase the image size or configure tiling from the config file. This is mainly because when a large image is resized into 256x256, detecting small anomalies becomes even smaller and detecting them becomes almost impossible. Using larger input or tiled input image could provide better performance.

In addition, our hyper-parameter optimisation tool will soon become publicly available so that parameter tuning could also be done to find the best parametrisation for custom datasets.

@dk-teknologisk-mlnn
Copy link
Author

ah ok , I checked out this friday. Now I re-checked out and now I get good maps out of the box, as long I reverted lightning back to 1.3.6 , put number of workers down to a reasonable amount, and add cv2.waitKey() after imshow.

@samet-akcay
Copy link
Contributor

Yeah, there is a PR that bumps up the lightning version to 1.6.0dev, but there are some breaking changes, and might take some time to merge this.

Good catch for cv2.waitKey(), we'll add it asap

@dk-teknologisk-mlnn
Copy link
Author

Also found that line 149 in torch.py (inferencer) has to be 👍
anomaly_map = anomaly_map.detach().numpy()
in order to run stfpm models.

and patchcore cannot train at the moment due to datatypes:

| Name | Type | Params

0 | image_threshold | AdaptiveThreshold | 0
1 | pixel_threshold | AdaptiveThreshold | 0
2 | training_distribution | AnomalyScoreDistribution | 0
3 | min_max | MinMax | 0
4 | image_metrics | MetricCollection | 0
5 | pixel_metrics | MetricCollection | 0
6 | model | PatchcoreModel | 68.9 M

68.9 M Trainable params
0 Non-trainable params
68.9 M Total params
275.533 Total estimated model params size (MB)
Epoch 0: 6%|███████████▌ | 8/132 [00:26<06:56, 3.36s/it]Traceback (most recent call last):
File "tools\train.py", line 66, in
train()
File "tools\train.py", line 61, in train
trainer.fit(model=model, datamodule=datamodule)
File "C:\Anaconda3\envs\anomalib_env\lib\site-packages\pytorch_lightning\trainer\trainer.py", line 458, in fit
self._run(model)
File "C:\Anaconda3\envs\anomalib_env\lib\site-packages\pytorch_lightning\trainer\trainer.py", line 756, in _run
self.dispatch()
File "C:\Anaconda3\envs\anomalib_env\lib\site-packages\pytorch_lightning\trainer\trainer.py", line 797, in dispatch
self.accelerator.start_training(self)
File "C:\Anaconda3\envs\anomalib_env\lib\site-packages\pytorch_lightning\accelerators\accelerator.py", line 96, in start_training
self.training_type_plugin.start_training(trainer)
File "C:\Anaconda3\envs\anomalib_env\lib\site-packages\pytorch_lightning\plugins\training_type\training_type_plugin.py", line 144, in start_training
self._results = trainer.run_stage()
File "C:\Anaconda3\envs\anomalib_env\lib\site-packages\pytorch_lightning\trainer\trainer.py", line 807, in run_stage
return self.run_train()
File "C:\Anaconda3\envs\anomalib_env\lib\site-packages\pytorch_lightning\trainer\trainer.py", line 869, in run_train
self.train_loop.run_training_epoch()
File "C:\Anaconda3\envs\anomalib_env\lib\site-packages\pytorch_lightning\trainer\training_loop.py", line 566, in run_training_epoch
self.on_train_epoch_end(epoch_output)
File "C:\Anaconda3\envs\anomalib_env\lib\site-packages\pytorch_lightning\trainer\training_loop.py", line 606, in on_train_epoch_end
training_epoch_end_output = model.training_epoch_end(processed_epoch_output)
File "d:\projects\anomalib\anomalib\models\patchcore\model.py", line 297, in training_epoch_end
embedding = self.model.subsample_embedding(embedding, sampling_ratio)
File "d:\projects\anomalib\anomalib\models\patchcore\model.py", line 229, in subsample_embedding
random_projector.fit(embedding)
File "d:\projects\anomalib\anomalib\models\patchcore\utils\sampling\random_projection.py", line 124, in fit
self.sparse_random_matrix = self._sparse_random_matrix(n_features=n_features).to(device)
File "d:\projects\anomalib\anomalib\models\patchcore\utils\sampling\random_projection.py", line 85, in _sparse_random_matrix
components[i, c_idx] = data.double()
IndexError: tensors used as indices must be long, byte or bool tensors

@samet-akcay
Copy link
Contributor

Thanks for reporting these!

@ashwinvaidya17
Copy link
Collaborator

@sequoiagrove Thanks for reporting these 😀
The inference.py does not superimpose anomaly maps. It would be good to add an option for this and make it a part of this issue.
I'll try to reproduce the patchcore issue but it seems to be working in the tests. I'll have a look.

@dk-teknologisk-mlnn
Copy link
Author

could it be confudsion between torch install that happened when I struggled to enable my GPU?
form conda list:

pytorch 1.10.1 py3.8_cuda11.3_cudnn8_0 pytorch
pytorch-lightning 1.3.6 pypi_0 pypi
torch 1.8.1 pypi_0 pypi
torch-metrics 1.1.7 pypi_0 pypi
torchaudio 0.10.1 py38_cu113 pytorch
torchmetrics 0.6.2 pypi_0 pypi
torchvision 0.9.1 pypi_0 pypi

@ashwinvaidya17
Copy link
Collaborator

@sequoiagrove Could be. That's another issue that's been on our list for some time 🙃

@dk-teknologisk-mlnn
Copy link
Author

fixed it:
in patchcore/utils/random projections.py line 79-83:
c_idx = torch.tensor(
sample_without_replacement(
n_population=n_features, n_samples=nnz_idx, random_state=self.random_state
),dtype=torch.long
)

@dk-teknologisk-mlnn
Copy link
Author

dk-teknologisk-mlnn commented Jan 11, 2022

patchcore inference:
File "d:\projects\anomalib\anomalib\utils\normalization\min_max.py", line 31, in normalize
normalized = ((targets - threshold) / (max_val - min_val)) + 0.5
TypeError: unsupported operand type(s) for -: 'numpy.ndarray' and 'Tensor'

mixed datatypes.

Brilliant usage of union btw :)
I guess it is the end of patchcore training it should be casting the types consistently?

meta_data is:

{'image_threshold': tensor(2.0865), 'pixel_threshold': tensor(2.8785), 'min': tensor(0.7478), 'max': tensor(4.2167), 'image_shape': (1024, 1024)}

anomalymap is tensor and pred_score is array.

if I run padim, both of them are tensors.

@dk-teknologisk-mlnn
Copy link
Author

Found the issue. I prnted the data types trough out the inference. in model.py score and map is tensor all the way, it is in the deploy/torch.py you ask "isinstance(map, tensor)" . andboth of them are, but it is false because it is two tensors, not just one. and the code for false is to convert pred_score to numpy. but anomaly_map is assumed to be numpy already. the metadata is still tensors so I cant just also convert the map to numpy.
if I just don't convert the score it works, but I guess that breaks some of the other models.
so we need to handle the special case of getting two tensors.

@dk-teknologisk-mlnn
Copy link
Author

this works for all three models I trained padim, patchcore and stfpm:

    if isinstance(predictions, Tensor):
        anomaly_map = predictions
        pred_score = anomaly_map.reshape(-1).max()
    else:
        if isinstance(predictions[1],( Tensor)):
            anomaly_map, pred_score = predictions
            pred_score = pred_score.detach()
        else:               
            anomaly_map, pred_score = predictions
            pred_score = pred_score.detach().numpy()

@dk-teknologisk-mlnn
Copy link
Author

I tried to make a new environment to install the exact versions in your requirements. I had to make the same fixes as above to get patchcore working.
In mvtec examples it works well on carpet, wood and leather. but on for example screw it is nowhere near the performance reported. Is the mvtec benchmark for all the mvtec data categories trained with different hyperparameters?
So far the best model on my own datasets is padim.

DATALOADER:0 TEST RESULTS
{'image_AUROC': 0.44906747341156006,
'image_F1': 0.8561151623725891,
'pixel_AUROC': 0.9092798233032227,
'pixel_F1': 0.03107343800365925}

@ashwinvaidya17
Copy link
Collaborator

@sequoiagrove It is possible that some metrics might have diverged from when we collected the results. There is a plan to re-evaluate all the algorithms. Also, a benchmarking script is in PR state which will help gather results but merging this is pushed back before a refactor we are planning. Here is a tip, if you want to log anomaly images, you can modify log_images to log_images_to: [local]. It will save the results in the results folder after training completion.

@dk-teknologisk-mlnn
Copy link
Author

diverged metrics: dropping from 0.99 to 0.44 and 0.03 is rather critical?
log images: nice :)
Also padim dropped in performance, but not as crazy.
Here's a patchcore result of "good" parts:
008

this is padim:
DATALOADER:0 TEST RESULTS
{'image_AUROC': 0.7589669823646545,
'image_F1': 0.8787878751754761,
'pixel_AUROC': 0.9781586527824402,
'pixel_F1': 0.22379672527313232}
008padim

@dk-teknologisk-mlnn
Copy link
Author

I tried this other patchcore repo [ https://github.com/hcw-00/PatchCore_anomaly_detection ]on mvtec/screws and it gives me:

{'img_auc': 0.5911047345767575, 'pixel_auc': 0.9048583939897462}

and rather random anomaly maps as well..

@samet-akcay
Copy link
Contributor

Thanks for reporting this discrepancy @sequoiagrove. We'll investigate the benchmarks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants