Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error in evaluating trained model --test #1135

Closed
llasanudin opened this issue May 30, 2022 · 16 comments · Fixed by #1173
Closed

Error in evaluating trained model --test #1135

llasanudin opened this issue May 30, 2022 · 16 comments · Fixed by #1173
Assignees
Labels
bug category: fixes an error in the code

Comments

@llasanudin
Copy link

Hi,

So I am training a segmentation model of axon and myelin from microscopy images using axondeepseg. The whole process up to creating the trained model worked just fine. However when I am trying to test the trained model, it kept on producing this error: AttributeError: 'NoneType' object has no attribute 'lower'. I was wondering if you could point out where I went wrong (the full output is shown below).

Many thanks,
Leroy

2022-05-30 13:48:44.003 | INFO | ivadomed.utils:init_ivadomed:421 -
ivadomed (2.9.5)

2022-05-30 13:48:44.008 | INFO | ivadomed.utils:get_path_output:373 - CLI flag --path-output not used to specify output directory. Will check config file for directory...
2022-05-30 13:48:44.008 | INFO | ivadomed.utils:get_path_data:385 - CLI flag --path-data not used to specify BIDS data directory. Will check config file for directory...
2022-05-30 13:48:44.009 | INFO | ivadomed.main:set_output_path:207 - Output path already exists: drive/MyDrive/ADS22/model
2022-05-30 13:48:44.107 | INFO | ivadomed.utils:define_device:137 - Using GPU ID 0
2022-05-30 13:48:44.109 | INFO | ivadomed.utils:display_selected_model_spec:147 - Selected architecture: Unet, with the following parameters:
2022-05-30 13:48:44.109 | INFO | ivadomed.utils:display_selected_model_spec:150 - dropout_rate: 0.2
2022-05-30 13:48:44.109 | INFO | ivadomed.utils:display_selected_model_spec:150 - bn_momentum: 0.1
2022-05-30 13:48:44.109 | INFO | ivadomed.utils:display_selected_model_spec:150 - depth: 2
2022-05-30 13:48:44.110 | INFO | ivadomed.utils:display_selected_model_spec:150 - is_2d: True
2022-05-30 13:48:44.110 | INFO | ivadomed.utils:display_selected_model_spec:150 - final_activation: sigmoid
2022-05-30 13:48:44.110 | INFO | ivadomed.utils:display_selected_model_spec:150 - length_2D: [256, 256]
2022-05-30 13:48:44.111 | INFO | ivadomed.utils:display_selected_model_spec:150 - stride_2D: [244, 244]
2022-05-30 13:48:44.111 | INFO | ivadomed.utils:display_selected_model_spec:150 - folder_name: model_NewEllipse_22
2022-05-30 13:48:44.111 | INFO | ivadomed.utils:display_selected_model_spec:150 - in_channel: 1
2022-05-30 13:48:44.111 | INFO | ivadomed.utils:display_selected_model_spec:150 - out_channel: 3
2022-05-30 13:48:44.700 | INFO | ivadomed.loader.bids_dataframe:save:322 - Dataframe has been saved in drive/MyDrive/ADS22/model/bids_dataframe.csv.
2022-05-30 13:48:44.704 | WARNING | ivadomed.loader.utils:split_dataset:102 - After splitting: train, validation and test fractions are respectively 0.571, 0.286 and 0.143 of sample_id.
2022-05-30 13:48:44.786 | INFO | ivadomed.utils:display_selected_transfoms:160 - Selected transformations for the ['testing'] dataset:
2022-05-30 13:48:44.786 | INFO | ivadomed.utils:display_selected_transfoms:162 - Resample: {'hspace': 0.0001, 'wspace': 0.0001}
2022-05-30 13:48:44.787 | INFO | ivadomed.utils:display_selected_transfoms:162 - NormalizeInstance: {'applied_to': ['im']}
Loading dataset: 100% 1/1 [00:00<00:00, 449.31it/s]
2022-05-30 13:48:45.956 | INFO | ivadomed.loader.loader:load_dataset:114 - Loaded 24 axial patches of shape [256, 256] for the testing set.
2022-05-30 13:48:45.957 | INFO | ivadomed.testing:test:52 - Loading model: drive/MyDrive/ADS22/model/best_model.pt
Inference - Iteration 0: 83% 10/12 [00:01<00:00, 2.42it/s]2022-05-30 13:48:52.031 | WARNING | ivadomed.testing:run_inference:249 - No color labels saved due to a temporary issue. For more details see:#720
Lossy conversion from float64 to uint8. Range [0, 1]. Convert image to uint8 prior to saving to suppress this warning.
Lossy conversion from float64 to uint8. Range [0, 1]. Convert image to uint8 prior to saving to suppress this warning.
Inference - Iteration 0: 100% 12/12 [00:03<00:00, 3.34it/s]
2022-05-30 13:49:13.560 | INFO | ivadomed.testing:test:88 - {'dice_score': 0.9126436879383726, 'multi_class_dice_score': 0.9166985962316472, 'precision_score': 0.8887869614675107, 'recall_score': 0.9378164590446905, 'specificity_score': 0.9369739801311445, 'intersection_over_union': 0.8393234837695477, 'accuracy_score': 0.9372683577674897, 'hausdorff_score': 0.0}
2022-05-30 13:49:13.560 | INFO | ivadomed.evaluation:evaluate:33 -
Run Evaluation on drive/MyDrive/ADS22/model/pred_masks

Evaluation: 0% 0/1 [00:00<?, ?it/s]
Traceback (most recent call last):
File "/usr/local/bin/ivadomed", line 8, in
sys.exit(run_main())
File "/usr/local/lib/python3.7/site-packages/ivadomed/main.py", line 604, in run_main
resume_training=bool(args.resume_training))
File "/usr/local/lib/python3.7/site-packages/ivadomed/main.py", line 517, in run_command
eval_params=context[ConfigKW.EVALUATION_PARAMETERS])
File "/usr/local/lib/python3.7/site-packages/ivadomed/evaluation.py", line 66, in evaluate
fname_gt = [imed_loader_utils.update_filename_to_nifti(fname) for fname in fname_gt]
File "/usr/local/lib/python3.7/site-packages/ivadomed/evaluation.py", line 66, in
fname_gt = [imed_loader_utils.update_filename_to_nifti(fname) for fname in fname_gt]
File "/usr/local/lib/python3.7/site-packages/ivadomed/loader/utils.py", line 423, in update_filename_to_nifti
extension = get_file_extension(filename)
File "/usr/local/lib/python3.7/site-packages/ivadomed/loader/utils.py", line 404, in get_file_extension
extension = next((ext for ext in EXT_LST if filename.lower().endswith(ext)), None)
File "/usr/local/lib/python3.7/site-packages/ivadomed/loader/utils.py", line 404, in
extension = next((ext for ext in EXT_LST if filename.lower().endswith(ext)), None)
AttributeError: 'NoneType' object has no attribute 'lower'

@mariehbourget
Copy link
Member

Hi Leroy, thank you for reaching out!
I was not able to reproduce the issue at the moment, but we'll try to figure it out.

As a first step, would you be able to share the config file you used for training/testing?
It would also help to see the bids_dataframe.csv file located in drive/MyDrive/ADS22/model/bids_dataframe.csv.
You can upload both files here in a zip folder.

Also, could you screenshot the content of the drive/MyDrive/ADS22/model/pred_masks folder?

Thanks!
Marie-Hélène

@llasanudin
Copy link
Author

Hi Marie,

Below is the screenshot of the pred_masks folder, and these are the config file and bids_dataframe.csv:
config file and bids dataframe.zip

Screen Shot 2022-05-30 at 5 05 24 PM

@mariehbourget
Copy link
Member

Thank @llasanudin for the additional information.
I was able to find the issue and reproduce the error.

The error comes from the target_suffix of the original derivatives files.
Currently, in ivadomed, the testing pipeline only works if there is an underscore at the beginning of the target_suffix and no underscore between words.
For example: _seg-axon-manual works, but _seg-axon_manual doesn't work.

In your case, you have _seg-axon_manual and _seg-myelin_manual, which gives you the following filenames in the pred_masks folder:
sub-LM3_sample-data3_BF_seg-axon_class-0_pred.png
sub-LM3_sample-data3_BF_seg-axon_class-1_pred.png

A quick fix is to replace the underscore before "manual" by a dash ( _seg-axon-manual and _seg-myelin-manual) both in the filenames of your dataset and in the ivadomed config file.

The correct filenames of the prediction in the pred_masks folder should then be:
sub-LM3_sample-data3_BF_class-0_pred.png
sub-LM3_sample-data3_BF_class-1_pred.png
and the evaluation should run without errors.

Note that you do not need to re-train the model, once the file are renamed and the config file fixed, the already trained model should work, but I recommend you use the corrected dataset and target_suffix for any further training.

I will follow-up shortly with more details for the ivadomed team so we can work on a more permanent fix.
Let us know if you have any questions.

@mariehbourget
Copy link
Member

Additional details on the issue for the dev team:

  • The testing pipeline of ivadomed (--test) only works if there is an underscore at the beginning of the target_suffix and no underscore between words. For example: _seg-axon-manual works, but _seg-axon_manual doesn't work.
  • The test part is wokring as expected and writes the predictions in the pred_masks folder.
  • However, the evaluate part, which compute the evaluation metrics between the ground-truth and the predictions, fails.
  • This is because the "evaluate" function relies on the filenames of the predictions in pred_masks to find the corresponding ground-truth in the bids_dataframe here:
    # LIST PREDS
    subj_acq_lst = [f.name.split('_pred')[0] for f in path_preds.iterdir() if f.name.endswith('_pred.nii.gz')]
    # Get all derivatives filenames
    all_deriv = bids_df.get_deriv_fnames()
    # LOOP ACROSS PREDS
    for subj_acq in tqdm(subj_acq_lst, desc="Evaluation"):
    # Fnames of pred and ground-truth
    fname_pred = path_preds.joinpath(subj_acq + '_pred.nii.gz')
    derivatives = bids_df.df[bids_df.df['filename']
    .str.contains('|'.join(bids_df.get_derivatives(subj_acq, all_deriv)))]['path'].to_list()
    # Ordering ground-truth the same as target_suffix
    fname_gt = [None] * len(target_suffix)
    for deriv in derivatives:
    for idx, suffix in enumerate(target_suffix):
    if suffix in deriv:
    fname_gt[idx] = deriv
  • So this issue really stems from the name of the predictions files. This name is determined during the "test" part in the "run_inference" function here:
    fname_pred = str(Path(ofolder, Path(fname_ref).name))
    fname_pred = fname_pred.rsplit("_", 1)[0] + '_pred.nii.gz'
  • Because of the rsplit("_", 1)[0], any target_suffix with an underscore in between words will generate the wrong prediction filenames.

A potential fix would be to split with the target_suffix instead of relying on the underscores. At this stage, the target_suffix are available in testing_params['target_suffix'].
If we go with this, we would have to make sure it works for the multi-rater case as well (when target_suffix is a list of list).

In any case, I think it would be good to look into this. AFAIK, this is the only place that "limits" the supported syntax of the target_suffix, and this is not very well documented.

@llasanudin
Copy link
Author

Thank you very much for the help, but I do have some follow-up questions:

  1. I had fixed the target_suffix for all dataset and was able to run the evaluation, however it only generated one prediction mask per iteration as shown below. Is it meant to do that or should each image have its own prediction mask? (I used 7 images as training set)

Screen Shot 2022-05-31 at 3 17 05 PM

  1. After evaluating the model, I tried to run visualize and compare testing models and it produced this error: TypeError: unhashable type: 'list' (below is the full output). I was wondering if this have anything to do with a problem during the model evaluation process?

2022-05-31 13:41:40.585 | WARNING | ivadomed.scripts.visualize_and_compare_testing_models::39 - No backend can be used - Visualization will fail
2022-05-31 13:41:40.586 | INFO | ivadomed.utils:init_ivadomed:421 -
ivadomed (2.9.5)

2022-05-31 13:41:40.586 | DEBUG | ivadomed.scripts.visualize_and_compare_testing_models:visualize_and_compare_models:132 - ofolders: ['drive/MyDrive/ADS22/model', 'drive/MyDrive/ADS24/model']
2022-05-31 13:41:40.587 | DEBUG | ivadomed.scripts.visualize_and_compare_testing_models:visualize_and_compare_models:133 - metric: ['dice_class0']
Traceback (most recent call last):
File "pandas/_libs/hashtable_class_helper.pxi", line 5231, in pandas._libs.hashtable.PyObjectHashTable.map_locations
TypeError: unhashable type: 'list'
Exception ignored in: 'pandas._libs.index.IndexEngine._call_map_locations'
Traceback (most recent call last):
File "pandas/_libs/hashtable_class_helper.pxi", line 5231, in pandas._libs.hashtable.PyObjectHashTable.map_locations
TypeError: unhashable type: 'list'
Traceback (most recent call last):
File "/usr/local/bin/ivadomed_visualize_and_compare_testing_models", line 8, in
sys.exit(main())
File "/usr/local/lib/python3.7/site-packages/ivadomed/scripts/visualize_and_compare_testing_models.py", line 250, in main
visualize_and_compare_models(args.ofolders, args.metric, args.metadata)
File "/usr/local/lib/python3.7/site-packages/ivadomed/scripts/visualize_and_compare_testing_models.py", line 186, in visualize_and_compare_models
print(df)
File "/usr/local/lib/python3.7/site-packages/pandas/core/frame.py", line 1002, in repr
show_dimensions=show_dimensions,
File "/usr/local/lib/python3.7/site-packages/pandas/core/frame.py", line 1134, in to_string
line_width=line_width,
File "/usr/local/lib/python3.7/site-packages/pandas/io/formats/format.py", line 1053, in to_string
string = string_formatter.to_string()
File "/usr/local/lib/python3.7/site-packages/pandas/io/formats/string.py", line 25, in to_string
text = self._get_string_representation()
File "/usr/local/lib/python3.7/site-packages/pandas/io/formats/string.py", line 40, in _get_string_representation
strcols = self._get_strcols()
File "/usr/local/lib/python3.7/site-packages/pandas/io/formats/string.py", line 31, in _get_strcols
strcols = self.fmt.get_strcols()
File "/usr/local/lib/python3.7/site-packages/pandas/io/formats/format.py", line 540, in get_strcols
strcols = self._get_strcols_without_index()
File "/usr/local/lib/python3.7/site-packages/pandas/io/formats/format.py", line 793, in _get_strcols_without_index
str_columns = self._get_formatted_column_labels(self.tr_frame)
File "/usr/local/lib/python3.7/site-packages/pandas/io/formats/format.py", line 876, in _get_formatted_column_labels
for i, (col, x) in enumerate(zip(columns, fmt_columns))
File "/usr/local/lib/python3.7/site-packages/pandas/io/formats/format.py", line 876, in
for i, (col, x) in enumerate(zip(columns, fmt_columns))
File "/usr/local/lib/python3.7/site-packages/pandas/io/formats/format.py", line 838, in _get_formatter
return self.formatters.get(i, None)
TypeError: unhashable type: 'list'

Here are some files that may be useful: config and evaluation metrics.zip

Many thanks,
Leroy

@mariehbourget
Copy link
Member

mariehbourget commented May 31, 2022

  1. I had fixed the target_suffix for all dataset and was able to run the evaluation, however it only generated one prediction mask per iteration as shown below. Is it meant to do that or should each image have its own prediction mask? (I used 7 images as training set)

In the "testing" phase, we usually computes evaluation metrics for the testing set only, which are the images that were not seen by the model during the training phase.
In the config file, you have a train_fraction of 0.6 and a test_fraction of 0.1.
It means that ~60% of your images were used for training and ~30% for validation during the training phase.
In the evaluation (testing) phase, the test set (~10% of your images or 1 image in your case) is used to evaluate the performance of the model.

If you want to output the segmentations of all the images of your dataset, you can run ivadomed with the command --segment (instead of --test). Note that this will only output the segmentations in the pred_masks folder, and not compute evaluation_metrics. Reference here: https://ivadomed.org/usage.html#usage

2. After evaluating the model, I tried to run visualize and compare testing models and it produced this error: TypeError: unhashable type: 'list' (below is the full output). I was wondering if this have anything to do with a problem during the model evaluation process?

Unfortunately, I was not able to reproduce this error using your bids_dataframe.csv and evaluation_metrics.csv files. Can you share the exact command-line you used to launch the script?

@llasanudin
Copy link
Author

Thank you for your answer, that makes perfect sense. For the second question, I was trying to compare between 2 models (both were trained with the same dataset but different depths). The command line was ivadomed_visualize_and_compare_testing_models --ofolders drive/MyDrive/ADS22/model drive/MyDrive/ADS24/model --metric dice_class0.

Here are some additional files (bids dataframe, evaluation metrics, and config file) from both models that may be useful:
model1.zip model2.zip

@mariehbourget
Copy link
Member

Thanks for the additional information. I have a somewhat similar error than you and there may be a bug with the --metric flag.

Are you able to run the script without the --metric flag?
ivadomed_visualize_and_compare_testing_models --ofolders drive/MyDrive/ADS22/model drive/MyDrive/ADS24/model
(the default is dice_class0)

@llasanudin
Copy link
Author

Yes, I was able to run the script after removing the --metric flag, however the violinplots still did not appear. It still had this warning: No backend can be used - Visualization will fail

Screen Shot 2022-06-01 at 4 04 55 PM

@mariehbourget
Copy link
Member

Thanks @llasanudin, unfortunately I don't have an answer for you at the moment.
As this is a bit out of scope of the original question, I opened a new issue #1138 specifically for the violonplot where we will follow-up.

@kanishk16
Copy link
Contributor

Yes, I was able to run the script after removing the --metric flag, however the violinplots still did not appear. It still had this warning: No backend can be used - Visualization will fail

Screen Shot 2022-06-01 at 4 04 55 PM

@llasanudin I guess you're running this on Google Colab... As a quick fix, I would suggest you the following:

  1. To install ivadomed on your machine. Just follow only the first two steps as suggested in the official docs. We are yet to update the docs as you don't require Step 3 now.

  2. I couldn't reproduce the visualization failure on my machine without --metric flag. Ofcourse, there is a bug in --metric flag and we'll try to look into it ASAP. The visualization failure seems to be colab specific and as we look further into it, you could perhaps download the files drive/MyDrive/ADS22/model and to drive/MyDrive/ADS24/model to your machine & then try running the same command ivadomed_visualize_and_compare_testing_models --ofolders drive/MyDrive/ADS22/model drive/MyDrive/ADS24/model to generate this not so violin looking plot since there is only a single example.

image

@kanishk16 kanishk16 self-assigned this Jun 6, 2022
@llasanudin
Copy link
Author

Hi everyone, thank you very much for the help!

Your suggestions did work and I was able to generate the plots by not running it from colab. Just another quick question on this, is it possible to test the model for the entire dataset? I had already segmented all images from my dataset using --segment command, and I want to extract the evaluation metrics from all of the prediction masks. However because of the split dataset in the config file, I can't seem to just change the test fraction into 1 and run the entire evaluation.

Many thanks,
Leroy

@mariehbourget
Copy link
Member

Hi @llasanudin ,
Happy to know that you were able to use the visualization script!

For your other question, as I mentioned earlier, it is unusual to test the model on images that were used in training, so ivadomed doesn't have an automatic way to do this.

However, I can suggest a workaround for your specific case.
If you check in your output path, you will find a file called "split_datasets.joblib". This is the file that controls the split of the dataset. So the idea would be to create a custom file with the following steps:

  1. Generate a custom joblib file for testing.
    Here is an example of how to generate the file using the joblib library, where you will have to specify 2 things:
  • The filenames of the subjects you want to include in the test list.
  • An output path for the .joblib file.
import joblib

train_lst = [] 
valid_lst = [] 
test_lst = ["sub-01_sample-01.png", "sub-01_sample-02.png", "sub-01_sample-03.png"] # Your list of filenames to include in the test set

split_path = "path_to_split_joblib/split_datasets_for_testing.joblib" # Your output path for the joblib file

split_dct = {'train': train_lst, 'valid': valid_lst, 'test': test_lst}
joblib.dump(split_dct, split_path)
  1. Once the .joblib file is generated, you can add its path to the fname_split parameter in a copy of your config file (it supersedes the test_fraction).

  2. Then run ivadomed with the new config_file and the --test command. This will re-segment the images of the test list (and overwrite previous ones in pred_masks), and will compute the evaluation_3Dmetrics.csv file.

Let me know if any of these steps is not clear.

@GrimmSnark
Copy link

Hello @mariehbourget

So I am helping @llasanudin with the project under discussion in this thread. And I want to get a better understanding of the joblib file utilisation. At the moment we want to be able to leave out specific images to be in the test set but still randomise the train and validate image sets. My understanding of how the training process normally works is that for each iteration of the CNN training the train/validate sets are randomly drawn from the fraction of images not reserved in the test set. However, if we use the joblib file do the train/validate steps still work like that?

Best,

Michael

PS. I can move this to a new issue if that is easier for you all.

@mariehbourget
Copy link
Member

Hi @GrimmSnark and @llasanudin,

The procedure I wrote above with the joblib file was for the special case where you wanted to evaluate the images from the entire dataset after the training (even on images that were used for training).
It may not be the best solution for your current questions, I'll try to detail the split procedure below.

My understanding of how the training process normally works is that for each iteration of the CNN training the train/validate sets are randomly drawn from the fraction of images not reserved in the test set. However, if we use the joblib file do the train/validate steps still work like that?

The normal dataset splitting goes like this:

  1. Without joblib file:

    • The training and test sets are randomly split before training according to train_fraction and test_fraction.
    • The validation set is the remainder (1 - train_fraction - test_fraction).
    • There is a seed for randomization in the config file that you can change to have different split (random_seed).
    • The training/validation/test sets remain the same for all the epochs of a given training.
    • A split_datasets.joblib file will be created automatically in your output path describing the split of the training.
  2. With joblib file:

    • You can use the split_datasets.joblib file from a previous training to run the exact same training again.
    • Or you can create a custom joblib file to control exactly what are in the training and validation and test sets.
    • When used in fname_split, it supersedes train_fraction, test_fraction and random_seed.

At the moment we want to be able to leave out specific images to be in the test set but still randomise the train and validate image sets.

For that specific case i.e. choosing what is in the test set but keeping train/valid sets random, we have a specific parameter that would work better for your situation.
This parameter is data_testing.

Example:
"data_testing": {"data_type": "sample_id", "data_value":["sample-03", "sample-04"]}

  • The training set will be split randomly according to train_fraction.
  • The test set will be the files with sample_id: "sample-03" and "sample-04".
  • The test_fraction is ignored.
  • The remainder is the validation set.

Let me know if I interpreted your question correctly and if you have any questions.

@GrimmSnark
Copy link

Hello @mariehbourget

Thank you again for your extremely fast reply!

Looking at your response, the data_testing parameter seems to be exactly what we want to use.

Michael

@kanishk16 kanishk16 added the bug category: fixes an error in the code label Jun 20, 2022
kanishk16 added a commit that referenced this issue Jun 28, 2022
…et_suffix (#1173)

Fixes #1135

Currently, `--test` command doesn't support multiple annotations for 
the ground truth. So, adding a test to document this behaviour.

Co-authored-by: mariehbourget <54086142+mariehbourget@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug category: fixes an error in the code
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants