Testing on other data sets #9

Tonthatj · 2019-05-03T16:27:41Z

Hi, so I was testing the model against another dataset of mammos, and was wondering if the inputted image dimensions have to be exact? Your sample cropped photos: (2440X3656) & (2607X3818) and of ours (1993X4396) & (2133X4906).

P_00005,RIGHT, CC, MALIGNANT, 0.1293 ,0.0123
P_00005,RIGHT,MLO, MALIGNANT ,0.1293, 0.0123
P_00007, LEFT, CC, BENIGN ,0.3026, 0.1753
P_00007,LEFT,MLO, BENIGN, 0.3026, 0.1753

As you can see the probabilities for benign and malignancy (respectively) are incredibly low, Out of a dataset of 200, the model only accurately predicted ~ 10 of them.

kjgeras · 2019-05-03T18:09:34Z

@Tonthatj If I understand you correctly, this is what you should expect. This classifier is trained with a very imbalanced data set, only a small fraction of the training examples contain a malignancy. This is skewing the classifier towards predicting very low probability of malignancy. What this classifier should be good at is distinguishing between malignant and not malignant cases (which is captured by AUC). It will not necessarily provide accurate probability estimates.

Tonthatj · 2019-05-03T19:01:58Z

So in what is the threshold for determining if the study as a whole was malignant or not?

kjgeras · 2019-05-04T22:30:18Z

There is no universally "correct" threshold. It depends on the dataset you are planning to apply it to. One way you can pick a sensible threshold using validation data in the following. Assuming that you expect p% of cancers in your dataset, sort the validation examples according to the predicted probability of malignancy, pick top p%, the lowest estimated probability of malignancy in that set is your threshold.

kjgeras · 2019-05-04T22:34:22Z

By the way, I'm not sure if I'm interpreting the numbers in the first post above correctly, but they look weird to me. If you are getting some strange results such as AUC = 0.5, it's probably because you are preprocessing the data differently than we did. Look at the tech report if that is the case: https://cs.nyu.edu/~kgeras/reports/datav1.0.pdf

Tonthatj · 2019-05-07T13:42:57Z

Hi @kjgeras , we followed all of your preprocessing instructions, with the exception that our dicom images have a larger dimension. Do you think that this is the reason for the poor AUC? All of the cropped image resolutions are larger than the specified resolution of 2290 × 1890 that you mentioned in tech report.

kjgeras · 2019-05-07T15:26:11Z

It's difficult to answer this question based on what you wrote. You have to do the preprocessing and image normalization exactly the same way as we do. If we differ in just one detail, you are going to get random predictions.

Tonthatj · 2019-05-07T15:58:49Z

@kjgeras I have Dicoms with resolutions of 3016 X 4616. Would I have to resize them to be 1942 X 2677 and then run your preprocessing to get an accurate result? Can I not use dicom images unless they match exactly to the resolution you used?

kjgeras · 2019-05-07T16:11:20Z

It is hard to say. I think we didn't have any images, which were that large in our data set, but that doesn't mean that it necessarily wouldn't world. I would suggest the following debugging strategy:

If the AUC you are getting for your dataset is relatively low but clearly not random (i.e. >0.6 and <0.8), then it is possible that the difference in performance is coming from some changes in the distribution of the data to which our model is might not be robust to. The difference might be the size of the images, contrast, digital vs. not digital mammography, difference in the definition of the labels etc.: there are multiple different options here. If different problems like this accumulate, it can degrade performance. The good news in that case is that, you could retrain our model with your data (even if your dataset is relatively small) to fix it.
If the AUC is really low (i.e. < 0.6), and the predictions look strange (e.g. are always the same, regardless of an example), the problem is coming almost certainly from a difference in preprocessing or normalization. You need to check every little detail to make sure that you are doing it exactly the way we did it.

Tonthatj · 2019-05-07T16:20:58Z

Just double checking, You cropped your dicom images before running them through crop_mammogram.py?

jpatrickpark · 2019-05-08T14:25:32Z

We do two stages of cropping. 1. We remove background from dicom images in order to improve loading time, which is done by crop_mammogram. 2. And from the cropped images of any arbitrary size after first stage, we further crop (or pad) image to be specific size (2642x1977 or 2974x1748) in order to feed the model (which is done by data_loading.augmentation.random_augmentation_bedt_center). So it is okay if your dicom files have different resolution. Please make sure you are following both stages in the right order.

Tonthatj · 2019-05-08T15:25:00Z

So do the png files produced by crop_mammo.py need to be of specific size (2642x1977 or 2974x1748) to get an accurate result? Or as long as they are all of 2 specific sizes it should provide reasonable result.

jpatrickpark · 2019-05-08T15:38:36Z

There's no requirement of image size for files produced from crop_mammogram.

jpatrickpark · 2019-05-16T01:09:42Z

Please feel free to run the classifiers again now that we updated the source code.

Tonthatj · 2019-05-16T19:07:15Z

Hey, I am trying to perform transfer learning on another dataset. Unfortunately I am running into a problem where I have the results of the prediction and the corresponding truth values in numpy arrays, and therefore cannot use the torch loss functions. When creating my own cross entropy function, I cannot use .backwards on the loss that I computed.

Is it possible for you to share your training code. I would like to look at how you calculate the loss.

zphang · 2019-05-22T21:41:27Z

Could you post the code you're running/error message you're getting?

kjgeras closed this as completed May 3, 2019

kjgeras assigned jpatrickpark May 7, 2019

Chunlwu mentioned this issue Jun 10, 2019

Predict on DDSM #13

Closed

nightandweather mentioned this issue Nov 26, 2020

some images doesn't work in crop_single_mammogram.py #42

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Testing on other data sets #9

Testing on other data sets #9

Tonthatj commented May 3, 2019

kjgeras commented May 3, 2019

Tonthatj commented May 3, 2019

kjgeras commented May 4, 2019

kjgeras commented May 4, 2019

Tonthatj commented May 7, 2019

kjgeras commented May 7, 2019

Tonthatj commented May 7, 2019

kjgeras commented May 7, 2019

Tonthatj commented May 7, 2019

jpatrickpark commented May 8, 2019 •

edited

Loading

Tonthatj commented May 8, 2019

jpatrickpark commented May 8, 2019

jpatrickpark commented May 16, 2019

Tonthatj commented May 16, 2019

zphang commented May 22, 2019

Testing on other data sets #9

Testing on other data sets #9

Comments

Tonthatj commented May 3, 2019

kjgeras commented May 3, 2019

Tonthatj commented May 3, 2019

kjgeras commented May 4, 2019

kjgeras commented May 4, 2019

Tonthatj commented May 7, 2019

kjgeras commented May 7, 2019

Tonthatj commented May 7, 2019

kjgeras commented May 7, 2019

Tonthatj commented May 7, 2019

jpatrickpark commented May 8, 2019 • edited Loading

Tonthatj commented May 8, 2019

jpatrickpark commented May 8, 2019

jpatrickpark commented May 16, 2019

Tonthatj commented May 16, 2019

zphang commented May 22, 2019

jpatrickpark commented May 8, 2019 •

edited

Loading