Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Testing on other data sets #9

Closed
Tonthatj opened this issue May 3, 2019 · 15 comments
Closed

Testing on other data sets #9

Tonthatj opened this issue May 3, 2019 · 15 comments
Assignees

Comments

@Tonthatj
Copy link

Tonthatj commented May 3, 2019

Hi, so I was testing the model against another dataset of mammos, and was wondering if the inputted image dimensions have to be exact? Your sample cropped photos: (2440X3656) & (2607X3818) and of ours (1993X4396) & (2133X4906).

P_00005,RIGHT, CC, MALIGNANT, 0.1293 ,0.0123
P_00005,RIGHT,MLO, MALIGNANT ,0.1293, 0.0123
P_00007, LEFT, CC, BENIGN ,0.3026, 0.1753
P_00007,LEFT,MLO, BENIGN, 0.3026, 0.1753

As you can see the probabilities for benign and malignancy (respectively) are incredibly low, Out of a dataset of 200, the model only accurately predicted ~ 10 of them.

@kjgeras
Copy link
Contributor

kjgeras commented May 3, 2019

@Tonthatj If I understand you correctly, this is what you should expect. This classifier is trained with a very imbalanced data set, only a small fraction of the training examples contain a malignancy. This is skewing the classifier towards predicting very low probability of malignancy. What this classifier should be good at is distinguishing between malignant and not malignant cases (which is captured by AUC). It will not necessarily provide accurate probability estimates.

@kjgeras kjgeras closed this as completed May 3, 2019
@Tonthatj
Copy link
Author

Tonthatj commented May 3, 2019

So in what is the threshold for determining if the study as a whole was malignant or not?

@kjgeras
Copy link
Contributor

kjgeras commented May 4, 2019

There is no universally "correct" threshold. It depends on the dataset you are planning to apply it to. One way you can pick a sensible threshold using validation data in the following. Assuming that you expect p% of cancers in your dataset, sort the validation examples according to the predicted probability of malignancy, pick top p%, the lowest estimated probability of malignancy in that set is your threshold.

@kjgeras
Copy link
Contributor

kjgeras commented May 4, 2019

By the way, I'm not sure if I'm interpreting the numbers in the first post above correctly, but they look weird to me. If you are getting some strange results such as AUC = 0.5, it's probably because you are preprocessing the data differently than we did. Look at the tech report if that is the case: https://cs.nyu.edu/~kgeras/reports/datav1.0.pdf

@Tonthatj
Copy link
Author

Tonthatj commented May 7, 2019

Hi @kjgeras , we followed all of your preprocessing instructions, with the exception that our dicom images have a larger dimension. Do you think that this is the reason for the poor AUC? All of the cropped image resolutions are larger than the specified resolution of 2290 × 1890 that you mentioned in tech report.

@kjgeras
Copy link
Contributor

kjgeras commented May 7, 2019

It's difficult to answer this question based on what you wrote. You have to do the preprocessing and image normalization exactly the same way as we do. If we differ in just one detail, you are going to get random predictions.

@Tonthatj
Copy link
Author

Tonthatj commented May 7, 2019

@kjgeras I have Dicoms with resolutions of 3016 X 4616. Would I have to resize them to be 1942 X 2677 and then run your preprocessing to get an accurate result? Can I not use dicom images unless they match exactly to the resolution you used?

@kjgeras
Copy link
Contributor

kjgeras commented May 7, 2019

It is hard to say. I think we didn't have any images, which were that large in our data set, but that doesn't mean that it necessarily wouldn't world. I would suggest the following debugging strategy:

  1. If the AUC you are getting for your dataset is relatively low but clearly not random (i.e. >0.6 and <0.8), then it is possible that the difference in performance is coming from some changes in the distribution of the data to which our model is might not be robust to. The difference might be the size of the images, contrast, digital vs. not digital mammography, difference in the definition of the labels etc.: there are multiple different options here. If different problems like this accumulate, it can degrade performance. The good news in that case is that, you could retrain our model with your data (even if your dataset is relatively small) to fix it.

  2. If the AUC is really low (i.e. < 0.6), and the predictions look strange (e.g. are always the same, regardless of an example), the problem is coming almost certainly from a difference in preprocessing or normalization. You need to check every little detail to make sure that you are doing it exactly the way we did it.

@Tonthatj
Copy link
Author

Tonthatj commented May 7, 2019

Just double checking, You cropped your dicom images before running them through crop_mammogram.py?

@jpatrickpark
Copy link
Collaborator

jpatrickpark commented May 8, 2019

We do two stages of cropping. 1. We remove background from dicom images in order to improve loading time, which is done by crop_mammogram. 2. And from the cropped images of any arbitrary size after first stage, we further crop (or pad) image to be specific size (2642x1977 or 2974x1748) in order to feed the model (which is done by data_loading.augmentation.random_augmentation_bedt_center). So it is okay if your dicom files have different resolution. Please make sure you are following both stages in the right order.

@Tonthatj
Copy link
Author

Tonthatj commented May 8, 2019

So do the png files produced by crop_mammo.py need to be of specific size (2642x1977 or 2974x1748) to get an accurate result? Or as long as they are all of 2 specific sizes it should provide reasonable result.

@jpatrickpark
Copy link
Collaborator

There's no requirement of image size for files produced from crop_mammogram.

@jpatrickpark
Copy link
Collaborator

Please feel free to run the classifiers again now that we updated the source code.

@Tonthatj
Copy link
Author

Hey, I am trying to perform transfer learning on another dataset. Unfortunately I am running into a problem where I have the results of the prediction and the corresponding truth values in numpy arrays, and therefore cannot use the torch loss functions. When creating my own cross entropy function, I cannot use .backwards on the loss that I computed.

Is it possible for you to share your training code. I would like to look at how you calculate the loss.

@zphang
Copy link
Contributor

zphang commented May 22, 2019

Could you post the code you're running/error message you're getting?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants