-
Notifications
You must be signed in to change notification settings - Fork 79
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use torchmetric PSNR implementation and argument ordering #693
Conversation
…ntation with torchmetric PSNR - I replaced the self-implemented PSNR computation with the one provided by torchmetric. - The ordering of torchmetric function call arguments is actually predictions ("preds") and then target ("target"), not the other way around.
MLCommons CLA bot All contributors have signed the MLCommons CLA ✍️ ✅ |
Thanks for this PR, @FelixSteinbauer. Can you elaborate why using the TorchMetrics' implementation would be preferred over what is currently available? Especially since the current implementation seems to handle divisions by zero better [ref] than that of torchmetrics [ref]? I think this would make sense if we were using the |
Codecov Report
@@ Coverage Diff @@
## master #693 +/- ##
==========================================
- Coverage 94.69% 94.68% -0.01%
==========================================
Files 117 117
Lines 8200 8208 +8
==========================================
+ Hits 7765 7772 +7
- Misses 435 436 +1
Flags with carried forward coverage won't be shown. Click here to find out more.
... and 1 file with indirect coverage changes 📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more |
That is a good point. I did neglect the division-by-zero aspect in the description of this PR. Regarding PSNR, I am actually proposing two changes:
I see that from a practical perspective, you might want to avoid inf's and nans in your framework. For our challenge however, we wanted to be most consistent with the "theoretical" definition of PSNR, which seems also to align with the torchmetrics implementation. This (and readability & simplicity) is the reason why I went straight with the torchmetrics implementation. Concerning the |
|
Regarding what torchmetrics does: are you sure it actually computes PSNR using the range? because for me it does not (maybe that is a versioning thigh though). If you do:
The output on my machine is:
In the torchmetrics version I have the line you referenced sets the self.min_target to 0 no matter what. Afterwards in this line the self.min_target is updated with
I am sorry for this lengthy discussion. Maybe what makes sense for GaNDLF is not what makes sense for our specific challenge evaluation. I will also to talk to Florian and the others about these aspects and update this post later. |
From my understanding, the range-based implementations are meant for images with negative values, so images that do not go from Imagine an image that has values from As our images do not necessarily contain I agree it is important to spot perfect reconstructions and we also want to avoid distorting our distribution statistics. |
What is the range for images in the test set? Are all images normalized? |
@FelixSteinbauer: TBH, I am unsure which version of torchmetrics takes the range by default. The version that is currently used by GaNDLF is I think we would all better confer with a statistician regarding the stability of this entire process. If we take the following example "results":
Firstly, we can clearly see that for the case that the PSNR calculations that consider range are able to discern the difference between the different minimum values in the target, which is I think what we would to consider in a challenge so that the results are as accurate and reliable as possible. Secondly, in my experience working with the stats profs on our end, they will always suggest to either drop In general, raising run-time exceptions for users is fine when the input is faulty, which is not actually the case when |
Thanks for conducting these experiments and providing a basis for discussion @sarthakpati ! Currently waiting for feedback from our team. From my naive perspective, it should not matter much..at least for our ranking; we rank the performance for each metric, for each case, and sum up the ranks. I don't see how a participant team should profit systematically from one implementation or the other? In the paper, it might be a bit weird if we report PSNR, which is not really PSNR, but PSNR + epsilon sauce? Perfect reconstructions for 3D images should be highly unlikely, so it's more of an academic problem? What is the big benefit of avoiding I am not a statistics prof, but we could report:
It would be a bit hard to compare if |
I agree that this is primarily an academic problem, and the entire premise lies on the fact whether or not perfect reconstruction is possible. If yes, then it can be easily detected through MSE, and if that is the case, if epsilon is not present, the entire PSNR column will need to be "engineered" through one of 3 ways:
I feel that 3 is more easily explained in a paper than the rest, and provides a more consistent mechanism for performing calculations.
Yup, large real numbers can be compared, but |
try this code :) |
|
Hmm, for me, both options sound fine. If we go for the epsilon solution, we should probably report median + median absolute deviation instead of mean+SD as they should be robust to potential large PSNRs introduced by epsilon. |
Sarthak suggested to output both PSNR with- and without epsilon. I believe this is a good idea, then the users can choose and will be aware. |
Yup, let's add a new |
That sounds like a good solution. Thanks for this constructive and elaborate discussion. Now, regarding this pull request; Should I just close it? Or can/should I do the modifications for |
Sure thing!
I can put this PR as a draft. You can make the changes on this branch itself, and then once you push your changes, they will get automatically reflected on the PR. Sound okay? |
peak_signal_noise_ratio_eps with the initial PSNR implementation (using range and epsilon)
Additionally to the vanilla PSNR, also the PSNR based on value range and with epsilon in the denominator is now added to the overall_stats_dict as "psnr_range_eps"
From my understanding, Sarthak's goal is to have one implementation to rule them all? This is tricky as, from my understanding, correct computation of PSNR might require top-down knowledge about the images that cannot be computed from the images themselves. One example of this is are our images not featuring background. Here PSNR needs to be computed for a range from To make it more complicated, some of the images in the BraTS dataset seem to feature negative values, which are probably artifacts from registration or skull-stripping. Such specifics will vary from dataset to dataset. The only option I see to serve and please everyone is to provide arguments for defining |
The current torchmetrics version getting installed with GaNDLF takes care of the range, and I think we should keep it like that: >>> import torch
>>> import torchmetrics
>>> torchmetrics.__version__
'0.8.1'
>>> from torchmetrics.image import PeakSignalNoiseRatio
>>> psnr = PeakSignalNoiseRatio()
>>> prediction = torch.Tensor([4,3,2])
>>> target_0 = torch.Tensor([2,3,4])
>>> target_1 = torch.Tensor([0,3,4])
>>> target_2 = torch.Tensor([-1,3,4])
>>> psnr(preds=prediction, target=target_0)
tensor(7.7815)
>>> psnr(preds=prediction, target=target_1)
tensor(3.8021)
>>> psnr(preds=prediction, target=target_2)
tensor(4.1266) |
I don't we have this case on our dataset... |
If that's the case, then is it not prudent for the artefacts themselves to be fixed? And the PSNR calculation with proper min and max will correctly point this issue out. |
Fixing artefacts would be great. But I think the BraTS Umbrella organisation needs to deal with this topic. The outliers are not that big. The 4 strongest outliers in the training set have following ranges: |
So, I think I managed to include the ideas/issues of the above discussion into a unified PSNR function by adding parameters for the data_range and epsilon. The function now looks like this: def peak_signal_noise_ratio(target, prediction, data_range=None, epsilon=None) -> torch.Tensor:
"""
Computes the peak signal to noise ratio between the target and prediction.
Args:
target (torch.Tensor): The target tensor.
prediction (torch.Tensor): The prediction tensor.
data_range (float, optional): If not None, this data range is used as enumerator instead of computing it from the given data. Defaults to None.
epsilon (float, optional): If not None, this epsilon is added to the denominator of the fraction to avoid infinity as output. Defaults to None.
"""
if epsilon == None:
psnr = PeakSignalNoiseRatio(data_range=data_range)
return psnr(preds=prediction, target=target)
else: #reimplement torchmetrics RSNR but with epsilon
mse = mean_squared_error(target, prediction)
if data_range == None: #compute data_range like torchmetrics if not given
min_v = 0 if torch.min(target) > 0 else torch.min(target) #look at this line
max_v = torch.max(target)
data_range = max_v - min_v
return 10.0 * torch.log10(data_range) ** 2) / (mse + epsilon) If you want:
Does this cover every (use)case we wanted? Does that make everyone happy? Suggestions for improvement? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor revision requested, but should be good to go otherwise.
I made an example that should clarify the behavior of torchmetrics: import torch
import torchmetrics
print(torchmetrics.__version__)
"1.0.1"
from torchmetrics import MeanSquaredError
from torchmetrics.image import PeakSignalNoiseRatio
mse = MeanSquaredError()
prediction = torch.Tensor([4, 3, 2])
target_0 = torch.Tensor([2, 3, 4])
target_1 = torch.Tensor([0, 3, 4])
target_2 = torch.Tensor([-1, 3, 4])
def psnr(preds, target):
min_v = 0 if torch.min(target) > 0 else torch.min(target) # look at this line
max_v = torch.max(target)
data_range = max_v - min_v
result = 10.0 * torch.log10(data_range**2 / mse(preds=preds, target=target))
print(result)
return result
def torchmetrics_original_psnr(range, preds, target):
psnr_computer = PeakSignalNoiseRatio(data_range=range)
psnr = psnr_computer(
preds=preds,
target=target,
)
print(psnr)
return psnr
psnr(preds=prediction, target=target_0)
torchmetrics_original_psnr(range=(0, 4), preds=prediction, target=target_0)
print("for positive values torchmetrics ignores the minimum in the data and chooses 0")
torchmetrics_original_psnr(range=(2, 4), preds=prediction, target=target_0)
psnr(preds=prediction, target=target_1)
torchmetrics_original_psnr(range=(0, 4), preds=prediction, target=target_1)
psnr(preds=prediction, target=target_2)
torchmetrics_original_psnr(range=(-1, 4), preds=prediction, target=target_2) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor revisions and we should be good to go.
Co-authored-by: Sarthak Pati <sarthak.pati@pennmedicine.upenn.edu>
Co-authored-by: Sarthak Pati <sarthak.pati@pennmedicine.upenn.edu>
I don't know why changing this comment broke the testing pipeline. Maybe it was not the comment. I changed the quotation marks hoping that would help (probably does not though...)
Now the code should be in the state when it last worked
I do not know why the pipeline fails To me it seems unrelated to the changes that were made since the last successful run.
@sarthakpati Sorry to bother you again, but I really do not get why the pipeline is failing (OpenFL-Test & CI-PyTest). I already tried to revert to the last state where the checks succeeded, but then the checks still fail. I am not certain what to do now. I don't even understand what the reason for these error are. |
Apologies, but due to an unforeseen issue, OpenFL tests are failing [ref]. Please hang on tight while we are coordinating a fix, and we will re-initiate the failing run automatically. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
I did not test the changes for GaNDLF specifically as they are minimal and I tested them for our BraTS repo where they did work and we basically use the same code anyways...
Fixes #ISSUE_NUMBER
Proposed Changes
Checklist
CONTRIBUTING
guide.pip install
step is needed for PR to be functional), please ensure it is reflected in all the files that control the CI, namely: python-test.yml, and all docker files [1,2,3].