Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

improve error message for cost function #863

Merged
merged 10 commits into from Apr 17, 2023

Conversation

amanmdesai
Copy link
Contributor

Hello @HDembinski

This PR is regarding #791

@HDembinski
Copy link
Member

HDembinski commented Apr 11, 2023

Cool, thanks! I think the error message is spot on.

Would you be willing to also write a test for this? Otherwise I would merge as is and write a test at some later point. Ideally we want to maintain 100 % test coverage.

A test would go to tests/test_cost.py and the pytest facility to use is with pytest.raises(ValueError): <create condition which triggers the exception>. Please have a look at examples in the source code, just search for pytest.raises.

@HDembinski
Copy link
Member

Thinking more about it, there are a couple of things that need to be improved.

  • I think we should check that the shapes are compatible, not only the length, since the binned cost functions support N-dimensional histograms.
  • The check should be implemented in class BinnedCostWithModel, so that all derived classes benefit from it. A good place seems to be BinnedCostWithModel._pred.

@amanmdesai
Copy link
Contributor Author

Hi @HDembinski

Thanks for the feedback and suggesstions.
I will work on them.

@amanmdesai
Copy link
Contributor Author

Hi @HDembinski

I have added a test for raisevalue error using pytest.raises. However this test fails since scipy is not installed.
In some cases, the other tests in test_cost use the following statement:

@pytest.mark.skipif(not scipy_available, reason="scipy.stats is needed")

so shall I include the above statement to skip test if scipy is not available?
locally, the test seem to pass.

I have also implemented a check for shape compatibility.

Please let me know if something needs to be changed.

Thanks,
Aman

@HDembinski
Copy link
Member

so shall I include the above statement to skip test if scipy is not available?

Yes, please. This is happening because scipy is an optional dependency in some methods. Those methods work with and without scipy being installed, but execute different code in both cases. We want to test both code paths, that's why the tests are run with and without scipy. When you test a method for which scipy is required, you need to explicitly mark so that it is skipped in those test runs.

@amanmdesai
Copy link
Contributor Author

so shall I include the above statement to skip test if scipy is not available?

Yes, please. This is happening because scipy is an optional dependency in some methods. Those methods work with and without scipy being installed, but execute different code in both cases. We want to test both code paths, that's why the tests are run with and without scipy. When you test a method for which scipy is required, you need to explicitly mark so that it is skipped in those test runs.

I see. Thanks. I have added that statment.
I am not sure what to write in tests to check compatibility of shapes (which is probably the reason for decrease in coverage).

src/iminuit/cost.py Outdated Show resolved Hide resolved
Comment on lines 1145 to 1146
if len(self._xe_shape) != len(self._model_xe):
raise ValueError("Variable shapes do not match")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this always matches by construction, no? So this if should never trigger and can be removed.

src/iminuit/cost.py Outdated Show resolved Hide resolved
Comment on lines 1157 to 1161
if len(self._data) != len(d):
raise ValueError(
f"Expected model to return an array of size {len(self._data)},"
f"but it returns an array of size {len(d)}"
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good, but d and self._data could be multidimensional, so we want to check self._data.shape == d.shape.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The error message needs to be adapted accordingly, "array of size" -> "array with shape" etc

Comment on lines 1403 to 1407
@pytest.mark.skipif(not scipy_available, reason="scipy.stats is needed")
def test_error_message_cost():
from iminuit import cost
import numpy as np
from scipy.stats import norm
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, this test only requires scipy because you import norm from scipy.stats. That's not necessary. We don't actually want to fit this cost function, so we can invent a bogus model that just returns a constant array with the wrong shape. In other words, you don't need norm to test this.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please also test the multi-dimensional case, where edges is a tuple of two arrays, with edges for the x and y-axis, and n is a 2d array.

@amanmdesai
Copy link
Contributor Author

Hi @HDembinski

I have implemented all suggesstions with the exception of 2D test (I could not implement this test).

@HDembinski
Copy link
Member

I have implemented all suggesstions with the exception of 2D test (I could not implement this test).

Ok, I implemented the 2D test and found some issue with our implementation. We should already test the length of the array returned by the model before it is further reshaped by _pred, otherwise there still will be confusing error messages.

I made a couple of unrelated changes, to make mypy happier and to increase coverage back to 100 %.

@amanmdesai
Copy link
Contributor Author

I have implemented all suggesstions with the exception of 2D test (I could not implement this test).

Ok, I implemented the 2D test and found some issue with our implementation. We should already test the length of the array returned by the model before it is further reshaped by _pred, otherwise there still will be confusing error messages.

I made a couple of unrelated changes, to make mypy happier and to increase coverage back to 100 %.

Thanks a lot @HDembinski for your help and guidance!

@HDembinski HDembinski merged commit 905dbbd into scikit-hep:develop Apr 17, 2023
7 checks passed
@HDembinski
Copy link
Member

Thanks @amanmdesai !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants