New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement InverseCDFSampler
class
#2229
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @fabiopintore for this PR. I've left a few comments to address.
gammapy/utils/random/inverse_cdf.py
Outdated
@@ -0,0 +1,70 @@ | |||
# Licensed under a 3-clause BSD style license - see LICENSE.rst | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove empty line...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
import numpy as np | ||
from numpy.testing import assert_allclose | ||
import scipy.optimize | ||
import scipy.stats as stats |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In Gammapy we typically use the following import order:
- Python standard libraries
- Numpy, Scipy, Astropy
- Relative imports
Can you fix the order following this scheme?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
gammapy/utils/random/inverse_cdf.py
Outdated
|
||
class InverseCDFSampler: | ||
"""Inverse CDF sampler. | ||
It determines a set of random numbers and calculate the cumulative distribution function. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Insert a new line here and de-indent the second line. When you look into the Gammapy code base you'll find a lot of examples, what the docstring format looks like. You ca also build the docs locally using make docs-all
(including tutorials which takes ~15 min.) or python setup.py build_docs
(without tutorials, this will show a few warnings, but runs much faster...). The html docs can be found in docs/_build/html
and you can open them use using your preferred browser or on a Mac open docs/_build/html/index.html
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
gammapy/utils/random/inverse_cdf.py
Outdated
Parameters | ||
---------- | ||
pdf : `~`gammapy.maps.Map` | ||
predicted source counts |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indent this line
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
gammapy/utils/random/inverse_cdf.py
Outdated
---------- | ||
pdf : `~`gammapy.maps.Map` | ||
predicted source counts | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please extend the doctoring to also include axis
and random_state
, so that it matches the signature of the __init__
method.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
assert_allclose((np.mean(hist[0]) * 100./1000.), 1.0, atol=1e-4) | ||
|
||
def test_norm_dist_sampling(): | ||
for i in np.arange(1,100): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the purpose of the loop here? Just remove it and choose one sigma for the Gaussian you use for testing...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
b2 = (b1 + np.roll(b1,1))/2.0 | ||
b2 = np.delete(b2, 0) | ||
guess = np.array([50, 0.05, 1000]) | ||
fitParams, fitCovariances = scipy.optimize.curve_fit(gaus, b2, a1, guess, sigma=(np.zeros(len(b2))+0.01), maxfev=100000) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is not need to do curve fit here, np.mean()
and np.stddev()
will do just fine...then you can also remove the gaus
function defined above...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
b2 = np.delete(b2, 0) | ||
guess = np.array([50, 0.05, 1000]) | ||
fitParams, fitCovariances = scipy.optimize.curve_fit(gaus, b2, a1, guess, sigma=(np.zeros(len(b2))+0.01), maxfev=100000) | ||
assert_allclose(fitParams[2], 1000., atol=0.15) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again maybe add the assert on the standard deviation...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
idx = InverseCDFSampler(uniform_dist(n_elem=100000), random_state=0) | ||
sampled_elem = idx.sample(1000) | ||
hist = np.histogram(sampled_elem[0],bins=100) | ||
assert_allclose((np.mean(hist[0]) * 100./1000.), 1.0, atol=1e-4) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe add the assert on the standard deviation i.e. call np.stddev
?
Maybe it would make the test a bit clearer if you convert the sampled indices into coordinates, like so:
def uniform(x, a=-1, b=1):
return np.select([x < a, x > b], [0, 0], 1)
x = np.linspace(-10, 10, 1000)
sampler = InverseCDFSampler(pdf=uniform(x))
idx = sampler.sample(int(1e6))
x_sampled = np.interp(idx, np.arange(1000), x)
np.mean(x_sampled)
np.std(x_sampled)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
sigma = i/100. | ||
idx = InverseCDFSampler(gauss_dist(x=x, mu=0, sigma=sigma), random_state=0) | ||
sampled_elem = idx.sample(1000) | ||
a1, b1 = np.histogram((sampled_elem[0]),bins=100) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again maybe convert the indices to coordinates and just compute the mean and standard deviation, as I've described above...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@adonath or @fabiopintore - Please see my inline comments and fix them.
To follow PEP8, we should probably rename sortindex
-> sort_index
.
I did not review the implementation here, this is just some stuff pointed out by PyCharm when I opened the file there.
|
||
Parameters | ||
---------- | ||
pdf : `~gammapy.maps.Map` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is pdf
really Map
object?
Should be a Numpy array, no?
Also, it probably should be a probability mass function pmf
(that's what it's called in scipy.stats), i.e. finite probability per bin, no?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, correct. It should be a generic Numpy array. I think Axel is going to modify the code soon.
In fact we are working with binned distribution, so the mass function should be a more adequate naming convention.
Map of the predicted source counts. | ||
axis : integer | ||
Axis along which sampling the indexes. | ||
random_state : integer |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please write this docstring like this:
https://docs.gammapy.org/0.12/development/howto.html#random-numbers
I use PyCharm and it uses docstring types to do type and code analysis, and if you write "int" here and later call a method ".uniform" on it it complains.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay.
assert_allclose(np.std(x_sampled), sigma, atol=0.005) | ||
|
||
|
||
def test_norm_dist_sampling(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You put two test functions with the same name: test_norm_dist_sampling
What will happen is that the second def
statement replaces the first one, and the first test will never run.
Please rename the test functions to be unique.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, this is a mistake. It will be changed soon.
|
||
Returns | ||
------- | ||
index : tuple of `~numpy.ndarray` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is index
really a tuple. Looking at the code, I think it's a numpy.ndarray
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay. We'll modify the string.
InverseCDFSampler
class
This PR implement the new class InverseCDFSampler into `gammapy.utils.random'. This PR request is necessary to the Event-Sampler discussed in #2136 .
We added two tests that use the InverseCDFSampler class to sample from a uniform distribution or a gaussian distribution. The tests compare the sampled distributions with the expected ones.