Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement InverseCDFSampler class #2229

Merged
merged 7 commits into from Jun 13, 2019

Conversation

fabiopintore
Copy link
Contributor

This PR implement the new class InverseCDFSampler into `gammapy.utils.random'. This PR request is necessary to the Event-Sampler discussed in #2136 .

We added two tests that use the InverseCDFSampler class to sample from a uniform distribution or a gaussian distribution. The tests compare the sampled distributions with the expected ones.

@adonath adonath self-assigned this Jun 12, 2019
@adonath adonath added this to the 0.13 milestone Jun 12, 2019
Copy link
Member

@adonath adonath left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @fabiopintore for this PR. I've left a few comments to address.

@@ -0,0 +1,70 @@
# Licensed under a 3-clause BSD style license - see LICENSE.rst

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove empty line...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

import numpy as np
from numpy.testing import assert_allclose
import scipy.optimize
import scipy.stats as stats
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In Gammapy we typically use the following import order:

  1. Python standard libraries
  2. Numpy, Scipy, Astropy
  3. Relative imports
    Can you fix the order following this scheme?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.


class InverseCDFSampler:
"""Inverse CDF sampler.
It determines a set of random numbers and calculate the cumulative distribution function.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Insert a new line here and de-indent the second line. When you look into the Gammapy code base you'll find a lot of examples, what the docstring format looks like. You ca also build the docs locally using make docs-all (including tutorials which takes ~15 min.) or python setup.py build_docs (without tutorials, this will show a few warnings, but runs much faster...). The html docs can be found in docs/_build/html and you can open them use using your preferred browser or on a Mac open docs/_build/html/index.html.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Parameters
----------
pdf : `~`gammapy.maps.Map`
predicted source counts
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indent this line

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

----------
pdf : `~`gammapy.maps.Map`
predicted source counts

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please extend the doctoring to also include axis and random_state, so that it matches the signature of the __init__ method.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

assert_allclose((np.mean(hist[0]) * 100./1000.), 1.0, atol=1e-4)

def test_norm_dist_sampling():
for i in np.arange(1,100):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the purpose of the loop here? Just remove it and choose one sigma for the Gaussian you use for testing...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

b2 = (b1 + np.roll(b1,1))/2.0
b2 = np.delete(b2, 0)
guess = np.array([50, 0.05, 1000])
fitParams, fitCovariances = scipy.optimize.curve_fit(gaus, b2, a1, guess, sigma=(np.zeros(len(b2))+0.01), maxfev=100000)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is not need to do curve fit here, np.mean() and np.stddev() will do just fine...then you can also remove the gaus function defined above...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

b2 = np.delete(b2, 0)
guess = np.array([50, 0.05, 1000])
fitParams, fitCovariances = scipy.optimize.curve_fit(gaus, b2, a1, guess, sigma=(np.zeros(len(b2))+0.01), maxfev=100000)
assert_allclose(fitParams[2], 1000., atol=0.15)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again maybe add the assert on the standard deviation...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

idx = InverseCDFSampler(uniform_dist(n_elem=100000), random_state=0)
sampled_elem = idx.sample(1000)
hist = np.histogram(sampled_elem[0],bins=100)
assert_allclose((np.mean(hist[0]) * 100./1000.), 1.0, atol=1e-4)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe add the assert on the standard deviation i.e. call np.stddev?

Maybe it would make the test a bit clearer if you convert the sampled indices into coordinates, like so:

def uniform(x, a=-1, b=1):
    return np.select([x < a, x > b], [0, 0], 1)

x = np.linspace(-10, 10, 1000)
sampler = InverseCDFSampler(pdf=uniform(x))

idx = sampler.sample(int(1e6))

x_sampled = np.interp(idx, np.arange(1000), x)

np.mean(x_sampled)
np.std(x_sampled)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

sigma = i/100.
idx = InverseCDFSampler(gauss_dist(x=x, mu=0, sigma=sigma), random_state=0)
sampled_elem = idx.sample(1000)
a1, b1 = np.histogram((sampled_elem[0]),bins=100)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again maybe convert the indices to coordinates and just compute the mean and standard deviation, as I've described above...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@adonath adonath merged commit 30e3b1e into gammapy:master Jun 13, 2019
Copy link
Contributor

@cdeil cdeil left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@adonath or @fabiopintore - Please see my inline comments and fix them.

To follow PEP8, we should probably rename sortindex -> sort_index.

I did not review the implementation here, this is just some stuff pointed out by PyCharm when I opened the file there.


Parameters
----------
pdf : `~gammapy.maps.Map`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is pdf really Map object?
Should be a Numpy array, no?

Also, it probably should be a probability mass function pmf (that's what it's called in scipy.stats), i.e. finite probability per bin, no?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, correct. It should be a generic Numpy array. I think Axel is going to modify the code soon.

In fact we are working with binned distribution, so the mass function should be a more adequate naming convention.

Map of the predicted source counts.
axis : integer
Axis along which sampling the indexes.
random_state : integer
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please write this docstring like this:
https://docs.gammapy.org/0.12/development/howto.html#random-numbers

I use PyCharm and it uses docstring types to do type and code analysis, and if you write "int" here and later call a method ".uniform" on it it complains.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay.

assert_allclose(np.std(x_sampled), sigma, atol=0.005)


def test_norm_dist_sampling():
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You put two test functions with the same name: test_norm_dist_sampling

What will happen is that the second def statement replaces the first one, and the first test will never run.
Please rename the test functions to be unique.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, this is a mistake. It will be changed soon.


Returns
-------
index : tuple of `~numpy.ndarray`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is index really a tuple. Looking at the code, I think it's a numpy.ndarray?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay. We'll modify the string.

@adonath adonath changed the title Adding the InverseCDFSampler Implement InverseCDFSampler class Jul 25, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants