Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement InverseCDFSampler class #2229

Merged
merged 7 commits into from Jun 13, 2019

Conversation

@fabiopintore
Copy link
Contributor

@fabiopintore fabiopintore commented Jun 12, 2019

This PR implement the new class InverseCDFSampler into `gammapy.utils.random'. This PR request is necessary to the Event-Sampler discussed in #2136 .

We added two tests that use the InverseCDFSampler class to sample from a uniform distribution or a gaussian distribution. The tests compare the sampled distributions with the expected ones.

@adonath adonath self-assigned this Jun 12, 2019
@adonath adonath added this to the 0.13 milestone Jun 12, 2019
Copy link
Member

@adonath adonath left a comment

Thanks @fabiopintore for this PR. I've left a few comments to address.

Loading

@@ -0,0 +1,70 @@
# Licensed under a 3-clause BSD style license - see LICENSE.rst

Copy link
Member

@adonath adonath Jun 12, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove empty line...

Loading

Copy link
Contributor Author

@fabiopintore fabiopintore Jun 13, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Loading

import numpy as np
from numpy.testing import assert_allclose
import scipy.optimize
import scipy.stats as stats
Copy link
Member

@adonath adonath Jun 12, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In Gammapy we typically use the following import order:

  1. Python standard libraries
  2. Numpy, Scipy, Astropy
  3. Relative imports
    Can you fix the order following this scheme?

Loading

Copy link
Contributor Author

@fabiopintore fabiopintore Jun 13, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Loading


class InverseCDFSampler:
"""Inverse CDF sampler.
It determines a set of random numbers and calculate the cumulative distribution function.
Copy link
Member

@adonath adonath Jun 12, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Insert a new line here and de-indent the second line. When you look into the Gammapy code base you'll find a lot of examples, what the docstring format looks like. You ca also build the docs locally using make docs-all (including tutorials which takes ~15 min.) or python setup.py build_docs (without tutorials, this will show a few warnings, but runs much faster...). The html docs can be found in docs/_build/html and you can open them use using your preferred browser or on a Mac open docs/_build/html/index.html.

Loading

Copy link
Contributor Author

@fabiopintore fabiopintore Jun 13, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Loading

Parameters
----------
pdf : `~`gammapy.maps.Map`
predicted source counts
Copy link
Member

@adonath adonath Jun 12, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indent this line

Loading

Copy link
Contributor Author

@fabiopintore fabiopintore Jun 13, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Loading

----------
pdf : `~`gammapy.maps.Map`
predicted source counts
Copy link
Member

@adonath adonath Jun 12, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please extend the doctoring to also include axis and random_state, so that it matches the signature of the __init__ method.

Loading

Copy link
Contributor Author

@fabiopintore fabiopintore Jun 13, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Loading

assert_allclose((np.mean(hist[0]) * 100./1000.), 1.0, atol=1e-4)

def test_norm_dist_sampling():
for i in np.arange(1,100):
Copy link
Member

@adonath adonath Jun 12, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the purpose of the loop here? Just remove it and choose one sigma for the Gaussian you use for testing...

Loading

Copy link
Contributor Author

@fabiopintore fabiopintore Jun 13, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Loading

b2 = (b1 + np.roll(b1,1))/2.0
b2 = np.delete(b2, 0)
guess = np.array([50, 0.05, 1000])
fitParams, fitCovariances = scipy.optimize.curve_fit(gaus, b2, a1, guess, sigma=(np.zeros(len(b2))+0.01), maxfev=100000)
Copy link
Member

@adonath adonath Jun 12, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is not need to do curve fit here, np.mean() and np.stddev() will do just fine...then you can also remove the gaus function defined above...

Loading

Copy link
Contributor Author

@fabiopintore fabiopintore Jun 13, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Loading

b2 = np.delete(b2, 0)
guess = np.array([50, 0.05, 1000])
fitParams, fitCovariances = scipy.optimize.curve_fit(gaus, b2, a1, guess, sigma=(np.zeros(len(b2))+0.01), maxfev=100000)
assert_allclose(fitParams[2], 1000., atol=0.15)
Copy link
Member

@adonath adonath Jun 12, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again maybe add the assert on the standard deviation...

Loading

Copy link
Contributor Author

@fabiopintore fabiopintore Jun 13, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Loading

idx = InverseCDFSampler(uniform_dist(n_elem=100000), random_state=0)
sampled_elem = idx.sample(1000)
hist = np.histogram(sampled_elem[0],bins=100)
assert_allclose((np.mean(hist[0]) * 100./1000.), 1.0, atol=1e-4)
Copy link
Member

@adonath adonath Jun 12, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe add the assert on the standard deviation i.e. call np.stddev?

Maybe it would make the test a bit clearer if you convert the sampled indices into coordinates, like so:

def uniform(x, a=-1, b=1):
    return np.select([x < a, x > b], [0, 0], 1)

x = np.linspace(-10, 10, 1000)
sampler = InverseCDFSampler(pdf=uniform(x))

idx = sampler.sample(int(1e6))

x_sampled = np.interp(idx, np.arange(1000), x)

np.mean(x_sampled)
np.std(x_sampled)

Loading

Copy link
Contributor Author

@fabiopintore fabiopintore Jun 13, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Loading

sigma = i/100.
idx = InverseCDFSampler(gauss_dist(x=x, mu=0, sigma=sigma), random_state=0)
sampled_elem = idx.sample(1000)
a1, b1 = np.histogram((sampled_elem[0]),bins=100)
Copy link
Member

@adonath adonath Jun 12, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again maybe convert the indices to coordinates and just compute the mean and standard deviation, as I've described above...

Loading

Copy link
Contributor Author

@fabiopintore fabiopintore Jun 13, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Loading

@adonath adonath merged commit 30e3b1e into gammapy:master Jun 13, 2019
8 of 9 checks passed
Loading
Copy link
Member

@cdeil cdeil left a comment

@adonath or @fabiopintore - Please see my inline comments and fix them.

To follow PEP8, we should probably rename sortindex -> sort_index.

I did not review the implementation here, this is just some stuff pointed out by PyCharm when I opened the file there.

Loading

Parameters
----------
pdf : `~gammapy.maps.Map`
Copy link
Member

@cdeil cdeil Jun 16, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is pdf really Map object?
Should be a Numpy array, no?

Also, it probably should be a probability mass function pmf (that's what it's called in scipy.stats), i.e. finite probability per bin, no?

Loading

Copy link
Contributor Author

@fabiopintore fabiopintore Jun 17, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, correct. It should be a generic Numpy array. I think Axel is going to modify the code soon.

In fact we are working with binned distribution, so the mass function should be a more adequate naming convention.

Loading

Map of the predicted source counts.
axis : integer
Axis along which sampling the indexes.
random_state : integer
Copy link
Member

@cdeil cdeil Jun 16, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please write this docstring like this:
https://docs.gammapy.org/0.12/development/howto.html#random-numbers

I use PyCharm and it uses docstring types to do type and code analysis, and if you write "int" here and later call a method ".uniform" on it it complains.

Loading

Copy link
Contributor Author

@fabiopintore fabiopintore Jun 17, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay.

Loading

assert_allclose(np.std(x_sampled), sigma, atol=0.005)


def test_norm_dist_sampling():
Copy link
Member

@cdeil cdeil Jun 16, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You put two test functions with the same name: test_norm_dist_sampling

What will happen is that the second def statement replaces the first one, and the first test will never run.
Please rename the test functions to be unique.

Loading

Copy link
Contributor Author

@fabiopintore fabiopintore Jun 17, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, this is a mistake. It will be changed soon.

Loading

Returns
-------
index : tuple of `~numpy.ndarray`
Copy link
Member

@cdeil cdeil Jun 16, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is index really a tuple. Looking at the code, I think it's a numpy.ndarray?

Loading

Copy link
Contributor Author

@fabiopintore fabiopintore Jun 17, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay. We'll modify the string.

Loading

@adonath adonath changed the title Adding the InverseCDFSampler Implement InverseCDFSampler class Jul 25, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Linked issues

Successfully merging this pull request may close these issues.

None yet

3 participants