Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

1d Confidence Regions #145

Merged
merged 33 commits into from
Mar 10, 2021
Merged

1d Confidence Regions #145

merged 33 commits into from
Mar 10, 2021

Conversation

htjb
Copy link
Collaborator

@htjb htjb commented Feb 8, 2021

Description

Fixes #142

I have added a function, 'iso_probability_1d' to utils to find the sample values where a 1D posterior has a given probability (e.g. [0.68, 0.95]). I have also edited 'kde_plot_1d' and 'fastkde_plot_1d' so that the user can optionally shade between these regions. The function is flexible enough to take in any number of probability values/ levels (e.g. [0.68, 0.95, 0.99] ect.) and plot the related shaded regions. The user can change the colour of the shaded regions, the gradient of shading (matplotlib alpha values) and invert the shading if they want to emphases disfavoured regions rather than favoured.

Docs, tests and flake8 have all been checked locally.

The basic example use case below will give something like the attached plot.

  import numpy as np
  import matplotlib.pyplot as plt
  from anesthetic.plot import kde_plot_1d
  
  x = np.hstack([np.random.normal(0, 3, 1000), np.random.normal(10, 2, 2000)])
  
  fig, axes = plt.subplots(1, 1)
  # additional kwargs for formatting have been added to documentation
  kde_plot_1d(axes, x, fill=True)
  plt.show()

Checklist:

  • I have performed a self-review of my own code
  • My code is PEP8 compliant (flake8 anesthetic tests)
  • My code contains compliant docstrings (pydocstyle --convention=numpy anesthetic)
  • New and existing unit tests pass locally with my changes (python -m pytest)
  • I have added tests that prove my fix is effective or that my feature works

example

@codecov
Copy link

codecov bot commented Feb 8, 2021

Codecov Report

Merging #145 (3206407) into master (2c00243) will increase coverage by 0.10%.
The diff coverage is 100.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #145      +/-   ##
==========================================
+ Coverage   94.87%   94.98%   +0.10%     
==========================================
  Files          16       16              
  Lines        1581     1614      +33     
==========================================
+ Hits         1500     1533      +33     
  Misses         81       81              
Impacted Files Coverage Δ
anesthetic/plot.py 97.54% <100.00%> (+0.21%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 2c00243...3206407. Read the comment docs.

@williamjameshandley
Copy link
Collaborator

Many thanks for this @htjb. Let me know when the code is ready for review

@htjb
Copy link
Collaborator Author

htjb commented Feb 8, 2021

Hi @williamjameshandley, no problem! It looks like all the checks are passing now and the code should be ready for review. Thanks!

Copy link
Collaborator

@williamjameshandley williamjameshandley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Many thanks @htjb. The final result looks very slick. I think there are a couple of algorithmic/coding improvements we could make which are summarised below.

In terms of the final result, do we think it's better for gray to be the default fill colour, or for the fill colour to be the same as the line? This would make it more consistent with the contour colouring after #140. With that in mind, it might also be more consistent instead of using an alpha-based mode of fill colouring, to use the same cmap scheme as the plot. (this would potentially allow overlays of several lines).

anesthetic/plot.py Outdated Show resolved Hide resolved
anesthetic/utils.py Outdated Show resolved Hide resolved
anesthetic/plot.py Outdated Show resolved Hide resolved
anesthetic/plot.py Outdated Show resolved Hide resolved
@htjb
Copy link
Collaborator Author

htjb commented Feb 8, 2021

Hi @williamjameshandley, thanks for the comments.

In terms of the colour and cmap issue, it sounds like a good idea to use the line colour and the existing cmap kwarg. This should simplify the changes being made. I will have a look at doing this shortly as it may effect some of the more specific comments you have.

@htjb
Copy link
Collaborator Author

htjb commented Feb 8, 2021

@williamjameshandley I have now edited the code so that it uses the cmap kwarg and removed the alpha/fill_color kwargs. I also moved the repeated code for plotting the filled regions into it's own function 'fill_plot_1d' to avoid repetition and hopefully made the remaining doc strings more consistent.

There are some replies to the specific method/code comments still left above. Thanks!

@williamjameshandley
Copy link
Collaborator

Could you update the figure above to see what it now looks like with the colour updates?

@htjb
Copy link
Collaborator Author

htjb commented Feb 8, 2021

Could you update the figure above to see what it now looks like with the colour updates?

See above, with cmap = plt.get_cmap('Blues')

Also removed the invert alpha function as well since you can pass things like 'Blues_r' now.

@williamjameshandley
Copy link
Collaborator

This looks awesome in blue! We should probably get @lukashergt 's opinion on the API. To be more consistent, it might be worth considering using something like the facecolor keyword argument (defaulting to None), although this would not quite be the same as for 2D contours, since these are by default filled (unless we changed the default to be filled for 1D).

@htjb
Copy link
Collaborator Author

htjb commented Feb 8, 2021

@williamjameshandley I did wonder about setting default to filled for 1D because at the moment you have to set fill=True and provide a cmap which is a bit messy. Perhaps just providing a cmap is sufficient?

@williamjameshandley
Copy link
Collaborator

I'm minded purely on consistency grounds to consider filling in 1D by default if we can get this working satisfactorily.

To give us an idea of this in action, are you able to update your gaussian mixture model example to be 3D, with more than one posterior, and loaded as MCMCSamples so that we can see how these look as overlayed triangle plots (this could also then be added as an example for plotting).

@htjb
Copy link
Collaborator Author

htjb commented Feb 9, 2021

HI @williamjameshandley, I can definitely look at expanding the example to higher dimensions perhaps with the data that is already being used in the example notebook. Just to clarify you mean plot 3 1D posteriors on the same graph right? Or a triangle plot? or both?

I will look at changing the method to use a cdf first, shouldn't take to long (famous last words!).

@williamjameshandley
Copy link
Collaborator

williamjameshandley commented Feb 9, 2021

Basically I had in mind the ability to see what the overall effect of filled 1d plots down the diagonal in figures like these, but ideally with a gaussian mixture model example like you have in your first example. Something like:

A = MCMCSamples(np.vstack([np.random.multivariate_normal(mu1, cov1, 1000), np.random.multivariate_normal(mu2, cov2, 2000)]))
B = MCMCSamples(np.vstack([np.random.multivariate_normal(mu3, cov3, 1000), np.random.multivariate_normal(mu4, cov4, 2000)]))
fig, ax = A.plot_2d([0,1,2])
B.plot_2d(ax)

@htjb
Copy link
Collaborator Author

htjb commented Feb 9, 2021

@williamjameshandley Here is an example plot using the code you have suggested.

I think there may be an ordering issue with the filled regions and lines.

multi_dim

@williamjameshandley
Copy link
Collaborator

Very nice -- could you post the code that produces this plot?

Copy link
Collaborator

@williamjameshandley williamjameshandley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, minor changes and I think this is good to be added

xmin = kwargs.pop('xmin', None)
xmax = kwargs.pop('xmax', None)
cmap = kwargs.pop('cmap', None)
color = kwargs.pop('color', (next(ax._get_lines.prop_cycler)['color']
if cmap is None else cmap(0.68)))
facecolor = kwargs.pop('facecolor', color)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could facecolor act as the same as filled1d, defaulting to None, but if it is equal to True, then taking color as its default? I don't think facecolor is used otherwise. This would mean we wouldn't need 'filledkde' as an option to plot_1d.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@williamjameshandley facecolor is used if cmap is None and filled1d is True to determine the color map. We could remove filled1d and replace,

if filled1d is True:
        c = iso_probability_contours(p[i], contours=levels)
        if cmap is None:
            cmap = basic_cmap(facecolor)
        for j in range(len(c)-1):
            ax.fill_between(x[i], p[i], where=p[i] >= c[j],
                            color=cmap(c[j]))

with

if any([facecolor, cmap]) is not None:
        c = iso_probability_contours(p[i], contours=levels)
        if cmap is None:
            cmap = basic_cmap(facecolor)
        for j in range(len(c)-1):
            ax.fill_between(x[i], p[i], where=p[i] >= c[j],
                            color=cmap(c[j]))

and set facecolor to None as a default. The user could then pass a color map or a facecolor from which to build a cmap to the function to trigger a filled plot?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would probably implement this by defaulting it to False in the kwargs.pop, and then using an if facecolor check in place of if filled1d, and then within that checking if facecolor is True: facecolor=color. I think that covers all bases.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@williamjameshandley No problem that sounds good to me. See latest commit.

tests/test_plot.py Outdated Show resolved Hide resolved
Comment on lines 292 to 294
if facecolor is True:
facecolor = color
c = iso_probability_contours(p[i], contours=levels)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This logic isn't quite right. It should be

if facecolor:
    if facecolor is True:
        facecolor = color
    c = ...

The difference being that if you pass facecolor='blue' as an argument, then this would give you control over the contour colouring. It's unlikely you'd actually want this except for when you want dark edges by passing facecolor='blue', color='k' as kwargs.

The same should be done in the second function

Copy link
Collaborator Author

@htjb htjb Mar 8, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the logic is as you suggest then when facecolor is none and cmap is also none the following

if cmap is None:
        cmap = basic_cmap(facecolor)

will throw an error.

For it to work as you suggest you would need to remove the if cmap is None: and add in an edgecolor=color to fill_between. I think this would work well...

if facecolor:
        if facecolor is True:
            facecolor = color
        c = iso_probability_contours(p[i], contours=levels)
        cmap = basic_cmap(facecolor)
        for j in range(len(c)-1):
            ax.fill_between(x[i], p[i], where=p[i] >= c[j],
                            color=cmap(c[j]), edgecolor=color)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@williamjameshandley This set up actually gives the user quite a bit of freedom with the colour scheme e.g.

from anesthetic.samples import MCMCSamples
import matplotlib.pyplot as plt
import numpy as np

mu1 = [2, 10, -1]
mu2 = [0, 8, 0]
cov1 = 1*np.eye(3)
cov2 =  1*np.eye(3)

A = MCMCSamples(np.vstack([
    np.random.multivariate_normal(mu1, cov1, 2000),
    np.random.multivariate_normal(mu2, cov2, 2000)]))

fig, ax = A.plot_2d(
    [0, 1],
    types={'diagonal': 'kde', 'lower': 'kde', 'upper': 'scatter'},
    diagonal_kwargs={'facecolor': 'C0', 'color': 'k'})
plt.show()

Looks pretty nice to me...
plot_example

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!

tests/test_plot.py Show resolved Hide resolved
anesthetic/samples.py Outdated Show resolved Hide resolved
Copy link
Collaborator

@williamjameshandley williamjameshandley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work, and pleasingly compact. Please squash and merge

@williamjameshandley
Copy link
Collaborator

The only other thing is that if you want this to included in a new minor release you could bump the version number to 2.0.0-beta.6 in the README.rst

Copy link
Collaborator

@lukashergt lukashergt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Resolved (my bad): I am currently getting an error AttributeError: 'Line2D' object has no property 'facecolor' when trying to run the most recent example by you, @htjb. Is that last code example that you've posted a MWE that should work? What did we decide to use for types in the end?

I think there might still be a few things to iron out:

  • Could we have an equivalent plot to the latest one using the example data pc_250 in the anesthetic test folder and showing the results for all parameters ['x0', 'x1', 'x2', 'x3', 'x4']? Something along the lines
    ns = NestedSamples(root="anesthetic/tests/example_data/pc_250")
    fig, axes = make_2d_axes(['x0', 'x1', 'x2', 'x3', 'x4'])
    ns.plot_2d(axes, **kwargs)
    These parameters contain some special cases (uniform, half constrained) where it might be good to investigate the behaviour of the sigma bounds.
    Note: We should avoid using 'C0' as a colour for test examples, as that is what matplotlib (and hence anesthetic) defaults to anyhow, so potential issues won't be visible...
  • Some example settings for the **kwargs to check that they behave as expected:
    • diagonal_kwargs={'facecolor': 'None'}: This should be equal to default behaviour, right? Note the use of 'None', not None.
    • ec='C1', fc='C2', ms=10, diagonal_kwargs={'fc': 'C3'}: How does colour inheritance work, when ec and fc are specified "globally" as opposed to c? But maybe this is too specific, and should be ironed out with a potential overhaul of the default colour behaviour as suggested in contour edgecolor matching to facecolor #140, @williamjameshandley?
  • If we use terms such as facecolor for the 1d plots now, then we should introduce a normalize_kwargs to the start of each plotting function, analogous to e.g. scatter_plot_2d.
  • To make sure this is really behaving the way we expect it to, I think we need to do some proper unit tests. Currently we are just checking whether the code throws any errors, not whether the behaviour is actually as we'd expect... For ways of testing for colour, see e.g. test_scatter_plot_2d in test_plot.py. But if this seems like too much work for this PR, then we can maybe defer it to a later PR. In that case we might want to track this as an issue for an Anesthetic 2.0 release.

@htjb
Copy link
Collaborator Author

htjb commented Mar 9, 2021

The only other thing is that if you want this to included in a new minor release you could bump the version number to 2.0.0-beta.6 in the README.rst

This would be a good thing to do I think. I notice the version is already at 2.0.0-beta.6 I am assuming you meant 2.0.0-beta.7?

@htjb
Copy link
Collaborator Author

htjb commented Mar 9, 2021

@lukashergt in response to your first comment

from anesthetic.samples import NestedSamples, make_2d_axes
import matplotlib.pyplot as plt
import numpy as np

ns = NestedSamples(root="tests/example_data/pc_250")
fig, axes = make_2d_axes(['x0', 'x1', 'x2', 'x3', 'x4'])
ns.plot_2d(axes, diagonal_kwargs={'facecolor': 'C0', 'color': 'k'})
plt.show()

gives
lukas_test_plot

With the current set up if you want facecolor to be 'C0' ie the same as the default for the 2d plots you need to set it as such otherwise it will default to None and the plots will not be filled.

I have played around with diagonal_kwargs={'facecolor': None} which does give the default behaviour and ec='C1', fc='C2', ms=10, diagonal_kwargs={'facecolor': 'C3'} which works to some extent although I don't think kde_1d_plot ect is inheriting the ec and fc parameters at all. I am guessing this is what the normalize_kwargs does? I need to look at that part of the code in more detail.

Happy to look at the testing in more detail and equally happy for it to be rolled over into a different PR.

@htjb
Copy link
Collaborator Author

htjb commented Mar 9, 2021

@lukashergt so in the latest commit the 1d plotting functions can now take fc or facecolor, c or color and ec or edgecolor.

I've also set it up so that the 1d line and the shaded region edges are always the same color with the ec kwarg taking presedence over the color kwarg if both are declared. The default edgecolor is color. Also if shading is not requested then the line color is by default given by the value of ec or color.

Globally defined parameters also take precedence over parameters passed directly to diagonal_kwargs e.g. ec='C1', fc='C2', ms=10, diagonal_kwargs={'fc': 'C3'} gives for the test data

lukas_test_plot

I think that has dealt with your first three requested changes more or less. Let me know if not. I will try and have a look at the testing later today.

@lukashergt
Copy link
Collaborator

lukashergt commented Mar 9, 2021

I have played around with diagonal_kwargs={'facecolor': None} which does give the default behaviour

  • Matplotlib differentiates between None and 'None'. None is normally treated as the default behaviour, i.e. pick next colour from colour cycler. 'None' is treated as not a colour. Hence I think our check for if facecolor needs to be expanded to if facecolor and facecolor not in [None, 'None', 'none'].

 

Globally defined parameters also take precedence over parameters passed directly to diagonal_kwargs e.g. ec='C1', fc='C2', ms=10, diagonal_kwargs={'fc': 'C3'} gives for the test data

Good to see the behaviour. I don't think that is what we want, but this is not introduced by this PR, so let's leave this off for another PR.

 

lukas_test_plot

Thanks for plotting this.

  • Are the shaded levels for 'x2' and 'x4' one-sided? Or are they two-sided but one side squished?
  • Is the behaviour of 'x3' as expected? Could this be improved by having all levels coloured to avoid the gap? @williamjameshandley what are your thoughts?

@htjb
Copy link
Collaborator Author

htjb commented Mar 9, 2021

@lukashergt

I am about to push another commit with a few more detailed tests in and I have added the facecolor not in [None, 'none', 'None'] statement.

Are the shaded levels for 'x2' and 'x4' one-sided? Or are they two-sided but one side squished?

Not sure what you mean here sorry?

@lukashergt
Copy link
Collaborator

Are the shaded levels for 'x2' and 'x4' one-sided? Or are they two-sided but one side squished?

Not sure what you mean here sorry?

For a Gaussian this finds the 0.025, 0.16, 0.84, and 0.975 quantiles and puts a vertical line and fills inbetween with colour, right? What is the behaviour for the 'x2' and 'x4' parameters in the example data (where we don't have a Gaussian, but a posterior that is bounded only from one side)?. Does it still put all 4 of those vertical quantile lines? Or does it put only 2 quantile lines? If only 2, which levels?

lukashergt
lukashergt previously approved these changes Mar 9, 2021
@htjb
Copy link
Collaborator Author

htjb commented Mar 10, 2021

For a Gaussian this finds the 0.025, 0.16, 0.84, and 0.975 quantiles and puts a vertical line and fills inbetween with colour, right? What is the behaviour for the 'x2' and 'x4' parameters in the example data (where we don't have a Gaussian, but a posterior that is bounded only from one side)?. Does it still put all 4 of those vertical quantile lines? Or does it put only 2 quantile lines? If only 2, which levels?

@lukashergt We are using the same function as is used for the 2D plots here. This interpolates the probability as a function of the cdf to find the value for which 68% of the probability mass is contained for example. The relevant function in utils returns two values, c, one for 68% and one for 95% and then the fill_between fills where= pp >= c[j] for j in range(2). For a one sided probability like 'x4' everything to the right of x4=3.2 is above the 68% limit so gets shaded darker right up to the edge of the pdf. Does that clarify what is happening a bit? As I said it is the same utils function as is used for the 2D plots (iso_probability_contors{_from_samples}).

Thanks for signing of on the changes! @williamjameshandley are you happy with the most recent edits as well?

I will have a look at the failed test now and bump the readme version to 2.0.0-beta.7.

Copy link
Collaborator

@williamjameshandley williamjameshandley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Many thanks @htjb, please squash and merge

@htjb htjb merged commit 633c076 into handley-lab:master Mar 10, 2021
@htjb
Copy link
Collaborator Author

htjb commented Mar 10, 2021

@williamjameshandley @lukashergt Thanks for helping with this! Hopefully people will find it a useful feature 😄

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Confidence regions for 1D posteriors
4 participants