Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Labels fix #304

Merged
merged 29 commits into from
Jun 29, 2023
Merged

Labels fix #304

merged 29 commits into from
Jun 29, 2023

Conversation

williamjameshandley
Copy link
Collaborator

@williamjameshandley williamjameshandley commented Jun 28, 2023

Description

Following offline conversations with @lukashergt and @AdamOrmondroyd, this PR aims to resolve #253 and #236.

It upgrades labelled data frames so the LabelledSeries.name attribute now reads as the tex, rather than the label (or something else), which will prove useful in improving plotting tools.

It achieves this by implementing the strategy "If the resulting accessed slice has a 'name' attribute, set this to the value of the label that was dropped".

e.g. a frame with a multiindex column element ('vowel','A','$A$'), would return frame.vowel.A as a series with the name '$A$', rather than 'A' or ('vowel','A').

The tests have been updated to reflect this new behaviour, and new tests added to check that this occurs.

Checklist:

  • I have performed a self-review of my own code
  • My code is PEP8 compliant (flake8 anesthetic tests)
  • My code contains compliant docstrings (pydocstyle --convention=numpy anesthetic)
  • New and existing unit tests pass locally with my changes (python -m pytest)
  • I have added tests that prove my fix is effective or that my feature works
  • I have appropriately incremented the semantic version number in both README.rst and anesthetic/_version.py

@williamjameshandley williamjameshandley added this to the 2.0.0 milestone Jun 28, 2023
Comment on lines +64 to +69
return ac([(_LocIndexer_("loc",
super(_LabelledObject,
self.obj.drop_labels(i))
).__getitem__,
self.obj.get_labels_map(i))
for i in self.obj._all_axes()], key)
Copy link
Collaborator Author

@williamjameshandley williamjameshandley Jun 28, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are several changes to all of the indexers/accessors:

  1. _all_axes now returns None as well, to capture the extra edge case (which was previously tagged on as +[...]
  2. crucially we return the super object in the Indexers (this is one of those, 'how did that ever work?' observations)
  3. We return a list of (function, labels_map) pairs, rather than just a list of functions.

@codecov
Copy link

codecov bot commented Jun 28, 2023

Codecov Report

Patch coverage: 100.00% and no project coverage change.

Comparison is base (8456b21) 100.00% compared to head (c714214) 100.00%.

Additional details and impacted files
@@            Coverage Diff            @@
##            master      #304   +/-   ##
=========================================
  Coverage   100.00%   100.00%           
=========================================
  Files           30        30           
  Lines         2667      2691   +24     
=========================================
+ Hits          2667      2691   +24     
Impacted Files Coverage Δ
anesthetic/_version.py 100.00% <100.00%> (ø)
anesthetic/labelled_pandas.py 100.00% <100.00%> (ø)
anesthetic/plotting/_matplotlib/core.py 100.00% <100.00%> (ø)
anesthetic/samples.py 100.00% <100.00%> (ø)

☔ View full report in Codecov by Sentry.
📢 Do you have feedback about the report comment? Let us know in this issue.

Copy link
Collaborator

@lukashergt lukashergt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few comments inline and two general points below.

Following offline conversations with @lukashergt and @AdamOrmondroyd, this PR aims to resolve #253 and #256.

anesthetic/labelled_pandas.py Outdated Show resolved Hide resolved
anesthetic/labelled_pandas.py Outdated Show resolved Hide resolved
@williamjameshandley
Copy link
Collaborator Author

@williamjameshandley
Copy link
Collaborator Author

williamjameshandley commented Jun 28, 2023

Done in 3f1e2e6

@williamjameshandley
Copy link
Collaborator Author

williamjameshandley commented Jun 28, 2023

I've also snuck in a resolution + tests for #297 in b223d1a

Comment on lines 94 to 97
x = data[self.x].values
y = data[self.y].values
x = data[self.x]
y = data[self.y]
self.x = x.name # transfer the tex label
self.y = y.name # transfer the tex label
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This nicely works for labelled dataframes, but now unlabelled dataframes don't have any x- and y-labels anymore:

samples[['x0', 'x1']].drop_labels().plot_2d()

See also some suggested tests below.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this requires a similar check à la:

if x.name:
    self.x = x.name

i.e. only use x.name if it is not empty.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nevermind, that won't help...

I think we might need to ensure that samples.drop_labels().get_label('x0') does not return an empty string, but 'x0' instead...

Copy link
Collaborator

@AdamOrmondroyd AdamOrmondroyd Jun 29, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done in aa715ab I think no

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this requires a similar check à la:

if x.name:
    self.x = x.name

i.e. only use x.name if it is not empty.

I put this in and took it back out again now that name is always used when slicing, but maybe worth retaining anyway

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tackled in 10a5b42

tests/test_samples.py Outdated Show resolved Hide resolved
Comment on lines 424 to 427
def test_plot_1d_no_axes():
np.random.seed(3)
ns = read_chains('./tests/example_data/pc')
ns[['x0', 'x1', 'x2']].plot_1d()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Following up on my above comment, we should add something along the lines of:

axes = ns[['x0', 'x1', 'x2']].plot_1d()
assert axes[0].get_xlabel() == '$x_0$'
assert axes[1].get_xlabel() == '$x_1$'
assert axes[2].get_xlabel() == '$x_2$'

axes = ns[['x0', 'x1', 'x2']].drop_labels().plot_1d()
assert axes[0].get_xlabel() == 'x0'
assert axes[1].get_xlabel() == 'x1'
assert axes[2].get_xlabel() == 'x2'

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added these tests, but the drop_labels() ones are failing as expected. What would you expect for a mixture of labelled and unlabelled columns?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If a parameter/column has a (tex) label, use that, if a parameter/column does not have a (tex) label then use the column handle.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've fixed this for the separate plots by only changing name to the label if the label exists, rather than setting it to a blank string as currently happens.

This solves the issues for individual plots (e.g. plot.kde_1d()), but not plot_1d/2d(), which use the label mapping to create the axes, which leads to blanks.

tests/test_samples.py Show resolved Hide resolved
Comment on lines 191 to 195
If not provided, then all parameters are plotted. This is intended
for plotting a sliced array (e.g. `samples[['x0','x1]].plot_1d()`.
It is not advisible to plot an entire frame, as it is
computationally expensive, and liable to run into linear algebra
errors for degenerate derived parameters.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this true for the 1d case?

@lukashergt
Copy link
Collaborator

Failing tests appear to be an issue in fastkde, not in anesthetic :/

@AdamOrmondroyd
Copy link
Collaborator

There's a similar problem with 1D legends, where the labels are beautifully displayed when they're present, but omitted when they are dropped

fig, ax = plt.subplots()
ns['x0'].plot.kde_1d(ax=ax)
ns['x1'].plot.kde_1d(ax=ax)
ns['x2'].plot.kde_1d(ax=ax)
ax.legend()
Screenshot 2023-06-29 at 15 11 21
ns = ns.drop_labels()
...
Screenshot 2023-06-29 at 15 10 35

@AdamOrmondroyd
Copy link
Collaborator

While this deals with an unlabelled LabelledDataFrame, what about the mixed case where only some columns have labels? e.g.

ns['a'] = ns.x0 + ns.x1
ns['b'] = ns.x0 + ns.x2

ns[['x0', 'a', 'b']].plot_2d()
Screenshot 2023-06-29 at 16 40 37

@lukashergt
Copy link
Collaborator

While this deals with an unlabelled LabelledDataFrame, what about the mixed case where only some columns have labels? e.g.

ns['a'] = ns.x0 + ns.x1
ns['b'] = ns.x0 + ns.x2

ns[['x0', 'a', 'b']].plot_2d()

For this example one could claim that in addition to creating a new parameter 'a' you should also create its label ns.set_label('a', '$a$')...

lukashergt
lukashergt previously approved these changes Jun 29, 2023
Copy link
Collaborator

@lukashergt lukashergt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Happy with the change in 10a5b42 ?

Opinions on #304 (comment) ?

Otherwise happy to approve this, now.

@williamjameshandley
Copy link
Collaborator Author

Happy with the change in 10a5b42

Yes -- to my mind this is quite a neat API.

Opinions on #304 (comment) ?

Adjusted -- will need reapproval

lukashergt
lukashergt previously approved these changes Jun 29, 2023
@williamjameshandley
Copy link
Collaborator Author

While this deals with an unlabelled LabelledDataFrame, what about the mixed case where only some columns have labels? e.g.

ns['a'] = ns.x0 + ns.x1
ns['b'] = ns.x0 + ns.x2

ns[['x0', 'a', 'b']].plot_2d()

Although in an ideal world this would have '$x_0$' 'a' and 'b' as labels...

@AdamOrmondroyd
Copy link
Collaborator

While this deals with an unlabelled LabelledDataFrame, what about the mixed case where only some columns have labels? e.g.

ns['a'] = ns.x0 + ns.x1
ns['b'] = ns.x0 + ns.x2

ns[['x0', 'a', 'b']].plot_2d()

Although in an ideal world this would have '$x_0$' 'a' and 'b' as labels...

I think I side with Lukas - not only is this PR then done, but blank axes serve as a reminder to the user to set_label()

@williamjameshandley williamjameshandley merged commit 61bed43 into master Jun 29, 2023
23 checks passed
@williamjameshandley williamjameshandley deleted the labels branch June 29, 2023 20:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Retain latex label when slicing DataFrame to Series
3 participants