Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New pandas.DataFrame.attrs feature possibly useful for metadata such as root or label #236

Open
lukashergt opened this issue Oct 12, 2022 · 5 comments
Labels
enhancement New feature or request
Milestone

Comments

@lukashergt
Copy link
Collaborator

When reviewing #232, I initially got tripped up and did not notice the difference in the similar kwargs label and labels, and got worried that the same kwarg was being passed to MCMCSamples or NestedSamples. However, as it turns out they are different, one being the (tex) labels for the columns, the other being the label/name of the entire dataframe.

I have come across a potentially interesting stackoverflow answer mentioning the pandas.DataFrame.attrs dictionary attribute (new as of pandas 1.5.0, note that it is flagged as "experimental"). This seems like the right place to put the remainders of our "metadata" to ensure the information persists even when copying/pickling/dropping/etc. What do you think?

@williamjameshandley, do you know whether/how good anesthetic currently manages to retain the label attribute when copying/pickling/dropping/etc.?

@lukashergt lukashergt added the enhancement New feature or request label Oct 12, 2022
@AdamOrmondroyd
Copy link
Collaborator

I hope this is relevant to the discussion: something I noticed when working on #235 was that the $\LaTeX$ labels are lost when Samples are sliced to a WeightedLabelledSeries, e.g. (noting the number of square brackets)

import anesthetic as ac
ns = anesthetic.read_chains("anesthetic/tests/example_data/pc")

ns["x0"].get_labels()
# array([0.00000000e+000, 1.08840734e-308, 4.83001654e-301...

ns[["x0"]].get_labels()
# array(['$x_0$'], dtype=object)

perhaps pandas.Series.attrs could store the $\LaTeX$ label?

@lukashergt
Copy link
Collaborator Author

Regarding the TEX labels, these are not so much metadata of the entire dataframe, but, more precisely, each is connected to one corresponding column. As such I really like the current implementation. It also renders beautifully in jupyter notebooks!!!

To get the Tex label of one specific column, this would be the suggested way:

ns.get_label('x0')

That said, it would be nice to have the sliced Samples retain the TEX label.
Does this affect 1D plots?
Is this a bug @williamjameshandley?
We could consider using the pandas.Series attribute name to take on the TEX label rather than the column name, since the column name is only needed in dataframes for indexing, but becomes obsolete in a series.

@AdamOrmondroyd
Copy link
Collaborator

...
Does this affect 1D plots?
...

I believe so, e.g. calling ns.x0.plot.hist_1d(), I don't think $x_0$ is accessible at the level of HistPlot._make_plot(), because by that point self.data is the just a series. Hopefully that makes sense.

@williamjameshandley
Copy link
Collaborator

Related to #253

@williamjameshandley williamjameshandley added this to the 2.0.0 milestone Jun 15, 2023
@williamjameshandley
Copy link
Collaborator

Also related to #303

@lukashergt lukashergt mentioned this issue Jun 28, 2023
6 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants