Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ensure consistent column ordering in DataSet.unstack_anomalous() #99

Merged
merged 4 commits into from
Oct 5, 2021

Conversation

JBGreisman
Copy link
Member

DataSet.unstack_anomalous() would sometimes produce new two-column anomalous DataSets in which the newly created anomalous columns were in different orders. This would occur because internally the function used a set to keep track of the new columns. Since the plus-Friedel columns were suffixed with (+) and the minus-Friedel with (-), this could produce different orders to the columns in the set.

This bug made it so that round-trip operations of DataSet.unstack_anomalous() followed by DataSet.stack_anomalous() sometimes produced a ValueError (#97 ), depending on the Python version being used.

This PR updates DataSet.unstack_anomalous() to internally use a dict for keeping track of the relevant columns. As of Python 3.7+ this preserves the order of input keys.

@JBGreisman JBGreisman added the bug Something isn't working label Oct 5, 2021
@codecov-commenter
Copy link

Codecov Report

Merging #99 (6a6120b) into main (02f95bc) will not change coverage.
The diff coverage is 85.71%.

Impacted file tree graph

@@           Coverage Diff           @@
##             main      #99   +/-   ##
=======================================
  Coverage   98.97%   98.97%           
=======================================
  Files          41       41           
  Lines        1561     1561           
=======================================
  Hits         1545     1545           
  Misses         16       16           
Flag Coverage Δ
unittests 98.97% <85.71%> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
reciprocalspaceship/dataset.py 99.01% <85.71%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 02f95bc...6a6120b. Read the comment docs.

@kmdalton
Copy link
Member

kmdalton commented Oct 5, 2021

I checked this branch and it appears to fix the issues i was having with careless output. 👍

@JBGreisman JBGreisman merged commit 66fd8fa into main Oct 5, 2021
@JBGreisman JBGreisman deleted the twocol_anom branch October 5, 2021 19:28
@dennisbrookner
Copy link
Contributor

Related to this (and hopefully not compounding the issue) would it make sense for DataSet.stack_anomalous() to take something like plus_suffix and minus_suffix arguments, as alternatives for plus_labels and minus_labels?

Definitely not critical, but would a) occasionally save some typing and b) be internally consistent with the way the defaults work (without breaking previous code), e.g. plus_suffix="(+)" / minus_suffix="(-)"

(This also seems like an easy enough change that I could try to tackle it myself, with blessing?)

@kmdalton
Copy link
Member

kmdalton commented Oct 6, 2021

@dennisbrookner 's suggestion is more consistent with pandas. I think I am in favor of it.

@JBGreisman
Copy link
Member Author

in the future, let's do this sort of discussion in a separate Issue rather than on a closed PR.

I'm fine with adding something like this. Given that unstack_anomalous() takes suffixes=("(+)", "(-)") as an argument, I think it makes sense to support alternate suffixes in stack_anomalous(). I think it also makes sense to generalize the search criteria to columns containing "+" or "-" in stack_anomalous() which would inherently cover either case.

@dennisbrookner
Copy link
Contributor

Noted re: locations of conversations, I'll make a new Issue and link to here. Glad this seems to make sense!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants