Ensure consistent column ordering in `DataSet.unstack_anomalous()` #99

JBGreisman · 2021-10-05T16:08:29Z

DataSet.unstack_anomalous() would sometimes produce new two-column anomalous DataSets in which the newly created anomalous columns were in different orders. This would occur because internally the function used a set to keep track of the new columns. Since the plus-Friedel columns were suffixed with (+) and the minus-Friedel with (-), this could produce different orders to the columns in the set.

This bug made it so that round-trip operations of DataSet.unstack_anomalous() followed by DataSet.stack_anomalous() sometimes produced a ValueError (#97 ), depending on the Python version being used.

This PR updates DataSet.unstack_anomalous() to internally use a dict for keeping track of the relevant columns. As of Python 3.7+ this preserves the order of input keys.

…lous()

codecov-commenter · 2021-10-05T16:30:39Z

Codecov Report

Merging #99 (6a6120b) into main (02f95bc) will not change coverage.
The diff coverage is 85.71%.

@@           Coverage Diff           @@
##             main      #99   +/-   ##
=======================================
  Coverage   98.97%   98.97%           
=======================================
  Files          41       41           
  Lines        1561     1561           
=======================================
  Hits         1545     1545           
  Misses         16       16

Flag	Coverage Δ
unittests	`98.97% <85.71%> (ø)`

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files	Coverage Δ
reciprocalspaceship/dataset.py	`99.01% <85.71%> (ø)`

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 02f95bc...6a6120b. Read the comment docs.

kmdalton · 2021-10-05T19:18:28Z

I checked this branch and it appears to fix the issues i was having with careless output. 👍

dennisbrookner · 2021-10-06T15:36:43Z

Related to this (and hopefully not compounding the issue) would it make sense for DataSet.stack_anomalous() to take something like plus_suffix and minus_suffix arguments, as alternatives for plus_labels and minus_labels?

Definitely not critical, but would a) occasionally save some typing and b) be internally consistent with the way the defaults work (without breaking previous code), e.g. plus_suffix="(+)" / minus_suffix="(-)"

(This also seems like an easy enough change that I could try to tackle it myself, with blessing?)

kmdalton · 2021-10-06T15:50:13Z

@dennisbrookner 's suggestion is more consistent with pandas. I think I am in favor of it.

JBGreisman · 2021-10-06T16:07:47Z

in the future, let's do this sort of discussion in a separate Issue rather than on a closed PR.

I'm fine with adding something like this. Given that unstack_anomalous() takes suffixes=("(+)", "(-)") as an argument, I think it makes sense to support alternate suffixes in stack_anomalous(). I think it also makes sense to generalize the search criteria to columns containing "+" or "-" in stack_anomalous() which would inherently cover either case.

dennisbrookner · 2021-10-06T17:47:39Z

Noted re: locations of conversations, I'll make a new Issue and link to here. Glad this seems to make sense!

JBGreisman added 4 commits October 5, 2021 10:10

Add test for roundtrip of unstack_anomalous() followed by stack_anoma…

76f18f8

…lous()

Update test

81cab24

Fix #97: Update DataSet.unstack_anomalous() to preserve column order

e25840b

Use centrics accessor in DataSet.unstack_anomalous()

6a6120b

JBGreisman added the bug Something isn't working label Oct 5, 2021

JBGreisman merged commit 66fd8fa into main Oct 5, 2021

JBGreisman deleted the twocol_anom branch October 5, 2021 19:28

dennisbrookner mentioned this pull request Oct 6, 2021

DataSet.stack_anomalous() should accept suffixes as alternatives to full column specification #100

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ensure consistent column ordering in `DataSet.unstack_anomalous()` #99

Ensure consistent column ordering in `DataSet.unstack_anomalous()` #99

JBGreisman commented Oct 5, 2021

codecov-commenter commented Oct 5, 2021

kmdalton commented Oct 5, 2021

dennisbrookner commented Oct 6, 2021

kmdalton commented Oct 6, 2021

JBGreisman commented Oct 6, 2021

dennisbrookner commented Oct 6, 2021

Ensure consistent column ordering in DataSet.unstack_anomalous() #99

Ensure consistent column ordering in DataSet.unstack_anomalous() #99

Conversation

JBGreisman commented Oct 5, 2021

codecov-commenter commented Oct 5, 2021

Codecov Report

kmdalton commented Oct 5, 2021

dennisbrookner commented Oct 6, 2021

kmdalton commented Oct 6, 2021

JBGreisman commented Oct 6, 2021

dennisbrookner commented Oct 6, 2021

Ensure consistent column ordering in `DataSet.unstack_anomalous()` #99

Ensure consistent column ordering in `DataSet.unstack_anomalous()` #99