Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix concat not respecting order of OrderedDict #25224

Merged
merged 23 commits into from Mar 13, 2019

Conversation

Projects
None yet
5 participants
@alexander-ponomaroff
Copy link
Contributor

commented Feb 8, 2019

  • closes #21510
  • tests added / passed
  • passes git diff upstream/master -u -- "*.py" | flake8 --diff
@pep8speaks

This comment has been minimized.

Copy link

commented Feb 8, 2019

Hello @alexander-ponomaroff! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻

Comment last updated at 2019-03-12 16:12:36 UTC
@alexander-ponomaroff

This comment has been minimized.

Copy link
Contributor Author

commented Feb 8, 2019

@jreback @toobaz I've reviewed your comments in Pull Request #21512 that went stale, and created a fix based on what I observed. As @toobaz said, the ordering of dicts in 3.6 is an implementation detail of cpython, and users should not rely on it in 3.6. Based on this, I changed dict_keys_to_ordered_list to use PY37 instead of PY36 and changed the tests that were failing to use PY37 as well.

Please let me know if this is the wrong way of going about this. If my changes are acceptable, I will change the comments in the tests and dict_keys_to_ordered_list to use 3.7 instead of 3.6.

@codecov

This comment has been minimized.

Copy link

commented Feb 8, 2019

Codecov Report

Merging #25224 into master will decrease coverage by 49.87%.
The diff coverage is 66.66%.

Impacted file tree graph

@@             Coverage Diff             @@
##           master   #25224       +/-   ##
===========================================
- Coverage   92.15%   42.28%   -49.88%     
===========================================
  Files         166      166               
  Lines       52294    52294               
===========================================
- Hits        48193    22111    -26082     
- Misses       4101    30183    +26082
Flag Coverage Δ
#multiple ?
#single 42.28% <66.66%> (ø) ⬆️
Impacted Files Coverage Δ
pandas/core/reshape/concat.py 44% <0%> (-53.2%) ⬇️
pandas/core/common.py 68.81% <100%> (-29.57%) ⬇️
pandas/io/formats/latex.py 0% <0%> (-100%) ⬇️
pandas/core/categorical.py 0% <0%> (-100%) ⬇️
pandas/io/sas/sas_constants.py 0% <0%> (-100%) ⬇️
pandas/tseries/plotting.py 0% <0%> (-100%) ⬇️
pandas/tseries/converter.py 0% <0%> (-100%) ⬇️
pandas/io/formats/html.py 0% <0%> (-99.35%) ⬇️
pandas/core/groupby/categorical.py 0% <0%> (-95.46%) ⬇️
pandas/io/sas/sas7bdat.py 0% <0%> (-91.17%) ⬇️
... and 125 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 0eddba8...d4e70ef. Read the comment docs.

@codecov

This comment has been minimized.

Copy link

commented Feb 8, 2019

Codecov Report

Merging #25224 into master will not change coverage.
The diff coverage is 100%.

Impacted file tree graph

@@           Coverage Diff           @@
##           master   #25224   +/-   ##
=======================================
  Coverage   91.28%   91.28%           
=======================================
  Files         173      173           
  Lines       52965    52965           
=======================================
  Hits        48348    48348           
  Misses       4617     4617
Flag Coverage Δ
#multiple 89.86% <100%> (ø) ⬆️
#single 41.74% <0%> (ø) ⬆️
Impacted Files Coverage Δ
pandas/core/groupby/generic.py 87% <100%> (ø) ⬆️
pandas/core/reshape/concat.py 97.2% <100%> (ø) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 26d991f...1557c21. Read the comment docs.

@WillAyd

This comment has been minimized.

Copy link
Member

commented Feb 8, 2019

As @toobaz said, the ordering of dicts in 3.6 is an implementation detail of cpython, and users should not rely on it in 3.6

Might be out of the loop but wasn't it declared that at first but since then guaranteed as a feature? If so I don't think we need to make distinction between 3.6 and 3.7 here

@jreback
Copy link
Contributor

left a comment

pls add a whatsnew note to 0.25 in reshaping.

it may be an implementation detail in PY36 but we are consistently testing for this.

Show resolved Hide resolved pandas/core/common.py Outdated
@alexander-ponomaroff

This comment has been minimized.

Copy link
Contributor Author

commented Feb 11, 2019

@jreback Thank you for the feedback, I will make changes tomorrow.

@alexander-ponomaroff

This comment has been minimized.

Copy link
Contributor Author

commented Feb 12, 2019

@WillAyd I get an error in one of the tests that you created. Would you have any suggestions for me? test_groupby_agg_ohlc_non_first() inside pandas/tests/groupby/test_groupby.py.

@WillAyd

This comment has been minimized.

Copy link
Member

commented Feb 12, 2019

Looks like its the ordering of the columns in the resulting data frame. I think this is actually preferable because it respects the order supplied by the user, so I think OK to update test accordingly

@alexander-ponomaroff

This comment has been minimized.

Copy link
Contributor Author

commented Feb 12, 2019

@WillAyd Thank you for such a fast response. Just to confirm, how do you feel about the change I made?

Show resolved Hide resolved pandas/tests/groupby/test_groupby.py Outdated
@WillAyd

This comment has been minimized.

Copy link
Member

commented Feb 13, 2019

@alexander-ponomaroff

This comment has been minimized.

Copy link
Contributor Author

commented Feb 13, 2019

@WillAyd I will have to investigate this and get back to you. I'm not entirely too sure how my change affected test_groupby_agg_ohlc_non_first() because my changes deal with dicts and OrderedDicts in the concat function. So, concat must be used within one of the functions that's used inside the test, but the test doesn't seem to be dealing with any dicts. Please let me know if anything I'm saying is wrong, I recently started solving actual code bugs in Pandas and am still learning. I started my contributions with easy issues and now trying to move onto harder ones :)

@WillAyd

This comment has been minimized.

Copy link
Member

commented Feb 13, 2019

Yea no worries - your contributions are greatly appreciated.

If it helps here is where the ohlc function is defined. If you trace the underlying function there back you'll see a concat call, hence why this has some impacts

def ohlc(self):

@alexander-ponomaroff

This comment has been minimized.

Copy link
Contributor Author

commented Feb 13, 2019

@WillAyd Thank you. I will be looking into it and will update when I find something.

@alexander-ponomaroff

This comment has been minimized.

Copy link
Contributor Author

commented Feb 14, 2019

@WillAyd I've been trying to figure out how to solve the problem with test_groupby_agg_ohlc_non_first() but I'm stuck. Python 3.6 and after, .agg(['sum', 'ohlc']) maintains the order "sum -> ohlc", .agg(['ohlc', 'sum']) maintains the order "ohlc -> sum". When doing .agg(['ohlc', 'sum']), it passes the test in all the versions. This is due to my recent concat change and I traced the call and get why it happens, but I don't know how to solve the issue in this test, without affecting other tests negatively.

@jreback jreback added this to the 0.25.0 milestone Feb 16, 2019

@alexander-ponomaroff

This comment has been minimized.

Copy link
Contributor Author

commented Feb 17, 2019

@jreback Why am I failing these tests now? Does it have anything to do with changes happened in the last 2 days?

@jreback

This comment has been minimized.

Copy link
Contributor

commented Feb 17, 2019

make sure to merge master
these were just fixed up today

@alexander-ponomaroff

This comment has been minimized.

Copy link
Contributor Author

commented Feb 17, 2019

@jreback I did merge master, right before I pushed my changes. It says that everything is up to date.

@jreback

This comment has been minimized.

Copy link
Contributor

commented Feb 17, 2019

these look related to your changes
run the tests locally and see

alexander-ponomaroff added some commits Feb 19, 2019

@alexander-ponomaroff

This comment has been minimized.

Copy link
Contributor Author

commented Feb 19, 2019

@jreback All tests are green now.

Looks like the previous errors were resolved by themselves. When I ran the test locally, I did not get them.

@jreback
Copy link
Contributor

left a comment

pls merge master as well

Show resolved Hide resolved doc/source/whatsnew/v0.25.0.rst Outdated
Show resolved Hide resolved pandas/tests/reshape/test_concat.py Outdated
@alexander-ponomaroff

This comment has been minimized.

Copy link
Contributor Author

commented Mar 4, 2019

@jreback Fixed requested changes, all tests are green.

Show resolved Hide resolved doc/source/whatsnew/v0.25.0.rst Outdated
Show resolved Hide resolved pandas/tests/groupby/test_groupby.py Outdated
@TomAugspurger
Copy link
Contributor

left a comment

LGTm, aside from a small request in the whatsnew.

('foo', 'ohlc', 'open'), ('foo', 'ohlc', 'high'),
('foo', 'ohlc', 'low'), ('foo', 'ohlc', 'close'),
('foo', 'sum', 'foo'))), index=pd.date_range(
('foo', 'sum', 'foo'), ('foo', 'ohlc', 'open'),

This comment has been minimized.

Copy link
@TomAugspurger

TomAugspurger Mar 12, 2019

Contributor

So the only change in behavior is that the output order is now (consistently) the arguments passed to .agg?

Since we did ['sum', 'ohlc'] the output order is 'sum', 'open', 'high', 'low', 'close'?

If so, let's add an entry to the release notes, under the Groupby bug fix section, saying that .agg now respects the order of arguments when 'ohlc' is one of the aggfuncs.

This comment has been minimized.

Copy link
@WillAyd

WillAyd Mar 12, 2019

Member

This might be rather difficult to express in a whatsnew since it's really just applicable to _aggregate_multiple_funcs here.

Perhaps a follow up to consistently use OrderedDict's in the groupby module is in order?

This comment has been minimized.

Copy link
@TomAugspurger

TomAugspurger Mar 12, 2019

Contributor

Is _aggregate_multiple_funcs called by just by functions like ohlc that have multiple outputs per input? Or is it also called by .agg([funcs])?

This comment has been minimized.

Copy link
@WillAyd

WillAyd Mar 12, 2019

Member

The latter. So this has implications to ordering which previously would have been non-deterministic in Py35.

So I think it might make sense as a follow up to use a OrderedDict and make better guarantees about the returned column order, but don't think that should hold this one up

This comment has been minimized.

Copy link
@WillAyd

WillAyd Mar 12, 2019

Member

See #25692 as follow up

@TomAugspurger

This comment has been minimized.

Copy link
Contributor

commented Mar 12, 2019

@WillAyd
Copy link
Member

left a comment

lgtm

@alexander-ponomaroff

This comment has been minimized.

Copy link
Contributor Author

commented Mar 12, 2019

Thank you @WillAyd and @TomAugspurger for the feedback and the help on this one. @jreback could you take a look if there are any additional changes that need to take place here?

@jreback

This comment has been minimized.

Copy link
Contributor

commented Mar 13, 2019

merging if any additional whatsnew is needed, can do in #25692

@jreback jreback merged commit 86879ac into pandas-dev:master Mar 13, 2019

1 check passed

continuous-integration/travis-ci/pr The Travis CI build passed
Details

sighingnow added a commit to sighingnow/pandas that referenced this pull request Mar 14, 2019

Merge branch 'master' into fix-25587
* master: (22 commits)
  Fixturize tests/frame/test_operators.py (pandas-dev#25641)
  Update ValueError message in corr (pandas-dev#25729)
  DOC: fix some grammar and inconsistency issues in the User Guide (pandas-dev#25728)
  ENH: Add public start, stop, and step attributes to RangeIndex (pandas-dev#25720)
  Make Rolling.apply documentation clearer (pandas-dev#25712)
  pandas-dev#25707 - Fixed flakiness in stata write test (pandas-dev#25714)
  Json normalize nan support (pandas-dev#25619)
  TST: resolve issues with test_constructor_dtype_datetime64 (pandas-dev#24868)
  DEPR: Deprecate box kwarg for to_timedelta and to_datetime (pandas-dev#24486)
  BUG: Preserve name in DatetimeIndex.snap (pandas-dev#25585)
  Fix concat not respecting order of OrderedDict (pandas-dev#25224)
  CLN: remove pandas.core.categorical (pandas-dev#25655)
  TST/CLN: Remove more Panel tests (pandas-dev#25675)
  Pinned pycodestyle (pandas-dev#25701)
  DOC: update date of 0.24.2 release notes (pandas-dev#25699)
  BUG: Fix error in replace with strings that are large numbers (pandas-dev#25616) (pandas-dev#25644)
  BUG: fix usage of na_sentinel with sort=True in factorize() (pandas-dev#25592)
  BUG: Fix to_string output when using header (pandas-dev#16718) (pandas-dev#25602)
  CLN: Remove unused test code (pandas-dev#25670)
  CLN: remove Panel from concat error message (pandas-dev#25676)
  ...

# Conflicts:
#	doc/source/whatsnew/v0.25.0.rst
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.