Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: fix tzaware dataframe transpose bug #26825

Merged
merged 23 commits into from
Jun 27, 2019
Merged

Conversation

jbrockmendel
Copy link
Member

@jbrockmendel jbrockmendel commented Jun 13, 2019

This allows us to get rid of a bunch of xfails and FIXMEs in arithmetic tests. Changes in groupby are the most likely to need attention; this is not an area of the code I know well.

Not sure if TestTranspose belongs somewhere else. Suggestions welcome.

Split a couple of too-widely-scoped groupby tests.

Will follow-up with issues this closes, whatsnew entry, and GH references added to tests.

xref #23988

@codecov
Copy link

codecov bot commented Jun 13, 2019

Codecov Report

Merging #26825 into master will decrease coverage by <.01%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #26825      +/-   ##
==========================================
- Coverage   92.01%      92%   -0.01%     
==========================================
  Files         180      180              
  Lines       50754    50766      +12     
==========================================
+ Hits        46699    46708       +9     
- Misses       4055     4058       +3
Flag Coverage Δ
#multiple 90.64% <100%> (ø) ⬆️
#single 41.84% <35%> (-0.07%) ⬇️
Impacted Files Coverage Δ
pandas/core/groupby/generic.py 89.37% <100%> (+0.04%) ⬆️
pandas/core/internals/construction.py 96.04% <100%> (+0.08%) ⬆️
pandas/io/gbq.py 88.88% <0%> (-11.12%) ⬇️
pandas/core/frame.py 96.89% <0%> (-0.12%) ⬇️
pandas/core/sorting.py 98.35% <0%> (ø) ⬆️
pandas/compat/_optional.py 100% <0%> (ø) ⬆️
pandas/core/dtypes/cast.py 90.89% <0%> (+0.16%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update a7f1d69...8b2372e. Read the comment docs.

pandas/core/groupby/generic.py Show resolved Hide resolved
@@ -160,7 +160,29 @@ def init_ndarray(values, index, columns, dtype=None, copy=False):
# on the entire block; this is to convert if we have datetimelike's
# embedded in an object type
if dtype is None and is_object_dtype(values):
values = maybe_infer_to_datetimelike(values)

if values.ndim == 2 and values.shape[0] != 1:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is much more messy, can we change something else to make this nicer?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not really. I'm looking into the other places where maybe_infer_to_datetimelike is used to see if some of this can go into that. We could separate this whole block into a dedicated function. But one way or another we need to bite the bullet.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so the inside of list loop should be in pandas.core.dtypes.cast, no? (obviously up until you make the blocks themselves)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like to leave this for the next pass when I'm taking a more systematic look at maybe_infer_to_datetimelike

@gfyoung gfyoung added Bug DataFrame DataFrame data structure Datetime Datetime data dtype labels Jun 13, 2019
@jbrockmendel
Copy link
Member Author

Updated with requested edit, GH references in tests, and whatsnew.

@@ -1664,3 +1655,36 @@ def _normalize_keyword_aggregation(kwargs):
order.append((column,
com.get_callable_name(aggfunc) or aggfunc))
return aggspec, columns, order


def _recast_datetimelike_result(result):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so i would put this in pandas.core.dtypes.cast, our dumping ground for casting, right after / before maybe_infer_to-datetimelike (and you can de-privatize)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is really kludgy code (and is replacing equally kludgy code; kind of like turtles all the way down). I'd rather keep it close to its only usage and hope it is ripped out by someone who knows this part of the code better

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as i said this look a lot like maybe_convert_objects, try to use that here

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note also: we only have 5 tests that go through this path

pandas/core/groupby/generic.py Outdated Show resolved Hide resolved

Notes
-----
- Assumes Groupby._selected_obj has ndim==2 and at least one
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this note doesn't seem relevant as you are passing in frame right?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That wasn't obvious to me bc were talking about the dimensions of two separate objects. Are they necessarily the same?

pandas/core/groupby/generic.py Outdated Show resolved Hide resolved
pandas/core/groupby/generic.py Outdated Show resolved Hide resolved
pandas/core/groupby/generic.py Outdated Show resolved Hide resolved
@@ -160,7 +160,29 @@ def init_ndarray(values, index, columns, dtype=None, copy=False):
# on the entire block; this is to convert if we have datetimelike's
# embedded in an object type
if dtype is None and is_object_dtype(values):
values = maybe_infer_to_datetimelike(values)

if values.ndim == 2 and values.shape[0] != 1:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so the inside of list loop should be in pandas.core.dtypes.cast, no? (obviously up until you make the blocks themselves)


from pandas.core.internals.blocks import make_block

# TODO: What about re-joining object columns?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pls reuse the block creation routines below

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

attempts so far have broken everything. do you have a particular routine in mind?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what I mean is you can remove the create_block_manager_from_blocks and let it fall thru to 184 with I think a very small change, e.g.


if .....
     blocks = bvals
else:
     dvals = .......
      blocks = [dvals]

of course pls use a longer name than dvals

# unnecessary if we ever allow 2D DatetimeArray

dvals_list = [maybe_infer_to_datetimelike(values[n, :])
for n in range(len(values))]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

make this a list comprehension & use enumerate if you must., this is very hard to read.

pandas/tests/arithmetic/test_datetime64.py Outdated Show resolved Hide resolved
@jbrockmendel
Copy link
Member Author

@jreback anything else here actionable?

@jreback
Copy link
Contributor

jreback commented Jun 20, 2019

haven’t had a chance to relook

pandas/core/groupby/generic.py Outdated Show resolved Hide resolved

from pandas.core.internals.blocks import make_block

# TODO: What about re-joining object columns?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what I mean is you can remove the create_block_manager_from_blocks and let it fall thru to 184 with I think a very small change, e.g.


if .....
     blocks = bvals
else:
     dvals = .......
      blocks = [dvals]

of course pls use a longer name than dvals

@jbrockmendel
Copy link
Member Author

I think comments have been addressed.

@jreback jreback added this to the 0.25.0 milestone Jun 27, 2019
This pull request was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug DataFrame DataFrame data structure Datetime Datetime data dtype
Projects
None yet
Development

Successfully merging this pull request may close these issues.

BUG: Pandas cannot create DataFrame from Numpy Array of TimeStamps
4 participants