Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CLN: Move boxing logic to BlockManager #12752

Closed
wants to merge 1 commit into from

Conversation

sinhrks
Copy link
Member

@sinhrks sinhrks commented Mar 30, 2016

If base direction looks OK, going to add some tests based on the spec.

@sinhrks sinhrks added Internals Related to non-user accessible pandas implementation Clean labels Mar 30, 2016
@sinhrks sinhrks added this to the 0.18.1 milestone Mar 30, 2016
@@ -1528,7 +1552,9 @@ def _try_coerce_result(self, result):
result = result.astype('m8[ns]')
result[mask] = tslib.iNaT
elif isinstance(result, np.integer):
result = lib.Timedelta(result)
result = self._box_func(result)
elif convert_float and isinstance(result, np.float):
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is needed to coerce .quantile result. Any problem if we coerce float to datetime/timedelta always?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure I understand why you need the convert_float kw? (I see you are using it below)

return self._na_value

if com.is_list_like(qs):
values = [_quantile(values, x * 100, **kwargs) for x in qs]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this fixes: #12093 I think? (eg. now we are doing quantileing per block and not per-column)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately no. rolling_quantile uses separate cython logic which raises.

https://github.com/pydata/pandas/blob/master/pandas/algos.pyx#L1762

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry, meant #11623

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. Currently NOT because DataFrame.quantile use Series.quantile (rather than added Block logic). Will take a look.

@jreback
Copy link
Contributor

jreback commented Mar 31, 2016

@sinhrks this is awesome!!!!!

2 small comments.

@jreback
Copy link
Contributor

jreback commented Mar 31, 2016

further pls review: #12469 which I think can be closed upon merging this? and #10207 the underlying issue.

@@ -131,6 +133,8 @@ def get_values(self, dtype=None):
return an internal format, currently just the ndarray
this is often overriden to handle to_dense like operations
"""
if dtype == object:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

com.is_object_dtype(dtype)

and return self.values.astype(object)

for (_, vals) in data.iteritems()]
# unable to use DataFrame.apply, becasuse data may be empty
result = dict(_quantile(s) for (_, s) in data.iteritems())
result = self._constructor(result, columns=data.columns)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, this is still going to have the perf issues. can you do this as block based? (or save for diff PR)?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that will fix the perf issue (as you already handle the 2-d blocks ok in internals/quantile

# ToDo: Temp logic to avoid GH 12619 and GH 12772
# which affects to DatetimeBlockTZ_try_coerce_result for np.ndarray
if isinstance(result, np.ndarray) and values.ndim > 0:
result = self._holder(result, tz='UTC')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok for now. yeah trying to avoid check like this! thanks!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but actually could/should this logic actually be in _try_coerce_result for DatetimeTZBlock?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally yes, but to avoid any side effect ATM.

@sinhrks sinhrks force-pushed the boxer_cln branch 4 times, most recently from 029b698 to b01608d Compare April 3, 2016 03:00
@jreback
Copy link
Contributor

jreback commented Apr 3, 2016

jreback@b7ec843
cleans up the datetime logic & coercion. and makes it more block friendly.

going to merge this, then can come back and fix the frame to use blocks. This is a bit of a special internal method as it is a reduction which we normally don't pass thru to the block manager (so using this raw kw). I have to think about how this works, but this is ok for now. Can clean up in a later PR.

jreback pushed a commit to jreback/pandas that referenced this pull request Apr 3, 2016
CLN: clean up quantile & move to BlockManager

closes pandas-dev#12741
closes pandas-dev#12772
closes pandas-dev#12469
closes pandas-dev#12752
@jreback jreback closed this in 36a8bd9 Apr 3, 2016
@jreback
Copy link
Contributor

jreback commented Apr 3, 2016

thanks @sinhrks

I may do a PR to make the reductions be a little nicer.

@sinhrks sinhrks deleted the boxer_cln branch April 3, 2016 17:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Clean Internals Related to non-user accessible pandas implementation
Projects
None yet
Development

Successfully merging this pull request may close these issues.

BUG: Series.quantile dtype / nan handling issue CLN: Move i8_boxer logic to BlockManager
2 participants