Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CLN: de-duplicate Block.should_store and related #33028

Merged
merged 6 commits into from Mar 27, 2020

Conversation

jbrockmendel
Copy link
Member

FooBlock.should_store is equivalent to is_dtype_equal(self.dtype, other.dtype) for almost all of our Block subclasses. This makes that the implementation for the base class, so we define it in fewer places. xref #32878 on remaining cases.

@jreback jreback added Clean Internals Related to non-user accessible pandas implementation labels Mar 26, 2020
@jreback
Copy link
Contributor

jreback commented Mar 26, 2020

wow, cleans it up

Copy link
Member

@jorisvandenbossche jorisvandenbossche left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added some comments on changes to the array values methods

return np.asarray(self.values.astype("datetime64[ns]", copy=False))

def array_values(self) -> ExtensionArray:
return DatetimeArray._simple_new(self.values)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is no the same as self._holder(self.values) as you did above, this was calling _simple_new on purpose for efficience

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True. My thought was that since the DTA constructor is relatively strict, the overhead wouldn't be too bad. Can revert if you have a strong opinion. (and/or we could cache_readonly array_values)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would need to look back or time it again to be sure about the exact impact, but as far as I remember, I did this change when optimizing extract_array, and this was done for a reason

values = self.values.ravel()
result = self._holder(values).astype(object)
return result.reshape(self.values.shape)
return self._holder(self.values).astype(object)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a comment here on why _holder can handle 2D?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will do

if self.ndim == 2:
# Ensure that our shape is correct for DataFrame.
# ExtensionArrays are always 1-D, even in a DataFrame when
# the analogous NumPy-backed column would be a 2-D ndarray.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you keep this comment?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will do

@jreback jreback added this to the 1.1 milestone Mar 26, 2020
@jbrockmendel
Copy link
Member Author

@jreback thoughts on using the new default should_store implementation (return is_dtype_equal(self.dtype, other.dtype)` for the remaining places that special case it (ExtensionBlock, ComplexBlock)? For ExtensionBlock im pretty sure what we have now is just wrong, while for ComplexBlock it becomes equivalent to the new default if we were to add an itemsize check, which seems reasonable

@jreback
Copy link
Contributor

jreback commented Mar 26, 2020

@jreback thoughts on using the new default should_store implementation (return is_dtype_equal(self.dtype, other.dtype)` for the remaining places that special case it (ExtensionBlock, ComplexBlock)? For ExtensionBlock im pretty sure what we have now is just wrong, while for ComplexBlock it becomes equivalent to the new default if we were to add an itemsize check, which seems reasonable

yes i think you can

@jbrockmendel
Copy link
Member Author

OK next pass

@jorisvandenbossche jorisvandenbossche merged commit c47e9ca into pandas-dev:master Mar 27, 2020
@jorisvandenbossche
Copy link
Member

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Clean Internals Related to non-user accessible pandas implementation
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants