Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

API: Make ExtensionDtype.construct_array_type a method #36136

Closed

Conversation

TomAugspurger
Copy link
Contributor

This allows a single dtype to support multiple array classes.
For arrow-backed strings, we'll likely want a separate array class
for ease of implementation, clarity. But we'll have a parametrized
dtype.

class StringDtype:
    def __init__(self, storage="python"):
        self.storage = storage

    def construct_array_type(self):  # regular method
        if self.storage == "python":
            return StringArray
        else:
            return ArrowStringArray

Closes #36126

This allows a single dtype to support multiple array classes.
For arrow-backed strings, we'll likely want a separate array class
for ease of implementation, clarity. But we'll have a parametrized
dtype.

```python
class StringDtype:
    def __init__(self, storage="python"):
        self.storage = storage

    def construct_array_type(self):  # regular method
        if self.storage == "python":
            return StringArray
        else:
            return ArrowStringArray
```

Closes pandas-dev#36126
@TomAugspurger TomAugspurger added the ExtensionArray Extending pandas with custom dtypes or arrays. label Sep 5, 2020
Copy link
Member

@jorisvandenbossche jorisvandenbossche left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fully onboard! Just some concerns about backwards compatibility

.. _whatsnew_120.api_breaking.experimental:

Changes to experimental APIs
----------------------------
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it needs ~~~~~~~~~~~~~~~~~~~~~~~~ type header

Changes to experimental APIs
----------------------------

- :meth:`pandas.api.extensions.ExtensionDtype.construct_array_type` has changed from a classmethod to a regular method to support one dtype being used for multiple arrays. To migrate, change your definition to a regular method and ensure that your method is called on instances rather than the class (:issue:`36126`).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be clear, as long as you are not calling it yourself on the class object, and only implementing it, it shouldn't directly break existing dtypes ? (a class method can also be called from an instance)

(otherwise I think we should do it in a non-breaking way, eg by taking a different name)

@TomAugspurger
Copy link
Contributor Author

I just had a thought... As long as we're careful to only call the method on instances and not types, it may be OK for StringDtype.construct_array_type to be a regular method? The only place we don't do that now is in the test_decimal tests, everywhere in the library we're already working on instances, not types.

It's not the most sound, but maybe it's OK to avoid any risk of API-breaking changes in the interface?

@jorisvandenbossche
Copy link
Member

Yes, that sounds correct to me.

I would still document it that it is allowed? (it might also be useful for external dtypes) Or maybe add a based extension test to encourage downstream users to change it?

@TomAugspurger
Copy link
Contributor Author

Yeah, I think we'll want to encourage people to update. Perhaps via a (non-visisble) deprecationwarning at first, then an extension test asserting that it's not a classmethod, then we make the change in 2.x.

@jreback jreback added this to the 1.2 milestone Sep 5, 2020
@jreback
Copy link
Contributor

jreback commented Sep 5, 2020

Yeah, I think we'll want to encourage people to update. Perhaps via a (non-visisble) deprecationwarning at first, then an extension test asserting that it's not a classmethod, then we make the change in 2.x.

I am ok with just changing this, its an experimental api after all, not worth waiting for 2.x

@jbrockmendel
Copy link
Member

Decimal tests failing

@jbrockmendel
Copy link
Member

@TomAugspurger just pushed a small commit that fixes the broken test

@TomAugspurger
Copy link
Contributor Author

TomAugspurger commented Sep 18, 2020 via email

@github-actions
Copy link
Contributor

This pull request is stale because it has been open for thirty days with no activity. Please update or respond to this comment if you're still interested in working on this.

@github-actions github-actions bot added the Stale label Oct 19, 2020
@TomAugspurger
Copy link
Contributor Author

Going to hold on this till it's definitely needed.

@jorisvandenbossche
Copy link
Member

Personally, I think it is still useful to make this change. As long as we only call the method on instances, it shouldn't actually be a breaking change. So although it might not be explicitly needed for the StringArray (since it can just make it a plain method only there), it might still be useful in general to change this to signal that this is OK.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
API Design ExtensionArray Extending pandas with custom dtypes or arrays. Stale
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Make ExtensionDtype.construct_array_type a regular method
4 participants