Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

API: pseudo-public internals API for downstream libraries #40182

Merged
merged 9 commits into from
Mar 5, 2021

Conversation

jbrockmendel
Copy link
Member

Downstream libraries accessing core.internals causes headaches pretty regularly (#40149 (comment), #38134,). The proposal: define a public-ish stable-ish API to expose for these libraries, leaving us free-er to make changes in internals.

This exposes make_block. I expect we'll also need to expose one or both of create_block_manager_from_arrays/create_block_manager_from_blocks (comment in internals.__init__ says they are there for downstream xref #33892)

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add a test which exercises this from a downstream author POV (e.g. create an manipulate things, e.g. like what pyarrow is doing), but only do this by importing the new api you are establishing so they can't cheat

authors

1) Try to avoid using internals directly altogether, and failing that,
2) Use only functions exposed here (or in core.internals)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ideally we remove 2 as soon as possible.

@@ -0,0 +1,61 @@
"""
This is a pseudo-public API for downstream libraries. We ask that downstream
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would have these import to pandas.api.internals i think (then you can really control the exports)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i know pyarrow accesses the pd.core.internals namespace. not sure about others. we can ask them to change, but for the forseeable future will need these in the namespace.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

right what i mean is let's expose a wrapper api namespace and then we can change the downstream packages when we have released.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fair enough. keep it in this file though? im wary of adding it to the pd.api namespace lest new downstream packages adopt bad habits

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok sure

@jreback jreback added API - Consistency Internal Consistency of API/Behavior API Design Internals Related to non-user accessible pandas implementation labels Mar 2, 2021
@jbrockmendel
Copy link
Member Author

can you add a test which exercises this from a downstream author POV (e.g. create an manipulate things, e.g. like what pyarrow is doing), but only do this by importing the new api you are establishing so they can't cheat

i think the way to do this is to rename the internal version, then our existing downstream/io tests will test for this

@jorisvandenbossche
Copy link
Member

Thanks, it's probably a good idea to make a public-ish make_block, so you are more free to make changes/optimizations to the one used internally. Now, then we do need to add tests for this, because it won't longer be tested through our own test suite.

In pyarrow, we currently make use of BlockManager(..) constructor as well. And further, we access CategoricalBlock, DatetimeTZBlock etc, but only as object, eg for isinstance(block, ObjectBlock) (never calling the __init__ of those). Now, I suppose that we could remove this usage (instead of checking for CategoricalBlock, we could also check that block.values is a Categorical, for example)

@@ -0,0 +1,61 @@
"""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add a test for this model that asserts the exact name that we expose, similar to https://github.com/pandas-dev/pandas/blob/master/pandas/tests/api/test_api.py

@jreback jreback added this to the 1.3 milestone Mar 5, 2021
from pandas.core import internals
from pandas.core.internals import api


Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you also assert the entire namespace of api itself e.g. [name for name in dir(api) if not name.startswith('_')] and then compare to a hard coded list.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i dont think we want to hard-code the existing list bc whats in there is somewhat haphazard. Once we hear back from all the relevant parties on #40226 we'll be able to make that list

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i would fix it now with all of the symbols that are currently exposed

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

picking my battles; updated.

@jreback jreback merged commit 6423561 into pandas-dev:master Mar 5, 2021
@jbrockmendel jbrockmendel deleted the api-pseudo-public branch March 5, 2021 15:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
API - Consistency Internal Consistency of API/Behavior API Design Internals Related to non-user accessible pandas implementation
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants