API: pandas.api #13634

jreback · 2016-07-13T02:01:01Z

established a bit of a public api in pandas.api; ATM this only contains the type introspection routines.

1st question

Some disagreement on whether we should not do this, and rather just expose pandas.types directly.

I think pandas.api is actually very important because:

limits scope of what we choose to expose in the future; its not simply 'everything' that isnt nailed down (the case now). This does change pandas to make everything by default private, EXCEPT what is explicitly public.
a single entry point for other packages to use the pandas public API that is known / stable / maintained (will with c-API as well)
provide consistent naming externally (and allow us to fix / hide internally as needed)
namespaced. I only import what I need as a user / other package, rather than everything

2nd question

as discussed here, should these deprecated API functions should be DeprecationWrarning rather than FutureWarning?
-> done in #13990

Ideally we should resolve this before the end of 0.19.0 (or remove it).

The text was updated successfully, but these errors were encountered:

jreback · 2016-07-13T02:01:19Z

@jorisvandenbossche @shoyer @wesm @TomAugspurger @sinhrks

jreback · 2016-07-13T10:20:09Z

@jorisvandenbossche updated the top section.

Related to second question in #13634 (whether to use FutureWarning or DeprecationWarning in deprecating the public pandas.core.common functions). As those functions are mostly used in library code, and less directly by users in their own code, I think a DeprecationWarning is more appropriate in this case. For example, in our own docs, we started to get warnings due to an example with a statsmodels regression that uses patsy using one of those functions. Note that recent IPython also shows DeprecationWarnings when using a deprecated function interactively. Author: Joris Van den Bossche <jorisvandenbossche@gmail.com> Closes #13990 from jorisvandenbossche/common-depr-warning and squashes the following commits: 2de5d48 [Joris Van den Bossche] Use DeprecationWarning for core.common deprecations (GH13634)

jorisvandenbossche · 2017-04-05T07:40:52Z

We have discussed this on some several places, recently also in the PR about the errors module (#15541) and previously when pandas.types was created (#13147 (comment) and comments below).
For the exceptions, we now have a top-level pandas.errors module, for type-related we have a private pandas.types and public pandas.api.types.

With the risk of keeping a settled discussion alive, I would still like to see this discussed some more.

To repeat the comment of @shoyer:

Ultimately, I don't think this matters too much, but given a choice between:

pandas.* indicates private namespace, pandas.api.* includes a public namespace, and

pandas.* includes public namespace, pandas._internals (and so forth) indicates private namespace

I think the later (option 2) is more user friendly and consistent with the top level namespace pandas already being public facing.

On the other hand, it's true that option 2 is a little harder to transition to (for us), so maybe it's not worth the trouble.

We have recently added new submodules to the top-level namespace (pandas.types, pandas.indexes, pandas.formats, code that was before mainly in pandas.core. ..), but towards the user these should be regarded as private. IMO this is not the good direction. For new code / refactors, I think we should stick to: only what is public can go in the top-level namespace.

Let's take the example of types. We now have a private pandas.types and a public pandas.api.types, so the less nested location is private, which I don't find ideal.
If we use the approach of pandas.api.types, I think we should at least make the actual implementation in pandas._types, so you don't have the two publicly visible locations.
The other option is to make put now what is in pandas.api.types, in pandas.types (while the exact implementation and organization in the different submodules of pandas.types remains a private implementation detail).

jreback · 2017-04-05T11:44:51Z

Here's my rationale:

We have 3 stakeholders.

users: regular folks who do import pandas as pd
library builders / advanced users: folks who need more 'library' code
implementation

implementation: this may seem strange to mention, but pandas is a large library. We need the flexibility to move code around, expand modules, rename and generally get stuff done internally, with out having to constantly deprecate things. This would be extremely disruptive to the public (not to mention burdensome).

For example. pandas.types exists with several sub-modules that are logically defined, simply named and concise. There is not reason a for 'users' or 'library' writers to access this at all. It exists simply to organize the code. Sure they can reach in if they really want, but we want to discourage this. Making it pandas._types would surely do this, but that is an ugly name and since this is pure python code a bit misleading.

'users': they are served by the top-level pd namespace, supplemented by pandas.errors. Everything that one could need is here (or as a method).

'library': these folks need code generally for instrospection that is just too cumbersome to be in a 'user' namespace. Things like is_integer_dtype is a canonicial way to introspect things in pandas. We use these internally and provide them as an external API to library writers.

I recently added pandas.api.lib with infer_dtype, which is a useful routine, again to external library writers. This is (or another sub-module of pandas.api) can also be the home of a c-api / development api.

So, everything as public is just fine. If someone wants to reach in and use a routine from pandas.format.* or whatever they should have no expectations that this is stable across releases.

For simplicity from a user AND a library pov, its much better to have a single namespace pandas.api that say: hey they routines are what I should use, they are documented and won't change

shoyer · 2017-04-05T17:19:45Z

I don't see a strong division between "users" and "advanced users". There is a continuum of use cases. Certainly we should group more things into submodules: this makes it easier to find related functionality.

One choice that would make this super clear is move all internal stuff into a top level submodule called internals, e.g., pandas.internals.core, pandas.internals.indexes, etc. Then if you're importing from pandas.internals.something, you know it's an internal routine. Otherwise, as a user it's not obvious whether pandas.something exists to as a namespace to logically group together the something routines or if it's an internal implementation detail.

jorisvandenbossche · 2017-04-05T22:32:21Z

For example. pandas.types exists with several sub-modules that are logically defined, simply named and concise. There is not reason a for 'users' or 'library' writers to access this at all. It exists simply to organize the code.

That is not fully what I meant. The pandas.types module could still have different sub-modules, but what I meant was that only the 'top-level' of pandas.types would be public, so we are still free to implement it as we like in the submodules.
And sure, we will change things in the future, but whether we expose the public names in pandas.api.types or in pandas.types (top-level, not submodules) does not really matter for that? In both cases the functions will not actually be defined in that location.

Making it pandas._types would surely do this, but that is an ugly name and since this is pure python code a bit misleading.

This is IMO not misleading at all, it is actually very explicit that it is internal.
And I agree it is a bit uglier, but is only about a few import lines in each pandas file. I don't think it is that problematic there is a bit ugliness there, and IMO worth the clarity (if we use the pandas.api approach).
The internals namespace is also a possiblity, but this is actually more or less what core is currently. So we could use it for that I think (although less explicit in name). But we moved types, formats, indexes out of core .., so this would be putting it back (but keeping the reorganization in submodules).

'users': they are served by the top-level pd namespace, supplemented by pandas.errors. Everything that one could need is here (or as a method).

This is already not fully the case. There are public methods in pandas.io.json, pandas.tseries.offsets, pandas.plotting (when that PR is merged), ..

I recently added pandas.api.lib with infer_dtype,

Slightly of topic, but I personally would rather put this in pandas.types, as it is type related. And the fact that it is in our lib is more an implemenation detail for the user.

jreback · 2017-04-06T19:21:32Z

ok so plan is to:

document that pandas.core is private
potentially search on github to see if we should be deprecating things more explicity
move out of pandas to pandas.core namspace
- computation
- formats
- indexes
- sparse
- tools
- tseries
- types
- util (maybe)?
going to leave stats (as when we remove the deprecations for things like pd.rolling_mean this will go away anyhow

cc @wesm
@jorisvandenbossche
@TomAugspurger

jorisvandenbossche · 2017-04-06T22:32:24Z

I started some search on github, but most of the things on the first pages were just internal imports in people who embedded the full pandas codebase in their repo ... Does anybody know of more advanced github search methods for such things?

jreback · 2017-04-06T22:35:44Z

people who embedded the full pandas codebase in their repo

who the heck does that????

jorisvandenbossche · 2017-04-06T22:38:14Z

Regarding the above list, I am not sure we can easily move tseries, there are too many public functions in that module (offsets and frequencies). So we should think how to organize that one.

For tools, I would take the opportunity to reorganize that a bit. For example add a reshaping submodule for stack/unstack/pivot etc functionality? (just an idea, the current 'tools' name is not very descriptive)

jorisvandenbossche · 2017-04-14T16:37:55Z

I am shortly going to move a bunch of things around. But pandas.io is going to stay (pandas.formats will move).

Can you maybe first give some more details here, so we can discuss some things first before you do the work? (I also have some time this weekend to look at it)

See my questions above (tseries). Further, formats could maybe be moved to io instead of core ?
Regarding util (your questionmark), I would maybe leave util where it is (that feels more logically), but then state that this is also private (together with core) ?

jreback · 2017-04-14T16:43:38Z

quickly did #15997 (stil WIP), moving pandas.formats -> pandas.core.formats. (will rebase after @TomAugspurger #15954

I think easy to move .computation, .types. Let's do that (I can provide a proxy deprecation module if needed). Then go from there.

Further we will then have a big warning somewhere that pandas.core is now considered private.

jreback · 2017-04-14T16:45:27Z

.util yes could leave there where it is.

I think we should also state that .tseries is private as well.

.tools I will also move to .core

.formats belongs in .core its about printing core stuff (not really about io per se).

jorisvandenbossche · 2017-04-14T16:47:42Z

formats belongs in .core its about printing core stuff (not really about io per se).

But if we expose certain aspects of it in io or io.formats, it seems logical to just move it there. It is about printing core objects, yes, but the line between output and repr is not that clear (to_html, to_string are basically used for both)

jreback · 2017-04-14T16:48:38Z

could move to io just as well. ok, i'll do that after @TomAugspurger merges then.

any problem with the others that I mentioned?

jorisvandenbossche · 2017-04-14T16:48:51Z

I think we should also state that .tseries is private as well.

Then we first need a assessment on what exactly is public in tseries (as there certainly are public objects there) and how we expose this.

jreback · 2017-05-01T21:56:41Z

what is the purpose of pandas.plotting?

jorisvandenbossche · 2017-05-01T22:05:49Z

It holds the non-method plotting functions like scatter_matrix

jreback · 2017-05-01T22:10:57Z

so what is the argument for having that as a top level module then?

TomAugspurger · 2017-05-01T22:15:00Z

so what is the argument for having that as a top level module then?

plotting or types? Assuming you mean types, Joris is saying pandas.types is more consistent with pandas.errors, pandas.plotting, pandas.testing.

jreback · 2017-05-02T00:22:41Z

actually no, I mean I don't really see the purpose of the top-level pandas.plotting.

further pandas.api.* will be expanded in the future. I see no reason to change this.

jreback · 2017-05-02T00:49:55Z

the more I look at this the more I don't think we need pandas.plotting at all. These are all methods on Series or DataFarme, with the exception of scatter_matrix.

Why do we need this again?

TomAugspurger · 2017-05-02T11:10:43Z

scatter_matrix, radviz, andrews_curves, parallel_coordinates, and autocorrelation_plot are all available in pandas.plotting and not on DataFrame.

jorisvandenbossche · 2017-05-02T17:16:42Z

pandas.api.* will be expanded in the future.

What do we envisage that will end up in there, apart from type related things?

jorisvandenbossche · 2017-05-02T17:21:14Z

@jreback To come back to the hashing utilities, I didn't respond yet to this (#13634 (comment)):

I think b) is fine for now. These should really be internal with a couple of things exposed via pandas.api.lib (e.g. just hash_pandas_object and hash_array), so could do that as well, but b) allows us to do that later

We now exposed the public hashing functions in pandas.util.hashing, which means we should keep it (the public exposure) there, and not later move it somewhere else.
If we are not sure the implementation will be kept in util.hashing, we could also expose the public method just in pandas.util instead of pandas.util.hashing

jreback · 2017-05-03T01:06:35Z

both of those are not great
this is purely a library exposed for other developers (dask) so it's public to them
if it's moved we can adjust dask but ituis is not public in any other way

TomAugspurger · 2017-05-03T20:14:54Z

Thoughts from anyone else? I think the proposals on the table are

Option 1 (current master);

pd.api.types
pd.errors
pd.io (.formats, .json, etc.)
pd.plotting
pd.testing

Option 2 (move types up):

pd.types
pd.errors
pd.io (.formats, .json, etc.)
pd.plotting
pd.testing

Option 3 (maybe? Not sure if this is on the table, but for symmetry)

pd.api.types
pd.api.errors
pd.api.io (.formats, .json, etc.)
pd.api.plotting
pd.api.testing

shoyer · 2017-05-03T20:23:30Z

I like option 2.

jreback · 2017-05-03T22:25:17Z

master is what it is
so -1 on any changes

jorisvandenbossche · 2017-05-03T23:10:57Z

@jreback Can you answer my comment above? #13634 (comment) (if you have an idea of course)

I am ok with option 1 / master (pandas.api.types), if we have an idea of what else we would put in pandas.api

jreback · 2017-05-03T23:20:47Z

@jreback Can you answer my comment above? #13634 (comment) (if you have an idea of course)

yes I think we should move hashing to pandas.api.lib (or something like that) and import the 2 functions (and I'll update dask as well).

jreback · 2017-05-03T23:22:01Z

pandas.util is also a bunch of random code. these should also not really be exposed and should really go in pandas.core.util

jorisvandenbossche · 2017-05-03T23:24:25Z

I think we should move hashing to pandas.api.lib

The objection I currently have against pandas.api.lib, is that I think that the fact that some functionality (eg in this case the hashing functions) are implemented in _lib is an implementation detail (if they were not in compiled code, they would not be there), and IMO this implementation detail is not need to 'leak' to the user API.

jorisvandenbossche · 2017-05-03T23:28:56Z

Another example is the cache_readonly decorator. I was thinking to use this one in geopandas, so it would be nice if there is a public exposure of it. It would also fit in util or pandas.api.lib, depending on this discussion.

Util is indeed a bunch of random things, but that is in its name, it is a bunch of various useful utility functions that don't really belong somewhere else. I think something like cache_readonly fits in that description.

jreback · 2017-05-03T23:36:47Z

ok what i could do is make everything in until private but leave the top level namespace to export certain functions as a flat namespace

jorisvandenbossche · 2017-05-03T23:38:44Z

make everything in until private but leave the top level namespace to export certain functions as a flat namespace

yes, that would be a nice idea to expose certain things (like eg the cache_readonly) in the util submodule

jreback · 2017-05-03T23:41:38Z

ok have a look for anything else
maybe Appender, Substitution

xref pandas-dev#13634

xref pandas-dev#13634 CLN: move pandas.util.* validators, depr_module, decorators, print_versions to _ leading

* CLN: make submodules of pandas.util private xref #13634 CLN: move pandas.util.* validators, depr_module, decorators, print_versions to _ leading * CLN: move remaning extensions to _libs * pandas.tools.hashing FutureWarning -> DeprecationWarning

* CLN: make submodules of pandas.util private xref pandas-dev#13634 CLN: move pandas.util.* validators, depr_module, decorators, print_versions to _ leading * CLN: move remaning extensions to _libs * pandas.tools.hashing FutureWarning -> DeprecationWarning

jreback added API Design Needs Discussion Requires discussion from core team before further action labels Jul 13, 2016

jreback added this to the 0.19.0 milestone Jul 13, 2016

jorisvandenbossche mentioned this issue Aug 13, 2016

Use DeprecationWarning for core.common deprecations (GH13634) #13990

Closed

jorisvandenbossche mentioned this issue Aug 13, 2016

ENH: PeriodIndex now has period dtype #13941

Closed

4 tasks

jreback modified the milestones: 1.0, 0.19.0 Sep 28, 2016

jorisvandenbossche mentioned this issue Mar 13, 2017

API: expose pandas.errors #15541

Closed

jreback modified the milestones: 0.20.0, 1.0 Apr 6, 2017

jreback self-assigned this Apr 6, 2017

chris-b1 mentioned this issue Apr 14, 2017

ENH: Style blocks #15954

Merged

jreback mentioned this issue Apr 14, 2017

CLN: move .formats #15997

Closed

jreback added a commit to jreback/pandas that referenced this issue May 4, 2017

CLN: make submodules of pandas.util private

ce092a6

xref pandas-dev#13634

jreback mentioned this issue May 4, 2017

CLN: make submodules of pandas.util private #16223

Merged

jreback added a commit to jreback/pandas that referenced this issue May 4, 2017

CLN: make submodules of pandas.util private

f7f324b

xref pandas-dev#13634 CLN: move pandas.util.* validators, depr_module, decorators, print_versions to _ leading

jreback added a commit to jreback/pandas that referenced this issue May 4, 2017

CLN: make submodules of pandas.util private

a7cd350

xref pandas-dev#13634 CLN: move pandas.util.* validators, depr_module, decorators, print_versions to _ leading

jreback added a commit to jreback/pandas that referenced this issue May 4, 2017

CLN: make submodules of pandas.util private

8d7fa0d

xref pandas-dev#13634 CLN: move pandas.util.* validators, depr_module, decorators, print_versions to _ leading

gboeing mentioned this issue May 5, 2017

upgrading to pandas 0.20.0 breaks geopandas 0.2.1 import geopandas/geopandas#443

Closed

This issue was closed.

API: pandas.api #13634

API: pandas.api #13634

Comments

jreback commented Jul 13, 2016 • edited by jorisvandenbossche Loading

1st question

2nd question

jreback commented Jul 13, 2016

jreback commented Jul 13, 2016

jorisvandenbossche commented Apr 5, 2017

jreback commented Apr 5, 2017 • edited Loading

shoyer commented Apr 5, 2017

jorisvandenbossche commented Apr 5, 2017

jreback commented Apr 6, 2017

jorisvandenbossche commented Apr 6, 2017

jreback commented Apr 6, 2017

jorisvandenbossche commented Apr 6, 2017

jorisvandenbossche commented Apr 14, 2017 • edited Loading

jreback commented Apr 14, 2017 • edited Loading

jreback commented Apr 14, 2017

jorisvandenbossche commented Apr 14, 2017

jreback commented Apr 14, 2017

jorisvandenbossche commented Apr 14, 2017

jreback commented May 1, 2017

jorisvandenbossche commented May 1, 2017

jreback commented May 1, 2017

TomAugspurger commented May 1, 2017

jreback commented May 2, 2017

jreback commented May 2, 2017

TomAugspurger commented May 2, 2017

jorisvandenbossche commented May 2, 2017

jorisvandenbossche commented May 2, 2017

jreback commented May 3, 2017

TomAugspurger commented May 3, 2017 • edited Loading

shoyer commented May 3, 2017

jreback commented May 3, 2017

jorisvandenbossche commented May 3, 2017

jreback commented May 3, 2017

jreback commented May 3, 2017

jorisvandenbossche commented May 3, 2017

jorisvandenbossche commented May 3, 2017

jreback commented May 3, 2017

jorisvandenbossche commented May 3, 2017

jreback commented May 3, 2017

jreback commented Jul 13, 2016 •

edited by jorisvandenbossche

Loading

jreback commented Apr 5, 2017 •

edited

Loading

jorisvandenbossche commented Apr 14, 2017 •

edited

Loading

jreback commented Apr 14, 2017 •

edited

Loading

TomAugspurger commented May 3, 2017 •

edited

Loading