Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

API: expose pandas.errors #15541

Closed
wants to merge 12 commits into from

Conversation

Projects
None yet
7 participants
@jreback
Copy link
Contributor

commented Mar 1, 2017

closes #14800

this doesn't actually deprecate the existing (as we have discussed in #15181) but baby steps.

I also didn't add the 'hated'! SettingWithCopy* ones; these will be removed in pandas 2.0 anyhow, so no point in adding them.

@jreback jreback added the API Design label Mar 1, 2017

@jreback jreback added this to the 0.20.0 milestone Mar 1, 2017

@jreback

This comment has been minimized.

Copy link
Contributor Author

commented Mar 1, 2017

@jreback jreback force-pushed the jreback:exceptions branch 3 times, most recently from ff94598 to 4386409 Mar 1, 2017

@jreback

This comment has been minimized.

Copy link
Contributor Author

commented Mar 1, 2017

@jorisvandenbossche xref #15181 (comment)

I think this actually works.

In [1]: from pandas.api.exceptions import UnsortedIndexError

In [2]: try:
   ...:     raise UnsortedIndexError()
   ...: except UnsortedIndexError:
   ...:     print("caught")
   ...:     
caught
@TomAugspurger

This comment has been minimized.

Copy link
Contributor

commented Mar 1, 2017

@jreback second one should be except exceptions.UnsortedIndexError, ;) Missing the exceptions.

@jreback

This comment has been minimized.

Copy link
Contributor Author

commented Mar 1, 2017

thanks @TomAugspurger yes that works too

@jorisvandenbossche

This comment has been minimized.

Copy link
Member

commented Mar 1, 2017

I think this actually works.

Yes, but that is because you didn't do any deprecations of the original ones. Referring from two locations to the same exception is no problem, but it was deprecating the old one (eg by using a decorator) that caused the problems discussed before.

I am in favor of putting this just in pandas.exceptions

@jreback

This comment has been minimized.

Copy link
Contributor Author

commented Mar 2, 2017

@jorisvandenbossche

I am in favor of putting this just in pandas.exceptions

why would you create another namespace ? when we already have a perfectly good one.

@jreback

This comment has been minimized.

Copy link
Contributor Author

commented Mar 3, 2017

@jorisvandenbossche can you comment on your objection?

@jorisvandenbossche

This comment has been minimized.

Copy link
Member

commented Mar 3, 2017

can you comment on your objection?

Well, that comes back to my initial objections to the api namespace :-) (so I should have pressed that more earlier ..)
I am in favour of removing things of the top-level namespace (functions / sub-namespaces) that people shouldn't use (as we are doing), but in favour of adding things users may use.

In this specific case, what I also would do is creating an actual module (let it be pandas.api.exceptions or pandas.exceptions, that's even a bit different discussion) with the code in there, and let our other modules (in the io code, indexing code, etc) import the exceptions from there (instead the other way around as it is now). That gathers all our custom exceptions in one place (but on the other hand moves them away from where they are used, so that's not necessarily a net win), but also gives a consistent api to users. If they encounter such an exception, it is of the same type (I mean with same class path name) as the one they can use to catch it. And you minimize the chance to 'misuse' a private import.

And to be clear, big +1 to make those exceptions more officially public!

@jreback

This comment has been minimized.

Copy link
Contributor Author

commented Mar 3, 2017

@jorisvandenbossche sure, but these are really 2 separate things.

  • an API for exceptions to the user
  • move / change the internal usage

I don't think the 2nd part is relevant here. I mean we can always change how they are imported into the public namespace. The point here is to have a separate public API layer (just as we did with pandas.api.types. The actual exceptions being defined elsewhere is fine.

@shoyer

This comment has been minimized.

Copy link
Member

commented Mar 3, 2017

@jreback

This comment has been minimized.

Copy link
Contributor Author

commented Mar 3, 2017

@shoyer @jorisvandenbossche what is the reason to deviate from the established pandas.api namespace?

find with exceptions -> errors. actually was maybe going to split out warnings separately.

This is the point of the top-level api namespace, IOW.

api.types, api.errors, api.warnings, api.* whatever (imaging .algos, .lib, .includes, .src) etc.

this is going to eventually proliferate. We cannot have pandas.* namespace be anything but non-public.

We have 1 and only 1 public namespace (or at least moving in that direction).

@shoyer

This comment has been minimized.

Copy link
Member

commented Mar 3, 2017

If the default for namespaces like pandas.* going forward is public, what's the advantage of nesting them like pandas.api.*?

Most libraries don't use special designated api submodules, e.g., consider all the scipy submodules like scipy.stats, scipy.ndimage, scipy.linalg, etc.

@jreback

This comment has been minimized.

Copy link
Contributor Author

commented Mar 3, 2017

no, pandas.* namespace is NOT public (or at least it shouldn't be). My statement is that is de-facto currently.

I am thinking about statsmodels.api

The point is that pandas NEEDS a specifically public api, that is not just everything the kitchen sink.

just because scipy/numpy does something does not make it the 'correct' way to do things. sure its a nice reference, but pandas has 2 sets of users, API and 'regular' ones. These are different and need to be treated differently.

@jreback

This comment has been minimized.

Copy link
Contributor Author

commented Mar 3, 2017

The problem is that for years people building things on top of pandas have just reached in and used anything. That cannot continue to happen. How better to provide implementation flexibility than to have a specific public API? I thought we had settled this discussion a while back.

@shoyer

This comment has been minimized.

Copy link
Member

commented Mar 3, 2017

Ultimately, I don't think this matters too much, but given a choice between:

  1. pandas.* indicates private namespace, pandas.api.* includes a public namespace, and
  2. pandas.* includes public namespace, pandas._internals (and so forth) indicates private namespace

I think the later (option 2) is more user friendly and consistent with the top level namespace pandas already being public facing. In contrast, note that with statsmodels, users only use statsmodels.api, not the top level namespace statsmodels.

On the other hand, it's true that option 2 is a little harder to transition to (for us), so maybe it's not worth the trouble.

I don't really understand your point about "API" vs "regular" users. I don't think it's a terribly advanced use case to catch appropriate exception types. I do agree that we want to reduce noise in the top level namespace.

@jreback

This comment has been minimized.

Copy link
Contributor Author

commented Mar 3, 2017

I don't really understand your point about "API" vs "regular" users. I don't think it's a terribly advanced use case to catch appropriate exception types. I do agree that we want to reduce noise in the top level namespace.

I would this is obvious.

  • 'regular' users are the ones who pretty much use pd.* as there namespace. In theory they may use pandas.api.exceptions (e.g. specifically catching a parser exception).
  • 'API' users are using (formerly) pandas internals to build other things on top of pandas. This is the point of an API.

Ultimately these are part of the same namespace that is publicly exposed. The point is that it is completely silly that people can just use any of the namespace. Sure you can, but pandas has gotten so big, that you simply cannot live that that when you need to move things around to change implementation details.

@jreback

This comment has been minimized.

Copy link
Contributor Author

commented Mar 3, 2017

pandas.* includes public namespace, pandas._internals (and so forth) indicates private namespace

how exactly is this more friendly? I would argue that this is much less friendly, simply because you don't know where to find things that you might need. This is a huge surface area, both for dev and for usage. Limiting this scope is hugely important.

@jreback jreback force-pushed the jreback:exceptions branch 2 times, most recently from 998b171 to 56e8631 Mar 8, 2017

@codecov-io

This comment has been minimized.

Copy link

commented Mar 8, 2017

Codecov Report

Merging #15541 into master will decrease coverage by 0.02%.
The diff coverage is 92.3%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #15541      +/-   ##
==========================================
- Coverage   90.98%   90.96%   -0.03%     
==========================================
  Files         143      145       +2     
  Lines       49477    49483       +6     
==========================================
- Hits        45019    45014       -5     
- Misses       4458     4469      +11
Flag Coverage Δ
#multiple 88.72% <92.3%> (-0.01%) ⬇️
#single 40.63% <82.05%> (-0.12%) ⬇️
Impacted Files Coverage Δ
pandas/core/indexing.py 94.1% <ø> (ø) ⬆️
pandas/tslib.py 100% <ø> (ø) ⬆️
pandas/lib.py 100% <ø> (ø) ⬆️
pandas/api/lib/__init__.py 0% <0%> (ø)
pandas/tseries/index.py 95.43% <100%> (ø) ⬆️
pandas/io/excel.py 79.7% <100%> (+0.03%) ⬆️
pandas/computation/align.py 97.91% <100%> (+0.02%) ⬆️
pandas/io/parsers.py 95.52% <100%> (ø) ⬆️
pandas/core/common.py 90.68% <100%> (-0.29%) ⬇️
pandas/core/frame.py 97.56% <100%> (-0.1%) ⬇️
... and 15 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 456e729...e5fbdc8. Read the comment docs.

@jreback jreback force-pushed the jreback:exceptions branch from 56e8631 to f0fb174 Mar 9, 2017

@jreback

This comment has been minimized.

Copy link
Contributor Author

commented Mar 9, 2017

related:

this now happens in our doc builds.

/home/joris/miniconda3/envs/dev/lib/python3.5/site-packages/feather/api.py:43: FutureWarning: pandas.lib.infer_dtype is deprecated. Please use pandas._libs.lib.infer_dtype instead.
  inferred_type = pd.lib.infer_dtype(col)

certainly could do a PR upstream to fix this (e.g. with conditional imports). But might be nice
to have
from pandas.api.lib import infer_dtype

as a starting location for some library code. This is a useful enough routine that its nice to expose to the outside world.

@jreback

This comment has been minimized.

Copy link
Contributor Author

commented Mar 9, 2017

cc @wesm

@wesm

This comment has been minimized.

Copy link
Member

commented Mar 13, 2017

I've been running into these internal API / external API issues lately (e.g. needing to use DatetimeTZDtype in an external library), so it might be worth documenting the pandas < 0.19 way to get some of these APIs

@jorisvandenbossche

This comment has been minimized.

Copy link
Member

commented Mar 13, 2017

We indeed need to document this better.

To come back on the discussion above, I am also more in favor of "option 2" from #15541 (comment). It's indeed maybe a bit harder to get to that point, but IMO it would be much more clear to be able to say: everything accessible from the pandas.* namespace (directly or in a submodule) is public (rather than eg have to say pandas.types is private, but pandas.api.types you may use). But that is also a more general discussion not only related to exceptions, cfr #13634

@jreback

This comment has been minimized.

Copy link
Contributor Author

commented Mar 27, 2017

@shoyer these have long-time meaning in pandas (could be changed, but this is what they are).

mixed* are just non-strings (unicode/bytes) that are actual mixed python objects.

In [9]: infer_dtype(['a', 1])
Out[9]: 'mixed-integer'

In [10]: infer_dtype(['a', 1.5])
Out[10]: 'mixed'

In [11]: infer_dtype([1.5, 1])
Out[11]: 'mixed-integer-float'

In [12]: infer_dtype(['a', pd.Timestamp('20130101')])
Out[12]: 'mixed'

In [14]: infer_dtype([(1, 2, 3)])
Out[14]: 'mixed'

@jreback jreback force-pushed the jreback:exceptions branch from 6efcc6d to b6947a1 Mar 27, 2017

@jreback

This comment has been minimized.

Copy link
Contributor Author

commented Mar 27, 2017

pushed some examples.

@jreback jreback force-pushed the jreback:exceptions branch from b6947a1 to 5c2543f Mar 27, 2017

@jorisvandenbossche
Copy link
Member

left a comment

minor comment, looks good to me

-------
string describing the common type of the input data.
Results can include:
- floating

This comment has been minimized.

Copy link
@jorisvandenbossche

jorisvandenbossche Mar 27, 2017

Member

small rst thing: no indentation needed, and white line before the list

This comment has been minimized.

Copy link
@jreback

jreback Mar 27, 2017

Author Contributor

done, though I don't think we actually have a doc-generation section for pandas.api ATM. (i'll make an issue).

@jreback

This comment has been minimized.

Copy link
Contributor Author

commented Mar 29, 2017

@jorisvandenbossche if you have a chance.

@jreback jreback force-pushed the jreback:exceptions branch from 6c21f63 to 7f18b9a Apr 2, 2017

@@ -2,6 +2,7 @@

import warnings
warnings.warn("The pandas.lib module is deprecated and will be "

This comment has been minimized.

Copy link
@chris-b1

chris-b1 Apr 3, 2017

Contributor

Would it be possible to specially catch infer_dtype here and give a more specific message since it has moved and is public? Noticed I'm using it some real code, I think based on your SO answer here - http://stackoverflow.com/a/20670901/3657742.

This comment has been minimized.

Copy link
@jreback

jreback Apr 3, 2017

Author Contributor
In [1]: pd.lib.infer_dtype('foo')
/Users/jreback/miniconda3/envs/pandas/bin/ipython:1: FutureWarning: pandas.lib.infer_dtype is deprecated. Please use pandas._libs.lib.infer_dtype instead.
  #!/Users/jreback/miniconda3/envs/pandas/bin/python
Out[1]: 'string'

hmm, let me see.

This comment has been minimized.

Copy link
@jreback

jreback Apr 3, 2017

Author Contributor

yep

In [1]: pd.lib.infer_dtype('foo')
/Users/jreback/miniconda3/envs/pandas/bin/ipython:1: FutureWarning: pandas.lib is deprecated and will be removed in a future version. You can access infer_dtype in pandas.api.lib.infer_dtype
  #!/Users/jreback/miniconda3/envs/pandas/bin/python
Out[1]: 'string'

This comment has been minimized.

Copy link
@jreback

jreback Apr 3, 2017

Author Contributor

done in latest push

@jreback jreback force-pushed the jreback:exceptions branch from 7f18b9a to e5fbdc8 Apr 3, 2017

@jorisvandenbossche
Copy link
Member

left a comment

Looks good to me!

@jreback jreback closed this in da0523a Apr 3, 2017

linebp added a commit to linebp/pandas that referenced this pull request Apr 17, 2017

API: expose pandas.errors
closes pandas-dev#14800

Author: Jeff Reback <jeff@reback.net>

Closes pandas-dev#15541 from jreback/exceptions and squashes the following commits:

e5fbdc8 [Jeff Reback] give nicer deprecation / message on infer_dtype moving
ab4525b [Jeff Reback] typo on pandas.errors in whatsnew
d636ef7 [Jeff Reback] document removed exceptions
3dc4b9a [Jeff Reback] more docs for exceptions
2bb1fbd [Jeff Reback] remove AmbiguousIndexError, completely unused
5754630 [Jeff Reback] fix doc-string
35d225f [Jeff Reback] more examples
e91901d [Jeff Reback] DOC: better docs on infer_type
7e8432d [Jeff Reback] remove need for PandasError sub-class
92b2fdc [Jeff Reback] corrections
991fbb4 [Jeff Reback] API: expose pandas.errors
eec40cd [Jeff Reback] add pandas.api.lib add infer_dtype to pandas.api.lib

jreback added a commit that referenced this pull request Apr 24, 2017

API: expose dtypes in pandas.api.types (#16099)
* API: expose dtypes in pandas.api.types

xref #16015
xref apache/arrow#585
xref #16042
xref #15541 (comment)

linebp added a commit to linebp/pandas that referenced this pull request May 2, 2017

pcluo added a commit to pcluo/pandas that referenced this pull request May 22, 2017

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.