Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

API: expose pandas.errors #15541

Closed
wants to merge 12 commits into from
Closed

API: expose pandas.errors #15541

wants to merge 12 commits into from

Conversation

jreback
Copy link
Contributor

@jreback jreback commented Mar 1, 2017

closes #14800

this doesn't actually deprecate the existing (as we have discussed in #15181) but baby steps.

I also didn't add the 'hated'! SettingWithCopy* ones; these will be removed in pandas 2.0 anyhow, so no point in adding them.

@jreback jreback added this to the 0.20.0 milestone Mar 1, 2017
@jreback
Copy link
Contributor Author

jreback commented Mar 1, 2017

@shoyer xref #15537 (comment)

@jreback jreback force-pushed the exceptions branch 3 times, most recently from ff94598 to 4386409 Compare March 1, 2017 19:45
@jreback
Copy link
Contributor Author

jreback commented Mar 1, 2017

@jorisvandenbossche xref #15181 (comment)

I think this actually works.

In [1]: from pandas.api.exceptions import UnsortedIndexError

In [2]: try:
   ...:     raise UnsortedIndexError()
   ...: except UnsortedIndexError:
   ...:     print("caught")
   ...:     
caught

@TomAugspurger
Copy link
Contributor

TomAugspurger commented Mar 1, 2017

@jreback second one should be except exceptions.UnsortedIndexError, ;) Missing the exceptions.

@jreback
Copy link
Contributor Author

jreback commented Mar 1, 2017

thanks @TomAugspurger yes that works too

@jorisvandenbossche
Copy link
Member

I think this actually works.

Yes, but that is because you didn't do any deprecations of the original ones. Referring from two locations to the same exception is no problem, but it was deprecating the old one (eg by using a decorator) that caused the problems discussed before.

I am in favor of putting this just in pandas.exceptions

@jreback
Copy link
Contributor Author

jreback commented Mar 2, 2017

@jorisvandenbossche

I am in favor of putting this just in pandas.exceptions

why would you create another namespace ? when we already have a perfectly good one.

@jreback
Copy link
Contributor Author

jreback commented Mar 3, 2017

@jorisvandenbossche can you comment on your objection?

@jorisvandenbossche
Copy link
Member

can you comment on your objection?

Well, that comes back to my initial objections to the api namespace :-) (so I should have pressed that more earlier ..)
I am in favour of removing things of the top-level namespace (functions / sub-namespaces) that people shouldn't use (as we are doing), but in favour of adding things users may use.

In this specific case, what I also would do is creating an actual module (let it be pandas.api.exceptions or pandas.exceptions, that's even a bit different discussion) with the code in there, and let our other modules (in the io code, indexing code, etc) import the exceptions from there (instead the other way around as it is now). That gathers all our custom exceptions in one place (but on the other hand moves them away from where they are used, so that's not necessarily a net win), but also gives a consistent api to users. If they encounter such an exception, it is of the same type (I mean with same class path name) as the one they can use to catch it. And you minimize the chance to 'misuse' a private import.

And to be clear, big +1 to make those exceptions more officially public!

@jreback
Copy link
Contributor Author

jreback commented Mar 3, 2017

@jorisvandenbossche sure, but these are really 2 separate things.

  • an API for exceptions to the user
  • move / change the internal usage

I don't think the 2nd part is relevant here. I mean we can always change how they are imported into the public namespace. The point here is to have a separate public API layer (just as we did with pandas.api.types. The actual exceptions being defined elsewhere is fine.

@shoyer
Copy link
Member

shoyer commented Mar 3, 2017 via email

@jreback
Copy link
Contributor Author

jreback commented Mar 3, 2017

@shoyer @jorisvandenbossche what is the reason to deviate from the established pandas.api namespace?

find with exceptions -> errors. actually was maybe going to split out warnings separately.

This is the point of the top-level api namespace, IOW.

api.types, api.errors, api.warnings, api.* whatever (imaging .algos, .lib, .includes, .src) etc.

this is going to eventually proliferate. We cannot have pandas.* namespace be anything but non-public.

We have 1 and only 1 public namespace (or at least moving in that direction).

@shoyer
Copy link
Member

shoyer commented Mar 3, 2017

If the default for namespaces like pandas.* going forward is public, what's the advantage of nesting them like pandas.api.*?

Most libraries don't use special designated api submodules, e.g., consider all the scipy submodules like scipy.stats, scipy.ndimage, scipy.linalg, etc.

@jreback
Copy link
Contributor Author

jreback commented Mar 3, 2017

no, pandas.* namespace is NOT public (or at least it shouldn't be). My statement is that is de-facto currently.

I am thinking about statsmodels.api

The point is that pandas NEEDS a specifically public api, that is not just everything the kitchen sink.

just because scipy/numpy does something does not make it the 'correct' way to do things. sure its a nice reference, but pandas has 2 sets of users, API and 'regular' ones. These are different and need to be treated differently.

@jreback
Copy link
Contributor Author

jreback commented Mar 3, 2017

The problem is that for years people building things on top of pandas have just reached in and used anything. That cannot continue to happen. How better to provide implementation flexibility than to have a specific public API? I thought we had settled this discussion a while back.

@shoyer
Copy link
Member

shoyer commented Mar 3, 2017

Ultimately, I don't think this matters too much, but given a choice between:

  1. pandas.* indicates private namespace, pandas.api.* includes a public namespace, and
  2. pandas.* includes public namespace, pandas._internals (and so forth) indicates private namespace

I think the later (option 2) is more user friendly and consistent with the top level namespace pandas already being public facing. In contrast, note that with statsmodels, users only use statsmodels.api, not the top level namespace statsmodels.

On the other hand, it's true that option 2 is a little harder to transition to (for us), so maybe it's not worth the trouble.

I don't really understand your point about "API" vs "regular" users. I don't think it's a terribly advanced use case to catch appropriate exception types. I do agree that we want to reduce noise in the top level namespace.

@jreback
Copy link
Contributor Author

jreback commented Mar 3, 2017

I don't really understand your point about "API" vs "regular" users. I don't think it's a terribly advanced use case to catch appropriate exception types. I do agree that we want to reduce noise in the top level namespace.

I would this is obvious.

  • 'regular' users are the ones who pretty much use pd.* as there namespace. In theory they may use pandas.api.exceptions (e.g. specifically catching a parser exception).
  • 'API' users are using (formerly) pandas internals to build other things on top of pandas. This is the point of an API.

Ultimately these are part of the same namespace that is publicly exposed. The point is that it is completely silly that people can just use any of the namespace. Sure you can, but pandas has gotten so big, that you simply cannot live that that when you need to move things around to change implementation details.

@jreback
Copy link
Contributor Author

jreback commented Mar 3, 2017

pandas.* includes public namespace, pandas._internals (and so forth) indicates private namespace

how exactly is this more friendly? I would argue that this is much less friendly, simply because you don't know where to find things that you might need. This is a huge surface area, both for dev and for usage. Limiting this scope is hugely important.

@jreback jreback force-pushed the exceptions branch 2 times, most recently from 998b171 to 56e8631 Compare March 8, 2017 14:04
@codecov-io
Copy link

codecov-io commented Mar 8, 2017

Codecov Report

Merging #15541 into master will decrease coverage by 0.02%.
The diff coverage is 92.3%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #15541      +/-   ##
==========================================
- Coverage   90.98%   90.96%   -0.03%     
==========================================
  Files         143      145       +2     
  Lines       49477    49483       +6     
==========================================
- Hits        45019    45014       -5     
- Misses       4458     4469      +11
Flag Coverage Δ
#multiple 88.72% <92.3%> (-0.01%) ⬇️
#single 40.63% <82.05%> (-0.12%) ⬇️
Impacted Files Coverage Δ
pandas/core/indexing.py 94.1% <ø> (ø) ⬆️
pandas/tslib.py 100% <ø> (ø) ⬆️
pandas/lib.py 100% <ø> (ø) ⬆️
pandas/api/lib/__init__.py 0% <0%> (ø)
pandas/tseries/index.py 95.43% <100%> (ø) ⬆️
pandas/io/excel.py 79.7% <100%> (+0.03%) ⬆️
pandas/computation/align.py 97.91% <100%> (+0.02%) ⬆️
pandas/io/parsers.py 95.52% <100%> (ø) ⬆️
pandas/core/common.py 90.68% <100%> (-0.29%) ⬇️
pandas/core/frame.py 97.56% <100%> (-0.1%) ⬇️
... and 15 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 456e729...e5fbdc8. Read the comment docs.

@jreback
Copy link
Contributor Author

jreback commented Mar 9, 2017

related:

this now happens in our doc builds.

/home/joris/miniconda3/envs/dev/lib/python3.5/site-packages/feather/api.py:43: FutureWarning: pandas.lib.infer_dtype is deprecated. Please use pandas._libs.lib.infer_dtype instead.
  inferred_type = pd.lib.infer_dtype(col)

certainly could do a PR upstream to fix this (e.g. with conditional imports). But might be nice
to have
from pandas.api.lib import infer_dtype

as a starting location for some library code. This is a useful enough routine that its nice to expose to the outside world.

@jreback
Copy link
Contributor Author

jreback commented Mar 9, 2017

cc @wesm

@wesm
Copy link
Member

wesm commented Mar 13, 2017

I've been running into these internal API / external API issues lately (e.g. needing to use DatetimeTZDtype in an external library), so it might be worth documenting the pandas < 0.19 way to get some of these APIs

@jorisvandenbossche
Copy link
Member

We indeed need to document this better.

To come back on the discussion above, I am also more in favor of "option 2" from #15541 (comment). It's indeed maybe a bit harder to get to that point, but IMO it would be much more clear to be able to say: everything accessible from the pandas.* namespace (directly or in a submodule) is public (rather than eg have to say pandas.types is private, but pandas.api.types you may use). But that is also a more general discussion not only related to exceptions, cfr #13634

@jreback
Copy link
Contributor Author

jreback commented Mar 27, 2017

@shoyer
updated

  • removed PandasError completely (barely used)
  • added doc-string to infer_dtype

@shoyer
Copy link
Member

shoyer commented Mar 27, 2017

@jreback looks great to me -- thanks!

One question on the docstring for infer_dtype: what do mixed and `mixed-integer mean? It would be nice to give some guidance on these. Here's my guess:

  • I think mixed means "not anything else" (i.e., dtype=object from numpy), not necessarily mixed types (e.g., you get "mixed" for an list of all tuples). So perhaps it would be better called "unknown-or-mixed" or something like that. Maybe this isn't worth changing, but it's definitely worth documenting.
  • I have no idea what 'mixed-integer' means exactly or how to trigger it. I thought this might mean a mixture of integer types (e.g., int32 vs int64) but those just turn up 'integer'.

@jreback
Copy link
Contributor Author

jreback commented Mar 27, 2017

@shoyer these have long-time meaning in pandas (could be changed, but this is what they are).

mixed* are just non-strings (unicode/bytes) that are actual mixed python objects.

In [9]: infer_dtype(['a', 1])
Out[9]: 'mixed-integer'

In [10]: infer_dtype(['a', 1.5])
Out[10]: 'mixed'

In [11]: infer_dtype([1.5, 1])
Out[11]: 'mixed-integer-float'

In [12]: infer_dtype(['a', pd.Timestamp('20130101')])
Out[12]: 'mixed'

In [14]: infer_dtype([(1, 2, 3)])
Out[14]: 'mixed'

@jreback
Copy link
Contributor Author

jreback commented Mar 27, 2017

pushed some examples.

Copy link
Member

@jorisvandenbossche jorisvandenbossche left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor comment, looks good to me

-------
string describing the common type of the input data.
Results can include:
- floating
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

small rst thing: no indentation needed, and white line before the list

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done, though I don't think we actually have a doc-generation section for pandas.api ATM. (i'll make an issue).

@jreback
Copy link
Contributor Author

jreback commented Mar 29, 2017

@jorisvandenbossche if you have a chance.

@@ -2,6 +2,7 @@

import warnings
warnings.warn("The pandas.lib module is deprecated and will be "
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be possible to specially catch infer_dtype here and give a more specific message since it has moved and is public? Noticed I'm using it some real code, I think based on your SO answer here - http://stackoverflow.com/a/20670901/3657742.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In [1]: pd.lib.infer_dtype('foo')
/Users/jreback/miniconda3/envs/pandas/bin/ipython:1: FutureWarning: pandas.lib.infer_dtype is deprecated. Please use pandas._libs.lib.infer_dtype instead.
  #!/Users/jreback/miniconda3/envs/pandas/bin/python
Out[1]: 'string'

hmm, let me see.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep

In [1]: pd.lib.infer_dtype('foo')
/Users/jreback/miniconda3/envs/pandas/bin/ipython:1: FutureWarning: pandas.lib is deprecated and will be removed in a future version. You can access infer_dtype in pandas.api.lib.infer_dtype
  #!/Users/jreback/miniconda3/envs/pandas/bin/python
Out[1]: 'string'

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done in latest push

Copy link
Member

@jorisvandenbossche jorisvandenbossche left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me!

@jreback jreback closed this in da0523a Apr 3, 2017
jreback added a commit that referenced this pull request Apr 24, 2017
* API: expose dtypes in pandas.api.types

xref #16015
xref apache/arrow#585
xref #16042
xref #15541 (comment)
pcluo pushed a commit to pcluo/pandas that referenced this pull request May 22, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
API Design Error Reporting Incorrect or improved errors from pandas
Projects
None yet
Development

Successfully merging this pull request may close these issues.

DEPR: relocate exceptions in pandas.core.common
7 participants