New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added DataError for irrecoverable data interface initialization errors #2041

Merged
merged 6 commits into from Oct 28, 2017

Conversation

Projects
None yet
2 participants
@philippjfr
Member

philippjfr commented Oct 28, 2017

We really need to do something about Interface data errors, particularly those that can provide clear and unambiguous exception message when the data is definitely incorrectly defined. In this PR I introduce a DataError, which prevents the Interface from trying to fall back to other data types and therefore allows raising a proper exception message. This approach is much simpler than what I outlined in #1986 and makes the process of raising specific exceptions quite simple.

Addresses:

#1762
#1867

@philippjfr philippjfr added the data label Oct 28, 2017

@philippjfr

This comment has been minimized.

Member

philippjfr commented Oct 28, 2017

@jlstevens this is one PR that does need your review. If you agree this is a decent approach and can't think of any more errors to improve this is ready to merge.

@@ -162,6 +166,8 @@ def initialize(cls, eltype, data, kdims, vdims, datatype=None):
try:
(data, dims, extra_kws) = interface.init(eltype, data, kdims, vdims)
break
except DataError:
raise

This comment has been minimized.

@jlstevens

jlstevens Oct 28, 2017

Member

Isn't something similar needed in the try/except where all the data interfaces are tried, one by one?

This comment has been minimized.

@philippjfr

philippjfr Oct 28, 2017

Member

I'm confused, that's exactly where this is.

@jlstevens

This comment has been minimized.

Member

jlstevens commented Oct 28, 2017

I agree this is important issue to address and adding a custom exception like this makes a lot of sense to me. Just some ideas about possible names:

  • DataError - what you used. Short but a little ambiguous.
  • UserDataError - to make it clear it was supplied to HoloViews?
  • MalformedDataError - bit too long!
  • InterfaceError - might be ok and less generic than DataError.

I think next to DataError, InterfaceError seems like the next possibility. One other idea is that you could automatically customize this exception to autogenerate part of the exception message e.g you could supply the interface class itself to 'InterfaceError' constructor and this could call a (class)method that would augment the supplied message with as much helpful information as possible e.g to make sure the library/datatype is mentioned, maybe point to the docs somewhere etc.

@philippjfr

This comment has been minimized.

Member

philippjfr commented Oct 28, 2017

InterfaceError seems significantly more generic than DataError to me since users don't necessarily know what an Interface even is. UserDataError seems fine, although the exception messages generally already make it clear that it is a user error.

One other idea is that you could automatically customize this exception to autogenerate part of the exception message e.g you could supply the interface class itself to 'InterfaceError' constructor and this could call a (class)method that would augment the supplied message with as much helpful information as possible e.g to make sure the library/datatype is mentioned, maybe point to the docs somewhere etc.

This doesn't make much sense to me, if you look where this is mostly used it's in cases where a particular interface identifies a specific error condition and can raise it directly. Adding extra indirection makes little sense when the original place where the error is raised already has all that information.

Might be good if @jbednar chimes in here as well.

@jlstevens

This comment has been minimized.

Member

jlstevens commented Oct 28, 2017

DataError is fine and the the idea about augmenting the exception message was only a suggestion to consider. Seems fine to me!

@philippjfr

This comment has been minimized.

Member

philippjfr commented Oct 28, 2017

Maybe I didn't consider your proposal correctly. I wouldn't mind if the DataError automatically adds a bit about the interfaces and adds a bit that points to the docs section that describes the allowed data types, e.g. to Gridded Data, Tabular Data or Geometry Data (once that user guide exists).

@philippjfr

This comment has been minimized.

Member

philippjfr commented Oct 28, 2017

In other words whatever raises the error provides a specific error message and the error then adds a more generic bit about data interfaces and points to the docs.

@philippjfr

This comment has been minimized.

Member

philippjfr commented Oct 28, 2017

To make that a bit more concrete, it might look something like this:

Key dimension values and value array 'Fluorescence' shapes do not match. Expected shape (10, 11), does not match actual shape: (10, 10). GridInterface is a gridded data type, for more information about valid data formats see holoviews.org/user_guide/Gridded_Data.html

The second half would be auto-generated based on the Interface type.

@jlstevens

This comment has been minimized.

Member

jlstevens commented Oct 28, 2017

Right. That is the sort of thing I was think of!

Edit: Minor point but I would format it so the core message is separate from the extra help. i.e

Key dimension values and value array 'Fluorescence' shapes do not match. Expected shape (10, 11), does not match actual shape: (10, 10).

GridInterface is a gridded data type, for more information about valid data formats see holoviews.org/user_guide/Gridded_Data.html

@philippjfr

This comment has been minimized.

Member

philippjfr commented Oct 28, 2017

Okay here are some samples from what I've implemented now:

DataError: Key dimension values and value array Fluorescence shapes do not match. Expected shape (61, 111, 50), actual shape: (62, 111, 50)

GridInterface expects gridded data, for more information on supported datatypes see http://holoviews.org/user_guide/Gridded_Datasets.html

and:

DataError: pandas DataFrame column names used as dimensions must be strings not integers.

PandasInterface expects tabular data, for more information on supported datatypes see http://holoviews.org/user_guide/Tabular_Datasets.html

and:

DataError: xarray Dataset must define coordinates for all defined kdims, [Dimension('x'), Dimension('y'), Dimension('z')] coordinates not found.

XArrayInterface expects gridded data, for more information on supported datatypes see http://holoviews.org/user_guide/Gridded_Datasets.html

and:

DataError: If xarray DataArray does not define a name an explicit vdim must be supplied.

XArrayInterface expects gridded data, for more information on supported datatypes see http://holoviews.org/user_guide/Gridded_Datasets.html

and:

DataError: MultiInterface subpaths must all have matching vdims.

MultiInterface expects a list of tabular data, for more information on supported datatypes see http://holoviews.org/user_guide/Tabular_Datasets.html

I'll update the last one once there is a user guide for geometry (path, contour, polygon) data.

@philippjfr

This comment has been minimized.

Member

philippjfr commented Oct 28, 2017

@jlstevens If the tests pass this is ready to merge. I think this is going to be a huge improvement, but I imagine we'll be adding more specific errors over time. We should collect uninformative "None of the available storage backends were able to support the supplied data format." somewhere and aim to ensure that a user always gets a more specific error.

@jlstevens

This comment has been minimized.

Member

jlstevens commented Oct 28, 2017

I think this is going to be a huge improvement

Agreed!

... but I imagine we'll be adding more specific errors over time. We should collect uninformative "None of the available storage backends were able to support the supplied data format." somewhere and aim to ensure that a user always gets a more specific error.

Also a good idea.

If you are happy with this, go ahead and merge once the tests pass.

@philippjfr

This comment has been minimized.

Member

philippjfr commented Oct 28, 2017

Great, tests are now passing. I'll go ahead and merge.

@philippjfr philippjfr merged commit 2a0e807 into master Oct 28, 2017

4 checks passed

continuous-integration/travis-ci/pr The Travis CI build passed
Details
continuous-integration/travis-ci/push The Travis CI build passed
Details
coverage/coveralls First build on dataset_error at 79.955%
Details
s3-reference-data-cache Test data is cached.
Details

@philippjfr philippjfr deleted the dataset_error branch Oct 28, 2017

jbednar added a commit that referenced this pull request Oct 31, 2017

Added DataError for irrecoverable data interface initialization errors (
#2041)

* Added DataError for irrecoverable data interface init errors

* Added additional interface info to DataError

* Fixed DataArray vdim error

* Added DataError for missing xarray coords

* Fixed error for GriddedInterface shape mismatch

* Updated MultiInterface error tests

@pyup-bot pyup-bot referenced this pull request Nov 3, 2017

Closed

Update holoviews to 1.9.0 #104

@pyup-bot pyup-bot referenced this pull request Nov 13, 2017

Closed

Update holoviews to 1.9.1 #120

@pyup-bot pyup-bot referenced this pull request Dec 12, 2017

Merged

Update holoviews to 1.9.2 #139

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment