Dimension preferences #843

jlstevens · 2016-09-01T16:55:41Z

Since the very beginning of the project, there has been some tension as to what Dimension objects are for. There have been two major pressures:

Keeping dimension objects as lightweight as possible, used to describe key features about the data.
Using dimension objects to implement features and improve usability. For instance, we've wanted a way to set a default position for sliders and Dimension is the obvious place to put that. Other things related to usability are soft_range and the formatters.

This situation has made things difficult as Dimension is a core component of HoloViews that we want to keep as simple as possible on one hand while also wanting to support new features on the other. Things are further complicated by questions as to whether two dimension objects are 'equal' and therefore interchangeable.

After discussing this problem with @philippjfr and @jbednar, I think we've found a good compromise based on this insight: some of what Dimension contains is indeed core metadata about the data and the rest of the options are aimed at usability that are issues of preference (which can be subjective).

The core metadata should be a small list of things that define the most important information about the dimension and the preferences can be a much larger, open-ended set of options aimed at usability. For this reason, we should group these two types of metadata separately.

Core metadata: name, unit, range (often constrained by the unit definition, e.g (0, 360) for angle in degrees and (0, None) for a temperature in Kelvin).
Preferences: soft_range (the typical useful range as determined by the domain or user preference), default_key and the various formatters.

Based on this, the proposal is that Dimension has only the core metadata as its own parameters as well as a preference parameter that holds an object containing all the preferences. This makes it clear as to what the most important information is regarding a dimension while allowing flexibility in terms of the various preferences.

We still need to discuss the exact form this preference object takes and we should think of finding a syntactically convenient way of defining them.

The text was updated successfully, but these errors were encountered:

jbednar · 2016-09-01T16:58:06Z

Sounds good! Deciding what to do with existing preferences like soft_range is also important, as a a balance between compatibility and clean semantics.

jlstevens · 2016-09-01T17:00:24Z

Here is one syntax for this I think might be quite nice:

K = hv.Dimension('Kelvin', range=(0,None)).prefs(soft_range=(200,300), default=220)

jbednar · 2016-09-01T17:02:56Z

Would the prefs object accept any Parameterized, or only the one sample one that we define (with soft_range, default, etc. already enumerated as Parameters)?

jlstevens · 2016-09-01T17:03:07Z

Of course, we also need to be careful around the issue of compatibility. Here is what we could do:

Issue deprecation warnings when things like soft_range are passed in through the dimension constructor, suggesting the use of the pref method while making it easy to disable such warnings.
Eventually remove these parameters (probably between one major release and the next).

I suppose we would need a bit of compatibility for pickling once these parameters are moved to live on the preferences object.

jlstevens · 2016-09-01T17:04:36Z

@jbednar Good question!

Maybe we could offer a way of using a preferences object we can understand (with things we define such as soft_range etc) but support storing other parameters corresponding to metadata that people may want to store on the dimension as preferences but that we won't then act upon.

philippjfr · 2016-09-01T17:14:26Z

Thanks for the summary. Several things to note, first of I'm wondering about the other parameters, like type and value. The type isn't currently used very consistently but values are used to define DynamicMaps with concrete sampling.

Preferences: soft_range (the typical useful range as determined by the domain or user preference), default_key and the various formatters.

K = hv.Dimension('Kelvin', range=(0,None)).prefs(soft_range=(200,300), default=220)

This does not match the current semantics of range and soft_range. range defines an absolute range to use while soft_range only defines soft limits which can be overridden by the data. In your example the plotting backends would normalize in the range 0 to None.

Maybe we could offer a way of using a preferences object we can understand (with things we define such as soft_range etc) but support storing other parameters corresponding to metadata that people may want to store on the dimension as preferences but that we won't then act upon.

Yes, storing other metadata on them makes sense, although perhaps calling it preferences isn't quite correct then.

jlstevens · 2016-09-01T17:18:54Z

This does not match the current semantics of range and soft_range. range defines an absolute range to use while soft_range only defines soft limits which can be overridden by the data.

Thanks for clarifying!

The key thing is that we agree that range is absolute and soft_range isn't.

I suspect value might be core as you can have categorial dimensions that are only defined for a number of discrete values. I think type (to the extent we use it) is also core.

philippjfr · 2016-10-10T13:44:22Z

I'll have to clarify again, when I said range is absolute what I meant is that it overrides everything else, including the data or soft_range. What that means is that it cannot be used as a definition of the actual physical quantity (e.g. Dimension('Temperature', unit='K', range=(0, inf)) but is simply a convenient way for the user to specify which part of the dimension range they are interested in. In other words the range is just as much a choice as the other parameters since the user has to set it deliberately to control the axis ranges, color normalization, or which parts of the parameter space can be explored with a DynamicMap.

This means range is not describing some intrinsic and absolute property of a physical quantity (i.e. what you call core metadata) but rather provides convenient metadata about how to display the Dimensioned object and therefore falls into the list of preferences. The only core metadata on our Dimensions is the name and the unit. Defining the actual range of values a physical quantity allows is something that's best left up to a units package which we could eventually integrate with HoloViews.

jlstevens · 2016-10-10T13:54:47Z

Thanks for the clarification. To update the bullet point lists above:

Core metadata: name, unit, type
Preferences: range, values, soft_range, default_key and the various formatters.

I agree we are very flexible with how range and values are used so this new suggestion makes sense. That said, isn't value overloaded to also specify the categories for categorical dimensions?

philippjfr · 2017-03-15T18:31:50Z

As part of 1.7 we will add the preferences to Dimension, and as part of 2.0 we will move existing parameters to the preferences.

jlstevens · 2017-04-05T14:10:38Z

After starting a PR and discussing this issue in detail with Philipp (again!) we came up with a different approach. This is a proposal we are both happy and it requires fewer changes while doing away with an explicit concept of dimension preferences:

Dimensions will now have clearly defined equality semantics i.e what dimensions are which makes me happy.
The way dimensions are used is largely unchanged and we can feel free to add parameters to Dimension as necessary, minimizing disruption.

Instead of introducing preferences, the idea is that dimension objects are almost entirely preferences which means almost everything in a dimension can be ignored when comparing them. The key new insight is that what defines a dimension can't be some optional argument, or in other words, 'core' metadata must be whatever is passed as the first (mandatory) positional argument.

Currently there are two things you can put into this mandatory position, a string (the name) or a tuple (the name and label):

a = hv.Dimension('a')
b = hv.Dimension(('b', 'description of the b dimension'))

This means the current rules for equality (defining the semantics) should be:

Two dimension objects are equal if and only if their names match AND their labels match.
A dimension matches a string for equality if either the name or label matches. (This definition is slightly loose but it is used all over the code. Names and labels are expected to be very different lengths by definition so there should be little confusion in practice.)

Matching on both name and label should always work between two dimension objects as the label will match the name by default (i.e if it isn't explicitly set via the tuple).

Units

We have long wondered whether units are part of the semantics of a dimension or not. I believe we have finally figured this out:

Currently our 'unit' parameter is just an optional string used to display the dimension (e.g on an axis). It is not a rich unit object supporting unit conversions so it should not be considered core metadata and should not be considered in equality.
In future, we can support unit objects that are passed as the first positional argument. This makes them part of the core and therefore part of the equality comparison. Unit equality will not be a trivial comparison of the declared unit either: two dimension with rich unit objects are equal if they are in the same space which means the units must be able to directly convert to each other.

For instance, a dimension expressed in Kelvin is equal to a dimension expressed in Celsius (given the name and labels are also equal) as both express temperature. A dimension expressed in meters isn't equal to one expressed in Celsius as these are different types of unit. The conversion methods of the rich unit objects of the dimensions will be used to determine whether the conversions are valid.

Rules defining Dimension parameters

In short, the first argument to Dimension defines the core metadata used in equality, everything else is to be considered preference parameters.

Unpacking the implications of this rule:

All state on a dimension instance is expressed by its parameters.
Core metadata parameters are set through the first (and only) positional argument. This means hv.Dimension('b', label='description of the b dimension') will be disallowed (the tuple format must be used if you want to set the label parameter). Although this restriction could potentially be relaxed if it causes problems, I would like to avoid making label an exception to this rule.
You have a choice in how rich you want the dimension semantics to be based on what you choose to pass in as the first argument (string, name-label tuple, potential future unit object).
The unit parameter is currently set as a kwarg but one day it should be changed to only allow None or a rich unit object which will be passed as the first argument. This unit object will require the name and optionally the label, allowing all core parameters to be specified.
Everything else is considered non-semantic metadata. This is information which isn't about your data but about how you choose to display your data with HoloViews for a given visualization.

I'll now be working on a PR (or two!) to implement the following:

The updated equality semantics
Renaming the first argument from name to something else that makes it clear that it can represent the name or the name and label or some other core data . I can't decide between spec and core right now.
Remove the deprecated 'initial' option from the values parameter.
Declaring label as a parameter
Disallowing label from being passed as a kwarg.
Fixing the __call__ method, making it work with (name, label) tuples

I realize this comment has grown quite long but it is worth laying things out clearly and explicitly so we can think through the consequences of this plan. This issue has generated a lot of discussion over a number of years and I think we can finally put it to rest!

philippjfr · 2017-04-05T16:34:02Z

Good summary and the to-do list looks correct to me, might just want to add:

Ensure that redim still works correctly after these changes

jbednar · 2017-04-05T16:49:37Z

Looks good; glad to see this heading toward resolution.

In future, we can support unit objects that are passed as the first positional argument.

By adding units as a third item in a tuple? Seems a bit clunky, but does seem like it forces the user to know that this is part of the definition of that dimension.

Seems like it ought to be possible for users to add units later, but I'm not certain what I mean by "later". People definitely add units later in wallclock time (after an initial implementation), but maybe they can be considered to add them at the logical start, not logically after initial creation; not sure.

This means hv.Dimension('b', label='description of the b dimension') will be disallowed

I'm not sure I agree with that; hv.Dimension('b', label='description of the b dimension') is much more explicit about which is which than hv.Dimension(('b','description of the b dimension')). I do see what you mean about the positional argument, but it does have a downside here (as the meaning of a tuples is highly non-obvious, if not a named tuple).

I can't decide between spec and core right now.

I vote for spec.

jlstevens · 2017-04-05T18:54:00Z

After further discussion with @jbednar I think we finally have something all three of us are happy with!

Although this restriction could potentially be relaxed if it causes problems, I would like to avoid making label an exception to this rule.

Supporting label as a kwarg turned out to be the main sticking point. As I still don't want to make special exceptions regarding what is core and what isn't, we need a new rule:

New rule

Core data is defined by the tuple format supported in the constructor. This is the same tuple format you can use to avoid having to build a dimension object yourself. For instance:

hv.Image(..., vdims=[('z', 'Z-score')])

Whatever parameters can be set in this type of tuple format must be core semantic information - by design, these tuples are designed to specify only the most important information to avoid building a dimension object explicitly.

You'll be able to go from a dimension object to its tuple format via a spec property. This will be the same name as the first argument to the Dimension constructor. The spec property will always return the most descriptive tuple possible.
Dimension equality will only be allowed to access the spec property and not any other parameters.

Advantages

This means future unit support would require 3-tuple support:
```
hv.Image(..., vdims=[('z', 'Z-score', unit('m')) ])
```
The advantage is that is that the unit object could be supplied by some 3-party unit library instead of requiring a special holoviews object that needs a name and optional label (parameters that aren't really about units).
label would once again be allowed as a keyword. This ambiguous case would warn and use the label as specified by the keyword:

Dimension(('z', 'Z-score'), label='Something else')

Consequences

Until unit support is implemented, conceptually dimension equality is straightforward tuple equality between the two dimension spec properties.
Unfortunately, it isn't quite true due to name sanitization. In the end we decided to keep the current sanitization support for 1.7 but drop it in 2.0. The recommendation after 2.0 will be to use attribute friendly names on your data source or simply apply redim to get attribute friendly names.

Using label will be encouraged as the way of assigning the descriptive, non-attribute friendly name. People ignoring this advice and who don't redim can still use the horrible kwarg trick in select if absolutely necessary.
Even once this is implemented, equality between dimensions won't be strict tuple equality when unit support is implemented as we will need to compare the unit types. Equality between dimensions and strings will still be supported (as described in the earlier comment) i.e matching either the name or the label.
The following spec formats are currently supported: 'z', ('z', 'Z-score'). With future unit support, something like the two additional two formats seem reasonable: ('z', unit('m')), ('z', 'Z-score', unit('m')). This is an exhaustive list of spec formats we can foresee right now.

I'm hoping that this is the final proposal that is satisfactory to @jlstevens, @philippjfr and @jbednar!

Edit:

One more TODO item:

Document the name/label system properly in the notebook tutorials.

jbednar · 2017-04-05T19:24:17Z

Sounds perfect; thanks for writing that up.

jlstevens · 2017-04-08T01:26:08Z

I think this issue has finally been addressed in the PR referenced above (merged).

Glad to finally close this issue!

philippjfr · 2020-02-19T20:15:57Z

Reopening, I strongly disagree with the decisions we made here. I'd prefer not to open this can of worms again but here we are, should at least discuss it for 2.0.

jlstevens mentioned this issue Sep 16, 2016

Dynamic Map step size #742

Closed

jlstevens mentioned this issue Oct 10, 2016

Plots missing in Overlay of DynamicMap #914

Closed

philippjfr mentioned this issue Jan 30, 2017

Add consistent support for categorical axes in bokeh #1089

Merged

6 tasks

jlstevens mentioned this issue Jan 30, 2017

Add Dimension.label replacing global alias state #1083

Merged

philippjfr added this to the v1.7.0 milestone Mar 15, 2017

philippjfr added the tag: API label Mar 15, 2017

philippjfr assigned jlstevens Mar 15, 2017

jlstevens mentioned this issue Apr 6, 2017

Tightened semantics of Dimension objects #1245

Merged

11 tasks

jlstevens closed this as completed Apr 8, 2017

philippjfr modified the milestones: v1.7.0, v2.0 Feb 19, 2020

philippjfr reopened this Feb 19, 2020

philippjfr mentioned this issue Feb 19, 2020

Ensure get_dimension does not match Dimensions with mismatching spec #4233

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Dimension preferences #843

Dimension preferences #843

jlstevens commented Sep 1, 2016 •

edited

Loading

jbednar commented Sep 1, 2016

jlstevens commented Sep 1, 2016

jbednar commented Sep 1, 2016

jlstevens commented Sep 1, 2016 •

edited

Loading

jlstevens commented Sep 1, 2016 •

edited

Loading

philippjfr commented Sep 1, 2016 •

edited

Loading

jlstevens commented Sep 1, 2016 •

edited

Loading

philippjfr commented Oct 10, 2016 •

edited

Loading

jlstevens commented Oct 10, 2016

philippjfr commented Mar 15, 2017

jlstevens commented Apr 5, 2017 •

edited by jbednar

Loading

philippjfr commented Apr 5, 2017

jbednar commented Apr 5, 2017

jlstevens commented Apr 5, 2017 •

edited

Loading

jbednar commented Apr 5, 2017

jlstevens commented Apr 8, 2017

philippjfr commented Feb 19, 2020

Dimension preferences #843

Dimension preferences #843

Comments

jlstevens commented Sep 1, 2016 • edited Loading

jbednar commented Sep 1, 2016

jlstevens commented Sep 1, 2016

jbednar commented Sep 1, 2016

jlstevens commented Sep 1, 2016 • edited Loading

jlstevens commented Sep 1, 2016 • edited Loading

philippjfr commented Sep 1, 2016 • edited Loading

jlstevens commented Sep 1, 2016 • edited Loading

philippjfr commented Oct 10, 2016 • edited Loading

jlstevens commented Oct 10, 2016

philippjfr commented Mar 15, 2017

jlstevens commented Apr 5, 2017 • edited by jbednar Loading

Units

Rules defining Dimension parameters

philippjfr commented Apr 5, 2017

jbednar commented Apr 5, 2017

jlstevens commented Apr 5, 2017 • edited Loading

New rule

Advantages

Consequences

jbednar commented Apr 5, 2017

jlstevens commented Apr 8, 2017

philippjfr commented Feb 19, 2020

jlstevens commented Sep 1, 2016 •

edited

Loading

jlstevens commented Sep 1, 2016 •

edited

Loading

jlstevens commented Sep 1, 2016 •

edited

Loading

philippjfr commented Sep 1, 2016 •

edited

Loading

jlstevens commented Sep 1, 2016 •

edited

Loading

philippjfr commented Oct 10, 2016 •

edited

Loading

jlstevens commented Apr 5, 2017 •

edited by jbednar

Loading

jlstevens commented Apr 5, 2017 •

edited

Loading