Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dimension preferences #843

Open
jlstevens opened this issue Sep 1, 2016 · 17 comments
Open

Dimension preferences #843

jlstevens opened this issue Sep 1, 2016 · 17 comments
Assignees
Labels
Milestone

Comments

@jlstevens
Copy link
Contributor

jlstevens commented Sep 1, 2016

Since the very beginning of the project, there has been some tension as to what Dimension objects are for. There have been two major pressures:

  • Keeping dimension objects as lightweight as possible, used to describe key features about the data.
  • Using dimension objects to implement features and improve usability. For instance, we've wanted a way to set a default position for sliders and Dimension is the obvious place to put that. Other things related to usability are soft_range and the formatters.

This situation has made things difficult as Dimension is a core component of HoloViews that we want to keep as simple as possible on one hand while also wanting to support new features on the other. Things are further complicated by questions as to whether two dimension objects are 'equal' and therefore interchangeable.

After discussing this problem with @philippjfr and @jbednar, I think we've found a good compromise based on this insight: some of what Dimension contains is indeed core metadata about the data and the rest of the options are aimed at usability that are issues of preference (which can be subjective).

The core metadata should be a small list of things that define the most important information about the dimension and the preferences can be a much larger, open-ended set of options aimed at usability. For this reason, we should group these two types of metadata separately.

  • Core metadata: name, unit, range (often constrained by the unit definition, e.g (0, 360) for angle in degrees and (0, None) for a temperature in Kelvin).
  • Preferences: soft_range (the typical useful range as determined by the domain or user preference), default_key and the various formatters.

Based on this, the proposal is that Dimension has only the core metadata as its own parameters as well as a preference parameter that holds an object containing all the preferences. This makes it clear as to what the most important information is regarding a dimension while allowing flexibility in terms of the various preferences.

We still need to discuss the exact form this preference object takes and we should think of finding a syntactically convenient way of defining them.

@jbednar
Copy link
Member

jbednar commented Sep 1, 2016

Sounds good! Deciding what to do with existing preferences like soft_range is also important, as a a balance between compatibility and clean semantics.

@jlstevens
Copy link
Contributor Author

Here is one syntax for this I think might be quite nice:

K = hv.Dimension('Kelvin', range=(0,None)).prefs(soft_range=(200,300), default=220)

@jbednar
Copy link
Member

jbednar commented Sep 1, 2016

Would the prefs object accept any Parameterized, or only the one sample one that we define (with soft_range, default, etc. already enumerated as Parameters)?

@jlstevens
Copy link
Contributor Author

jlstevens commented Sep 1, 2016

Of course, we also need to be careful around the issue of compatibility. Here is what we could do:

  • Issue deprecation warnings when things like soft_range are passed in through the dimension constructor, suggesting the use of the pref method while making it easy to disable such warnings.
  • Eventually remove these parameters (probably between one major release and the next).

I suppose we would need a bit of compatibility for pickling once these parameters are moved to live on the preferences object.

@jlstevens
Copy link
Contributor Author

jlstevens commented Sep 1, 2016

@jbednar Good question!

Maybe we could offer a way of using a preferences object we can understand (with things we define such as soft_range etc) but support storing other parameters corresponding to metadata that people may want to store on the dimension as preferences but that we won't then act upon.

@philippjfr
Copy link
Member

philippjfr commented Sep 1, 2016

Thanks for the summary. Several things to note, first of I'm wondering about the other parameters, like type and value. The type isn't currently used very consistently but values are used to define DynamicMaps with concrete sampling.

Preferences: soft_range (the typical useful range as determined by the domain or user preference), default_key and the various formatters.

K = hv.Dimension('Kelvin', range=(0,None)).prefs(soft_range=(200,300), default=220)

This does not match the current semantics of range and soft_range. range defines an absolute range to use while soft_range only defines soft limits which can be overridden by the data. In your example the plotting backends would normalize in the range 0 to None.

Maybe we could offer a way of using a preferences object we can understand (with things we define such as soft_range etc) but support storing other parameters corresponding to metadata that people may want to store on the dimension as preferences but that we won't then act upon.

Yes, storing other metadata on them makes sense, although perhaps calling it preferences isn't quite correct then.

@jlstevens
Copy link
Contributor Author

jlstevens commented Sep 1, 2016

This does not match the current semantics of range and soft_range. range defines an absolute range to use while soft_range only defines soft limits which can be overridden by the data.

Thanks for clarifying!

The key thing is that we agree that range is absolute and soft_range isn't.

I suspect value might be core as you can have categorial dimensions that are only defined for a number of discrete values. I think type (to the extent we use it) is also core.

@philippjfr
Copy link
Member

philippjfr commented Oct 10, 2016

I'll have to clarify again, when I said range is absolute what I meant is that it overrides everything else, including the data or soft_range. What that means is that it cannot be used as a definition of the actual physical quantity (e.g. Dimension('Temperature', unit='K', range=(0, inf)) but is simply a convenient way for the user to specify which part of the dimension range they are interested in. In other words the range is just as much a choice as the other parameters since the user has to set it deliberately to control the axis ranges, color normalization, or which parts of the parameter space can be explored with a DynamicMap.

This means range is not describing some intrinsic and absolute property of a physical quantity (i.e. what you call core metadata) but rather provides convenient metadata about how to display the Dimensioned object and therefore falls into the list of preferences. The only core metadata on our Dimensions is the name and the unit. Defining the actual range of values a physical quantity allows is something that's best left up to a units package which we could eventually integrate with HoloViews.

@jlstevens
Copy link
Contributor Author

Thanks for the clarification. To update the bullet point lists above:

  • Core metadata: name, unit, type
  • Preferences: range, values, soft_range, default_key and the various formatters.

I agree we are very flexible with how range and values are used so this new suggestion makes sense. That said, isn't value overloaded to also specify the categories for categorical dimensions?

@philippjfr
Copy link
Member

As part of 1.7 we will add the preferences to Dimension, and as part of 2.0 we will move existing parameters to the preferences.

@jlstevens
Copy link
Contributor Author

jlstevens commented Apr 5, 2017

After starting a PR and discussing this issue in detail with Philipp (again!) we came up with a different approach. This is a proposal we are both happy and it requires fewer changes while doing away with an explicit concept of dimension preferences:

  • Dimensions will now have clearly defined equality semantics i.e what dimensions are which makes me happy.
  • The way dimensions are used is largely unchanged and we can feel free to add parameters to Dimension as necessary, minimizing disruption.

Instead of introducing preferences, the idea is that dimension objects are almost entirely preferences which means almost everything in a dimension can be ignored when comparing them. The key new insight is that what defines a dimension can't be some optional argument, or in other words, 'core' metadata must be whatever is passed as the first (mandatory) positional argument.

Currently there are two things you can put into this mandatory position, a string (the name) or a tuple (the name and label):

a = hv.Dimension('a')
b = hv.Dimension(('b', 'description of the b dimension'))

This means the current rules for equality (defining the semantics) should be:

  • Two dimension objects are equal if and only if their names match AND their labels match.
  • A dimension matches a string for equality if either the name or label matches. (This definition is slightly loose but it is used all over the code. Names and labels are expected to be very different lengths by definition so there should be little confusion in practice.)

Matching on both name and label should always work between two dimension objects as the label will match the name by default (i.e if it isn't explicitly set via the tuple).

Units

We have long wondered whether units are part of the semantics of a dimension or not. I believe we have finally figured this out:

  • Currently our 'unit' parameter is just an optional string used to display the dimension (e.g on an axis). It is not a rich unit object supporting unit conversions so it should not be considered core metadata and should not be considered in equality.

  • In future, we can support unit objects that are passed as the first positional argument. This makes them part of the core and therefore part of the equality comparison. Unit equality will not be a trivial comparison of the declared unit either: two dimension with rich unit objects are equal if they are in the same space which means the units must be able to directly convert to each other.

For instance, a dimension expressed in Kelvin is equal to a dimension expressed in Celsius (given the name and labels are also equal) as both express temperature. A dimension expressed in meters isn't equal to one expressed in Celsius as these are different types of unit. The conversion methods of the rich unit objects of the dimensions will be used to determine whether the conversions are valid.

Rules defining Dimension parameters

In short, the first argument to Dimension defines the core metadata used in equality, everything else is to be considered preference parameters.

Unpacking the implications of this rule:

  • All state on a dimension instance is expressed by its parameters.
  • Core metadata parameters are set through the first (and only) positional argument. This means hv.Dimension('b', label='description of the b dimension') will be disallowed (the tuple format must be used if you want to set the label parameter). Although this restriction could potentially be relaxed if it causes problems, I would like to avoid making label an exception to this rule.
  • You have a choice in how rich you want the dimension semantics to be based on what you choose to pass in as the first argument (string, name-label tuple, potential future unit object).
  • The unit parameter is currently set as a kwarg but one day it should be changed to only allow None or a rich unit object which will be passed as the first argument. This unit object will require the name and optionally the label, allowing all core parameters to be specified.
  • Everything else is considered non-semantic metadata. This is information which isn't about your data but about how you choose to display your data with HoloViews for a given visualization.

I'll now be working on a PR (or two!) to implement the following:

  • The updated equality semantics
  • Renaming the first argument from name to something else that makes it clear that it can represent the name or the name and label or some other core data . I can't decide between spec and core right now.
  • Remove the deprecated 'initial' option from the values parameter.
  • Declaring label as a parameter
  • Disallowing label from being passed as a kwarg.
  • Fixing the __call__ method, making it work with (name, label) tuples

I realize this comment has grown quite long but it is worth laying things out clearly and explicitly so we can think through the consequences of this plan. This issue has generated a lot of discussion over a number of years and I think we can finally put it to rest!

@philippjfr
Copy link
Member

Good summary and the to-do list looks correct to me, might just want to add:

  • Ensure that redim still works correctly after these changes

@jbednar
Copy link
Member

jbednar commented Apr 5, 2017

Looks good; glad to see this heading toward resolution.

In future, we can support unit objects that are passed as the first positional argument.

By adding units as a third item in a tuple? Seems a bit clunky, but does seem like it forces the user to know that this is part of the definition of that dimension.

Seems like it ought to be possible for users to add units later, but I'm not certain what I mean by "later". People definitely add units later in wallclock time (after an initial implementation), but maybe they can be considered to add them at the logical start, not logically after initial creation; not sure.

This means hv.Dimension('b', label='description of the b dimension') will be disallowed

I'm not sure I agree with that; hv.Dimension('b', label='description of the b dimension') is much more explicit about which is which than hv.Dimension(('b','description of the b dimension')). I do see what you mean about the positional argument, but it does have a downside here (as the meaning of a tuples is highly non-obvious, if not a named tuple).

I can't decide between spec and core right now.

I vote for spec.

@jlstevens
Copy link
Contributor Author

jlstevens commented Apr 5, 2017

After further discussion with @jbednar I think we finally have something all three of us are happy with!

Although this restriction could potentially be relaxed if it causes problems, I would like to avoid making label an exception to this rule.

Supporting label as a kwarg turned out to be the main sticking point. As I still don't want to make special exceptions regarding what is core and what isn't, we need a new rule:

New rule

  • Core data is defined by the tuple format supported in the constructor. This is the same tuple format you can use to avoid having to build a dimension object yourself. For instance:
hv.Image(..., vdims=[('z', 'Z-score')])

Whatever parameters can be set in this type of tuple format must be core semantic information - by design, these tuples are designed to specify only the most important information to avoid building a dimension object explicitly.

  • You'll be able to go from a dimension object to its tuple format via a spec property. This will be the same name as the first argument to the Dimension constructor. The spec property will always return the most descriptive tuple possible.

  • Dimension equality will only be allowed to access the spec property and not any other parameters.

Advantages

  • This means future unit support would require 3-tuple support:

    hv.Image(..., vdims=[('z', 'Z-score', unit('m')) ])

    The advantage is that is that the unit object could be supplied by some 3-party unit library instead of requiring a special holoviews object that needs a name and optional label (parameters that aren't really about units).

  • label would once again be allowed as a keyword. This ambiguous case would warn and use the label as specified by the keyword:

Dimension(('z', 'Z-score'), label='Something else')

Consequences

  • Until unit support is implemented, conceptually dimension equality is straightforward tuple equality between the two dimension spec properties.

  • Unfortunately, it isn't quite true due to name sanitization. In the end we decided to keep the current sanitization support for 1.7 but drop it in 2.0. The recommendation after 2.0 will be to use attribute friendly names on your data source or simply apply redim to get attribute friendly names.

    Using label will be encouraged as the way of assigning the descriptive, non-attribute friendly name. People ignoring this advice and who don't redim can still use the horrible kwarg trick in select if absolutely necessary.

  • Even once this is implemented, equality between dimensions won't be strict tuple equality when unit support is implemented as we will need to compare the unit types. Equality between dimensions and strings will still be supported (as described in the earlier comment) i.e matching either the name or the label.

  • The following spec formats are currently supported: 'z', ('z', 'Z-score'). With future unit support, something like the two additional two formats seem reasonable: ('z', unit('m')), ('z', 'Z-score', unit('m')). This is an exhaustive list of spec formats we can foresee right now.

I'm hoping that this is the final proposal that is satisfactory to @jlstevens, @philippjfr and @jbednar!

Edit:

One more TODO item:

  • Document the name/label system properly in the notebook tutorials.

@jbednar
Copy link
Member

jbednar commented Apr 5, 2017

Sounds perfect; thanks for writing that up.

@jlstevens
Copy link
Contributor Author

I think this issue has finally been addressed in the PR referenced above (merged).

Glad to finally close this issue!

@philippjfr philippjfr modified the milestones: v1.7.0, v2.0 Feb 19, 2020
@philippjfr
Copy link
Member

Reopening, I strongly disagree with the decisions we made here. I'd prefer not to open this can of worms again but here we are, should at least discuss it for 2.0.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants