# Parallel Categories (parcats) trace type for multi dimensional categorical data #2963

Merged
merged 51 commits into from Oct 1, 2018

## Conversation

Projects
None yet
4 participants
Collaborator

### jonmmease commented Sep 1, 2018 • edited

Continuation of jonmmease#1, now against master in the plotly.js repo.

I believe I addressed all of the outstanding code comments brought up in the old PR. Below is the original post, and my last few comments on the implementation.

## Introduction

This PR is a proposal and an implementation of a new trace type for the interactive exploration of multi-dimensional categorical data sets. My working name for the trace is "Parallel Categories" or parcats for short.

The concept of this trace has been discussed previously in the following plotly.js issues:

I also briefly showed a prototype of this diagram to @chriddyp over screenshare several months ago.

## Related work

The closest prior art to the Parallel Categories Diagram is the Parallel Sets Diagram by Robert Kosara and Caroline Ziemkiewicz.

### Parallel Sets implementations / descriptions

Here are a collection of existing implementations / descriptions of the Parallel Sets Diagram

#### Parallel Sets Java Program

https://eagereyes.org/parallel-sets

This is a stand-alone Java program by Kosara that implements a Parallel Sets Diagram

#### Parallel Sets from the DataViz catalog

https://datavizcatalogue.com/methods/parallel_sets.html

#### D3 implementation of Parallel Sets

https://www.jasondavies.com/parallel-sets/

## What's different about the Parallel Categories Diagram?

The primary difference between this Parallel Categories Diagram (parcats from here on) and the Parallel Sets Diagram (parsets from each on) is that the parcats diagram supports a more flexible path coloring scheme.

In all of the examples of parsets diagrams that I have found, the colors of the paths correspond to states in the left-most (or top-most) dimension. In contrast, for the parcats diagram, color may correspond to a column in the dataset that may or may not be present as a dimensions in the diagram.

This, admittedly modest, extension has several advantages. Path colors may be set using a numeric array and a color map just like many other plotly.js trace types (scatter, parcoords, etc.). This makes it possible to use the parcats diagram combined with other traces in brushing/crossfiltering configurations.

## Dragging and Brushing example

Here is an example of visualizing a 5-dimensional data set with two continuous dimensions and 3 categorical dimensions. This is accomplished by displaying the two continuous dimensions in a 2D scatter plot and the 3 categorical dimensions using the parcats diagram.

I created this example using a branch of plotly.py version 3 built against this branch of plotly.js.

First I show the drag interactions supported by the diagram. Categories (the rectangles) and dimensions labels (dimensions are the columns of rectangles) can be dragged to reorder categories and dimensions. Upon release, the diagram animates to a relaxed state with equal spacing between dimensions and categories.

Selection events in the scatter plot are use to update the colors of both the selected points in the scatter plot, and the corresponding paths in the parcats diagram. Similarly, click events on categories and paths in the parcats diagram are used to update the colors in both diagrams.

As far as I'm aware, this is the only visualization of multi-dimensional categorical data that supports this kind of two-way data brushing. And, combined with plotly.py version 3, it is certainly the only visualization of this type that would be easily accessible to Python users.

## Color bundling

There are two modes for how the colors of paths are arranged.

In the example above, color is not considered when sorting the paths. This is desirable in a brushing scenario so that the paths remain stable as the colors change during interactions. This behavior is specified by setting the bundlecolors property to false.

Setting the bundlecolors property to true causes paths with like colors to be bundled together as they pass through each category. This results in a cleaner looking diagram and is preferable in cases where the positions of paths do not need to remain stable as colors change.

For example:

## Mocks

Several simple mocks have been added as a part of the current test suite.

parcats_basic

parcats_bundled

parcats_unbundled

## API notes

I tried to model the API as closely as possible after existing trace conventions. There is a top-level dimensions property with label and values sub-properties just as with the parcats trace. Path colors/colorscales are specified under a dimension.marker parent property.

## Alternative approach

In the issues cited at the beginning of this PR there was some discussion on the possibility of adding categorical support to the existing Parallel Coordinates Diagram. This diagram was already well under development for our internal needs at the time of these discussions, so I did not pursue this approach.

## TODO

Some items that I know still need to be done

• Font styling support
• Complete attribute descriptions
• Complete the test suite. In terms of my personal testing standards I'd estimate that the test suite is about 50% complete.
• Examples!

So the top-level question for the plotly.js team is, are you all interested in having this diagram be part of plotly.js? It's not the most common use-case, but I think it would be another differentiating feature for the plotly ecosystem.

If you all are interested, I have internal funding to put a bunch more time into this through September. And if we can get it merged in during that time, I can continue helping out with basic maintenance after that.

Let me know what you think!

I just added a mock that demonstrates the color hovermode (parcats_hovermode_color).

The basic idea here is that when you hover on a category, only the paths of a single color are highlighted. What's really useful about this is that the tooltip can then display the absolute probability of the paths of that color that pass through a given category. You can also display the conditional probabilities (Probability of blue given category A, probability of category A given blue).

Does hovermode of being an enumeration of none, category, or color seem like a reasonable way to specify this?

@alexcjohnson
Regarding dragging, I went ahead and added a sankey style arrangement property to control the dragging behavior. There are three modes (names taken from sankey.arrangement)

• perpendicular (now the default): categories only drag vertically,
dimension labels drag horizontally.
• freeform: category labels can drag vertically and horizontally
(in which case they pull the dimension along with them). Here
dragging a category can reorder the categories and dimensions.
• fixed: dragging of dimensions and categories is disabled.

@alexcjohnson
I took a look back through the multi-label hover logic, and added a new 'dimension' hover mode to show it off. This hover mode will display a label for each category in the current dimension (See parcats_hovermode_dimension mock).

If you notice the hover label on B, you can see how the label is pushed downward to keep it from overlapping with the label for C. It might be nice in some cases to also push labels upward to avoid collisions, but at the moment this only pushes things down.

Now that there are multiple hovermodes and configurable hoverinfo I do like this mode as an option.

Font support added for dimension labels (labelfont) and category labels (categorylabelfont). labelfont matches the corresponding property name in parcoords.

### Jon M. Mease and others added some commits Nov 27, 2017

Remove customHovers, replace with loneHover
This was a relic of an older attempt to display a tooltip per color for the hover node.
It worked, but was pretty unwieldy.
Renamed parcats.marker -> parcats.line
Renamed shape categories to linear and hspline and made linear the default.
Cleanup supplyDefaults and add visible dimension property
(property isn't wired up properly yet)
Replace tooltip with hoverinfo
More consistent with other traces, and now it's possible to display
only probabilities, only counts, both, none (with hover effects),
or skip (not hover effects).
Added arrangement property that is very similar to the sankey trace
There are three arrangement modes:
- perpendicular (default): categories only drag vertically,
dimension labels drag horizontally.
- freeform: category labels can drag vertically and horizontally
(in which case they pull the dimension along with them). Here
dragging a category can reorder the categories and dimensions.
- fixed: dragging of dimensions and categories is disabled.
WIP towards categoryorder/categoryarray/categorylabels
Not working yet, just a checkpoint
Added labelfont and categorylabelfont top-level attributes
to the control font of dimension labels and category labels respectively
Add counts attribute to parcats_hovermode_dimension mock
Makes the hoverlabel shifting logic more noticeable and shows off
the counts attribute
Refactor dimension and category dragging tests and test arrangements
Now there are tests for 'freeform', 'perpendicular', and 'fixed'
arrangements for dragging the dimension label and category rectangle.

Closed

### alexcjohnson reviewed Sep 28, 2018

src/traces/parcats/calc.js Outdated

### alexcjohnson reviewed Sep 28, 2018

src/traces/parcats/defaults.js Outdated

### alexcjohnson reviewed Sep 28, 2018

src/traces/parcats/parcats.js Outdated
Contributor

### alexcjohnson commented Sep 28, 2018

 @jonmmease this is looking great! Aside from the comments above (all minor and straightforward, I think) the only thing I'd like to see is a mock that puts two parcats traces side-by-side, to verify that this works. Can you just replace two of the existing mocks with one combined mock? Maybe even using layout.grid, looks as though that's plumbed up correctly but would be nice to 🔒 it down!

### Jon M. Mease added some commits Sep 28, 2018

Create parcats_grid_subplots mock with 4 grid-based subplots
This combines the former colorbar and font mocks. And adds a parcats
trace with a Latex category label as well.
Collaborator Author

### jonmmease commented Sep 28, 2018

 Thanks @alexcjohnson! I really appreciate your time on this. In 4117612 I merged the font and colorbar mocks into parcats_grid_subplots, which displays 4 parcats traces in a 2x2 grid using layout.grid subplots. I was pretty happy to see that this just worked 🙂 I also took the opportunity to throw in a parcats trace with pseudo-HTML and MathJax category labels. Let me know if anything else comes to mind!

Open

Contributor

### alexcjohnson commented Sep 28, 2018

 Fantastic grid mock @jonmmease! Good idea to include pseudo-html. And I'm glad that it just worked ™️ 🎉 Having multiple traces also shows off that the default coloring behavior, which I hadn't noticed before, is to pull from the trace color sequence - hence the orange and green coloration when you don't specify a color. Is that really what we want, or would it be better to use 'lightgray' as you had in the old code (that wasn't being used anyway because you had coerced line.color)? parcats traces don't share subplots, which is normally the reason to pull trace colors from a sequence.
Collaborator Author

### jonmmease commented Sep 28, 2018

 That's a good point and I don't have a strong preference. The grey felt like a pretty boring default and I thought our default blue was a nicer starting point. I also liked that the default colors could be specified in a template with the colorway property, but perhaps that's overloading colorway too much. In any case, if you have a preference for a different default I'm happy to make a change 🙂
Member

### nicolaskruchten commented Sep 28, 2018

 Random last-minute thought: is "parcats" a more easily-grokked/commonly-used name than something like "alluvial diagram" ? I would maybe consider renaming this trace :)
Collaborator Author

### jonmmease commented Sep 28, 2018

 From my reading "alluvial diagram" is most often used interchangeably with sankey diagram. For example Wikipedia and datavizproject both define it to be essentially equivalent to our sankey diagram.
Member

 OK.
Contributor

### alexcjohnson commented Sep 28, 2018

 The grey felt like a pretty boring default and I thought our default blue was a nicer starting point. That makes sense, why don't we just use layout.colorway[0] though, so all parcats traces get the same default color. I think that's the last item, then we'll be ready to merge! Looking at the examples you showed in the head of this PR, they include a feature we don't, which is categorical coloring based on the first dimension. Not needed for this PR though - I'll make another new issue.

### alexcjohnson referenced this pull request Sep 28, 2018

Open

#### Color by category within one dimension, for parcats (and parcoords?) #3061

Collaborator Author

### jonmmease commented Sep 28, 2018

 Sounds good. I'll do that tonight. What would be the best way to get at colorway from inside supplyDefaults?
Contributor

### alexcjohnson commented Sep 28, 2018

 What would be the best way to get at colorway from inside supplyDefaults? layout is there already as an arg to handleLineDefaults - you should be able to just grab layout.colorway[0].
Collaborator Author

### jonmmease commented Sep 28, 2018

 Alright, I just pushed the colorway[0] change. Almost there!
Contributor

### alexcjohnson commented Sep 28, 2018

 Beautiful! Yeah, to my eye a consistent blue (when that's the head of the colorway) is definitely better. Dunno if @etpinard wanted to take a last look at this, but from my side it's ready to go! 💃

Member

### etpinard commented Oct 1, 2018

 I apologise if this might have been discussed before, but on and most baselines, why isn't the parcats trace centered?
Collaborator Author

### jonmmease commented Oct 1, 2018

 The domain in most of the mocks is set to "domain": {"x": [0.125, 0.625],"y": [0.25, 0.75]}. I made it a bit off center in x to test that the resulting SVG geometry lines up where it should. This geometry is checked explicitly in a couple of tests in parcats_test.js. Then I stuck with the geometry for the rest of the mocks to have a consistent frame for computing mouse interaction locations.
Member

### etpinard commented Oct 1, 2018 • edited

 "domain": {"x": [0.125, 0.625],"y": [0.25, 0.75]} Wow. I totally missed that when 👁 the mock JSONs. My bad. Let's merge this thing.

### etpinard merged commit cf2fdc4 into plotly:master Oct 1, 2018 6 checks passed

#### 6 checks passed

ci/circleci: build Your tests passed on CircleCI!
Details
ci/circleci: test-image Your tests passed on CircleCI!
Details
ci/circleci: test-image2 Your tests passed on CircleCI!
Details
ci/circleci: test-jasmine Your tests passed on CircleCI!
Details
ci/circleci: test-jasmine2 Your tests passed on CircleCI!
Details
ci/circleci: test-syntax Your tests passed on CircleCI!
Details

Merged

Open