Xkcd-style plot flow diagrams #111

theq629 · 2014-05-14T08:43:09Z

(Paraphrased from emails by @anoopsarkar:) As a new view on the frontend, use the entities from the current search to produce a visualization like the ones at http://csclub.uwaterloo.ca/~n2iskand/?page_id=13, which is based on xkcd #657. The x-axis is like in the current timeline plot but may need more scrolling see detail. The y-axis groups entities (especially persons, but we could also use organizations, etc.) into clusters based on co-occurrence in the same events at any given year, and is sorted by frequency. There is a d3 plugin for Sankey diagrams that should work. We can optionally have the thickness of the flow lines represent frequency, but the initial goal should just be to replicate the Waterloo visualization.

theq629 · 2014-05-14T08:49:17Z

On Sankey diagrams, in my understanding the main purpose is to represent proportional flow, so I suspect that it isn't really the best term here but I'm not sure what the better term would be. Regardless, it looks to me like the d3 plugin will work fine. At some point we might need to work out how to draw nicer line start and end points.

On how to do the processing, the plan is for the backend to produce per-entity timelines indicating which reference points or other event clusters the event is in at each year. The frontend then further processes this (if needed; eg to produce more specific clusters for the plot) and draws the plot. We don't want to send a lot of event data to the frontend for bandwith reasons, but it may be worth sending more than the minimal amount for the diagram so that we have some flexibility to adjust diagrams on the frontend without having to change backend code.

theq629 · 2014-06-18T18:59:31Z

I've put up early work in the xkcdplottimelines branch and live at http://champ.cs.sfu.ca/WikiHistory/latest/whoosh/wikipediahistoryxkcdplottimelines/. Currently it has a very basic interface and I haven't tried to optimize it at all so it's pretty slow. I'm also not trying to fix clusters' positions on the y-axis yet, just letting the d3 sankey plugin puts them where it wants.

The three input areas at the top are starting year, ending year, and entities list. The entities list is comma separated and each item has the format field:value. Mouse over things or otherwise look at the SVG titles to find out what they actually are.

The plot definitely needs some improvements. Unless you like the current look with entity lines spread out so much on the cluster nodes, then I'll probably need to edit the plugin on just do the diagram manually. I already have to ignore the plugin's x-axis placement choices and we'll probably want to do the same on the y-axis, likely messing up its layout.

Additionally, assigning links with branching entity lines needs work. Currently it always looks one time step back when making links, always making a direct link when the entity stays in the same cluster and otherwise adding an link from an arbitrary cluster. This can produce jumpy results (eg for person:Hannibal around 205-210BC), but I'm not sure that a better algorithm is totally straightforward.

anoopsarkar · 2014-06-19T04:45:12Z

Looks good for a first step. Is it possible to make the timeline longer than the window size and scroll left and right to see more of the timeline?

theq629 · 2014-06-19T05:58:25Z

Yes, I can add a scroll bar. And eventually I'd like to do something with a brush select for zooming and limiting the range (like for the regular timeline zoom), but I'd like to wait until the basic plot is more settled first.

theq629 · 2014-06-19T06:15:53Z

Here are some things I would like to clarify:

How to distinguish clusters. If I understand what you said previously in email, you were wanting to fix the position for each cluster on the y-axis, so all the nodes for that cluster are at the same vertical position. Is that right? If we want to have the distinct clusters clear on the diagram, then the only alternative I can think of would be to colour code them like in the sankey plugin example. But I think using colours to distinguish entities as well would be too confusing then, so we'd need to use distinct line styles on the links or just distinguish entities with graph structure.
I think it would be clearer and tidier if all the links for a single entity going into the same node went to the same spot on the node rectangle, not spread out like they are now.
How to choose links. If we say have an entity that is in cluster A at 10 CE and cluster B at 20 CE, then we just need an A-B link for these years. But say it is in clusters A and B at 10 CE and then clusters B and C 20 CE. I think it's clear that there should be a B-B link. I think it's best to keep all parts of the graph for the entity connected, so we also need to connect 10 CE A to something at 20 CE, and there is no need to connect it to more than one 20 CE node. Similarly 20 CE C needs to connect to something at 10 CE. Right now I just chose these links arbitrarily, and that could be improved with a similarity metric for the clusters, presumably geographic distance for the current clusters.

anoopsarkar · 2014-06-19T08:43:29Z

I don't see the different colors for the different person entities. All the mouseovers say "Hannibal" for me.

About each point above:

I think the vertical placement of cluster nodes is less of an issue as long as they are spaced out. The d3 demo of Sankey allows the user to drag these nodes around on the y-axis to get a better view. It's ok if the lines get a bit cluttered since we can use a mouseover on the line to see the entity.
The uwaterloo link up above has a css style that might solve this problem perhaps?
I think the best thing to do here is to merge the clusters until there is only one cluster per entity for each year. So in your example, -----(A,B)@10CE-----(B,C)@20CE------. If there are two entities, say entity X and entity Y. Entity X is in cluster (A,B) in 10CE and Y was in cluster (B,C) in 10CE. And then X is in (D) in 20CE and Y is in (D,E) in 20CE then we would merge them all to get =====(A,B,C)@10CE======(D,E)@20ce====== where === represents the two lines for X and Y. Does that work?

anoopsarkar · 2014-06-19T08:46:48Z

By the way, if we do want to cluster points differently we should consider DBSCAN:

theq629 · 2014-06-19T09:27:06Z

This is how the diagram looks for me: the green (?) lines are person:Hannibal and the blue (?) lines are person:Scipio Africanus.

anoopsarkar · 2014-06-19T09:37:23Z

Ah, I see. I didn't expect Scipio to be only at the end and I was confused by two different lines for the same person initially. I think I said something differently earlier on, but I think one line per entity seems the least confusing. We would have to merge clusters to make this happen.

anoopsarkar · 2014-06-19T09:39:22Z

It would also help to highlight the entire line for each entity on mouseover (as in other Sankey demos).

theq629 · 2014-06-19T09:42:48Z

Yes, I'll change it to highlight the whole line for an entity. The confusingness of branching entity lines is part of why I'd like it to put the endpoints for together on the node. But as you say, this will be much less confusing if there is one node per entity. I think your description for (3) will work, although we may have to see how much the initial clusters overlap in practice.

theq629 · 2014-06-19T09:45:19Z

For the rest, to explain how the sankey plugin works: the plugin does layout to produce positions and sizes for the nodes and end points and sizes for the links. The associated example code then draws rectangles and curves accordingly, and I modified it a bit to get the current plot. The layout part is allocating space for much wider flow-proportional link lines, since it seems to scale them to the available screen space. As far as I can tell there isn't any way to disable that (and it's not just a CSS issue). Similarly I'm having to bump the nodes to the correct x-axis positions after layout since I don't see any way to constrain the layout positions, and that's probably part of why the layout is poor.

So I think probably we should either give up on using the sankey plugin or heavily modify it. If we do want to fix cluster y-positions across time then I think implementing custom layout won't be hard. Otherwise we can probably use one of the more general d3 graph layout implementations.

anoopsarkar · 2014-06-19T09:50:36Z

Have a look at what they say at the bottom of http://csclub.uwaterloo.ca/~n2iskand/?page_id=13

Within the y-range of a cluster, each character that appears in a scene whose median cluster is that cluster gets a unique y-position. Now, there are several ways to go about determining the exact position of the node within the cluster’s y-range. We could average the cluster-positions of the characters appearing in it (i.e. the positions of the characters within that cluster), or apply the heuristic we initially used to determine the y-range on a smaller scale. Both of these ideas resulted in ugly cluttering, which in retrospect should’ve been expected– what’s common to all the scenes placed in cluster x’s range is that they’re all dominated by characters from cluster x. Therefore, when we take into account the within-cluster positions of every character in the scene, they all end up getting placed at approximately the same y-position, causing ugliness. Currently, we’re averaging the positions of all the non-x-cluster characters in the scene if any exist, and of the x-cluster characters otherwise.

anoopsarkar · 2014-06-19T09:51:07Z

They say "heuristic" in the previous link, but I wonder if they fiddled with the placement by hand.

anoopsarkar · 2014-06-19T10:07:26Z

The layout looks a lot nicer when the year constraint -210 to -180 is added. Make me think that scrolling left or right might do the trick when it comes to the crowded feel of the zoomed out view. Although having one line per entity will also help.

theq629 · 2014-06-19T10:19:59Z

If I understand the uwaterloo system correctly, they are first doing a character clustering based on global scene co-occurrences and then placing scene nodes within a y-axis bands for each character cluster. So when they say the "y-range of a cluster", that's referring to grouping of characters that we don't currently have any analog to. Should we be doing a similar DBSCAN step? An alternative in our case (as long as we continue using the geographic clusters as the underlying data) is to try to make the y-axis roughly represent geographic distance, perhaps projecting clusters geo-points onto a most-separating axis and then respacing to look better. I'm not totally sure that we should expect their procedure to work for us since the entity being in multiple clusters at once case must occur a lot more in our data; however that may not matter if we are merging clusters.

Everything is section 3 of the uwaterloo page is done by their javascript (http://csclub.uwaterloo.ca/~n2iskand/comics/narrative/narrative.js), but unfortunately it does not appear to be licensed.

theq629 · 2014-06-19T10:22:47Z

Oh, and additionally I think we probably don't want to be doing much processing for this on the backend unless it can be done globally for the whole data set, so if we do DBSCAN or anything we need to do it on the frontend in javascript. The uwaterloo implementation seems to do that in the python (the input to the javascript is eg http://csclub.uwaterloo.ca/~n2iskand/comics/narrative/luckyluke6_narrative/narrative.json and http://csclub.uwaterloo.ca/~n2iskand/comics/narrative/luckyluke6_narrative/characters.xml).

anoopsarkar · 2014-06-19T10:45:16Z

I don't understand why they have a character cluster if they also have a spatial localization in a frame of the comic. Perhaps they are clustering frames of the comic which in our case would be to group together our spatial clusters. So our plan seems to be fairly reasonable.

I agree we should do the clustering before we launch the backend (if needed, looks like we can reuse our existing clusters).

theq629 · 2014-06-19T11:03:09Z

If I'm understanding correctly, then they are making global character clusters based on co-occurrence across all scenes, and then when they draw the plot they first assign these character clusters positions on the y-axis (trying to put large clusters far apart). Then they produce a Sankey node for each scene (corresponding to our geographic-cluster@year nodes) and position each scene by the y-position for character cluster that's best represented in the scene. Then finally they add lines for each character, ordering the line positions within each scene node according to some procedure I'm not understanding.

So if that's right then they aren't directly clustering frames or scenes (I assume scenes are sequences of frames determined somehow), but scenes are assigned to the best matching character cluster, producing a sort of clustering of scenes.

In our case, we could definitely do some comparable fixed co-occurrence based clustering before starting the backend. However, it's possible that doing it on the frontend based on what entities are selected for a particular plot would give more relevant results.

Also note that while they do have localization by frames or scenes, it's not exactly spatial in the same way that our geographic clusters are in that it doesn't give any distance metric unless they've added that by hand.

anoopsarkar · 2014-06-19T14:45:10Z

The greedy grouping our existing geographical clusters might produce the same effect for us. So perhaps we can try that first. We would still need to do placement but we can use their heuristic for this?

anoopsarkar · 2014-06-19T14:46:16Z

The geographical clustering could give us the character groups we need? Can you work out an example to see if this works?

theq629 · 2014-06-19T14:52:39Z

How does the geographical clustering give us character groups? (I mean, I think what you already said about merging clusters will probably work fine to make a working plot, I'm just not sure if it's the same as having global character/entity groups.)

anoopsarkar · 2014-06-19T15:01:55Z

A global character group for them seems to be simply to select where they start on the left hand side. They enter and leave different groups on the y-axis anyway (e.g. Ma Dalton is in group 0 with 2/3 others, but the line curves around to enter and leave different groups/clusters in the plot).

I think we will get big clumps as we keep merging our geoclusters to form groups at each timeline point. Allowing the user to rearrange the nodes might be sufficient to make things less cluttered (or at least takes us off the hook).

theq629 · 2014-06-19T15:49:13Z

Ok, in that case I'll try merging geoclusters first.

anoopsarkar · 2014-06-28T15:19:03Z

Are we stuck on this issue?

theq629 · 2014-06-28T15:36:10Z

Yes, I haven't been feeling very well so I still haven't got the new clusters working yet.

anoopsarkar · 2014-06-28T17:34:52Z

Oh, OK. Feel better. I wanted to know if it was a conceptual issue.

theq629 · 2014-07-02T14:19:57Z

Ok, cluster merging is in. I also made the backend handling for it a fair bit faster, and made the frontend highlight the whole entity line on mouseover, along with changing the node style slightly to make that more visible.

I haven't played too much with it yet, but you can try eg "person:Hannibal, person:Scipio Africanus, person: Antiochus III the Great, person:Philip V of Macedon" for a more complex plot than before and "person:Wang Mang, person:Julius Caesar" to see what happens with totally non-overlapping geographic clusters.

With cluster merging I think the main issue is that entity lines enter and leave the cluster nodes at totally different places. Again I don't think that is changable with the Sankey plugin, so if you think the merging looks ok then I think the next step could be to work on a new drawing method.

anoopsarkar · 2014-12-24T23:56:22Z

Hmm. That comes across as incredibly confusing. I know we had discussed reference points versus locations as clusters. I don't recall locations being that bad visually. Do you think it is better to switch to locations as the nodes in the storyline view to avoid this outcome?

theq629 · 2014-12-25T00:31:31Z

I get exactly the same result with location-based clusters. I'm not sure it is something we can really avoid. For example, the three entities could easily appear in totally separate events at the same year and same reference point / location, and then they have to go in the same storyline cluster.

theq629 · 2014-12-30T06:08:50Z

For redrawing after a selection, I assume the issue here is loosing the zoom rather than the redraw itself. I've changed it now so that it restores the zoom like the timeline does.

For the confusing entity selections, I can't think of any way to make it less confusing within the current general method for making storyline clusters. So I say that we either leave it for now, or disable entity line selections for now if it's too confusing.

anoopsarkar · 2014-12-30T10:19:12Z

let's leave it for now and push to master.
On Dec 29, 2014 10:08 PM, "Max Whitney" notifications@github.com wrote:

For redrawing after a selection, I assume the issue here is loosing the
zoom rather than the redraw itself. I've changed it now so that it restores
the zoom like the timeline does.

For the confusing entity selections, I can't think of any way to make it
less confusing within the current general method for making storyline
clusters. So I say that we either leave it for now, or disable entity line
selections for now if it's too confusing.

—
Reply to this email directly or view it on GitHub
#111 (comment)
.

theq629 · 2014-12-30T11:39:15Z

Ok, I went ahead and pushed to master! Make sure to restart the backend for the main site since there are backend changes.

anoopsarkar · 2015-01-01T00:25:52Z

xkcd storyline is live for wikipedia now!

KonceptGeek · 2015-01-01T00:32:08Z

I think it would be a good idea to move the Storyline tab in index.html after Facets.

anoopsarkar · 2015-01-01T00:34:39Z

I agree but let us wait until the merge fest is complete.

anoopsarkar · 2015-01-01T09:16:38Z

There seems to be a corner case bug:

select from Timeline 1499 BCE - 1314 BCE (does not have to be exact, just don't select until 1000 BCE).
Select "Hatshepsut" from the Person facet.
Look at the Storyline tab. It is empty. Selecting "Hatshepsut" without the timeline constraint shows two independent lines. It might be that "Hatshepsut" does not co-occur with any other person.

anoopsarkar · 2015-01-01T09:23:48Z

I'm not sure if this needs a solution but it is one of those strange corner cases.

If I select a line in Storyline and then remove the original Facet constraint (as I should be able to do in faceted browsing) then there is a constraint from the Storyline still active but the Storyline view insists on a selection from the facet list. So there is a constraint active but the Storyline view is empty.

theq629 · 2015-01-02T15:05:18Z

For the Hatshepsut, another likely possibility is that Hatshepsut occurs in only one storyline cluster in that range. Right now that gets filtered out from the visualization since it can't be drawn as a line. I guess I should add a special case that draws it as a dot on top of the node or something like that.

For the selection case, that does seem confusing. Should the storyline view clear its own constraints at the same time it shows the "Make a selection in the facet" text?

anoopsarkar · 2015-01-02T23:44:32Z

a dot sounds good
either the constraint is cleared or once a selection is done in the
storyline view then it proceeds independently and shows the storyline view
consistent with the selection.
On Jan 2, 2015 7:05 AM, "Max Whitney" notifications@github.com wrote:

For the Hatshepsut, another likely possibility is that Hatshepsut occurs
in only one storyline cluster in that range. Right now that gets filtered
out from the visualization since it can't be drawn as a line. I guess I
should add a special case that draws it as a dot on top of the node or
something like that.

For the selection case, that does seem confusing. Should the storyline
view clear its own constraints at the same time it shows the "Make a
selection in the facet" text?

—
Reply to this email directly or view it on GitHub
#111 (comment)
.

anoopsarkar · 2015-01-07T18:14:14Z

I moved the Storyline tab in index.html to be right after the Facets tab

anoopsarkar · 2015-01-08T17:38:39Z

should we close this issue and handle bugs or enhancements with separate issues?

theq629 · 2015-01-09T00:36:40Z

Why don't I finish the two enhancements above, and then we'll close this issue.

anoopsarkar · 2015-01-09T02:23:35Z

ok. sounds good.
On Jan 8, 2015 4:36 PM, "Max Whitney" notifications@github.com wrote:

Why don't I finish the two enhancements above, and then we'll close this
issue.

—
Reply to this email directly or view it on GitHub
#111 (comment)
.

anoopsarkar · 2015-01-10T00:18:25Z

How about simplifying the current back and forth between the Facet view and Storyline view by simply generating a default Storyline for the most frequent element of each facet entry. e.g. currently the Storyline view for the Person facet would be Augustus by default. We can also cache the default view as we do with other tabs.

theq629 · 2015-01-11T05:26:54Z

That sounds good, but I'm also wondering if we need to add some sort of introduction message that explains what's being shown, especially if we are introducing arbitrary limits (for the too many entity lines bug, #133). One way to handle that would be to have a help message that is shown initially, like the text search does. But maybe a similar box that doesn't show up until the user asks for it would be find in this case.

theq629 · 2015-01-14T06:53:51Z

On the default view, isn't it confusing if there is already a storyline generated before a selection is made?

theq629 · 2015-01-14T08:42:14Z

For (2), I've now got it clearing whenever it shows the help text. That also includes when you switch modes (change facets or change to query mode).

For (1), I've added oval dots. Here is the Hatshepsut case:

There is also a single-cluster node visible if you select Hannibal. I'm not sure this is actually the best way to draw then, but at least we can see all entities now. One concern is that the marker has to extend past the node marker to be selectable

anoopsarkar · 2015-01-14T17:56:14Z

Re: a default view for Storyline. Just like other views have a summary view that is loaded by default, the Storyline can have the most frequent entity in a facet. To make things clearer we can have a drop down menu in the Storyline view that can be used to select the entity (this will also avoid having to switch to the Facet view altogether).

So the top of Storyline would look like this:

{Clear Selection} {Person facet}(see 1) {Augustus [439]}(see 2)

[ Default Storyline view would be Person facet and most frequent Person ]

1: can be used to select other enabled facets (as it is done now)
2: the other elements in this drop down menu would be entities from the selected facet in (1) sorted by frequency. the user can select any of them without switching to the facet view.

One potential issue is that the frequencies and entities will change based on other constraints. But I think that information is readily available already to the Storyline view, isn't it?

theq629 · 2015-01-15T00:23:47Z

Ok, I think that would work. Do the selections made in the main facet still change the storyline selection (and what's shown as selected in the storyline menu)?

anoopsarkar · 2015-01-15T00:25:42Z

No, I think we can decouple the Facet view from Storyline view entirely using this approach. That would give us more flexibility in the Facet view (as raised in #138)

theq629 · 2015-01-15T00:33:37Z

In that case we'd basically just move a copy of the facet queries to the storyline. So definitely doable, but since it's a pretty big change I suggest we make a new issue for it. (And close this issue if the other two things look ok to you.)

anoopsarkar · 2015-01-15T03:05:15Z

yes, sounds good. please merge your current changes with master.
On Jan 14, 2015 4:33 PM, "Max Whitney" notifications@github.com wrote:

In that case we'd basically just move a copy of the facet queries to the
storyline. So definitely doable, but since it's a pretty big change I
suggest we make a new issue for it. (And close this issue if the other two
things look ok to you.)

—
Reply to this email directly or view it on GitHub
#111 (comment)
.

theq629 · 2015-01-15T07:03:58Z

Ok, merged. And I assume that means we can finally close this issue!

theq629 added backend labels May 14, 2014

theq629 mentioned this issue Jan 14, 2015

serious bug: backend crashes on large queries #133

Closed

theq629 mentioned this issue Jan 15, 2015

More storyline improvements #140

Merged

theq629 mentioned this issue Jan 15, 2015

Decoupling storyline and facet views #141

Closed

theq629 closed this as completed Jan 15, 2015

Xkcd-style plot flow diagrams #111

Xkcd-style plot flow diagrams #111

Comments

theq629 commented May 14, 2014

theq629 commented May 14, 2014

theq629 commented Jun 18, 2014

anoopsarkar commented Jun 19, 2014

theq629 commented Jun 19, 2014

theq629 commented Jun 19, 2014

anoopsarkar commented Jun 19, 2014

anoopsarkar commented Jun 19, 2014

theq629 commented Jun 19, 2014

anoopsarkar commented Jun 19, 2014

anoopsarkar commented Jun 19, 2014

theq629 commented Jun 19, 2014

theq629 commented Jun 19, 2014

anoopsarkar commented Jun 19, 2014

anoopsarkar commented Jun 19, 2014

anoopsarkar commented Jun 19, 2014

theq629 commented Jun 19, 2014

theq629 commented Jun 19, 2014

anoopsarkar commented Jun 19, 2014

theq629 commented Jun 19, 2014

anoopsarkar commented Jun 19, 2014

anoopsarkar commented Jun 19, 2014

theq629 commented Jun 19, 2014

anoopsarkar commented Jun 19, 2014

theq629 commented Jun 19, 2014

anoopsarkar commented Jun 28, 2014

theq629 commented Jun 28, 2014

anoopsarkar commented Jun 28, 2014

theq629 commented Jul 2, 2014

anoopsarkar commented Dec 24, 2014

theq629 commented Dec 25, 2014

theq629 commented Dec 30, 2014

anoopsarkar commented Dec 30, 2014

theq629 commented Dec 30, 2014

anoopsarkar commented Jan 1, 2015

KonceptGeek commented Jan 1, 2015

anoopsarkar commented Jan 1, 2015

anoopsarkar commented Jan 1, 2015

anoopsarkar commented Jan 1, 2015

theq629 commented Jan 2, 2015

anoopsarkar commented Jan 2, 2015

anoopsarkar commented Jan 7, 2015

anoopsarkar commented Jan 8, 2015

theq629 commented Jan 9, 2015

anoopsarkar commented Jan 9, 2015

anoopsarkar commented Jan 10, 2015

theq629 commented Jan 11, 2015

theq629 commented Jan 14, 2015

theq629 commented Jan 14, 2015

anoopsarkar commented Jan 14, 2015

theq629 commented Jan 15, 2015

anoopsarkar commented Jan 15, 2015

theq629 commented Jan 15, 2015

anoopsarkar commented Jan 15, 2015

theq629 commented Jan 15, 2015