-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Xkcd-style plot flow diagrams #111
Comments
On Sankey diagrams, in my understanding the main purpose is to represent proportional flow, so I suspect that it isn't really the best term here but I'm not sure what the better term would be. Regardless, it looks to me like the d3 plugin will work fine. At some point we might need to work out how to draw nicer line start and end points. On how to do the processing, the plan is for the backend to produce per-entity timelines indicating which reference points or other event clusters the event is in at each year. The frontend then further processes this (if needed; eg to produce more specific clusters for the plot) and draws the plot. We don't want to send a lot of event data to the frontend for bandwith reasons, but it may be worth sending more than the minimal amount for the diagram so that we have some flexibility to adjust diagrams on the frontend without having to change backend code. |
I've put up early work in the xkcdplottimelines branch and live at http://champ.cs.sfu.ca/WikiHistory/latest/whoosh/wikipediahistoryxkcdplottimelines/. Currently it has a very basic interface and I haven't tried to optimize it at all so it's pretty slow. I'm also not trying to fix clusters' positions on the y-axis yet, just letting the d3 sankey plugin puts them where it wants. The three input areas at the top are starting year, ending year, and entities list. The entities list is comma separated and each item has the format field:value. Mouse over things or otherwise look at the SVG titles to find out what they actually are. The plot definitely needs some improvements. Unless you like the current look with entity lines spread out so much on the cluster nodes, then I'll probably need to edit the plugin on just do the diagram manually. I already have to ignore the plugin's x-axis placement choices and we'll probably want to do the same on the y-axis, likely messing up its layout. Additionally, assigning links with branching entity lines needs work. Currently it always looks one time step back when making links, always making a direct link when the entity stays in the same cluster and otherwise adding an link from an arbitrary cluster. This can produce jumpy results (eg for person:Hannibal around 205-210BC), but I'm not sure that a better algorithm is totally straightforward. |
Looks good for a first step. Is it possible to make the timeline longer than the window size and scroll left and right to see more of the timeline? |
Yes, I can add a scroll bar. And eventually I'd like to do something with a brush select for zooming and limiting the range (like for the regular timeline zoom), but I'd like to wait until the basic plot is more settled first. |
Here are some things I would like to clarify:
|
I don't see the different colors for the different person entities. All the mouseovers say "Hannibal" for me. About each point above:
|
By the way, if we do want to cluster points differently we should consider DBSCAN: |
Ah, I see. I didn't expect Scipio to be only at the end and I was confused by two different lines for the same person initially. I think I said something differently earlier on, but I think one line per entity seems the least confusing. We would have to merge clusters to make this happen. |
It would also help to highlight the entire line for each entity on mouseover (as in other Sankey demos). |
Yes, I'll change it to highlight the whole line for an entity. The confusingness of branching entity lines is part of why I'd like it to put the endpoints for together on the node. But as you say, this will be much less confusing if there is one node per entity. I think your description for (3) will work, although we may have to see how much the initial clusters overlap in practice. |
For the rest, to explain how the sankey plugin works: the plugin does layout to produce positions and sizes for the nodes and end points and sizes for the links. The associated example code then draws rectangles and curves accordingly, and I modified it a bit to get the current plot. The layout part is allocating space for much wider flow-proportional link lines, since it seems to scale them to the available screen space. As far as I can tell there isn't any way to disable that (and it's not just a CSS issue). Similarly I'm having to bump the nodes to the correct x-axis positions after layout since I don't see any way to constrain the layout positions, and that's probably part of why the layout is poor. So I think probably we should either give up on using the sankey plugin or heavily modify it. If we do want to fix cluster y-positions across time then I think implementing custom layout won't be hard. Otherwise we can probably use one of the more general d3 graph layout implementations. |
Have a look at what they say at the bottom of http://csclub.uwaterloo.ca/~n2iskand/?page_id=13
|
They say "heuristic" in the previous link, but I wonder if they fiddled with the placement by hand. |
The layout looks a lot nicer when the year constraint -210 to -180 is added. Make me think that scrolling left or right might do the trick when it comes to the crowded feel of the zoomed out view. Although having one line per entity will also help. |
If I understand the uwaterloo system correctly, they are first doing a character clustering based on global scene co-occurrences and then placing scene nodes within a y-axis bands for each character cluster. So when they say the "y-range of a cluster", that's referring to grouping of characters that we don't currently have any analog to. Should we be doing a similar DBSCAN step? An alternative in our case (as long as we continue using the geographic clusters as the underlying data) is to try to make the y-axis roughly represent geographic distance, perhaps projecting clusters geo-points onto a most-separating axis and then respacing to look better. I'm not totally sure that we should expect their procedure to work for us since the entity being in multiple clusters at once case must occur a lot more in our data; however that may not matter if we are merging clusters. Everything is section 3 of the uwaterloo page is done by their javascript (http://csclub.uwaterloo.ca/~n2iskand/comics/narrative/narrative.js), but unfortunately it does not appear to be licensed. |
Oh, and additionally I think we probably don't want to be doing much processing for this on the backend unless it can be done globally for the whole data set, so if we do DBSCAN or anything we need to do it on the frontend in javascript. The uwaterloo implementation seems to do that in the python (the input to the javascript is eg http://csclub.uwaterloo.ca/~n2iskand/comics/narrative/luckyluke6_narrative/narrative.json and http://csclub.uwaterloo.ca/~n2iskand/comics/narrative/luckyluke6_narrative/characters.xml). |
I don't understand why they have a character cluster if they also have a spatial localization in a frame of the comic. Perhaps they are clustering frames of the comic which in our case would be to group together our spatial clusters. So our plan seems to be fairly reasonable. I agree we should do the clustering before we launch the backend (if needed, looks like we can reuse our existing clusters). |
If I'm understanding correctly, then they are making global character clusters based on co-occurrence across all scenes, and then when they draw the plot they first assign these character clusters positions on the y-axis (trying to put large clusters far apart). Then they produce a Sankey node for each scene (corresponding to our geographic-cluster@year nodes) and position each scene by the y-position for character cluster that's best represented in the scene. Then finally they add lines for each character, ordering the line positions within each scene node according to some procedure I'm not understanding. So if that's right then they aren't directly clustering frames or scenes (I assume scenes are sequences of frames determined somehow), but scenes are assigned to the best matching character cluster, producing a sort of clustering of scenes. In our case, we could definitely do some comparable fixed co-occurrence based clustering before starting the backend. However, it's possible that doing it on the frontend based on what entities are selected for a particular plot would give more relevant results. Also note that while they do have localization by frames or scenes, it's not exactly spatial in the same way that our geographic clusters are in that it doesn't give any distance metric unless they've added that by hand. |
The greedy grouping our existing geographical clusters might produce the same effect for us. So perhaps we can try that first. We would still need to do placement but we can use their heuristic for this? |
The geographical clustering could give us the character groups we need? Can you work out an example to see if this works? |
How does the geographical clustering give us character groups? (I mean, I think what you already said about merging clusters will probably work fine to make a working plot, I'm just not sure if it's the same as having global character/entity groups.) |
A global character group for them seems to be simply to select where they start on the left hand side. They enter and leave different groups on the y-axis anyway (e.g. Ma Dalton is in group 0 with 2/3 others, but the line curves around to enter and leave different groups/clusters in the plot). I think we will get big clumps as we keep merging our geoclusters to form groups at each timeline point. Allowing the user to rearrange the nodes might be sufficient to make things less cluttered (or at least takes us off the hook). |
Ok, in that case I'll try merging geoclusters first. |
Are we stuck on this issue? |
Yes, I haven't been feeling very well so I still haven't got the new clusters working yet. |
Oh, OK. Feel better. I wanted to know if it was a conceptual issue. |
Ok, cluster merging is in. I also made the backend handling for it a fair bit faster, and made the frontend highlight the whole entity line on mouseover, along with changing the node style slightly to make that more visible. I haven't played too much with it yet, but you can try eg "person:Hannibal, person:Scipio Africanus, person: Antiochus III the Great, person:Philip V of Macedon" for a more complex plot than before and "person:Wang Mang, person:Julius Caesar" to see what happens with totally non-overlapping geographic clusters. With cluster merging I think the main issue is that entity lines enter and leave the cluster nodes at totally different places. Again I don't think that is changable with the Sankey plugin, so if you think the merging looks ok then I think the next step could be to work on a new drawing method. |
Hmm. That comes across as incredibly confusing. I know we had discussed reference points versus locations as clusters. I don't recall locations being that bad visually. Do you think it is better to switch to locations as the nodes in the storyline view to avoid this outcome? |
I get exactly the same result with location-based clusters. I'm not sure it is something we can really avoid. For example, the three entities could easily appear in totally separate events at the same year and same reference point / location, and then they have to go in the same storyline cluster. |
For redrawing after a selection, I assume the issue here is loosing the zoom rather than the redraw itself. I've changed it now so that it restores the zoom like the timeline does. For the confusing entity selections, I can't think of any way to make it less confusing within the current general method for making storyline clusters. So I say that we either leave it for now, or disable entity line selections for now if it's too confusing. |
let's leave it for now and push to master.
|
Ok, I went ahead and pushed to master! Make sure to restart the backend for the main site since there are backend changes. |
xkcd storyline is live for wikipedia now! |
I think it would be a good idea to move the Storyline tab in index.html after Facets. |
I agree but let us wait until the merge fest is complete. |
There seems to be a corner case bug:
|
I'm not sure if this needs a solution but it is one of those strange corner cases. If I select a line in Storyline and then remove the original Facet constraint (as I should be able to do in faceted browsing) then there is a constraint from the Storyline still active but the Storyline view insists on a selection from the facet list. So there is a constraint active but the Storyline view is empty. |
For the Hatshepsut, another likely possibility is that Hatshepsut occurs in only one storyline cluster in that range. Right now that gets filtered out from the visualization since it can't be drawn as a line. I guess I should add a special case that draws it as a dot on top of the node or something like that. For the selection case, that does seem confusing. Should the storyline view clear its own constraints at the same time it shows the "Make a selection in the facet" text? |
|
I moved the Storyline tab in index.html to be right after the Facets tab |
should we close this issue and handle bugs or enhancements with separate issues? |
Why don't I finish the two enhancements above, and then we'll close this issue. |
ok. sounds good.
|
How about simplifying the current back and forth between the Facet view and Storyline view by simply generating a default Storyline for the most frequent element of each facet entry. e.g. currently the Storyline view for the Person facet would be Augustus by default. We can also cache the default view as we do with other tabs. |
That sounds good, but I'm also wondering if we need to add some sort of introduction message that explains what's being shown, especially if we are introducing arbitrary limits (for the too many entity lines bug, #133). One way to handle that would be to have a help message that is shown initially, like the text search does. But maybe a similar box that doesn't show up until the user asks for it would be find in this case. |
On the default view, isn't it confusing if there is already a storyline generated before a selection is made? |
Re: a default view for Storyline. Just like other views have a summary view that is loaded by default, the Storyline can have the most frequent entity in a facet. To make things clearer we can have a drop down menu in the Storyline view that can be used to select the entity (this will also avoid having to switch to the Facet view altogether). So the top of Storyline would look like this: {Clear Selection} {Person facet}(see 1) {Augustus [439]}(see 2) [ Default Storyline view would be Person facet and most frequent Person ] 1: can be used to select other enabled facets (as it is done now) One potential issue is that the frequencies and entities will change based on other constraints. But I think that information is readily available already to the Storyline view, isn't it? |
Ok, I think that would work. Do the selections made in the main facet still change the storyline selection (and what's shown as selected in the storyline menu)? |
No, I think we can decouple the Facet view from Storyline view entirely using this approach. That would give us more flexibility in the Facet view (as raised in #138) |
In that case we'd basically just move a copy of the facet queries to the storyline. So definitely doable, but since it's a pretty big change I suggest we make a new issue for it. (And close this issue if the other two things look ok to you.) |
yes, sounds good. please merge your current changes with master.
|
Ok, merged. And I assume that means we can finally close this issue! |
(Paraphrased from emails by @anoopsarkar:) As a new view on the frontend, use the entities from the current search to produce a visualization like the ones at http://csclub.uwaterloo.ca/~n2iskand/?page_id=13, which is based on xkcd #657. The x-axis is like in the current timeline plot but may need more scrolling see detail. The y-axis groups entities (especially persons, but we could also use organizations, etc.) into clusters based on co-occurrence in the same events at any given year, and is sorted by frequency. There is a d3 plugin for Sankey diagrams that should work. We can optionally have the thickness of the flow lines represent frequency, but the initial goal should just be to replicate the Waterloo visualization.
The text was updated successfully, but these errors were encountered: