Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Sankey diagram #308

Merged
merged 1 commit into from Oct 17, 2016
Merged

Add Sankey diagram #308

merged 1 commit into from Oct 17, 2016

Conversation

ctessum
Copy link
Contributor

@ctessum ctessum commented Aug 15, 2016

No description provided.

@ctessum
Copy link
Contributor Author

ctessum commented Aug 15, 2016

PTAL. Let me know your thoughts about whether this fits here.

@kortschak
Copy link
Member

I think this is worthwhile having, but a couple of questions.

Why is Flow retained as a pointer and with an inUse field? It seems to me that it's not mutated except for the Group field when there is a defaulting behaviour. Would it not be better to just use a concrete Flow, not mutate it and either understand that "" is default, or pass "Default" (I don't like this) when f.Group is "".

Would you accept a bezier curve instead of a cubic spline. I have a bezier package that I can donate - I prefer to not increase imports given the trouble we have had here with moving infrastructure.

@ctessum
Copy link
Contributor Author

ctessum commented Aug 15, 2016

  1. When I was starting to write it I was thinking that it would need to be mutated but that turned to not be the case. It could also work to have the default group be "", but I thought it might be confusing for beginning users that try to make a legend and find that it is a grey box with no label. This may not be an important concern though.
  2. I don't have a strong knowledge of the difference between a bezier curve and a spline, but it seems like a change I would be fine with. Would this involve creating a new bezier package in gonum, copying the code into sankey.spline, or something in between?

@kortschak
Copy link
Member

  1. I think it might be reasonable to prohibit "" named groups. Ascribing a default makes the package user locale opinionated, which is probably not OK.
  2. The package is here. I'm the sole author of this code, so we can bring it into gonum. Either as a unexported version of what exists there (probably the right thing to do in the short term) or as a (probably internal) package. In the first instance see if you can get your code to play with it where it is and then I can commit it as the author afterwards.

if s[i].order != s[j].order {
return s[i].order < s[j].order
}
panic(fmt.Errorf("can't sort stocks:\n%+v\n%+v", s[i], s[j]))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

prepend error message with "plotter:"

@ctessum
Copy link
Contributor Author

ctessum commented Aug 28, 2016

I just removed the requirement for the group to not be an empty string. There is currently no requirement that the stock labels not be an empty string, so I figured it would be best for the behavior to be consistent.

// SourceStockLabel and ReceptorStockLabel are the labels
// of the stocks that originate and receive the flow,
// respectively.
SourceStockLabel, ReceptorStockLabel string
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SourceLabel, DestinationLabel? These diagrams are used in many places where the entities are not financial. Same for the categories below.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking stocks and flows in the sense used here. It is true that the terms are most commonly used in an economic context, but as the wikipedia page linked above mentions, the amount of CO2 in the atmosphere can also be thought of as a stock. If these terms are deemed not sufficiently general, though, we can use something else.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair enough, the issue for me is more that it makes the field names java-esque. If Source and Destination are meaningful and short. SourceCategory and DestinationCategory are unfortunately longer, but an additional type to allay this doesn't seem warranted.

@kortschak
Copy link
Member

Could you remove the changes from other branches (possibly rebase onto master if you want, but the other commits confound the PR).

@ctessum ctessum force-pushed the sankey branch 2 times, most recently from cea0048 to cb8727c Compare August 29, 2016 15:37
@ctessum
Copy link
Contributor Author

ctessum commented Aug 29, 2016

PTAL. I'm still not sure about the proper usage of ClipLinesX. @eaburns ?

flows []Flow

// FlowStyle is a function that specifies the
// background colar and border line style of the
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/colar/color/

@kortschak
Copy link
Member

Where are we up to on this?

@ctessum
Copy link
Contributor Author

ctessum commented Oct 11, 2016

Sorry for the delay. I have responded to the remaining comments and made some additional changes. PTAL.

Copy link
Member

@kortschak kortschak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM with minor comments addressed and @eaburns approval.

@@ -276,7 +276,6 @@ func (s *Sankey) Plot(c draw.Canvas, plt *plot.Plot) {
{catMin, valMin},
{catMax, valMin},
}
// outline := c.ClipLinesX(pts) // This causes half of the lines to disappear.
c.StrokeLines(lineStyle, pts) //outline...)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

//outline...) and below?

@ctessum
Copy link
Contributor Author

ctessum commented Oct 12, 2016

done. @eaburns PTAL

@eaburns
Copy link
Member

eaburns commented Oct 15, 2016

Very pretty plots, though I don't really understand them.

My one issue is that I don't like the outline-box drawn around the plot within the axes. In my opinion it violates the idea of maximizing the data:ink ratio. The box adds too much ink and no additional information. It's on purpose that Plotinum didn't have a box around the entire plot like many other plotting systems do. Anyway, if you all prefer that, then I don't want to block anything.

I skimmed the code. It looks very nice. LGTM after a squash.

@ctessum
Copy link
Contributor Author

ctessum commented Oct 15, 2016

The idea behind sankey diagrams is to visualize stocks and flows, where stocks are some quantity and each flow some transfer function that relates two stocks. Perhaps the most famous sankey diagram is this one where the stocks are numbers of soldiers and the flows are loss rates between events. A more common use of the diagrams, is to visualize energy utilization, where the stocks are different types of energy (either waste or useful) and the flows are conversions between the different types.

Both of the examples linked above used solid colors with no outlines to delimit the different stocks and flows, but the implementation here uses outlines with no fill color by default. So the outline box around the whole plotter does represent data: the y-height of the box gives the total value of the stock in the system at a given point on the x axis (just like in a stacked bar plot). So in this example, the outlines show that the trees produce 18 apples total but we only know the fate of 17 of them. If you look closely, you can see that this is due to a difference between the number of apples Sofia recieves and the number of apples she does something with. These types of accounting mismatches are common in the types of data used in these plots, and different implementations handle them differently.

@eaburns
Copy link
Member

eaburns commented Oct 15, 2016

I see. So is it sufficient to only have the line on the top? Just curious.

@ctessum
Copy link
Contributor Author

ctessum commented Oct 15, 2016

We could, but I think that would be the equivalent of not outlining the bottom of the bar in a bar chart, but bar charts currently do currently have lines at bottom.

@eaburns
Copy link
Member

eaburns commented Oct 16, 2016

Yeah the bar charts are kind of noisy. Oh well. To late to fix them.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants