New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

log-axis of histogram handles missing bins badly #210

Closed
eigenhombre opened this Issue Nov 1, 2013 · 7 comments

Comments

Projects
None yet
2 participants
@eigenhombre

eigenhombre commented Nov 1, 2013

In every histogramming package I've used (several for high energy physics), log scales are easily obtained as options on histograms.

After a fair amount of digging, I managed to figure out how to set log scale for the Y axis in histograms. However, this behaves very badly in the case where some bins are missing entries.

For example,

(ns foo.graphs
    (:require [incanter.core :refer :all]
              [incanter.stats :refer :all]
              [incanter.charts :refer :all]))
(let [values [1 2 1 2 3 1 2 4 6]
      hist (histogram values :nbins 6)]
    (set-axis hist :y (log-axis))
    (view hist))

screen shot 2013-10-31 at 11 03 35 pm

The correct behavior IMO is to leave out the missing bin, but not to make the Y axis go crazy as shown in the attached. In other words, the maximum value for Y in this case should be 3 (there are three 1s); the 5 bin should be empty; and there should be a value of 1 for 3, 4 and 6.

Correct handling of logarithmic scales for histograms is actually a fairly important feature for a statistical package, IMO.

Thanks!

@eigenhombre

This comment has been minimized.

Show comment
Hide comment
@eigenhombre

eigenhombre Nov 1, 2013

In case it's helpful, though it's a bit kludgy, the following works:

(let [values [1 2 1 2 3 1 2 4 6]
      nbins 6
      hist (histogram values :nbins nbins)
      axis (org.jfree.chart.axis.LogarithmicAxis. "")
      basedata (-> hist .getXYPlot .getDataset)
      values (for [i (range nbins)] (.getY basedata 0 i))
      maxvalue (apply max values)]
  (.setLowerBound axis 0.5)
  (.setUpperBound axis (inc maxvalue))
  (set-axis hist :y axis)
  (view hist))

eigenhombre commented Nov 1, 2013

In case it's helpful, though it's a bit kludgy, the following works:

(let [values [1 2 1 2 3 1 2 4 6]
      nbins 6
      hist (histogram values :nbins nbins)
      axis (org.jfree.chart.axis.LogarithmicAxis. "")
      basedata (-> hist .getXYPlot .getDataset)
      values (for [i (range nbins)] (.getY basedata 0 i))
      maxvalue (apply max values)]
  (.setLowerBound axis 0.5)
  (.setUpperBound axis (inc maxvalue))
  (set-axis hist :y axis)
  (view hist))
@jakubholynet

This comment has been minimized.

Show comment
Hide comment
@jakubholynet

jakubholynet Dec 28, 2013

Contributor

You complain about two things that are worth a discussion: 1) how to make log-axis easily available through the API so that people don't need to search for it and 2) how to handle display of missing bins; let me focus on nr. 2.

Since Incanter is here only a wrapper around JFreeCharts, to find the best solution we would need to understand JFC well, which I do not :-( However, looking at its API and playing with it a little, I found this somewht better workaround:

(let [values (concat [1 2 1 2 3 1 2 4] (repeat 100 6))
      hist (histogram values :nbins 6)
      axis (doto (log-axis)
             (.setSmallestValue 0.1))] ;; workaround: setSmallestValue > 0
    (set-axis hist :y axis)
    (view hist))

resulting in:
hist-with-bound01

However I still do not know what would be the prefered solution. setSmallestValue by default? Fix org.jfree.chart.axis.LogAxis to display 0 as 0 instead of log(0)? (though this is confusing b/c suddenly we don't know if the actual value was 0 or 1 as both would be displayed as 0). Or filter the dataset and remove 0s from it before displaying?

Comments welcomed.

PS: Notice that we use LogAxis while JFC also has org.jfree.chart.axis.LogarithmicAxis
PPS: I wish we used D3 instead of JFC, it has so much cleaner and more understandable API!
PPS: I consider making this possible: (log-axis :smallest-value 0.1)

Contributor

jakubholynet commented Dec 28, 2013

You complain about two things that are worth a discussion: 1) how to make log-axis easily available through the API so that people don't need to search for it and 2) how to handle display of missing bins; let me focus on nr. 2.

Since Incanter is here only a wrapper around JFreeCharts, to find the best solution we would need to understand JFC well, which I do not :-( However, looking at its API and playing with it a little, I found this somewht better workaround:

(let [values (concat [1 2 1 2 3 1 2 4] (repeat 100 6))
      hist (histogram values :nbins 6)
      axis (doto (log-axis)
             (.setSmallestValue 0.1))] ;; workaround: setSmallestValue > 0
    (set-axis hist :y axis)
    (view hist))

resulting in:
hist-with-bound01

However I still do not know what would be the prefered solution. setSmallestValue by default? Fix org.jfree.chart.axis.LogAxis to display 0 as 0 instead of log(0)? (though this is confusing b/c suddenly we don't know if the actual value was 0 or 1 as both would be displayed as 0). Or filter the dataset and remove 0s from it before displaying?

Comments welcomed.

PS: Notice that we use LogAxis while JFC also has org.jfree.chart.axis.LogarithmicAxis
PPS: I wish we used D3 instead of JFC, it has so much cleaner and more understandable API!
PPS: I consider making this possible: (log-axis :smallest-value 0.1)

@eigenhombre

This comment has been minimized.

Show comment
Hide comment
@eigenhombre

eigenhombre Jan 3, 2014

Thanks for the detailed reply on this.

In another project I set the minimum to 0.5 by default. I like the :smallest-value option to log-axis, but would go further and say that for histograms there should be a logarithmic option (something like (histogram :log-y true)) which sets smallest-value automatically on the axis to eliminate this sharp edge. This would at least cover the common use case of showing exponential and/or gaussian distributions in a friendly way. What do you think?

I definitely agree that D3 is much nicer to work with than JFC! :-)

eigenhombre commented Jan 3, 2014

Thanks for the detailed reply on this.

In another project I set the minimum to 0.5 by default. I like the :smallest-value option to log-axis, but would go further and say that for histograms there should be a logarithmic option (something like (histogram :log-y true)) which sets smallest-value automatically on the axis to eliminate this sharp edge. This would at least cover the common use case of showing exponential and/or gaussian distributions in a friendly way. What do you think?

I definitely agree that D3 is much nicer to work with than JFC! :-)

jakubholynet pushed a commit to jakubholynet/incanter that referenced this issue Jan 9, 2014

Jakub Holy
Fix incanter#210 By setting smallest val on log axis
- so that values close to 0 do not make the axis go crazy

Ref: incanter#210
@jakubholynet

This comment has been minimized.

Show comment
Hide comment
@jakubholynet

jakubholynet Jan 9, 2014

Contributor

I think that setting smallest0value by default is reasonable, pull req.send.

If family permits, I'd like to look into enhancing histogram. But I'd prefer it more generic then you propose, to be able to choose also other types of axis (is it meaningful?) and to supply more options.

Ex:

(histogram :y-axis :logarithmic) ; optional, minimal syntax, abbrev. for this below:
(histogram :y-axis [:type :logarithmic, :base 10, :int-ticks? true, :smallest-value 0.5]) ; full syntax w/ vector w/ type + opts
;; altern. design, less declarative, more flexible:
(histogram :y-axis (log-axis :base 10, :int-ticks? true, :smallest-value 0.5))

comments?

Contributor

jakubholynet commented Jan 9, 2014

I think that setting smallest0value by default is reasonable, pull req.send.

If family permits, I'd like to look into enhancing histogram. But I'd prefer it more generic then you propose, to be able to choose also other types of axis (is it meaningful?) and to supply more options.

Ex:

(histogram :y-axis :logarithmic) ; optional, minimal syntax, abbrev. for this below:
(histogram :y-axis [:type :logarithmic, :base 10, :int-ticks? true, :smallest-value 0.5]) ; full syntax w/ vector w/ type + opts
;; altern. design, less declarative, more flexible:
(histogram :y-axis (log-axis :base 10, :int-ticks? true, :smallest-value 0.5))

comments?

@eigenhombre

This comment has been minimized.

Show comment
Hide comment
@eigenhombre

eigenhombre Jan 9, 2014

+1.

John

On Jan 9, 2014, at 9:25 AM, Jakub Holy notifications@github.com wrote:

I think that setting smallest0value by default is reasonable, pull req.send.

If family permits, I'd like to look into enhancing histogram. But I'd prefer it more generic then you propose, to be able to choose also other types of axis (is it meaningful?) and to supply more options.

Ex:

(histogram :y-axis :logarithmic) ; optional, minimal syntax, abbrev. for this below:
(histogram :y-axis [:type :logarithmic, :base 10, :int-ticks? true, :smallest-value 0.5]) ; full syntax w/ vector w/ type + opts
;; altern. design, less declarative, more flexible:
(histogram :y-axis (log-axis :base 10, :int-ticks? true, :smallest-value 0.5))
comments?


Reply to this email directly or view it on GitHub.

eigenhombre commented Jan 9, 2014

+1.

John

On Jan 9, 2014, at 9:25 AM, Jakub Holy notifications@github.com wrote:

I think that setting smallest0value by default is reasonable, pull req.send.

If family permits, I'd like to look into enhancing histogram. But I'd prefer it more generic then you propose, to be able to choose also other types of axis (is it meaningful?) and to supply more options.

Ex:

(histogram :y-axis :logarithmic) ; optional, minimal syntax, abbrev. for this below:
(histogram :y-axis [:type :logarithmic, :base 10, :int-ticks? true, :smallest-value 0.5]) ; full syntax w/ vector w/ type + opts
;; altern. design, less declarative, more flexible:
(histogram :y-axis (log-axis :base 10, :int-ticks? true, :smallest-value 0.5))
comments?


Reply to this email directly or view it on GitHub.

@jakubholynet

This comment has been minimized.

Show comment
Hide comment
@jakubholynet

jakubholynet Jan 9, 2014

Contributor

Thank you John; any comments regarding proposed variants 1, 2, and 3? I.e. nr. 2 or nr 3? If 2, is it worth implementing also nr. 1?

Contributor

jakubholynet commented Jan 9, 2014

Thank you John; any comments regarding proposed variants 1, 2, and 3? I.e. nr. 2 or nr 3? If 2, is it worth implementing also nr. 1?

@eigenhombre

This comment has been minimized.

Show comment
Hide comment
@eigenhombre

eigenhombre Jan 9, 2014

Sorry, I thought you were proposing all three! :-) #3 is slightly more abstract so I like it a little better; #1 would definitely be good as well since I expect most people won't care about the additional options.

eigenhombre commented Jan 9, 2014

Sorry, I thought you were proposing all three! :-) #3 is slightly more abstract so I like it a little better; #1 would definitely be good as well since I expect most people won't care about the additional options.

@alexott alexott closed this in 5a55a69 Jan 10, 2014

alexott added a commit that referenced this issue Jan 10, 2014

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment