fix hist limit issue for step* for both linear and log scale #1075

Closed
wants to merge 5 commits into
from

Conversation

Projects
None yet
6 participants
Contributor

piti118 commented Aug 12, 2012

Fix two closely related issue:
#1066

and
#196

lib/matplotlib/axes.py
@@ -7946,6 +7946,22 @@ def hist(self, x, bins=10, range=None, normed=False, weights=None,
x += 0.5*(bins[1]-bins[0])
if log:
+ #in the case of log scale there are three cases to consider for the y axis limit
+ #1) has zero or negative bin but other bins may be 0<n<1 due to weighting. we want positive minimum*0.1 to be the minimum
@WeatherGod

WeatherGod Aug 14, 2012

Member

Please keep within 80 characters line width as per PEP8.

+
+ if np.any(ndata>0):
+ if np.all(ndata[ndata!=0]>=1):
+ minimum = 0.1 # case 2)
@mdboom

mdboom Aug 14, 2012

Owner

Shouldn't this be log_base ** -1? (i.e. 0.1 only makes sense when the log base is 10).

@piti118

piti118 Aug 14, 2012

Contributor

Yes, but how do I access the base of log scale axis?

@mdboom

mdboom Aug 14, 2012

Owner

Ah, I see. I missed the fact that hist doesn't take a base as argument. hist is a bit of an anomaly -- it takes a 'log' argument, whereas most other things (except bar*) work by requiring a subsequent call to set_xscale or set_yscale, which of course take an optional "base". Making this also work that way would be a little bit of work, because this minimum adjustment stuff you've added here would have to be moved to the draw phase. Ideally, that's where I'd like to see this go, but I don't know if that's worth abandoning this fix, and we've have to maintain backward compatiblity anyway -- this "log" kwarg goes back at least 5 years, as far as I can tell.

We could either add an additional keyword "log_base", or extend "log" so it can be "True" (meaning base 10 as it does now), or a number, meaning a base. I would also recommend updating bar* to be in line with that.

However, I'm aware that none of this is premised on any real world use case -- just my own desire for consistency across uses of log scale, so I could be persuaded to leave this as-is by further discussion.

@piti118

piti118 Aug 14, 2012

Contributor

This might be kind of hard to set the minimum correctly if log base can be changed between to calls. Consider this

ax.hist( x1, log=10)
ax.hist(x2, log=2)

The first hist call would have no idea that eventually the base of log scale will be changed to 2.

But, I do not think it would be hard for use to do another call to set_ylim though. The histogram will be painted all the way down (see line 7892) anyway.

@mdboom

mdboom Aug 14, 2012

Owner

Yes -- that, in a nutshell, is why I'm not a fan of "log" being a kwarg to this method. You may want to hold of on implementing any sort of fix for this issue until we discuss it -- it may be best to just leave as-is.

@efiring, @WeatherGod, @pelson: any thoughts?

@WeatherGod

WeatherGod Aug 17, 2012

Member

w.r.t. not being able to access the axes's log base, we are assuming base 10 throughout this part of the code, and we have always done so before. We could always look to future PRs to address the handling of arbitrary log bases. I see this more as a good bug fix to an outstanding issue.

I guess it would be smart, though, for future developers, to somehow comment the spots that are assuming base 10 so that they know to change it if/when we extend the interface in the future to accept arbitary log bases.

@mdboom

mdboom Aug 20, 2012

Owner

I tend to agree with @WeatherGod here: let's leave this assuming base 10 for now.

Member

WeatherGod commented Aug 14, 2012

log in hist has always been an odd duck. It tries to be like the bar* functions, but not quite. Note, however, that unlike the bar* functions, a log kwarg in hist actually has a purpose besides convenience. While one could pre-transform their data for the bar* functions, one can't with hist because what is plotted is the result of an internal aggregation calculation.

There is another PR that I am waiting for a few last polishings that also addresses an issue for log scaling of the results for hist2d(). Might be a good idea to make sure both functions support the same "log" kwarg inputs.

@@ -7946,7 +7946,29 @@ def hist(self, x, bins=10, range=None, normed=False, weights=None,
x += 0.5*(bins[1]-bins[0])
if log:
- y[0],y[-1] = minimum, minimum
+ #in the case of log scale there are three cases
@WeatherGod

WeatherGod Aug 17, 2012

Member

Wrap these comment lines to less than 75 characters

@pelson

pelson Aug 19, 2012

Member

75? I thought we did 80?

@WeatherGod

WeatherGod Aug 20, 2012

Member

Actually, I was wrong, in PEP8, docstrings are recommended to be at most 72 characters.

lib/matplotlib/axes.py
+ #to consider for the y axis limit
+ #1) has zero or negative bin but other bins may be 0<n<1
+ # due to weighting or normalization. We want
+ # (positive minimum)*0.1 to be the minimum
@WeatherGod

WeatherGod Aug 17, 2012

Member

Don't you mean np.any() here?

@piti118

piti118 Aug 17, 2012

Contributor

What do you mean?

@WeatherGod

WeatherGod Aug 17, 2012

Member

strange, my line comments got applied to the wrong lines.

@WeatherGod

WeatherGod Aug 17, 2012

Member

Looks like you have since fixed the np.any() I thought you needed.

+ minimum = np.min(ndata[ndata>0])*0.1 #case 1)
+ else:
+ minimum = 0.1 #case 3)
+
if orientation == 'horizontal':
@WeatherGod

WeatherGod Aug 17, 2012

Member

Set up your editor to strip trailing white spaces.

@WeatherGod

WeatherGod Aug 17, 2012

Member

Looks like you have since fixed the training white spaces.

@@ -7957,7 +7979,11 @@ def hist(self, x, bins=10, range=None, normed=False, weights=None,
for m, c in zip(n, color):
y[1:-1:2], y[2::2] = m, m
if log:
- y[y<minimum]=minimum
+ tiniest = np.finfo(np.float).tiny
@WeatherGod

WeatherGod Aug 17, 2012

Member

we need some of this logic here in the case of multiple calls to hist() or mixed plotting. If there are existing bounds, they should be used (at least, that is what this logic attempts to do. It might need some serious updating, though).

@piti118

piti118 Aug 17, 2012

Contributor

Yes, that's the part I'm not sure about the inner working. This logic seems to start around line 7999.

So, as far as I understand how limit work for multiple plots. The existing desired range is store in self.dataLim.intervalx/y, right? So, all i have to do is make a super range that contains the new limit and old limit, right?

+ ax.hist(d,bins=40,weights=w,histtype='stepfilled',log=True)
+
+
+@image_comparison(baseline_images=['hist2d'])
@WeatherGod

WeatherGod Aug 17, 2012

Member

I think you accidentally added these hist2d tests to this PR. Would be more appropriate over in the hist2d PR... with images.

@piti118

piti118 Aug 17, 2012

Contributor

hmmm bad rebasing I think. It's actually in master or official repo. If I make a commit to remove them will it be gone? #826

@WeatherGod

WeatherGod Aug 17, 2012

Member

Don't bother. When we merge, it should see that the same change has shown
up twice and handle it ok. Odd rebasing indeed...

Member

WeatherGod commented Aug 17, 2012

Great job with the additional tests. Only one request here about them. Think we could have corresponding tests for the horizontal case?

@@ -7970,22 +7996,14 @@ def hist(self, x, bins=10, range=None, normed=False, weights=None,
# adopted from adjust_x/ylim part of the bar method
if orientation == 'horizontal':
- xmin0 = max(_saved_bounds[0]*0.9, minimum)
+ xmin = minimum
@WeatherGod

WeatherGod Aug 17, 2012

Member

My comments above wrt multiple calls to hist() goes here.

@piti118

piti118 Aug 17, 2012

Contributor

This is kind of the part I don't fully understand. What function does 0.9 serve and where should I get the old bound. What's the difference between self.datalim.bound and self.datalim.intervalx/y?

@WeatherGod

WeatherGod Aug 17, 2012

Member

I don't have time for a full explanation as I am about to head off on
vacation, but perhaps I can provide some clues for you. I think this code
part has been neglected. Other functions like plot() and scatter() utilize
pads for the data limits so that the axes wouldn't be extremely tight
around the data being plotted. This value used to be a hard-coded 0.05 on
each endside (for a total of 0.1). In plot() and scatter() (and possibly
bar()) we now handle this a lot more intelligently, but I think hist() was
left behind in this regard.

Note that data limits don't necessarially mean the same as view limits. I
am not sure what the distinction is between data limits and data interval
is, though.

I hope this helps!
Ben Root

Contributor

lpsinger commented Jan 1, 2013

Does this pull request need any more work? histtype='stepfilled' also produces pretty unsightly results on master.

Contributor

piti118 commented Jan 1, 2013

I think it needs testing on multiple plots being drawn together. I'm not 100% sure what _save_bounds does.

Member

dmcdougall commented Jan 27, 2013

I just saw this. #1684 was merged and #196 was closed. Does this PR also need to be merged? I.e., does it fix a separate issue not addressed in #1684?

@ghost ghost assigned dmcdougall Jan 27, 2013

Contributor

piti118 commented Jan 27, 2013

They seems to fix the same thing. I'll close this.

@piti118 piti118 closed this Jan 27, 2013

Member

dmcdougall commented Jan 27, 2013

@piti118 Judging by the title of this PR, this patch seems to address behaviour that can be seen when using a linear scale rather than a log scale. PR #1684 only addresses an issue with log scaling. Therefore it might be the case that this may need to be reopened.

Thanks for taking the time and the effort to write a patch for the codebase, @piti118.

Again, if you see unexpected behaviour with linear scaling then please feel free to re-open this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment