Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Erroneous gaps in the histogram when column type is integer #21652

Closed
qgib opened this issue Oct 15, 2015 · 13 comments
Closed

Erroneous gaps in the histogram when column type is integer #21652

qgib opened this issue Oct 15, 2015 · 13 comments
Labels
Bug Either a bug report, or a bug fix. Let's hope for the latter! Feedback Waiting on the submitter for answers Symbology Related to vector layer symbology or renderers

Comments

@qgib
Copy link
Contributor

qgib commented Oct 15, 2015

Author Name: Johannes Kroeger (Johannes Kroeger)
Original Redmine Issue: 13614
Affected QGIS version: master
Redmine category:symbology


I noticed erroneous gaps in the histogram in many cases.

Attached are examples from Natural Earth's countries name_len.

Also a comparison between a histogram in R and the same in QGIS (both using the same number of bins I believe, I used Freedman-Diaconis in R, I think QGIS uses that too).

I am on ae1250b on Archlinux, not totally up to date with master.


@qgib
Copy link
Contributor Author

qgib commented Oct 15, 2015

Author Name: Johannes Kroeger (Johannes Kroeger)


  • 9197 was configured as histogram_gaps.jpg

@qgib
Copy link
Contributor Author

qgib commented Oct 15, 2015

Author Name: Nyall Dawson (@nyalldawson)


Can you share your data? Just a CSV with the values from the column used is enough


  • status_id was changed from Open to Feedback

@qgib
Copy link
Contributor Author

qgib commented Oct 16, 2015

Author Name: Johannes Kroeger (Johannes Kroeger)


Open the Shapefile from http://www.naturalearthdata.com/http//www.naturalearthdata.com/download/10m/cultural/ne_10m_admin_0_countries.zip

Choose a graduated style and classify by "name_len". Switch to Histogram tab and play around.

@qgib
Copy link
Contributor Author

qgib commented Nov 10, 2015

Author Name: Nyall Dawson (@nyalldawson)


  • subject was changed from Erroneous gaps in the histogram to Erroneous gaps in the histogram when column type is integer

@qgib
Copy link
Contributor Author

qgib commented Dec 19, 2015

Author Name: Giovanni Manghi (@gioman)


  • category_id was configured as Symbology
  • status_id was changed from Feedback to Open

@qgib
Copy link
Contributor Author

qgib commented Apr 30, 2017

Author Name: Giovanni Manghi (@gioman)


  • regression was configured as 0
  • easy_fix was configured as 0

@qgib
Copy link
Contributor Author

qgib commented Mar 9, 2019

Author Name: Giovanni Manghi (@gioman)


End of life notice: QGIS 2.18 LTR

Source:
http://blog.qgis.org/2019/03/09/end-of-life-notice-qgis-2-18-ltr/


  • resolution was changed from to end of life
  • status_id was changed from Open to Closed

@qgib qgib closed this as completed Mar 9, 2019
@qgib qgib added Bug Either a bug report, or a bug fix. Let's hope for the latter! Symbology Related to vector layer symbology or renderers labels May 25, 2019
@kannes
Copy link
Contributor

kannes commented Jul 7, 2019

This is still the case in 3.8, please re-open.

@gioman gioman reopened this Jul 8, 2019
@pjshelton
Copy link

pjshelton commented Jan 3, 2021

Using the supplied data, I have the following comments to make.

NAME_LEN is count data. it counts the number of characters in the name. This, by definition, is integer.

The histogram shows two things, it is colour coded by the classes, and the count of data points per histogram bin.
Concentrating on the histogram bins, they are created to as to fit the requested number of bins between the extreme's of the data.

The data we are looking at here ranges from 4 to 25. There are only 22 distinct values the data can take.

When you ask for enough bins, (greater than 42) there will be bins where the range for that bin does not bracket an integer value between every bin where the range does bracket a integer value. For those bins that do not bracket an integer value, the count of data points has to be zero, and you will get a gap in the histogram.
image

For this dataset you have to reduce the number of bins to at least 22 before every bin brackets an integer.
image

For this dataset the behaviour of the histogram does exactly what is expected. (so long as you stay above the number of classes in the classification).

@gioman gioman added the Feedback Waiting on the submitter for answers label Jan 4, 2021
@kannes
Copy link
Contributor

kannes commented Jan 4, 2021

Ha, that makes a lot of sense. Thank you so much!

Sadly I do not remember what dataset I used for the other example. Your explanation matches what we see for name_len but the BEVENT example still seems worth investigating. Might be a similarly logical explanation but the gaps in a normal-ish distributon like that surprised me.

@gioman
Copy link
Contributor

gioman commented Jan 4, 2021

@kannes closing?

@kannes
Copy link
Contributor

kannes commented Jan 4, 2021

Let me generate some synthetic data and test first.

@kannes
Copy link
Contributor

kannes commented Jan 4, 2021

Yes, any gaps I see even with float data can be explained logically.

  • I created 1000 float values between 0 and 1000 in a normal distribution
  • I looked at the data in the histogram and increased the number of classes until there was a gap
    image
  • I looked at the breaks around that gap and at the data

...
1870.3386544665275
1988.5187405557824
2020.7498321992264
...

  • Where QGIS shows the gap, there are also no values in the data
  • I increased the class number even more to get more gaps and looked at another one, same result

So yes, there is no issue. It was a PEBKAC on my side. The other example data must have had some funny characteristics with those regular gaps (or some code was fixed in the meantime).

Thank you @pjshelton for looking into this!

@gioman, please close :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Either a bug report, or a bug fix. Let's hope for the latter! Feedback Waiting on the submitter for answers Symbology Related to vector layer symbology or renderers
Projects
None yet
Development

No branches or pull requests

4 participants