Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Raster Singleband Pseudocolor with Quantile mode inconsistencies #35465

Open
PedroVenancio opened this issue Mar 30, 2020 · 25 comments · Fixed by #35718 or #35852
Open

Raster Singleband Pseudocolor with Quantile mode inconsistencies #35465

PedroVenancio opened this issue Mar 30, 2020 · 25 comments · Fixed by #35718 or #35852
Assignees
Labels
Bug Either a bug report, or a bug fix. Let's hope for the latter! Rasters Related to general raster layer handling (not specific data formats) Symbology Related to vector layer symbology or renderers

Comments

@PedroVenancio
Copy link
Contributor

PedroVenancio commented Mar 30, 2020

I'm seeing some inconsistencies in classes breaks, when using Quantile mode on a raster with singleband pseudocolor symbology.

To reproduce:

  1. Download this sample raster and load it on QGIS:
    https://cld.pt/dl/download/7161caab-5ebe-4e94-8c4a-574a80821a97/perigosidade.tif

  2. Properties -> Symbology -> Render type: Singleband pseudocolor -> Interpolation: Discrete -> Mode: Quantile -> Classes: 5. You will get:
    imagem

  3. Run GRASS r.quantile from Processing on this raster, with Number of Quantiles: 5. Leave all other options as default. You will get:

0:20.000000:144.000000
1:40.000000:720.000000
2:60.000000:2400.000000
3:80.000000:4500.000000 

These are the correct classes breaks, and so the symbology should be:
imagem

I have done some tests to be sure about the correct classification.

[A] I've run Raster Pixels to Points to get the center point of all pixels. If you want to check, here is the output geopackage:
https://cld.pt/dl/download/3764537c-2504-4326-be77-74e26c0d2a49/perigosidade.gpkg

Load it in QGIS and go to Properties -> Symbology -> Graduated -> Mode: Equal Count (Quantile) -> Classes: 5 -> Classify. You will get:
imagem

[B] I've run GRASS r.stats to get all pixel values. Here is the CSV with all values:
https://cld.pt/dl/download/b730c614-40b3-40b8-887f-2b985e753d96/perigosidade_r_stats.csv

I've tested this csv in R:

> table_perigosidade <- read.csv("D:\\Testes\\quantile\\perigosidade_r_stats.csv")
> quantile(table_perigosidade$raster_value, probs = c(0, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0))
   0%   10%   20%   30%   40%   50%   60%   70%   80%   90%  100% 
   40    96   144   288   720  1632  2400  3000  4500  6000 46721

[C] Then I extracted the absolute frequencies of raster values:

write.csv(count(table_perigosidade, 'raster_value'), file = "D:\\Testes\\quantile\\perigosidade_r_stats_abs_freq.csv")

https://cld.pt/dl/download/af265a0a-024f-40c4-a04c-77eaa9f09173/perigosidade_r_stats_abs_freq.csv

and did the manual analysis of it, getting the cumulative relative frequencies and the result is the same:
https://cld.pt/dl/download/9b014dbd-f991-42e9-880e-baec555a6d18/perigosidade_r_stats_abs_freq_cum.ods

So, it seems that Singleband Pseudocolor Quantile is not returning the correct breaks.

@PedroVenancio PedroVenancio added the Bug Either a bug report, or a bug fix. Let's hope for the latter! label Mar 30, 2020
@gioman gioman added Symbology Related to vector layer symbology or renderers Rasters Related to general raster layer handling (not specific data formats) labels Mar 30, 2020
@PedroVenancio
Copy link
Contributor Author

This issue is not new, I see that it was already present in QGIS 2.18.28.

@elpaso elpaso self-assigned this Apr 11, 2020
elpaso added a commit to elpaso/QGIS that referenced this issue Apr 11, 2020
Fixes qgis#35465 but it is still and arbitrary value,
better approaches would require to calculate other
dispersion indexes and they seem impractical (inefficient)
in this case.

See for example: https://en.wikipedia.org/wiki/Freedman%E2%80%93Diaconis_rule
@PedroVenancio
Copy link
Contributor Author

Hi @elpaso

Thanks for the intermediate solution, however, I've tested it in master, and it still does not solve the sample I've put here.

Do you agree it is better to keep this issue open until it can be completely solved?

Best regards

@elpaso
Copy link
Contributor

elpaso commented Apr 19, 2020

I tested your same dataset and the problem was solved.

@elpaso
Copy link
Contributor

elpaso commented Apr 19, 2020

But let me check it again ...

@elpaso
Copy link
Contributor

elpaso commented Apr 19, 2020

@PedroVenancio you are right, there was another change required to get exact results, which is to not restrict the sample size to 25000 like we are doing now, I removed that before the commit because I was (wrongly) convinced that the other fix was sufficient to fix this issue.
I will investigate if there is a possibility to increase the sample size without affecting performance.

This is what I get if the sample size is augmented ten times (set to 250000*10 )

immagine

@elpaso elpaso reopened this Apr 19, 2020
@elpaso
Copy link
Contributor

elpaso commented Apr 19, 2020

Ok I tested it with my biggest rasters and it obviously freezes QGIS unless a maximum is set. A better approach for exact results would involve a more complex solution with a separate thread.

elpaso added a commit to elpaso/QGIS that referenced this issue Apr 19, 2020
@PedroVenancio
Copy link
Contributor Author

This is what I get if the sample size is augmented ten times (set to 250000*10 )

immagine

Hi @elpaso

Yes, this is the right answer for my sample! But if that increase freezes QGIS with huge rasters, it can't be applied.

Quantile for vectors code is newer, and it seems it follows another aproach:

https://github.com/qgis/QGIS/blob/d147a8dffa9e4a143a7ab0016099c3eaabde346b/src/core/classification/qgsclassificationquantile.cpp

Can't be somehow used in rasters? Or the problem is upstream, when collecting the raster statistics?

Thank you very much Alessandro!

@elpaso
Copy link
Contributor

elpaso commented Apr 22, 2020

@PedroVenancio we (me and @gioman) decided to go with the current solution (which is to increase both bin count and sample size). It does not impact performances because there is still a limit, even if that is way bigger than it was before (the original code was written 10 years ago, when RAM and CPUs were much more limited than today).

elpaso added a commit to elpaso/QGIS that referenced this issue Apr 22, 2020
Fixes qgis#35465 but it is still and arbitrary value,
better approaches would require to calculate other
dispersion indexes and they seem impractical (inefficient)
in this case.

See for example: https://en.wikipedia.org/wiki/Freedman%E2%80%93Diaconis_rule
elpaso added a commit to elpaso/QGIS that referenced this issue Apr 22, 2020
elpaso added a commit to elpaso/QGIS that referenced this issue Apr 22, 2020
Fixes qgis#35465 but it is still and arbitrary value,
better approaches would require to calculate other
dispersion indexes and they seem impractical (inefficient)
in this case.

See for example: https://en.wikipedia.org/wiki/Freedman%E2%80%93Diaconis_rule
elpaso added a commit to elpaso/QGIS that referenced this issue Apr 22, 2020
@PedroVenancio
Copy link
Contributor Author

Perfect @elpaso ! I've tested here and it works!

Do you think this changes can be backported to 3.12 and 3.10?

@gioman
Copy link
Contributor

gioman commented Apr 22, 2020

Do you think this changes can be backported to 3.12 and 3.10?

@PedroVenancio backports done minutes ago.

@PedroVenancio
Copy link
Contributor Author

@elpaso @gioman

Can you please test quantile in this raster with 3.14?

PERIGOSIDADE.zip

It does not show any class to me, both in Linux or Win.

With QGIS 3.10.7 it works as expected.

@PedroVenancio
Copy link
Contributor Author

Another issue I'm seeing now, with a Float64 raster:

RISCO.zip

With this raster, quantiles are not correctly calculated (in this case, in QGIS 3.10.7, as it does not work in 3.14).

Rounding that raster and converting to a Int32, it already calculate quantiles correctly (image on the right):

quantile_float64

Using Raster Pixels to Points algorithm to export Float64 pixel values to points, and calculating quantiles in the resulting vector layer, the output is the correct (as seen on the left image).

Running r.quantile in the Float64 raster, the output is correct:

0:20.000000:120.000000
1:40.000000:1320.000000
2:60.000000:2400.000000
3:80.000000:4500.000000 

Also correct exporting to CSV and calculating in R:

> tabela_risco <- read.csv("D:\\RISCO_RASTER_PIXELS_TO_POINTS.csv")
> quantile(tabela_risco$VALUE, probs = c(0, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0))
      0%      10%      20%      30%      40%      50%      60%      70%      80%      90%     100% 
    36.0     72.0    120.0    792.0   1320.0   1920.0   2400.0   3000.0   4500.0   6000.0 218030.4

So, it seems that quantile classification is still failing with some kind of data.

@gioman
Copy link
Contributor

gioman commented Jun 27, 2020

@elpaso @gioman

Can you please test quantile in this raster with 3.14?

@PedroVenancio @elpaso confirmed, new ticket filed here: #37448

@gioman
Copy link
Contributor

gioman commented Jun 27, 2020

Another issue I'm seeing now, with a Float64 raster:

@PedroVenancio and #37449

rouault added a commit to rouault/QGIS that referenced this issue Sep 15, 2020
Commits 98261bc and ebdb546
for qgis#35465 have changed the logic to compute the default number of bins
for the histogram. While appropriate in the context of qgis#35465, that tends
to augment the number of bins, which is inappropriate for the raster histogram
chart, where beyond 1000, the chart becomes unreadable.

However I think the logic for plotting the raster histogram should probably be
revised, but I'm not clear how: ask the user for a number of bins and/or take
into account the width in pixels of the chart to determine the number of bins
(should that depend if the user has zoomed in... ?)
rouault added a commit that referenced this issue Sep 17, 2020
[GUI] Raster histogram: restore behavior before #35465 changes
qgis-bot pushed a commit to qgis-bot/QGIS that referenced this issue Sep 17, 2020
@PedroVenancio
Copy link
Contributor Author

Hi @elpaso and @gioman

It keeps failing with some rasters. See this one:

Raster_Quantiles

Here is the project with the sample raster:

Raster_Quantiles.zip

Quantiles calculated from Vector Symbology are always correct, just like r.quantile output:

0:20.000000:1.500000
1:40.000000:2.625000
2:60.000000:5.250000
3:80.000000:13.500000

@gioman gioman reopened this Sep 28, 2020
@elpaso
Copy link
Contributor

elpaso commented Sep 28, 2020

Maybe related to #39058

@elpaso
Copy link
Contributor

elpaso commented Sep 28, 2020

@PedroVenancio what QGIS version are you testing?

@PedroVenancio
Copy link
Contributor Author

@PedroVenancio what QGIS version are you testing?

@elpaso I'm testing with

QGIS version 3.15.0-Master QGIS code revision 8d2a0d1
Compiled against Qt 5.11.2 Running against Qt 5.11.2
Compiled against GDAL/OGR 3.2.0dev Running against GDAL/OGR 3.2.0dev
Compiled against GEOS 3.8.1-CAPI-1.13.3 Running against GEOS 3.8.1-CAPI-1.13.3
Compiled against SQLite 3.29.0 Running against SQLite 3.29.0
PostgreSQL Client Version 11.5 SpatiaLite Version 4.3.0
QWT Version 6.1.3 QScintilla2 Version 2.10.8
Compiled against PROJ 7.2.0 Running against PROJ Rel. 7.2.0, November 1st, 2020
OS Version Windows 10 (10.0) This copy of QGIS writes debugging output.

and

QGIS version 3.14.16-Pi QGIS code revision 2b2f966
Compiled against Qt 5.11.2 Running against Qt 5.11.2
Compiled against GDAL/OGR 3.2.0dev Running against GDAL/OGR 3.2.0dev
Compiled against GEOS 3.8.1-CAPI-1.13.3 Running against GEOS 3.8.1-CAPI-1.13.3
Compiled against SQLite 3.29.0 Running against SQLite 3.29.0
PostgreSQL Client Version 11.5 SpatiaLite Version 4.3.0
QWT Version 6.1.3 QScintilla2 Version 2.10.8
Compiled against PROJ 7.2.0 Running against PROJ Rel. 7.2.0, November 1st, 2020
OS Version Windows 10 (10.0) This copy of QGIS writes debugging output.

and I'm getting the same results.

And I also believe that #39058 is somehow related.

@elpaso
Copy link
Contributor

elpaso commented Sep 30, 2020

This always the same issue: quantile for rasters is calculated on an histogram, which uses a sample of values and not the whole raster and a limited number of bins.

Unless we re-implement it completely using an exact approach similar to the one used by vector layers and GRASS (and like I've done with unique/paletted) you will always get approximated results, that might lead to a wrong classification in some cases.

Differently from paletted/quantile, where the calculations were already implemented in a separate thread, for singleband/pseudocolr that is not the case, quantile requires a non-trivial implementation of a new class to gather the values and build the histogram.

I'll see what I can do but I cannot make any promise.

@PedroVenancio
Copy link
Contributor Author

I'll see what I can do but I cannot make any promise.

Great @elpaso , thank you very much for your effort!

@elpaso
Copy link
Contributor

elpaso commented Jan 19, 2021

@PedroVenancio can we close this?

@PedroVenancio
Copy link
Contributor Author

Hi @elpaso

Do you think this needs a complete overhaul of Singleband pseudocolor implementation, right? Yes, I think we can close it, at least until we can think on a way to support that re-implementation.

Do you agree @gioman ?

@gioman
Copy link
Contributor

gioman commented Jan 19, 2021

@PedroVenancio can we close this?

@elpaso @PedroVenancio I re-read this ticket and other related. To me there is still an issue (the one described at the end of this ticket), so I don't see why close it. I agree that raster symbology should be overhauled, but until then this do not seems fully solved. Just my opinion @elpaso feel free to close.

@elpaso
Copy link
Contributor

elpaso commented Jan 20, 2021

@PedroVenancio @gioman the way I see it is that in order to get exact results from a raster classification we cannot use samples but read the whole raster.

Due to the size that a typical raster can get we cannot do this in the GUI thread without freezing QGIS, this means to take a different approach and probably split the classification methods to add a list of "exact" (and potentially very slow) methods.

I actually have got an idea and I've started an implementation back in october that exposes a raster layer as a vector layer and allows to use all vector classification methods already implemented for vectors in QGIS for free.

But...

  1. this has to executed in a separate thread
  2. requires more tests and development
  3. the GUI needs to be updated to make place for these new methods and to allow for cancellation

so, the bottom line is that this is quite a big refactoring that would probably classify as a feature request.

I would be more than happy to work on this topic because I've already spent quite a lot of time researching and experimenting on this but I cannot do it without funds and it does not seem fair to me to use all the bugfixing funds for this development.

There is another potentially promising approach to speed up these calculations which is to use the GPU with OpenCL or OpenGL compute shaders but that is also quite a refactoring.

IMHO the route to go is a QEP followed by a grant, a crowdfunding, a generous donor or a combination of these.

@gioman
Copy link
Contributor

gioman commented Jan 20, 2021

IMHO the route to go is a QEP followed by a grant, a crowdfunding, a generous donor or a combination of these.

@elpaso I will be more than happy to chip in. And probably there are others interested. Ping me privately so we can get an idea of the amount that is necessary.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Either a bug report, or a bug fix. Let's hope for the latter! Rasters Related to general raster layer handling (not specific data formats) Symbology Related to vector layer symbology or renderers
Projects
None yet
3 participants