Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Serious problem with rasters statistics calculated by QGIS with estimated option #27345

Closed
qgib opened this issue Jul 31, 2018 · 7 comments
Closed
Labels
Bug Either a bug report, or a bug fix. Let's hope for the latter! Crash/Data Corruption High Priority Rasters Related to general raster layer handling (not specific data formats)

Comments

@qgib
Copy link
Contributor

qgib commented Jul 31, 2018

Author Name: Pedro Venâncio (Pedro Venâncio)
Original Redmine Issue: 19517
Affected QGIS version: 3.3(master)
Redmine category:rasters
Assignee: Matthias Kuhn


I think the problem described in these tickets can be more serious than just a visualization question: #20181, #22806 and #22789

Please try the following workflow.

  1. Download this sample raster:
    https://cld.pt/dl/download/de81b6ce-38a8-4539-be16-24a3e3ef72d1/perigosidade_int.tif

  2. Run gdalinfo on it and see the output:

gdalinfo perigosidade_int.tif

Driver: GTiff/GeoTIFF
Files: perigosidade_int.tif
Size is 1039, 1701
Coordinate System is:
PROJCS["ETRS89 / Portugal TM06",
(...)
Origin = (73626.458811498901923,145193.972505581419682)
Pixel Size = (25.002694985400002,-25.002694985400002)
Metadata:
  AREA_OR_POINT=Area
Image Structure Metadata:
  INTERLEAVE=BAND
Corner Coordinates:
Upper Left  (   73626.459,  145193.973) (  7d15'30.17"W, 40d58'21.05"N)
Lower Left  (   73626.459,  102664.388) (  7d15'48.20"W, 40d35'22.48"N)
Upper Right (   99604.259,  145193.973) (  6d56'59.28"W, 40d58'11.13"N)
Lower Right (   99604.259,  102664.388) (  6d57'23.68"W, 40d35'12.70"N)
Center      (   86615.359,  123929.180) (  7d 6'25.36"W, 40d46'47.22"N)
Band 1 Block=1039x3 Type=Int16, ColorInterp=Gray
  NoData Value=32767

  1. Then run gdalinfo with stats option, and the output is:
gdalinfo -stats perigosidade_int.tif

Driver: GTiff/GeoTIFF
Files: perigosidade_int.tif
Size is 1039, 1701
Coordinate System is:
PROJCS["ETRS89 / Portugal TM06",
(...)
Origin = (73626.458811498901923,145193.972505581419682)
Pixel Size = (25.002694985400002,-25.002694985400002)
Metadata:
  AREA_OR_POINT=Area
Image Structure Metadata:
  INTERLEAVE=BAND
Corner Coordinates:
Upper Left  (   73626.459,  145193.973) (  7d15'30.17"W, 40d58'21.05"N)
Lower Left  (   73626.459,  102664.388) (  7d15'48.20"W, 40d35'22.48"N)
Upper Right (   99604.259,  145193.973) (  6d56'59.28"W, 40d58'11.13"N)
Lower Right (   99604.259,  102664.388) (  6d57'23.68"W, 40d35'12.70"N)
Center      (   86615.359,  123929.180) (  7d 6'25.36"W, 40d46'47.22"N)
Band 1 Block=1039x3 Type=Int16, ColorInterp=Gray
  Minimum=2.000, Maximum=835.000, Mean=37.025, StdDev=71.526
  NoData Value=32767
  Metadata:
    STATISTICS_MAXIMUM=835
    STATISTICS_MEAN=37.024768992184
    STATISTICS_MINIMUM=2
    STATISTICS_STDDEV=71.52569727799

  1. gdalinfo also creates a XML file with statistics info in the file folder:
<PAMDataset>
  <PAMRasterBand band="1">
    <Metadata>
      <MDI key="STATISTICS_MAXIMUM">835</MDI>
      <MDI key="STATISTICS_MEAN">37.024768992184</MDI>
      <MDI key="STATISTICS_MINIMUM">2</MDI>
      <MDI key="STATISTICS_STDDEV">71.52569727799</MDI>
    </Metadata>
  </PAMRasterBand>
</PAMDataset>

  1. After this, delete the XML file created by gdalinfo.

  2. Load the perigosidade_int.tif raster file in QGIS.

  3. You don't need to do anything inside QGIS, just remove the raster from QGIS TOC.

  4. If you go to the folder where the raster is stored, QGIS created a new XML file with:

<PAMDataset>
  <PAMRasterBand band="1">
    <Histograms>
      <HistItem>
        <HistMin>1.500685871056241</HistMin>
        <HistMax>730.4993141289438</HistMax>
        <BucketCount>729</BucketCount>
        <IncludeOutOfRange>0</IncludeOutOfRange>
        <Approximate>1</Approximate>
        <HistCounts>(...)</HistCounts>
      </HistItem>
    </Histograms>
    <Metadata>
      <MDI key="STATISTICS_MAXIMUM">730</MDI>
      <MDI key="STATISTICS_MEAN">34.375557670573</MDI>
      <MDI key="STATISTICS_MINIMUM">2</MDI>
      <MDI key="STATISTICS_STDDEV">67.853826109685</MDI>
    </Metadata>
  </PAMRasterBand>
</PAMDataset>

  1. So, QGIS marks the STATISTICS_MAXIMUM as 730.

  2. Even defining in Settings -> Options -> Rendering -> Rasters -> Limits (minimum/maximum): Minimum/Maximum

QGIS always uses 730, because the Accuracy is, by default, "Estimate (faster)", and this option is not configurable at Options, as said in the related tickets.

But what is really serious here, is that as soon as the XML file is created, Processing tools use these statistics for the calculations.

  1. For instance, if run r.quantile with "generate recode rules" flag, the output is:
2.000000:6.000000:1
6.000000:8.000000:2
8.000000:12.000000:3
12.000000:20.000000:4
20.000000:730.000000:5

  1. Classifying the raster based on these rules with r.recode leaves pixels with value 835 as NULL.

And every subsequent process gives errors.

The question is.. how I never had realized this problem? The fact is that QGIS only generates the XML file when the raster layer is removed from QGIS project. And so, if Processing modules are runned using the layers loaded in TOC, there are no problems.

If these modules are runned from the python console (processing.runalg / processing.run), from a plugin or using raster layers that are not loaded in QGIS, the problem show up.

This was tested in QGIS 2.18.22 and 3.2.1 (OSGeo4W64).

It seems a really serious problem to me.

@qgib
Copy link
Contributor Author

qgib commented Aug 5, 2018

Author Name: Giovanni Manghi (@gioman)


I'm tagging this as "corrupts" data, because wrong results are never acceptable.


  • crashes_corrupts_data was changed from 0 to 1

@qgib
Copy link
Contributor Author

qgib commented Aug 10, 2018

Author Name: Jürgen Fischer (@jef-n)


  • description was changed from I think the problem described in these tickets can be more serious than just a visualization question:

https://issues.qgis.org/issues/11974
https://issues.qgis.org/issues/14853
https://issues.qgis.org/issues/14835

Please try the following workflow.

  1. Download this sample raster:
    https://cld.pt/dl/download/de81b6ce-38a8-4539-be16-24a3e3ef72d1/perigosidade_int.tif

  2. Run gdalinfo on it and see the output:

gdalinfo perigosidade_int.tif

Driver: GTiff/GeoTIFF
Files: perigosidade_int.tif
Size is 1039, 1701
Coordinate System is:
PROJCS["ETRS89 / Portugal TM06",
(...)
Origin = (73626.458811498901923,145193.972505581419682)
Pixel Size = (25.002694985400002,-25.002694985400002)
Metadata:
  AREA_OR_POINT=Area
Image Structure Metadata:
  INTERLEAVE=BAND
Corner Coordinates:
Upper Left  (   73626.459,  145193.973) (  7d15'30.17"W, 40d58'21.05"N)
Lower Left  (   73626.459,  102664.388) (  7d15'48.20"W, 40d35'22.48"N)
Upper Right (   99604.259,  145193.973) (  6d56'59.28"W, 40d58'11.13"N)
Lower Right (   99604.259,  102664.388) (  6d57'23.68"W, 40d35'12.70"N)
Center      (   86615.359,  123929.180) (  7d 6'25.36"W, 40d46'47.22"N)
Band 1 Block=1039x3 Type=Int16, ColorInterp=Gray
  NoData Value=32767
  1. Then run gdalinfo with stats option, and the output is:
gdalinfo -stats perigosidade_int.tif

Driver: GTiff/GeoTIFF
Files: perigosidade_int.tif
Size is 1039, 1701
Coordinate System is:
PROJCS["ETRS89 / Portugal TM06",
(...)
Origin = (73626.458811498901923,145193.972505581419682)
Pixel Size = (25.002694985400002,-25.002694985400002)
Metadata:
  AREA_OR_POINT=Area
Image Structure Metadata:
  INTERLEAVE=BAND
Corner Coordinates:
Upper Left  (   73626.459,  145193.973) (  7d15'30.17"W, 40d58'21.05"N)
Lower Left  (   73626.459,  102664.388) (  7d15'48.20"W, 40d35'22.48"N)
Upper Right (   99604.259,  145193.973) (  6d56'59.28"W, 40d58'11.13"N)
Lower Right (   99604.259,  102664.388) (  6d57'23.68"W, 40d35'12.70"N)
Center      (   86615.359,  123929.180) (  7d 6'25.36"W, 40d46'47.22"N)
Band 1 Block=1039x3 Type=Int16, ColorInterp=Gray
  Minimum=2.000, Maximum=835.000, Mean=37.025, StdDev=71.526
  NoData Value=32767
  Metadata:
    STATISTICS_MAXIMUM=835
    STATISTICS_MEAN=37.024768992184
    STATISTICS_MINIMUM=2
    STATISTICS_STDDEV=71.52569727799
  1. gdalinfo also creates a XML file with statistics info in the file folder:
  
    
      835
      37.024768992184
      2
      71.52569727799
    
  

  1. After this, delete the XML file created by gdalinfo.

  2. Load the perigosidade_int.tif raster file in QGIS.

  3. You don't need to do anything inside QGIS, just remove the raster from QGIS TOC.

  4. If you go to the folder where the raster is stored, QGIS created a new XML file with:

  
    
      
        1.500685871056241
        730.4993141289438
        729
        0
        1
        (...)
      
    
    
      730
      34.375557670573
      2
      67.853826109685
    
  

  1. So, QGIS marks the STATISTICS_MAXIMUM as 730.

  2. Even defining in Settings -> Options -> Rendering -> Rasters -> Limits (minimum/maximum): Minimum/Maximum

QGIS always uses 730, because the Accuracy is, by default, "Estimate (faster)", and this option is not configurable at Options, as said in the related tickets.

But what is really serious here, is that as soon as the XML file is created, Processing tools use these statistics for the calculations.

  1. For instance, if run r.quantile with "generate recode rules" flag, the output is:
2.000000:6.000000:1
6.000000:8.000000:2
8.000000:12.000000:3
12.000000:20.000000:4
20.000000:730.000000:5
  1. Classifying the raster based on these rules with r.recode leaves pixels with value 835 as NULL.

And every subsequent process gives errors.

The question is.. how I never had realized this problem? The fact is that QGIS only generates the XML file when the raster layer is removed from QGIS project. And so, if Processing modules are runned using the layers loaded in TOC, there are no problems.

If these modules are runned from the python console (processing.runalg / processing.run), from a plugin or using raster layers that are not loaded in QGIS, the problem show up.

This was tested in QGIS 2.18.22 and 3.2.1 (OSGeo4W64).

It seems a really serious problem to me. to I think the problem described in these tickets can be more serious than just a visualization question: #20181, #22806 and #22789

Please try the following workflow.

  1. Download this sample raster:
    https://cld.pt/dl/download/de81b6ce-38a8-4539-be16-24a3e3ef72d1/perigosidade_int.tif

  2. Run gdalinfo on it and see the output:

gdalinfo perigosidade_int.tif

Driver: GTiff/GeoTIFF
Files: perigosidade_int.tif
Size is 1039, 1701
Coordinate System is:
PROJCS["ETRS89 / Portugal TM06",
(...)
Origin = (73626.458811498901923,145193.972505581419682)
Pixel Size = (25.002694985400002,-25.002694985400002)
Metadata:
  AREA_OR_POINT=Area
Image Structure Metadata:
  INTERLEAVE=BAND
Corner Coordinates:
Upper Left  (   73626.459,  145193.973) (  7d15'30.17"W, 40d58'21.05"N)
Lower Left  (   73626.459,  102664.388) (  7d15'48.20"W, 40d35'22.48"N)
Upper Right (   99604.259,  145193.973) (  6d56'59.28"W, 40d58'11.13"N)
Lower Right (   99604.259,  102664.388) (  6d57'23.68"W, 40d35'12.70"N)
Center      (   86615.359,  123929.180) (  7d 6'25.36"W, 40d46'47.22"N)
Band 1 Block=1039x3 Type=Int16, ColorInterp=Gray
  NoData Value=32767
  1. Then run gdalinfo with stats option, and the output is:
gdalinfo -stats perigosidade_int.tif

Driver: GTiff/GeoTIFF
Files: perigosidade_int.tif
Size is 1039, 1701
Coordinate System is:
PROJCS["ETRS89 / Portugal TM06",
(...)
Origin = (73626.458811498901923,145193.972505581419682)
Pixel Size = (25.002694985400002,-25.002694985400002)
Metadata:
  AREA_OR_POINT=Area
Image Structure Metadata:
  INTERLEAVE=BAND
Corner Coordinates:
Upper Left  (   73626.459,  145193.973) (  7d15'30.17"W, 40d58'21.05"N)
Lower Left  (   73626.459,  102664.388) (  7d15'48.20"W, 40d35'22.48"N)
Upper Right (   99604.259,  145193.973) (  6d56'59.28"W, 40d58'11.13"N)
Lower Right (   99604.259,  102664.388) (  6d57'23.68"W, 40d35'12.70"N)
Center      (   86615.359,  123929.180) (  7d 6'25.36"W, 40d46'47.22"N)
Band 1 Block=1039x3 Type=Int16, ColorInterp=Gray
  Minimum=2.000, Maximum=835.000, Mean=37.025, StdDev=71.526
  NoData Value=32767
  Metadata:
    STATISTICS_MAXIMUM=835
    STATISTICS_MEAN=37.024768992184
    STATISTICS_MINIMUM=2
    STATISTICS_STDDEV=71.52569727799
  1. gdalinfo also creates a XML file with statistics info in the file folder:
  
    
      835
      37.024768992184
      2
      71.52569727799
    
  

  1. After this, delete the XML file created by gdalinfo.

  2. Load the perigosidade_int.tif raster file in QGIS.

  3. You don't need to do anything inside QGIS, just remove the raster from QGIS TOC.

  4. If you go to the folder where the raster is stored, QGIS created a new XML file with:

  
    
      
        1.500685871056241
        730.4993141289438
        729
        0
        1
        (...)
      
    
    
      730
      34.375557670573
      2
      67.853826109685
    
  

  1. So, QGIS marks the STATISTICS_MAXIMUM as 730.

  2. Even defining in Settings -> Options -> Rendering -> Rasters -> Limits (minimum/maximum): Minimum/Maximum

QGIS always uses 730, because the Accuracy is, by default, "Estimate (faster)", and this option is not configurable at Options, as said in the related tickets.

But what is really serious here, is that as soon as the XML file is created, Processing tools use these statistics for the calculations.

  1. For instance, if run r.quantile with "generate recode rules" flag, the output is:
2.000000:6.000000:1
6.000000:8.000000:2
8.000000:12.000000:3
12.000000:20.000000:4
20.000000:730.000000:5
  1. Classifying the raster based on these rules with r.recode leaves pixels with value 835 as NULL.

And every subsequent process gives errors.

The question is.. how I never had realized this problem? The fact is that QGIS only generates the XML file when the raster layer is removed from QGIS project. And so, if Processing modules are runned using the layers loaded in TOC, there are no problems.

If these modules are runned from the python console (processing.runalg / processing.run), from a plugin or using raster layers that are not loaded in QGIS, the problem show up.

This was tested in QGIS 2.18.22 and 3.2.1 (OSGeo4W64).

It seems a really serious problem to me.

@qgib
Copy link
Contributor Author

qgib commented Sep 14, 2018

Author Name: Pedro Venâncio (Pedro Venâncio)


This seems a really tricky issue and not trivial to solve.

But maybe in the meanwhile, it could be better to change the default Accuracy option to "Actual (slower)". It can slower QGIS a litle bit with huge rasters, but prevent issues with geoprocessing tools and wrong results.

@qgib
Copy link
Contributor Author

qgib commented Oct 14, 2018

Author Name: Giovanni Manghi (@gioman)


  • version was changed from 2.18.21 to 3.2.1

@qgib
Copy link
Contributor Author

qgib commented Oct 18, 2018

Author Name: Matthias Kuhn (@m-kuhn)


It looks like GDAL writes this file when GDALClose is called.
I am considering to delete it right after if two conditions are met: the file was not there before and the statistics are estimated.

A real fix would probably involve GDAL (I guess it shouldn't permanently write estimated data at all?) but this should at least help for a bandaid fix.


  • assigned_to_id was configured as Matthias Kuhn
  • version was changed from 3.2.1 to 3.3(master)

@qgib
Copy link
Contributor Author

qgib commented Oct 18, 2018

Author Name: Matthias Kuhn (@m-kuhn)


Pull request pending #8230

@qgib
Copy link
Contributor Author

qgib commented Oct 25, 2018

Author Name: Anónimo (Anónimo)


Applied in changeset 70d4a27.


  • done_ratio was changed from 0 to 100
  • status_id was changed from Open to Closed

@qgib qgib closed this as completed Oct 25, 2018
@qgib qgib added Bug Either a bug report, or a bug fix. Let's hope for the latter! High Priority Rasters Related to general raster layer handling (not specific data formats) Crash/Data Corruption labels May 25, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Either a bug report, or a bug fix. Let's hope for the latter! Crash/Data Corruption High Priority Rasters Related to general raster layer handling (not specific data formats)
Projects
None yet
Development

No branches or pull requests

1 participant