Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"Internal Proj Error: [...] database disk image is malformed" when multiprocessing since pyproj 2.3 #426

Closed
TimoRoth opened this issue Sep 4, 2019 · 9 comments · Fixed by #412
Labels

Comments

@TimoRoth
Copy link

TimoRoth commented Sep 4, 2019

Code Sample, a copy-pastable example if possible

It's unfortunately not possible to produce a minimal example, this only happens in the full setup of our project, but is 100% reproducible there.
See for example: https://travis-ci.org/OGGM/OGGM-Anaconda/jobs/580670196#L1406

I tried triggering this by just calling the pyproj.Proj() invocation in a lot of parallel processes, but it was not impressed by that and worked fine.

Problem description

Ever since pyproj 2.3
pyproj.exceptions.CRSError: Invalid projection: +init=epsg:4326 +type=crs: (Internal Proj Error: proj_create: SQLite error on SELECT auth_name FROM authority_list: database disk image is malformed)
occurs when trying to do pyproj.Proj("+init=EPSG:4326", preserve_units=True) in our concurrent multiprocessing setup.

Turning off multiprocessing and running things sequentially works around the issue.

Downgrading pyproj to <2.3 also fixes it. Mind that I did not downgrade the underlying proj4 binary library, so purely downgrade pyproj is enough to stop this from happening.

Environment Information

System:
    python: 3.7.3 | packaged by conda-forge | (default, Jul  1 2019, 21:52:21)  [GCC 7.3.0]
executable: /home/users/timo/miniconda3/envs/projtest_env/bin/python
   machine: Linux-4.19.64-gentoo-x86_64-Intel-R-_Xeon-R-_CPU_E5-2623_v4_@_2.60GHz-with-gentoo-2.6

PROJ:
      PROJ: 6.1.1
  data dir: /home/users/timo/miniconda3/envs/projtest_env/share/proj

Python deps:
    pyproj: 2.3.1
       pip: 19.2.3
setuptools: 41.2.0
    Cython: 0.29.13

Installation method

  • conda

Conda environment information (if you installed with conda):


Environment (conda list):
$ conda list | grep -E "proj|aenum"
proj4                     6.1.1                hc80f0dc_1    conda-forge
pyproj                    2.3.1            py37h2fd02e8_0    conda-forge

Details about conda and system ( conda info ):
$ conda info
     active environment : projtest_env
    active env location : /home/users/timo/miniconda3/envs/projtest_env
            shell level : 1
       user config file : /home/users/timo/.condarc
 populated config files : /home/users/timo/.condarc
          conda version : 4.7.11
    conda-build version : 3.18.9
         python version : 3.7.4.final.0
       virtual packages :
       base environment : /home/users/timo/miniconda3  (writable)
           channel URLs : https://repo.anaconda.com/pkgs/main/linux-64
                          https://repo.anaconda.com/pkgs/main/noarch
                          https://repo.anaconda.com/pkgs/r/linux-64
                          https://repo.anaconda.com/pkgs/r/noarch
          package cache : /home/users/timo/miniconda3/pkgs
                          /home/users/timo/.conda/pkgs
       envs directories : /home/users/timo/miniconda3/envs
                          /home/users/timo/.conda/envs
               platform : linux-64
             user-agent : conda/4.7.11 requests/2.22.0 CPython/3.7.4 Linux/4.19.64-gentoo gentoo/2.6 glibc/2.29
                UID:GID : 10000:10000
             netrc file : None
           offline mode : False
@TimoRoth TimoRoth added the bug label Sep 4, 2019
@snowman2
Copy link
Member

snowman2 commented Sep 4, 2019

If you don't mind trying PROJ 6.2, you could give this a go: #412

@snowman2
Copy link
Member

snowman2 commented Sep 4, 2019

Also might be interested in #386

@TimoRoth
Copy link
Author

TimoRoth commented Sep 4, 2019 via email

@snowman2
Copy link
Member

snowman2 commented Sep 4, 2019

Side note, on conda-forge is is renamed to proj.

@coroa
Copy link

coroa commented Jan 21, 2020

Even with Proj 6.2+, we are encountering this bug; or a very similar one, where using Proj from multiple processes seems to corrupt the database file. Here:

ERROR 1: PROJ: proj_create_operations: SQLite error on SELECT v1.table_name as table1, v1.auth_name AS auth_name1, v1.code AS code1, v1.accuracy AS accuracy1, v2.table_name as table2, v2.auth_name AS auth_name2, v2.code AS code2, v2.accuracy as accuracy2, a1.south_lat AS south_lat1, a1.west_lon AS west_lon1, a1.north_lat AS north_lat1, a1.east_lon AS east_lon1, a2.south_lat AS south_lat2, a2.west_lon AS west_lon2, a2.north_lat AS north_lat2, a2.east_lon AS east_lon2, ss1.replacement_auth_name AS replacement_auth_name1, ss1.replacement_code AS replacement_code1, ss2.replacement_auth_name AS replacement_auth_name2, ss2.replacement_code AS replacement_code2 FROM coordinate_operation_view v1 JOIN coordinate_operation_view v2 ON v1.target_crs_auth_name = v2.source_crs_auth_name AND v1.target_crs_code = v2.source_crs_code LEFT JOIN supersession ss1 ON ss1.superseded_table_name = v1.table_name AND ss1.superseded_auth_name = v1.auth_name AND ss1.superseded_code = v1.code AND ss1.superseded_table_name = ss1.replacement_table_name LEFT JOIN supersession ss2 ON ss2.superseded_table_name = v2.table_name AND ss2.superseded_auth_name = v2.auth_name AND ss2.superseded_code = v2.code AND ss2.superseded_table_name = ss2.replacement_table_name JOIN area a1 ON v1.area_of_use_auth_name = a1.auth_name AND v1.area_of_use_code = a1.code JOIN area a2 ON v2.area_of_use_auth_name = a2.auth_name AND v2.area_of_use_code = a2.code WHERE v1.source_crs_auth_name = ? AND v1.source_crs_code = ? AND v2.target_crs_auth_name = ? AND v2.target_crs_code = ? AND v1.deprecated = 0 AND v2.deprecated = 0 AND intersects_bbox(south_lat1, west_lon1, north_lat1, east_lon1, south_lat2, west_lon2, north_lat2, east_lon2) == 1 AND v1.auth_name = ? AND v2.auth_name = ? ORDER BY (CASE WHEN accuracy1 is NULL THEN 1 ELSE 0 END) + (CASE WHEN accuracy2 is NULL THEN 1 ELSE 0 END), accuracy1 + accuracy2: database disk image is malformed

ERROR 1: PROJ: proj_create_operations: SQLite error on SELECT v1.table_name as table1, v1.auth_name AS auth_name1, v1.code AS code1, v1.accuracy AS accuracy1, v2.table_name as table2, v2.auth_name AS auth_name2, v2.code AS code2, v2.accuracy as accuracy2, a1.south_lat AS south_lat1, a1.west_lon AS west_lon1, a1.north_lat AS north_lat1, a1.east_lon AS east_lon1, a2.south_lat AS south_lat2, a2.west_lon AS west_lon2, a2.north_lat AS north_lat2, a2.east_lon AS east_lon2, ss1.replacement_auth_name AS replacement_auth_name1, ss1.replacement_code AS replacement_code1, ss2.replacement_auth_name AS replacement_auth_name2, ss2.replacement_code AS replacement_code2 FROM coordinate_operation_view v1 JOIN coordinate_operation_view v2 ON v1.target_crs_auth_name = v2.source_crs_auth_name AND v1.target_crs_code = v2.source_crs_code LEFT JOIN supersession ss1 ON ss1.superseded_table_name = v1.table_name AND ss1.superseded_auth_name = v1.auth_name AND ss1.superseded_code = v1.code AND ss1.superseded_table_name = ss1.replacement_table_name LEFT JOIN supersession ss2 ON ss2.superseded_table_name = v2.table_name AND ss2.superseded_auth_name = v2.auth_name AND ss2.superseded_code = v2.code AND ss2.superseded_table_name = ss2.replacement_table_name JOIN area a1 ON v1.area_of_use_auth_name = a1.auth_name AND v1.area_of_use_code = a1.code JOIN area a2 ON v2.area_of_use_auth_name = a2.auth_name AND v2.area_of_use_code = a2.code WHERE v1.source_crs_auth_name = ? AND v1.source_crs_code = ? AND v2.target_crs_auth_name = ? AND v2.target_crs_code = ? AND v1.deprecated = 0 AND v2.deprecated = 0 AND intersects_bbox(south_lat1, west_lon1, north_lat1, east_lon1, south_lat2, west_lon2, north_lat2, east_lon2) == 1 AND v1.auth_name = ? AND v2.auth_name = ? ORDER BY (CASE WHEN accuracy1 is NULL THEN 1 ELSE 0 END) + (CASE WHEN accuracy2 is NULL THEN 1 ELSE 0 END), accuracy1 + accuracy2: database disk image is malformed

ERROR 6: Cannot find coordinate operations from `EPSG:4326' to `EPSG:3035'

Could not transform between the following SRS:

SOURCE:

GEOGCS["WGS 84",DATUM["WGS_1984",SPHEROID["WGS 84",6378137,298.257223563,AUTHORITY["EPSG","7030"]],AUTHORITY["EPSG","6326"]],PRIMEM["Greenwich",0,AUTHORITY["EPSG","8901"]],UNIT["degree",0.0174532925199433,AUTHORITY["EPSG","9122"]],AXIS["Latitude",NORTH],AXIS["Longitude",EAST],AUTHORITY["EPSG","4326"]]

TARGET:

PROJCS["ETRS89-extended / LAEA Europe",GEOGCS["ETRS89",DATUM["European_Terrestrial_Reference_System_1989",SPHEROID["GRS 1980",6378137,298.257222101,AUTHORITY["EPSG","7019"]],AUTHORITY["EPSG","6258"]],PRIMEM["Greenwich",0,AUTHORITY["EPSG","8901"]],UNIT["degree",0.0174532925199433,AUTHORITY["EPSG","9122"]],AUTHORITY["EPSG","4258"]],PROJECTION["Lambert_Azimuthal_Equal_Area"],PARAMETER["latitude_of_center",52],PARAMETER["longitude_of_center",10],PARAMETER["false_easting",4321000],PARAMETER["false_northing",3210000],UNIT["metre",1,AUTHORITY["EPSG","9001"]],AXIS["Northing",NORTH],AXIS["Easting",EAST],AUTHORITY["EPSG","3035"]]

as one can observe for instance in https://travis-ci.org/PyPSA/pypsa-eur/jobs/639710706?utm_medium=notification&utm_source=email .

We are not calling pyproj ourselves, but call into osgeo.gdal.Warp, which seems to rely on the proj library; which leads me to suspect I am wrong to report this here. :/

Thanks for any help, though

@snowman2
Copy link
Member

snowman2 commented Jan 21, 2020

We are not calling pyproj ourselves, but call into osgeo.gdal.Warp, which seems to rely on the proj library; which leads me to suspect I am wrong to report this here. :/

@coroa, you are correct. osgeo.gdal.Warp does not use pyproj. I would recommend opening an issue here: https://github.com/osgeo/gdal

@coroa
Copy link

coroa commented Jan 21, 2020

Thanks for the fast answer! I managed to find a solution in the meantime.

For future reference for other people finding this issue based on the "database disk image is malformed" error in conjunction with multiprocessing (and maybe also for @TimoRoth).

In my case, making sure that the module imports of gdal happened after forking to multiple processes fixed the issue. The import of gdal does create a gdal context, which -- I suspect -- also contains an sqlite database handle to the proj.db database, which gets corrupted by multiple processes writing to it. Another working alternative is to use multiprocessing.set_start_method('spawn') to not use forking, at all.

@fmaussion
Copy link

Doesn't this needs reporting to gdal? The set_start_method('spawn') method is creating other sort of problems, and it seems that multiprocessing is something that should work?

@TimoRoth
Copy link
Author

TimoRoth commented Feb 3, 2020

After some further analysis, this is caused by gdal using the proj C API itself, to create non-autoclosing proj contexts.
Changing this in gdal seems very daunting, and until the fix reaches any production system via distros would also take forever.
So I'm not sure what the correct curse of action here is.

In the long run, a way to globally control the proj behaviour would be ideal. An env var that forces it to always autoclose the db, or at least change the default from false to true.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants