Skip to content

Latest commit

 

History

History
185 lines (126 loc) · 8.13 KB

README.md

File metadata and controls

185 lines (126 loc) · 8.13 KB

cj2pgsql

MIT badge   PyPI

cj2pgsql is a Python based importer of CityJSONL files to a PostgreSQL database. It requires PostGIS extension for geometry types.

Table of Contents


1. CLI usage


CLI instructions:

https://leoleonsio.github.io/cjdb/#cj2pgsql-cli-usage

Quickstart example

Sample CityJSON data can be downloaded from 3DBAG download service.

Then, having the CityJSON file, a combination of cjio (external CityJSON processing library) and cj2pgsql is needed to import it to a specified schema in a database.

  1. Convert CityJSON to CityJSONL
cjio --suppress_msg tile_901.json export jsonl stdout > tile_901.jsonl 
  1. Import CityJSONL to the database
PGPASSWORD=postgres cj2pgsql -H localhost -U postgres -d postgres -s cjdb -o tile_901.jsonl   

Alternatively steps 1 and 2 in a single command:

cjio --suppress_msg tile_901.json export jsonl stdout | cj2pgsql -H localhost -U postgres -d postgres -s cjdb -o

The metadata and the objects can then be found in the tables in the specified schema (cjdb in this example).

Password can be specified in the PGPASSWORD environment variable. If not specified, the app will prompt for the password.

2. cj2pgsql explanations


Model assumptions

The cj2pgsql importer loads the data in accordance with a specific data model, which is also shared with the cjdb_api.

Model documentation: model/README

Indexes

Some indexes are created by default (refer to model/README).

Additionally, the user can specify which CityObject attributes are to be indexed with the -x/--attr-index or -px/--partial-attr-index flag. The second option uses a partial index with a not null condition on the attribute. This saves disk space when indexing an attribute that is not present among all the imported CityObjects. This is often the case with CityJSON, because in a single dataset there can be different object types, with different attributes.

What is a City Model? How to organize CityJSON data from various sources?

The definition and scope of the City Model are for the user to decide. It is recommended to group together semantically coherent objects, by importing them to the same database schema.

While the static table structure (columns don't change) does support loading any type of CityJSON objects together, the data becomes harder to manage for the user. Example of this would be having different attributes for the same CityObject type (which should be consistent for data coming from the same source).

Types of input

The importer works only on CityJSONL files. Instructions on how to obtain such a file from a CityJSON file: https://cjio.readthedocs.io/en/latest/includeme.html#stdin-and-stdout

The importer supports 3 kinds of input:

  • a single CityJSONL file
  • a directory of CityJSONL files (all files with jsonl extensions are located and imported)
  • STDIN using the pipe operator:
cat file.jsonl | cj2pgsql ...

Coordinate Reference Systems

The cj2pgsql importer does not allow inconsistent CRS (coordinate reference systems) within the same database schema. For storing data in separate CRS using multiple schemas is required.

The data needs to be either harmonized beforehand, or the -I/--srid flag can be used upon import, to reproject all the geometries to the one specified CRS. Specifying a 2D CRS (instead of a 3D one) will cause the Z-coordinates to remain unchanged.

Note: reprojections slow down the import significantly.

Note: Source data with missing "metadata"/"referenceSystem" cannot be reprojected due to unknown source reference system.

3D reprojections

Pyproj is used for CRS reprojections. It supports 3D CRS transformations between different systems. However, sometimes downloading additional grids is required. The importer will attempt to download the grids needed for the reprojection, with the following message:

Attempting to download additional grids required for CRS transformation.
This can also be done manually, and the files should be put in this folder:
        {pyproj_directory}

If that fails, the user will have to download the required grids and put them in the printed {pyproj_directory} themselves.

CityJSON Extensions

If CityJSON Extensions were present in the imported file, they can be found listed in the extensions column in the import_meta table.

The CityJSON specifications mention 3 different extendable features, and the cj2pgsql importer deals with them as follows:

  1. Complex attributes

No action is taken. These attributes end up in the attributes JSONB column. Querying by complex attributes values is not supported in the cjdb_api as of v0.0.7a.

  1. Additional root properties

Additional root properties are placed in the extra properties JSONB column in the import_meta table.

  1. Additional CityObject type

Additional CityObject types are appended to the list of allowed CityJSON objects.

CityJSON GeometryTemplates

Geometry templates are resolved for each object geometry, so that the object in the table ends up with its real-world coordinates (instead of vertex references or relative template coordinates).

Data validation

The importer does not validate the structure of the file. It is assumed that the input file is schema-valid (CityJSON validator). It sends out warnings when:

  • there appear CityObject types defined neither in the main CityJSON specification nor any of the supplied extensions.
  • the specified target CRS does not have the Z-axis defined
  • the source dataset does not have a CRS defined at all

Repeated object IDs

By default, the importer does not check if an object with a given ID exists already in the database. This is because such an operation for every inserted object results in a performance penalty.

The user can choose to run the import with either the -e/--skip-existing option to skip existing objects or -u, --update-existing to update existing objects. This will slow down the import, but it will also ensure that repeated object cases are handled.

3. Running with local code

Create pipenv environment in repository root:

pipenv install

Run the importer:

PYTHONPATH=$PWD pipenv run python cj2pgsql/main.py --help

4. Running tests


Test cases for Pytest are generated based on the CityJSONL files in:

  • cj2pgsql/test/files

And the argument sets defined in the file:

  • cj2pgsql/test/inputs/arguments

Where each line is a separate argument set.

The tests are run for each combination of a file and argument set. To run them locally, the cj2pgsql/test/inputs/arguments file has to be modified.

Install pytest first.

pip3 install pytest

Then, in repository root:

pytest cj2pgsql -v

or, to see the importer output:

pytest cj2pgsql -s