diff --git a/.github/workflows/deploy-book.yml b/.github/workflows/deploy-book.yml new file mode 100644 index 0000000..2831931 --- /dev/null +++ b/.github/workflows/deploy-book.yml @@ -0,0 +1,36 @@ +# Based on https://github.com/rust-lang/mdBook/wiki/Automated-Deployment%3A-GitHub-Actions +name: Deploy mdbook +on: + push: + branches: + - doc-book + +jobs: + deploy: + runs-on: ubuntu-latest + steps: + - uses: actions/checkout@v3 + with: + fetch-depth: 0 + - name: Install mdbook + run: | + mkdir mdbook + curl -sSL https://github.com/rust-lang/mdBook/releases/download/v0.4.14/mdbook-v0.4.14-x86_64-unknown-linux-gnu.tar.gz | tar -xz --directory=./mdbook + echo `pwd`/mdbook >> $GITHUB_PATH + - name: Deploy GitHub Pages + run: | + # This assumes your book is in the root of your repository. + # Just add a `cd` here if you need to change to another directory. + cd docs + mdbook build + git worktree add gh-pages + git config user.name "Deploy from CI" + git config user.email "" + cd gh-pages + # Delete the ref to avoid keeping history. + git update-ref -d refs/heads/gh-pages + rm -rf * + mv ../book/* . + git add . + git commit -m "Deploy $GITHUB_SHA to gh-pages" + git push --force --set-upstream origin gh-pages diff --git a/.gitignore b/.gitignore index b7e29d9..8a2616c 100644 --- a/.gitignore +++ b/.gitignore @@ -2,4 +2,5 @@ tests/tmp/* output/ **/.coverage **/__pycache__ -pgosm-data/* \ No newline at end of file +pgosm-data/* +docs/book/* diff --git a/README.md b/README.md index 4c85021..4313e16 100644 --- a/README.md +++ b/README.md @@ -1,210 +1,12 @@ # PgOSM Flex PgOSM Flex provides high quality OpenStreetMap datasets in PostGIS using the -[osm2pgsql Flex output](https://osm2pgsql.org/doc/manual.html#the-flex-output). -This project provides a curated set of Lua and SQL scripts to clean and organize -the most commonly used OpenStreetMap data, such as roads, buildings, and points of interest (POIs). +osm2pgsql Flex output. +See [https://pgosm-flex.com/](https://pgosm-flex.com/) for the main project +documentation. -The recommended way to use PgOSM Flex is via the PgOSM Docker image -[hosted on Docker Hub](https://hub.docker.com/repository/docker/rustprooflabs/pgosm-flex). -Basic usage instructions are included in this README.md file, full Docker -usage instructions are available in [docs/DOCKER-RUN.md](docs/DOCKER-RUN.md). -## Project decisions - -A few decisions made in this project: - -* ID column is `osm_id` -* Geometry stored in SRID 3857 (customizable) -* Geometry column named `geom` -* Defaults to same units as OpenStreetMap (e.g. km/hr, meters) -* Data not included in a dedicated column goes into the `osm.tags` table's `JSONB` column -* Points, Lines, and Polygons are not mixed in a single table -* Tracks latest Postgres, PostGIS, and osm2pgsql versions - -This project's approach is to do as much processing in the Lua styles -passed along to osm2pgsql, with post-processing steps creating indexes, -constraints and comments. - -## Quick start - -See the [Docker Usage](#docker-usage) section below for an explanation of -these commands. - -```bash -mkdir ~/pgosm-data -export POSTGRES_USER=postgres -export POSTGRES_PASSWORD=mysecretpassword - -docker run --name pgosm -d --rm \ - -v ~/pgosm-data:/app/output \ - -v /etc/localtime:/etc/localtime:ro \ - -e POSTGRES_PASSWORD=$POSTGRES_PASSWORD \ - -p 5433:5432 -d rustprooflabs/pgosm-flex - -docker exec -it \ - pgosm python3 docker/pgosm_flex.py \ - --ram=8 \ - --region=north-america/us \ - --subregion=district-of-columbia -``` - - -## Versions Supported - -Minimum versions supported: - -* Postgres 12 -* PostGIS 3.0 -* osm2pgsql 1.8.0 - -Defining [Postgres indexes in the Lua styles](https://osm2pgsql.org/doc/manual.html#defining-indexes) -bumps osm2pgsql minimum requirement to 1.8.0. - - -## Minimum Hardware - -### RAM - -osm2pgsql requires [at least 2 GB RAM](https://osm2pgsql.org/doc/manual.html#main-memory). - -### Storage - -Fast SSD drives are strongly recommended. It should work on slower storage devices (HDD, -SD, etc), -however the [osm2pgsql-tuner](https://github.com/rustprooflabs/osm2pgsql-tuner) -package used to determine the best osm2pgsql command assumes fast SSDs. - - -## PgOSM via Docker - -The PgOSM Flex -[Docker image](https://hub.docker.com/r/rustprooflabs/pgosm-flex) -is hosted on Docker Hub. -The image includes all the pre-requisite software and handles all of the options, -logic, an post-processing steps required. Features include: - -* Automatic data download from Geofabrik and validation against checksum -* Custom Flex layers built in Lua -* Mix and match layers using Layersets -* Loads to Docker-internal Postgres, or externally defined Postgres -* Supports `osm2pgsql-replication` and `osm2pgsql --append` mode -* Export processed data via `pg_dump` for loading into additional databases - - -### Docker usage - -This section outlines a typical import using Docker to run PgOSM Flex. -See the full Docker instructions in [docs/DOCKER-RUN.md](docs/DOCKER-RUN.md). - -Create directory for the `.osm.pbf` file, output `.sql` file, log output, and -the osm2pgsql command ran. - - -```bash -mkdir ~/pgosm-data -``` - -Set environment variables for the temporary Postgres connection in Docker. -These are required for the Docker container to run. - - -```bash -export POSTGRES_USER=postgres -export POSTGRES_PASSWORD=mysecretpassword -``` - -Start the `pgosm` Docker container. At this point, Postgres / PostGIS -is available on port `5433`. - -```bash -docker run --name pgosm -d --rm \ - -v ~/pgosm-data:/app/output \ - -v /etc/localtime:/etc/localtime:ro \ - -e POSTGRES_PASSWORD=$POSTGRES_PASSWORD \ - -p 5433:5432 -d rustprooflabs/pgosm-flex -``` - -Use `docker exec` to run the processing for the Washington D.C subregion. -This example uses three (3) parameters to specify the total system RAM (8 GB) -along with a region/subregion. - -* Total RAM for osm2pgsql, Postgres and OS (`8`) -* Region (`north-america/us`) -* Sub-region (`district-of-columbia`) (Optional) - - -```bash -docker exec -it \ - pgosm python3 docker/pgosm_flex.py \ - --ram=8 \ - --region=north-america/us \ - --subregion=district-of-columbia -``` - - -The above command takes roughly 1 minute to run if the PBF for today -has already been downloaded. -If the PBF is not downloaded it will depend on how long -it takes to download the 17 MB PBF file + ~ 1 minute processing. - - -### After processing - -The processed OpenStreetMap data is also available in the Docker container on port `5433`. -You can connect and query directly in the Docker container. - -```bash -psql -h localhost -p 5433 -d pgosm -U postgres -c "SELECT COUNT(*) FROM osm.road_line;" - -┌───────┐ -│ count │ -╞═══════╡ -│ 39865 │ -└───────┘ -``` - - -The `~/pgosm-data` directory has two (2) files from a typical single run. -The PBF file and its MD5 checksum have been renamed with the date in the filename. -This enables loading the file downloaded today -again in the future, either with the same version of PgOSM Flex or the latest version. The `docker exec` command uses the `PGOSM_DATE` environment variable -to load these historic files. - - -If `--pg-dump` option is used the output `.sql` is also saved in -the `~/pgosm-data` directory. -This `.sql` file can be loaded into any other database with PostGIS and the proper -permissions. - - -```bash -ls -alh ~/pgosm-data/ - --rw-r--r-- 1 root root 18M Jan 21 03:45 district-of-columbia-2023-01-21.osm.pbf --rw-r--r-- 1 root root 70 Jan 21 04:39 district-of-columbia-2023-01-21.osm.pbf.md5 --rw-r--r-- 1 root root 163M Jan 21 16:14 north-america-us-district-of-columbia-default-2023-01-21.sql -``` - - -## Layer Sets - - -PgOSM Flex includes a few layersets and makes it easy to customize your own. -See [docs/LAYERSETS.md](docs/LAYERSETS.md) for details. - - - -## QGIS Layer Styles - -If you use QGIS to visualize OpenStreetMap, there are a few basic -styles using the `public.layer_styles` table created by QGIS. -This data is loaded by default and can be excluded with `--data-only`. - -See [the QGIS Style README.md](https://github.com/rustprooflabs/pgosm-flex/blob/main/db/qgis-style/README.md) -for more information. - ## Explore data loaded @@ -240,29 +42,6 @@ SELECT s_name, t_name, rows, size_plus_indexes -## Meta table - -PgOSM Flex tracks processing metadata in the ``osm.pgosm_flex`` table. The initial import -has `osm2pgsql_mode = 'create'`, the subsequent update has -`osm2pgsql_mode = 'append'`. - - -```sql -SELECT osm_date, region, srid, - pgosm_flex_version, osm2pgsql_version, osm2pgsql_mode - FROM osm.pgosm_flex -; -``` - -```bash -┌────────────┬───────────────────────────┬──────┬────────────────────┬───────────────────┬────────────────┐ -│ osm_date │ region │ srid │ pgosm_flex_version │ osm2pgsql_version │ osm2pgsql_mode │ -╞════════════╪═══════════════════════════╪══════╪════════════════════╪═══════════════════╪════════════════╡ -│ 2022-11-04 │ north-america/us-colorado │ 3857 │ 0.6.2-e1f140f │ 1.7.2 │ create │ -│ 2022-11-25 │ north-america/us-colorado │ 3857 │ 0.6.2-e1f140f │ 1.7.2 │ append │ -└────────────┴───────────────────────────┴──────┴────────────────────┴───────────────────┴────────────────┘ -``` - ## Query examples @@ -270,70 +49,6 @@ For example queries with data loaded by PgOSM-Flex see [docs/QUERY.md](docs/QUERY.md). -## Points of Interest (POIs) - -PgOSM Flex loads an range of tags into a materialized view (`osm.poi_all`) for -easily searching POIs. -Line and polygon data is forced to point geometry using -`ST_Centroid()`. This layer duplicates a bunch of other more specific layers -(shop, amenity, etc.) to provide a single place for simplified POI searches. - -Special layer included by layer sets `run-all` and `run-no-tags`. -See `style/poi.lua` for logic on how to include POIs. -The topic of POIs is subject and likely is not inclusive of everything that probably should be considered -a POI. If there are POIs missing -from this table please submit a [new issue](https://github.com/rustprooflabs/pgosm-flex/issues/new) -with sufficient details about what is missing. -Pull requests also welcome! [See CONTRIBUTING.md](CONTRIBUTING.md). - - -Counts of POIs by `osm_type`. - -```sql -SELECT osm_type, COUNT(*) - FROM osm.vpoi_all - GROUP BY osm_type - ORDER BY COUNT(*) DESC; -``` - -Results from Washington D.C. subregion (March 2020). - -``` -┌──────────┬───────┐ -│ osm_type │ count │ -╞══════════╪═══════╡ -│ amenity │ 12663 │ -│ leisure │ 2701 │ -│ building │ 2045 │ -│ shop │ 1739 │ -│ tourism │ 729 │ -│ man_made │ 570 │ -│ landuse │ 32 │ -│ natural │ 19 │ -└──────────┴───────┘ -``` - -Includes Points (`N`), Lines (`L`) and Polygons (`W`). - - -```sql -SELECT geom_type, COUNT(*) - FROM osm.vpoi_all - GROUP BY geom_type - ORDER BY COUNT(*) DESC; -``` - -``` -┌───────────┬───────┐ -│ geom_type │ count │ -╞═══════════╪═══════╡ -│ W │ 10740 │ -│ N │ 9556 │ -│ L │ 202 │ -└───────────┴───────┘ -``` - - ## One table to rule them all diff --git a/db/qgis-style/README.md b/db/qgis-style/README.md index ee01ab9..ec9d593 100644 --- a/db/qgis-style/README.md +++ b/db/qgis-style/README.md @@ -1,118 +1 @@ -# QGIS Styles for PgOSM Flex - -QGIS can save its styling information directly in a table in the Postgres database -using a table `public.layer_styles`. - - -## Prepare - -The `create_layer_styles.sql` script creates the `public.layer_styles` table defined in QGIS 3.16 along with an additional `public.layer_styles_staging` table used to prepare -data before loading. - -``` -psql -d pgosm -f create_layer_styles.sql -``` - -Load styles to staging. - -``` -psql -d pgosm -f layer_styles.sql -``` - - -To use these styles as defaults, update the `f_table_catalog` and -`f_table_schema` values in the staging table. The defaults are -`f_table_catalog='pgosm'` and `f_table_schema='osm'`. - - -```sql -UPDATE public.layer_styles_staging - SET f_table_catalog = 'your_db', - f_table_schema = 'osm' -; -``` - -## Add/Update existing records - -The QGIS table does not include `UNIQUE` constraints, so using Postgres' `UPSERT` is -not available by default. - -Add new records from staging, based on object names. - -```sql -INSERT INTO public.layer_styles - (f_table_catalog, f_table_schema, f_table_name, - f_geometry_column, stylename, styleqml, stylesld, - useasdefault, description, "owner", ui, update_time) -SELECT new.f_table_catalog, new.f_table_schema, new.f_table_name, - new.f_geometry_column, new.stylename, new.styleqml, new.stylesld, - new.useasdefault, new.description, new."owner", new.ui, new.update_time - FROM public.layer_styles_staging new - LEFT JOIN public.layer_styles ls - ON new.f_table_catalog = ls.f_table_catalog - AND new.f_table_schema = ls.f_table_schema - AND new.f_table_name = ls.f_table_name - AND new.stylename = ls.stylename - WHERE ls.id IS NULL -; -``` - -To update existing styles. - -```sql -UPDATE public.layer_styles ls - SET f_geometry_column = new.f_geometry_column, - styleqml = new.styleqml, - stylesld = new.stylesld, - useasdefault = new.useasdefault, - description = new.description, - "owner" = new."owner", - ui = new.ui, - update_time = new.update_time - FROM public.layer_styles_staging new - WHERE new.f_table_catalog = ls.f_table_catalog - AND new.f_table_schema = ls.f_table_schema - AND new.f_table_name = ls.f_table_name - AND new.stylename = ls.stylename -; -``` - - -Cleanup the staging table. - -```sql -DELETE FROM public.layer_styles_staging; -``` - - -## Updating Style .sql - -To update (or create new) the .sql file with styles. - -Load into `_staging` table so restoring the data puts it back in the same place. -Optionally add a `WHERE` clause to only export certain styles. - -You may want to update the `owner` field. - -```sql -INSERT INTO public.layer_styles_staging -SELECT * FROM public.layer_styles; - -UPDATE public.layer_styles_staging - SET owner = 'rustprooflabs' - WHERE owner != 'rustprooflabs' -; -``` - - -```bash -pg_dump --no-owner --no-privileges --data-only --table=public.layer_styles_staging \ - -d pgosm \ - -f layer_styles.sql -``` - -Cleanup the staging table. - -```sql -DELETE FROM public.layer_styles_staging; -``` +Documentation moved to https://pgosm-flex.com diff --git a/docs/CNAME b/docs/CNAME new file mode 100644 index 0000000..3e05273 --- /dev/null +++ b/docs/CNAME @@ -0,0 +1 @@ +pgosm-flex.com \ No newline at end of file diff --git a/docs/DOCKER-RUN.md b/docs/DOCKER-RUN.md deleted file mode 100644 index eda7892..0000000 --- a/docs/DOCKER-RUN.md +++ /dev/null @@ -1,494 +0,0 @@ -# Using PgOSM Flex - -This README provides details about running PgOSM Flex using the image defined -in `Dockerfile` and the script loaded from `docker/pgosm_flex.py`. - - -## Directory for data files - -Create directory for the `.osm.pbf` file and the output `.sql` file. The PBF and MD5 files -downloaded from Geofabrik are stored in this directory. -This directory location is assumed in subsequent `docker run` commands. -If you change the data file path be sure to adjust `-v ~/pgosm-data:/app/output` -appropriately to link your path. - -```bash -mkdir ~/pgosm-data -``` - - - -## Run PgOSM Flex Container - - -Set environment variables for the temporary Postgres connection in Docker. - - -### Internal Postgres instance - -The Postgres username and password are the minimum required parameters to use -the internal Postgres database instance. - -```bash -export POSTGRES_USER=postgres -export POSTGRES_PASSWORD=mysecretpassword -``` - - -Start the `pgosm` Docker container to make PostgreSQL/PostGIS available. -This command exposes Postgres inside Docker on port 5433 and establishes links -to the local directory created above (`~/pgosm-data`). If your data is stored in a -different location, update this value. - -Using `-v /etc/localtime:/etc/localtime:ro` allows the Docker image to use -the host machine's timezone instead of UTC. This is important when determining if the data -to load should be the latest file (download) or a historic (local) file. - - -```bash -docker run --name pgosm -d --rm \ - -v ~/pgosm-data:/app/output \ - -v /etc/localtime:/etc/localtime:ro \ - -e POSTGRES_PASSWORD=$POSTGRES_PASSWORD \ - -p 5433:5432 -d rustprooflabs/pgosm-flex -``` - -Ensure the docker container is running. - -```bash -docker ps -a | grep pgosm -``` - -> The most common reason the Docker container fails to run is not setting the `$POSTGRES_PASSWORD` env var. - -Run the processing with `docker exec`. - -```bash -docker exec -it \ - pgosm python3 docker/pgosm_flex.py \ - --ram=8 \ - --region=north-america/us \ - --subregion=district-of-columbia -``` - - - -### External Postgres instance - -The PgOSM Flex Docker image can be used with Postgres instance outside the -Docker container. - -Prepare the database and permissions as described in -[POSTGRES-PERMISSIONS.md](POSTGRES-PERMISSIONS.md). - - -Set environment variables to define the connection. Create a file with the -configuration options. - -```bash -touch ~/.pgosm-db-myproject -chmod 0700 ~/.pgosm-db-myproject -nano ~/.pgosm-db-myproject -``` - -Put in the contents. - -```bash -export POSTGRES_USER=your_login_role -export POSTGRES_PASSWORD=mysecretpassword -export POSTGRES_HOST=your-host-or-ip -export POSTGRES_DB=your_db_name -export POSTGRES_PORT=5432 -``` - -Env vars can be loaded using. - -```bash -source ~/.pgosm-db-myproject -``` - ----- - -Note: The `POSTGRES_HOST` value is in relation to the Docker container. -Using `localhost` refers to the Docker container and will use the Postgres instance -within the Docker container, not your host running the Docker container. -Use `ip addr` to find your local host's IP address and provide that. - ----- - -Run the container with the additional environment variables. - -```bash -docker run --name pgosm -d --rm \ - -v ~/pgosm-data:/app/output \ - -v /etc/localtime:/etc/localtime:ro \ - -e POSTGRES_USER=$POSTGRES_USER \ - -e POSTGRES_PASSWORD=$POSTGRES_PASSWORD \ - -e POSTGRES_HOST=$POSTGRES_HOST \ - -e POSTGRES_DB=$POSTGRES_DB \ - -e POSTGRES_PORT=$POSTGRES_PORT \ - -p 5433:5432 -d rustprooflabs/pgosm-flex -``` - -> Note: Setting `POSTGRES_HOST` to anything but `localhost` disables the drop/create database step. This means the target database must be created prior to running PgOSM Flex. - - -The `docker exec` command is the same as when using the internal Postgres instance. - -```bash -docker exec -it \ - pgosm python3 docker/pgosm_flex.py \ - --ram=8 \ - --region=north-america/us \ - --subregion=district-of-columbia -``` - - -## Use `--replication` to keep data fresh - -> The `--replication` mode seems to be stable as of 0.7.0. It was added as an experimental feature in 0.4. (originally under the --append option). - - -PgOSM Flex's `--replication` mode wraps around the `osm2pgsql-replication` package -included with `osm2pgsql`. The first time running an import with `--replication` -mode runs osm2pgsql normally, with `--slim` mode and without `--drop`. -After osm2pgsql completes, `osm2pgsql-replication init ...` is ran to setup -the DB for updates. -This mode of operation results in larger database as the intermediate osm2pgsql -tables (`--slim`) must be left in the database (no `--drop`). - - -> Important: The original `--append` option is now under `--replication`. The `--append` option was removed in PgOSM Flex 0.7.0. See [#275](https://github.com/rustprooflabs/pgosm-flex/issues/275) for context. - - -When using replication you need to pin your process to a specific PgOSM Flex version -in the `docker run` command. When upgrading to new versions, -be sure to check the release notes for manual upgrade steps for `--replication`. -The release notes for -[PgOSM Flex 0.6.1](https://github.com/rustprooflabs/pgosm-flex/releases/tag/0.6.1) -are one example. -The notes discussed in the release notes have reference SQL scripts -under `db/data-migration` folder. - ----- - -**WARNING - Due to the ability to configure custom layersets these data-migration -scripts need manual review, and possibly manual adjustments for -your specific database and process.** - ----- - - -The other important change when using replication is to increase Postgres' `max_connections`. -See [this discussion on osm2pgsql](https://github.com/openstreetmap/osm2pgsql/discussions/1650) -for why this is necessary. - -If using the Docker-internal Postgres instance this is done with `-c max_connections=300` -in the `docker run` command. External database connections must update this -in the appropriate `postgresql.conf` file. - - -```bash -docker run --name pgosm -d --rm \ - -v ~/pgosm-data:/app/output \ - -v /etc/localtime:/etc/localtime:ro \ - -e POSTGRES_PASSWORD=$POSTGRES_PASSWORD \ - -p 5433:5432 \ - -d rustprooflabs/pgosm-flex:0.7.0 \ - -c max_connections=300 -``` - - -Run the `docker exec` step with `--replication`. - -```bash -docker exec -it \ - pgosm python3 docker/pgosm_flex.py \ - --ram=8 \ - --region=north-america/us \ - --subregion=district-of-columbia \ - --pgosm-date 2022-12-30 \ - --replication -``` - -Running the above command a second time will detect that the target database -has `osm2pgsql-replication` setup and load data via the defined replication -service. - -> Note: The `--pgosm-date` parameter is ignored during subsequent imports using `--replication`. - - - -## Run PgOSM-Flex - -The following `docker exec` command runs PgOSM Flex to load the District of Columbia -region. -The command `python3 docker/pgosm_flex.py` runs the full process. The -script uses a region (`--region=north-america/us`) and -sub-region (`--subregion=district-of-columbia`). -The region/subregion values must the URL pattern used by the Geofabrik download server, -see the [Regions and Subregions](#regions-and-subregions) section. - -The `--ram=8` parameter defines the total system RAM available and is used by -internal logic to determine the best osm2pgsql options to use. -When running on hardware dedicated to this process it is safe to define the total -system RAM. If the process is on a computer with other responsibilities, such -as your laptop, feel free to lower this value. - - -```bash -docker exec -it \ - pgosm python3 docker/pgosm_flex.py \ - --ram=8 \ - --region=north-america/us \ - --subregion=district-of-columbia -``` - -For the best in-Docker performance you will need to -[tune the internal Postgres config](#configure-postgres-in-docker) appropriately -for your hardware. -See the [osm2pgsql documentation](https://osm2pgsql.org/doc/manual.html#tuning-the-postgresql-server) for more on tuning Postgres for this -process. - - -## Regions and Subregions - -The `--region` and `--subregion` definitions must match -the Geofabrik URL scheme. This can be a bit confusing -as larger subregions can contain smaller subregions. - -The example above to process the `district-of-columbia` subregion defines -`--region=north-america/us`. You cannot, unfortunately, drop off -the `--subregion` to load the U.S. subregion. Attempting this results -in a `ValueError`. - -To load the U.S. subregion, the `us` portion drops out of `--region` -and moves to `--subregion`. - -```bash -docker exec -it pgosm python3 docker/pgosm_flex.py \ - --ram=8 \ - --region=north-america \ - --subregion=us -``` - - -## Customize PgOSM-Flex - -See full set of options via `--help`. The required option (`--ram`) and the -commonly used `--region` and `--subregion` are listed first. The remainder -of the options are listed in alphabetical order. - - -```bash -docker exec -it pgosm python3 docker/pgosm_flex.py --help -``` - -```bash -Usage: pgosm_flex.py [OPTIONS] - - Run PgOSM Flex within Docker to automate osm2pgsql flex processing. - -Options: - --ram FLOAT Amount of RAM in GB available on the machine - running the Docker container. This is used to - determine the appropriate osm2pgsql command via - osm2pgsql-tuner recommendation engine. [required] - --region TEXT Region name matching the filename for data sourced - from Geofabrik. e.g. north-america/us. Optional - when --input-file is specified, otherwise - required. - --subregion TEXT Sub-region name matching the filename for data - sourced from Geofabrik. e.g. district-of-columbia - --data-only When set, skips running Sqitch and importing QGIS - Styles. - --debug Enables additional log output - --input-file TEXT Set filename or absolute filepath to input osm.pbf - file. Overrides default file handling, archiving, - and MD5 checksum validation. Filename is assumed - under /app/output unless absolute path is used. - --layerset TEXT Layerset to load. Defines name of included - layerset unless --layerset-path is defined. - [required] - --layerset-path TEXT Custom path to load layerset INI from. Custom - paths should be mounted to Docker via docker run - -v ... - --language TEXT Set default language in loaded OpenStreetMap data - when available. e.g. 'en' or 'kn'. - --pg-dump Uses pg_dump after processing is completed to - enable easily load OpenStreetMap data into a - different database - --pgosm-date TEXT Date of the data in YYYY-MM-DD format. If today - (default), automatically downloads when files not - found locally. Set to historic date to load - locally archived PBF/MD5 file, will fail if both - files do not exist. - --replication EXPERIMENTAL - Replication mode enables updates - via osm2pgsql-replication. - --schema-name TEXT Change the final schema name, defaults to 'osm'. - --skip-nested When set, skips calculating nested admin polygons. - Can be time consuming on large regions. - --srid TEXT SRID for data loaded by osm2pgsql to PostGIS. - Defaults to 3857 - --sp-gist When set, builds SP-GIST indexes on geom column - instead of the default GIST indexes. - --update [append|create] EXPERIMENTAL - Wrap around osm2pgsql create v. - append modes, without using osm2pgsql-replication. - --help Show this message and exit. -``` - -An example of running with many of the current options. - -```bash -docker exec -it \ - pgosm python3 docker/pgosm_flex.py \ - --layerset=poi \ - --layerset-path=/custom-layerset/ \ - --ram=8 \ - --region=north-america/us \ - --subregion=district-of-columbia \ - --schema-name=osm_dc \ - --pgosm-date="2021-03-11" \ - --language="en" \ - --srid="4326" \ - --data-only \ - --pg-dump \ - --skip-nested \ - --sp-gist \ - --debug -``` - -## Use custom layersets - -See [LAYERSETS.md](LAYERSETS.md) for details about creating custom layersets. - -To use the `--layerset-path` option for custom layerset -definitions, link the directory containing custom styles -to the Docker container in the `docker run` command. -The custom styles will be available inside the container under -`/custom-layerset`. - - -```bash -docker run --name pgosm -d --rm \ - -v ~/pgosm-data:/app/output \ - -v /etc/localtime:/etc/localtime:ro \ - -v ~/custom-layerset:/custom-layerset \ - -e POSTGRES_PASSWORD=$POSTGRES_PASSWORD \ - -p 5433:5432 -d rustprooflabs/pgosm-flex -``` - -Define the layerset name (`--layerset=poi`) and path -(`--layerset-path`) to the `docker exec`. - - -```bash -docker exec -it \ - pgosm python3 docker/pgosm_flex.py \ - --layerset=poi \ - --layerset-path=/custom-layerset/ \ - --ram=8 \ - --region=north-america/us \ - --subregion=district-of-columbia -``` - - -## Skip nested place polygons - -The nested place polygon calculation -([explained in this post](https://blog.rustprooflabs.com/2021/01/pgosm-flex-improved-openstreetmap-places-postgis)) -adds minimal overhead to smaller regions, e.g. Colorado with a 225 MB PBF input file. -Larger regions, such as North America (12 GB PBF), -are impacted more severely as a difference in processing time. -Calculating nested place polygons for Colorado adds less than 30 seconds on an 8 minute process, -taking about 5% longer. -A larger region, such as North America, can take 33% longer adding more than -an hour and a half to the total processing time. -See [docs/PERFORMANCE.md](PERFORMANCE.md) for more details. - - -Use `--skip-nested` to bypass the calculation of nested admin polygons. - - -## Use `--pg-dump` to export data - -> The `--pg-dump` option was added in 0.7.0. Prior versions defaulted to using `pg_dump` and provided a `--skip-dump` option to override. The default now is to only use `pg_dump` when requested. See [#266](https://github.com/rustprooflabs/pgosm-flex/issues/266) for more. - - -A `.sql` file can be created using `pg_dump` as part of the processing -for easy loading into one or more external Postgres databases. -Add `--pg-dump` to the `docker exec` command to enable this feature. - -The following example -creates an empty `myosm` database to load the processed and dumped OpenStreetMap -data. - - -```bash -psql -d postgres -c "CREATE DATABASE myosm;" -psql -d myosm -c "CREATE EXTENSION postgis;" - -psql -d myosm \ - -f ~/pgosm-data/pgosm-flex-north-america-us-district-of-columbia-default-2023-01-21.sql -``` - -> The above assumes a database user with `superuser` permissions is used. See [docs/POSTGRES-PERMISSIONS.md](POSTGRES-PERMISSIONS.md) for a more granular approach to permissions. - - -## Configure Postgres inside Docker - -Add customizations with the `-c` switch, e.g. `-c shared_buffers=1GB`, -to customize Postgres' configuration at run-time in Docker. -See the [osm2pgsql documentation](https://osm2pgsql.org/doc/manual.html#preparing-the-database) -for recommendations on a server with 64 GB of RAM. - -This `docker run` command has been tested with 16GB RAM and 4 CPU (8 threads) with the Colorado -subregion. Configuring Postgres in-Docker runs 7-14% faster than the default -Postgres in-Docker configuration. - - -```bash -docker run --name pgosm -d --rm \ - -v ~/pgosm-data:/app/output \ - -v /etc/localtime:/etc/localtime:ro \ - -e POSTGRES_PASSWORD=$POSTGRES_PASSWORD \ - -p 5433:5432 -d rustprooflabs/pgosm-flex \ - -c shared_buffers=512MB \ - -c work_mem=50MB \ - -c maintenance_work_mem=4GB \ - -c checkpoint_timeout=300min \ - -c max_wal_senders=0 -c wal_level=minimal \ - -c max_wal_size=10GB \ - -c checkpoint_completion_target=0.9 \ - -c random_page_cost=1.0 -``` - - -The `docker exec` command used for the timings. - -```bash -time docker exec -it \ - pgosm python3 docker/pgosm_flex.py \ - --ram=8 \ - --region=north-america/us \ - --subregion=colorado \ - --layerset=basic \ - --pgosm-date=2021-10-08 -``` - - -## Monitoring the import - -You can track the query activity in the database being loaded using the -`pg_stat_activity` view from `pg_catalog`. Database connections use -`application_name = 'pgosm_flex'`. - - -```sql -SELECT * - FROM pg_catalog.pg_stat_activity - WHERE application_name = 'pgosm-flex' -; -``` - - diff --git a/docs/DUMP-AND-LOAD.md b/docs/DUMP-AND-LOAD.md deleted file mode 100644 index a720957..0000000 --- a/docs/DUMP-AND-LOAD.md +++ /dev/null @@ -1,49 +0,0 @@ -# PgOSM Flex: Dump and reload data - -> These manual procedures (outside of Docker) are not regularly tested or reviewed. The recommended way to use PgOSM Flex is through the Docker image. The Docker image is capable of renaming the schema and running pg_dump when desired. - ----- - -To move data loaded on one Postgres instance to another, use `pg_dump`. -The import from PBF to PostGIS is far more taxing on resources than general -querying of the data requires. One common approach is to use a temporary cloud -server with additional resources to process and prepare the data, then dump -and restore the data onto a production Postgres instance for use. - -## (optional) Rename schema - -If the desired schema name is different from `osm` the schema can be renamed -at this point. If the schema is renamed, adjust the following `pg_dump` -to change `--schema=` as well. - - -```sql -ALTER SCHEMA osm RENAME TO some_other_schema_name; -``` - - -## pg_dump - -Create a directory to export. Using `-Fd` for directory format to allow using -`pg_dump`/`pg_restore` with multiple processes (`-j 4`). For the small data set for -Washington D.C. used here this isn't necessary, though can seriously speed up with larger areas, e.g. Europe or North America. - -```bash -mkdir -p ~/tmp/osm_dc -pg_dump --schema=osm --schema=pgosm \ - -d pgosm \ - -Fd -j 4 \ - -f ~/tmp/osm_dc -tar -cvf osm_dc.tar -C ~/tmp osm_dc -``` - -## pg_restore - -Move the `.tar` if needed. Untar and restore. - - -```bash -tar -xvf osm_eu.tar -pg_restore -j 4 -d pgosm_eu -Fd osm_eu/ -``` - diff --git a/docs/LAYERSETS.md b/docs/LAYERSETS.md deleted file mode 100644 index aa4cc02..0000000 --- a/docs/LAYERSETS.md +++ /dev/null @@ -1,37 +0,0 @@ -# PgOSM Flex layersets - -A layerset defines one or more layers, where each layer includes -one or more tables and/or views. -Layers are defined by a matched pair of Lua and SQL scripts. For example, -the road layer is defined by `flex-config/style/road.lua` and -`flex-config/sql/road.sql`. - - -Layersets are defined in `.ini` files. - - -## Included layersets - -PgOSM Flex includes a few layersets. These are defined under `flex-config/layerset/`. -If the `--layerset` is not defined, the `default` layerset is used. - -* `basic` -* `default` -* `everything` -* `minimal` - - -## Custom layerset - - -A layerset including the `poi` and `road_major` layers would look -like: - -```ini -[layerset] -poi=true -road_major=true -``` - -Layers not listed in the layerset `.ini` are not included. - diff --git a/docs/QUERY.md b/docs/QUERY.md deleted file mode 100644 index 06ac3cc..0000000 --- a/docs/QUERY.md +++ /dev/null @@ -1,98 +0,0 @@ -# Querying with PgOSM Flex - -## Nested admin polygons - -Nested admin polygons are stored in the table `osm.place_polygon_nested`. -The `osm.build_nested_admin_polygons()` to populate the table is defined in `flex-config/place.sql`, -the Docker process automatically runs it. -Can run quickly on small areas (Colorado), takes significantly longer on larger -areas (North America). - - -The Python script in the Docker image has a `--skip-nested` option to skip -running the function to populate the table. It can always be populated -at a later time manually using the function. - -```sql -CALL osm.build_nested_admin_polygons(); -``` - -When this process is running for a while it can be monitored with this query. - -```sql -SELECT COUNT(*) AS row_count, - COUNT(*) FILTER (WHERE nest_level IS NOT NULL) AS rows_processed - FROM osm.place_polygon_nested -; -``` - - -# Quality Control Queries - -## Features not Loaded - -The process of selectively load specific features and not others always has the chance -of accidentally missing important data. - -Running and examine tags from the SQL script `db/qc/features_not_in_run_all.sql`. -Run within `psql` (using `\i db/qc/features_not_in_run_all.sql`) or a GUI client -to explore the temp table used to return the aggregated results, `osm_missing`. -The table is a `TEMP TABLE` so will disappear when the session ends. - -Example results from initial run (v0.0.4) showed some obvious omissions from the -current layer definitions. - -```bash -┌────────────────────────────────────────┬────────┐ -│ jsonb_object_keys │ count │ -╞════════════════════════════════════════╪════════╡ -│ landuse │ 110965 │ -│ addr:street │ 89482 │ -│ addr:housenumber │ 89210 │ -│ name │ 47151 │ -│ leisure │ 25351 │ -│ addr:state │ 19051 │ -│ power │ 16933 │ -│ addr:unit │ 13973 │ -│ building:part │ 13773 │ -│ golf │ 13427 │ -│ railway │ 13032 │ -│ addr:city │ 12426 │ -│ addr:postcode │ 12358 │ -│ height │ 12113 │ -│ building:colour │ 11124 │ -│ roof:colour │ 11115 │ -``` - -## Unroutable routes - -The `helpers.lua` methods are probably not perfect. - -* `routable_foot()` -* `routable_cycle()` -* `routable_motor()` - - - -```sql -SELECT * FROM osm.road_line - WHERE NOT route_foot AND NOT route_motor AND NOT route_cycle -; -``` -> Not all rows returned are errors. `highway = 'construction'` is not necessarily determinate... - - -## Relations missing from unitable - -```sql -SELECT t.* - FROM osm.tags t - WHERE t.geom_type = 'R' - AND NOT EXISTS ( - SELECT 1 - FROM osm.unitable u - WHERE u.geom_type = t.geom_type AND t.osm_id = u.osm_id -); -``` - - diff --git a/docs/README.md b/docs/README.md new file mode 100644 index 0000000..efb9b5a --- /dev/null +++ b/docs/README.md @@ -0,0 +1,16 @@ +# Building doc book + +The book is built [using `mdbook`](https://rust-lang.github.io/mdBook/index.html). + +Install mdbook. + +```bash +cargo install mdbook +``` + +Serve the book locally and open your default browser. + +```bash +cd docs +mdbook serve --open +``` \ No newline at end of file diff --git a/docs/book.toml b/docs/book.toml new file mode 100644 index 0000000..4851da5 --- /dev/null +++ b/docs/book.toml @@ -0,0 +1,15 @@ +[book] +authors = ["Ryan Lambert", "PgOSM Flex Contributors"] +language = "en" +multilingual = false +src = "src" +title = "PgOSM Flex User Guide" +description = "PgOSM Flex provides high quality OpenStreetMap datasets in PostGIS using the osm2pgsql Flex output" + +[output.html] +default-theme = "rust" +preferred-dark-theme = "navy" +git-repository-url = "https://github.com/rustprooflabs/pgosm-flex" +git-repository-icon = "fa-github" +edit-url-template = "https://github.com/rustprooflabs/pgosm-flex/edit/main/docs/{path}" + diff --git a/docs/APPEND-MODE.md b/docs/src/APPEND-MODE.md similarity index 100% rename from docs/APPEND-MODE.md rename to docs/src/APPEND-MODE.md diff --git a/docs/src/COMMON-CUSTOMIZATION.md b/docs/src/COMMON-CUSTOMIZATION.md new file mode 100644 index 0000000..cefe6a3 --- /dev/null +++ b/docs/src/COMMON-CUSTOMIZATION.md @@ -0,0 +1,136 @@ +# Common Customizations + +A major goal of PgOSM Flex is support a wide range of use cases for using +OpenStreetMap data in PostGIS. This chapter explores a few ways PgOSM Flex +can be customized. + + +## Selecting region and subregion + +The most used customization is the region and subregion selection. +The examples throughout this project's documentation use +the `--region=north-america/us` and `--subregion=district-of-columbia` +because it is a small region that downloads and imports quickly. + +```bash +docker exec -it \ + pgosm python3 docker/pgosm_flex.py \ + --ram=8 \ + --region=north-america/us \ + --subregion=district-of-columbia +``` + +By default PgOSM Flex will attempt to download the necessary data files +from [Geofabrik's download server](https://download.geofabrik.de/). +Navigate the Region/Sub-region structure on Geofabrik to determine +exactly what `--region` and `--subregion` options to choose. +This can be a bit confusing as larger subregions can contain smaller subregions. +Feel free to [start a discussion](https://github.com/rustprooflabs/pgosm-flex/discussions/new/choose) if you need help figuring this part out! + +If you want to load the entire United States subregion, instead of +the District of Columbia subregion, the `docker exec` command is changed to the +following. + +```bash +docker exec -it \ + pgosm python3 docker/pgosm_flex.py \ + --ram=8 \ + --region=north-america \ + --subregion=us +``` + +For top-level regions, such as North America, leave off the `--subregion` option. + +```bash +docker exec -it \ + pgosm python3 docker/pgosm_flex.py \ + --ram=8 \ + --region=north-america +``` + +## Specific input file + +The automatic Geofabrik download can be overridden by providing PgOSM Flex +with the path to a valid `.osm.pbf` file using `--input-file`. +This option overrides the default file handling, archiving, and MD5 +checksum validation. With `--input-file` you can use a custom `osm.pbf` +you created, or use it to simply remove the need for an internet connection +from the instance running the processing. + +> Note: The `--region` option is always required, the `--subregion` option can be used with `--input-file` to put the information in the `subregion` column of `osm.pgosm_flex`. + + +## Customize load to PostGIS + +There are a few ways to customize exactly how data is loaded to PostGIS / Postgres. + +### SRID + +PgOSM Flex defaults to SRID 3857 matching the default osm2pgsql behavior. +This can be customized using `--srid 4326` or any other SRID supported by +osm2pgsql and PostGIS. + + + +### Language + +The `--language` option enables defining a preferred language for OpenStreetMap +names. If `--language=en` is defined, PgOSM Flex's `helper.get_name()` +function will use `name:en` if it exists. The usage and effect +of this option is shown in [this comment](https://github.com/rustprooflabs/pgosm-flex/issues/93#issuecomment-818271870). + +Using `-e PGOSM_LANGUAGE=kn` for U.S. West results in most state labels picking +up the Kannada language option. The states without a `name:kn` default +to the standard name selection logic. + +![](https://user-images.githubusercontent.com/3085224/114467942-ecd29700-9ba7-11eb-980a-10a127fd3c97.png) + + + +### Data only + +The `--data-only` option skips creating optional data structures in the target +database. This includes the helper tables in the `pgosm` schema and the +QGIS layer style table. + + +## Use `--help` + +The PgOSM Docker image can provide command line help. +The Python script that controls PgOSM Flex's behavior is built using the +`click` module, providing built-in `--help`. +Use `docker exec` to show the full help. + + +```bash +docker exec -it pgosm python3 docker/pgosm_flex.py --help +``` + +The first portion of the `--help` output is shown here. + +```bash +Usage: pgosm_flex.py [OPTIONS] + + Run PgOSM Flex within Docker to automate osm2pgsql flex processing. + +Options: + --ram FLOAT Amount of RAM in GB available on the machine + running the Docker container. This is used to + determine the appropriate osm2pgsql command via + osm2pgsql-tuner recommendation engine. [required] + --region TEXT Region name matching the filename for data sourced + from Geofabrik. e.g. north-america/us. Optional + when --input-file is specified, otherwise + required. + --subregion TEXT Sub-region name matching the filename for data + sourced from Geofabrik. e.g. district-of-columbia + --data-only When set, skips running Sqitch and importing QGIS + Styles. + +``` + + + + + + diff --git a/docs/DOCKER-BUILD.md b/docs/src/DOCKER-BUILD.md similarity index 100% rename from docs/DOCKER-BUILD.md rename to docs/src/DOCKER-BUILD.md diff --git a/docs/src/DOCKER-RUN.md b/docs/src/DOCKER-RUN.md new file mode 100644 index 0000000..82d6a94 --- /dev/null +++ b/docs/src/DOCKER-RUN.md @@ -0,0 +1,111 @@ +# Using PgOSM Flex + +This README provides details about running PgOSM Flex using the image defined +in `Dockerfile` and the script loaded from `docker/pgosm_flex.py`. + + + +## Use custom layersets + +See [LAYERSETS.md](LAYERSETS.md) for details about creating custom layersets. + + +## Skip nested place polygons + +The nested place polygon calculation +([explained in this post](https://blog.rustprooflabs.com/2021/01/pgosm-flex-improved-openstreetmap-places-postgis)) +adds minimal overhead to smaller regions, e.g. Colorado with a 225 MB PBF input file. +Larger regions, such as North America (12 GB PBF), +are impacted more severely as a difference in processing time. +Calculating nested place polygons for Colorado adds less than 30 seconds on an 8 minute process, +taking about 5% longer. +A larger region, such as North America, can take 33% longer adding more than +an hour and a half to the total processing time. +See [docs/PERFORMANCE.md](PERFORMANCE.md) for more details. + + +Use `--skip-nested` to bypass the calculation of nested admin polygons. + + +## Use `--pg-dump` to export data + +> The `--pg-dump` option was added in 0.7.0. Prior versions defaulted to using `pg_dump` and provided a `--skip-dump` option to override. The default now is to only use `pg_dump` when requested. See [#266](https://github.com/rustprooflabs/pgosm-flex/issues/266) for more. + + +A `.sql` file can be created using `pg_dump` as part of the processing +for easy loading into one or more external Postgres databases. +Add `--pg-dump` to the `docker exec` command to enable this feature. + +The following example +creates an empty `myosm` database to load the processed and dumped OpenStreetMap +data. + + +```bash +psql -d postgres -c "CREATE DATABASE myosm;" +psql -d myosm -c "CREATE EXTENSION postgis;" + +psql -d myosm \ + -f ~/pgosm-data/pgosm-flex-north-america-us-district-of-columbia-default-2023-01-21.sql +``` + +> The above assumes a database user with `superuser` permissions is used. See [docs/POSTGRES-PERMISSIONS.md](POSTGRES-PERMISSIONS.md) for a more granular approach to permissions. + + +## Configure Postgres inside Docker + +Add customizations with the `-c` switch, e.g. `-c shared_buffers=1GB`, +to customize Postgres' configuration at run-time in Docker. +See the [osm2pgsql documentation](https://osm2pgsql.org/doc/manual.html#preparing-the-database) +for recommendations on a server with 64 GB of RAM. + +This `docker run` command has been tested with 16GB RAM and 4 CPU (8 threads) with the Colorado +subregion. Configuring Postgres in-Docker runs 7-14% faster than the default +Postgres in-Docker configuration. + + +```bash +docker run --name pgosm -d --rm \ + -v ~/pgosm-data:/app/output \ + -v /etc/localtime:/etc/localtime:ro \ + -e POSTGRES_PASSWORD=$POSTGRES_PASSWORD \ + -p 5433:5432 -d rustprooflabs/pgosm-flex \ + -c shared_buffers=512MB \ + -c work_mem=50MB \ + -c maintenance_work_mem=4GB \ + -c checkpoint_timeout=300min \ + -c max_wal_senders=0 -c wal_level=minimal \ + -c max_wal_size=10GB \ + -c checkpoint_completion_target=0.9 \ + -c random_page_cost=1.0 +``` + + +The `docker exec` command used for the timings. + +```bash +time docker exec -it \ + pgosm python3 docker/pgosm_flex.py \ + --ram=8 \ + --region=north-america/us \ + --subregion=colorado \ + --layerset=basic \ + --pgosm-date=2021-10-08 +``` + + +## Monitoring the import + +You can track the query activity in the database being loaded using the +`pg_stat_activity` view from `pg_catalog`. Database connections use +`application_name = 'pgosm_flex'`. + + +```sql +SELECT * + FROM pg_catalog.pg_stat_activity + WHERE application_name = 'pgosm-flex' +; +``` + + diff --git a/docs/src/LAYERSETS.md b/docs/src/LAYERSETS.md new file mode 100644 index 0000000..7c9ebd4 --- /dev/null +++ b/docs/src/LAYERSETS.md @@ -0,0 +1,98 @@ +# PgOSM Flex layersets + + +A layerset defines one or more layers, where each layer includes +one or more tables and/or views. +Layers are defined by a matched pair of Lua and SQL scripts. For example, +the road layer is defined by `flex-config/style/road.lua` and +`flex-config/sql/road.sql`. + + +Layersets are defined in `.ini` files. + + +## Included layersets + +PgOSM Flex includes a few layersets to get started as examples. +These layersets are defined under `flex-config/layerset/`. +If the `--layerset` is not defined, the `default` layerset is used. + +* `basic` +* `default` +* `everything` +* `minimal` + +Using a built-in layerset other than `default` is done by defining +the `--layerset` option. The following example uses the `minimal` layerset, +including the `place`, `poi`, and `road_major` layers. + + +```bash +docker exec -it \ + pgosm python3 docker/pgosm_flex.py \ + --layerset=minimal \ + --ram=8 \ + --region=north-america/us \ + --subregion=district-of-columbia +``` + +The output from running PgOSM Flex indicates which layers are being loaded. + +``` +2023-01-29 08:47:12,191:INFO:pgosm-flex:helpers:Including place +2023-01-29 08:47:12,192:INFO:pgosm-flex:helpers:Including poi +2023-01-29 08:47:12,192:INFO:pgosm-flex:helpers:Including road_major +``` + + + +## Custom layerset + + +A layerset including the `poi` and `road_major` layers would look +like: + +```ini +[layerset] +poi=true +road_major=true +``` + +Layers not listed in the layerset `.ini` are not included. + + + + +## Using custom layersets + +To use the `--layerset-path` option for custom layerset +definitions, link the directory containing custom styles +to the Docker container in the `docker run` command. +The custom styles will be available inside the container under +`/custom-layerset`. + + +```bash +docker run --name pgosm -d --rm \ + -v ~/pgosm-data:/app/output \ + -v /etc/localtime:/etc/localtime:ro \ + -v ~/custom-layerset:/custom-layerset \ + -e POSTGRES_PASSWORD=$POSTGRES_PASSWORD \ + -p 5433:5432 -d rustprooflabs/pgosm-flex +``` + +Define the layerset name (`--layerset=poi`) and path +(`--layerset-path`) to the `docker exec`. + + +```bash +docker exec -it \ + pgosm python3 docker/pgosm_flex.py \ + --layerset=poi \ + --layerset-path=/custom-layerset/ \ + --ram=8 \ + --region=north-america/us \ + --subregion=district-of-columbia +``` + + diff --git a/docs/MANUAL-STEPS-REPLICATION.md b/docs/src/MANUAL-STEPS-REPLICATION.md similarity index 100% rename from docs/MANUAL-STEPS-REPLICATION.md rename to docs/src/MANUAL-STEPS-REPLICATION.md diff --git a/docs/MANUAL-STEPS-RUN.md b/docs/src/MANUAL-STEPS-RUN.md similarity index 100% rename from docs/MANUAL-STEPS-RUN.md rename to docs/src/MANUAL-STEPS-RUN.md diff --git a/docs/PERFORMANCE.md b/docs/src/PERFORMANCE.md similarity index 95% rename from docs/PERFORMANCE.md rename to docs/src/PERFORMANCE.md index 4b0a5c4..e1c7fed 100644 --- a/docs/PERFORMANCE.md +++ b/docs/src/PERFORMANCE.md @@ -1,6 +1,6 @@ -# PgOSM Flex Performance +# Processing Time -This page provides timings for how long PgOSM-Flex runs for various region sizes. +This page provides timings for how long PgOSM Flex runs for various region sizes. The server used to host these tests has 8 vCPU and 64 GB RAM to match the target server size [outlined in the osm2pgsql manual](https://osm2pgsql.org/doc/manual.html#preparing-the-database). @@ -10,6 +10,9 @@ server size [outlined in the osm2pgsql manual](https://osm2pgsql.org/doc/manual. Versions used for testing: PgOSM Flex 0.4.7 Docker image, based on the official PostGIS image with Postgres 14 / PostGIS 3.2. +Note: Postgres 15 [made GIST indexes faster](https://osm2pgsql.org/news/2023/01/22/faster-with-postgresql15.html) +to create. These timings will be updated in the future with the latest versions. + ## Layerset: Minimal diff --git a/docs/src/POSTGRES-EXTERNAL.md b/docs/src/POSTGRES-EXTERNAL.md new file mode 100644 index 0000000..f40e962 --- /dev/null +++ b/docs/src/POSTGRES-EXTERNAL.md @@ -0,0 +1,77 @@ +# Using External Postgres Connection + + +The PgOSM Flex Docker image can be used with Postgres instance outside the +Docker container. + +Prepare the database and permissions as described in +[Postgres Permissions](POSTGRES-PERMISSIONS.md). + + +Set environment variables to define the connection. Create a file with the +configuration options. + +```bash +touch ~/.pgosm-db-myproject +chmod 0700 ~/.pgosm-db-myproject +nano ~/.pgosm-db-myproject +``` + +Put in the contents specific to your database connection. + +```bash +export POSTGRES_USER=your_login_role +export POSTGRES_PASSWORD=mysecretpassword +export POSTGRES_HOST=your-host-or-ip +export POSTGRES_DB=your_db_name +export POSTGRES_PORT=5432 +``` + +Env vars can be loaded using `source`. + +```bash +source ~/.pgosm-db-myproject +``` + + +Run the container with the additional environment variables. + +```bash +docker run --name pgosm -d --rm \ + -v ~/pgosm-data:/app/output \ + -v /etc/localtime:/etc/localtime:ro \ + -e POSTGRES_USER=$POSTGRES_USER \ + -e POSTGRES_PASSWORD=$POSTGRES_PASSWORD \ + -e POSTGRES_HOST=$POSTGRES_HOST \ + -e POSTGRES_DB=$POSTGRES_DB \ + -e POSTGRES_PORT=$POSTGRES_PORT \ + -p 5433:5432 -d rustprooflabs/pgosm-flex +``` + + + +The `docker exec` command is the same as when using the internal Postgres instance. + +```bash +docker exec -it \ + pgosm python3 docker/pgosm_flex.py \ + --ram=8 \ + --region=north-america/us \ + --subregion=district-of-columbia +``` + + + + +## Notes + + +The `POSTGRES_HOST` value is in relation to the Docker container. +Using `localhost` refers to the Docker container and will use the Postgres instance +within the Docker container, not your host running the Docker container. +Use `ip addr` to find your local host's IP address and provide that. + + + +Setting `POSTGRES_HOST` to anything but `localhost` disables the drop/create database step. This means the target database must be created prior to running PgOSM Flex. + diff --git a/docs/POSTGRES-PERMISSIONS.md b/docs/src/POSTGRES-PERMISSIONS.md similarity index 100% rename from docs/POSTGRES-PERMISSIONS.md rename to docs/src/POSTGRES-PERMISSIONS.md diff --git a/docs/PROJECTS.md b/docs/src/PROJECTS.md similarity index 100% rename from docs/PROJECTS.md rename to docs/src/PROJECTS.md diff --git a/docs/src/QGIS-STYLES-DEV.md b/docs/src/QGIS-STYLES-DEV.md new file mode 100644 index 0000000..3d74c5f --- /dev/null +++ b/docs/src/QGIS-STYLES-DEV.md @@ -0,0 +1,88 @@ +# Developing QGIS Styles + +This page explains how to maintain QGIS layer styles. + +## Add/Update existing records + +The QGIS table does not include `UNIQUE` constraints, so using Postgres' `UPSERT` is +not available by default. + +Add new records from staging, based on object names. + +```sql +INSERT INTO public.layer_styles + (f_table_catalog, f_table_schema, f_table_name, + f_geometry_column, stylename, styleqml, stylesld, + useasdefault, description, "owner", ui, update_time) +SELECT new.f_table_catalog, new.f_table_schema, new.f_table_name, + new.f_geometry_column, new.stylename, new.styleqml, new.stylesld, + new.useasdefault, new.description, new."owner", new.ui, new.update_time + FROM public.layer_styles_staging new + LEFT JOIN public.layer_styles ls + ON new.f_table_catalog = ls.f_table_catalog + AND new.f_table_schema = ls.f_table_schema + AND new.f_table_name = ls.f_table_name + AND new.stylename = ls.stylename + WHERE ls.id IS NULL +; +``` + +To update existing styles. + +```sql +UPDATE public.layer_styles ls + SET f_geometry_column = new.f_geometry_column, + styleqml = new.styleqml, + stylesld = new.stylesld, + useasdefault = new.useasdefault, + description = new.description, + "owner" = new."owner", + ui = new.ui, + update_time = new.update_time + FROM public.layer_styles_staging new + WHERE new.f_table_catalog = ls.f_table_catalog + AND new.f_table_schema = ls.f_table_schema + AND new.f_table_name = ls.f_table_name + AND new.stylename = ls.stylename +; +``` + + +Cleanup the staging table. + +```sql +DELETE FROM public.layer_styles_staging; +``` + + +## Updating Style .sql + +To update (or create new) the .sql file with styles. + +Load into `_staging` table so restoring the data puts it back in the same place. +Optionally add a `WHERE` clause to only export certain styles. + +You may want to update the `owner` field. + +```sql +INSERT INTO public.layer_styles_staging +SELECT * FROM public.layer_styles; + +UPDATE public.layer_styles_staging + SET owner = 'rustprooflabs' + WHERE owner != 'rustprooflabs' +; +``` + + +```bash +pg_dump --no-owner --no-privileges --data-only --table=public.layer_styles_staging \ + -d pgosm \ + -f layer_styles.sql +``` + +Cleanup the staging table. + +```sql +DELETE FROM public.layer_styles_staging; +``` diff --git a/docs/src/QGIS-STYLES.md b/docs/src/QGIS-STYLES.md new file mode 100644 index 0000000..279ec93 --- /dev/null +++ b/docs/src/QGIS-STYLES.md @@ -0,0 +1,40 @@ +# QGIS Styles for PgOSM Flex + + +If you use QGIS to visualize OpenStreetMap, there are a few basic +styles using the `public.layer_styles` table created by QGIS. +This data is loaded by default and can be excluded with `--data-only`. + + +QGIS can save its styling information directly in a table in the Postgres database +using a table `public.layer_styles`. + + +## Prepare + +The `create_layer_styles.sql` script creates the `public.layer_styles` table defined in QGIS 3.16 along with an additional `public.layer_styles_staging` table used to prepare +data before loading. + +```bash +psql -d pgosm -f create_layer_styles.sql +``` + +Load styles to staging. + +```bash +psql -d pgosm -f layer_styles.sql +``` + + +To use these styles as defaults, update the `f_table_catalog` and +`f_table_schema` values in the staging table. The defaults are +`f_table_catalog='pgosm'` and `f_table_schema='osm'`. + + +```sql +UPDATE public.layer_styles_staging + SET f_table_catalog = 'your_db', + f_table_schema = 'osm' +; +``` + diff --git a/docs/src/QUERY.md b/docs/src/QUERY.md new file mode 100644 index 0000000..bdc5f34 --- /dev/null +++ b/docs/src/QUERY.md @@ -0,0 +1,184 @@ +# Querying with PgOSM Flex + +## Nested admin polygons + +Nested admin polygons are stored in the table `osm.place_polygon_nested`. +The `osm.build_nested_admin_polygons()` to populate the table is defined in `flex-config/place.sql`, +the Docker process automatically runs it. +Can run quickly on small areas (Colorado), takes significantly longer on larger +areas (North America). + + +The Python script in the Docker image has a `--skip-nested` option to skip +running the function to populate the table. It can always be populated +at a later time manually using the function. + +```sql +CALL osm.build_nested_admin_polygons(); +``` + +When this process is running for a while it can be monitored with this query. + +```sql +SELECT COUNT(*) AS row_count, + COUNT(*) FILTER (WHERE nest_level IS NOT NULL) AS rows_processed + FROM osm.place_polygon_nested +; +``` + + +# Quality Control Queries + +## Features not Loaded + +The process of selectively load specific features and not others always has the chance +of accidentally missing important data. + +Running and examine tags from the SQL script `db/qc/features_not_in_run_all.sql`. +Run within `psql` (using `\i db/qc/features_not_in_run_all.sql`) or a GUI client +to explore the temp table used to return the aggregated results, `osm_missing`. +The table is a `TEMP TABLE` so will disappear when the session ends. + +Example results from initial run (v0.0.4) showed some obvious omissions from the +current layer definitions. + +```bash +┌────────────────────────────────────────┬────────┐ +│ jsonb_object_keys │ count │ +╞════════════════════════════════════════╪════════╡ +│ landuse │ 110965 │ +│ addr:street │ 89482 │ +│ addr:housenumber │ 89210 │ +│ name │ 47151 │ +│ leisure │ 25351 │ +│ addr:state │ 19051 │ +│ power │ 16933 │ +│ addr:unit │ 13973 │ +│ building:part │ 13773 │ +│ golf │ 13427 │ +│ railway │ 13032 │ +│ addr:city │ 12426 │ +│ addr:postcode │ 12358 │ +│ height │ 12113 │ +│ building:colour │ 11124 │ +│ roof:colour │ 11115 │ +``` + +## Unroutable routes + +The `helpers.lua` methods are probably not perfect. + +* `routable_foot()` +* `routable_cycle()` +* `routable_motor()` + + + +```sql +SELECT * FROM osm.road_line + WHERE NOT route_foot AND NOT route_motor AND NOT route_cycle +; +``` +> Not all rows returned are errors. `highway = 'construction'` is not necessarily determinate... + + +## Relations missing from unitable + +```sql +SELECT t.* + FROM osm.tags t + WHERE t.geom_type = 'R' + AND NOT EXISTS ( + SELECT 1 + FROM osm.unitable u + WHERE u.geom_type = t.geom_type AND t.osm_id = u.osm_id +); +``` + + + +## Points of Interest (POIs) + +PgOSM Flex loads an range of tags into a materialized view (`osm.poi_all`) for +easily searching POIs. +Line and polygon data is forced to point geometry using +`ST_Centroid()`. This layer duplicates a bunch of other more specific layers +(shop, amenity, etc.) to provide a single place for simplified POI searches. + +Special layer included by layer sets `run-all` and `run-no-tags`. +See `style/poi.lua` for logic on how to include POIs. +The topic of POIs is subject and likely is not inclusive of everything that probably should be considered +a POI. If there are POIs missing +from this table please submit a [new issue](https://github.com/rustprooflabs/pgosm-flex/issues/new) +with sufficient details about what is missing. +Pull requests also welcome! [See CONTRIBUTING.md](CONTRIBUTING.md). + + +Counts of POIs by `osm_type`. + +```sql +SELECT osm_type, COUNT(*) + FROM osm.vpoi_all + GROUP BY osm_type + ORDER BY COUNT(*) DESC; +``` + +Results from Washington D.C. subregion (March 2020). + +``` +┌──────────┬───────┐ +│ osm_type │ count │ +╞══════════╪═══════╡ +│ amenity │ 12663 │ +│ leisure │ 2701 │ +│ building │ 2045 │ +│ shop │ 1739 │ +│ tourism │ 729 │ +│ man_made │ 570 │ +│ landuse │ 32 │ +│ natural │ 19 │ +└──────────┴───────┘ +``` + +Includes Points (`N`), Lines (`L`) and Polygons (`W`). + + +```sql +SELECT geom_type, COUNT(*) + FROM osm.vpoi_all + GROUP BY geom_type + ORDER BY COUNT(*) DESC; +``` + +``` +┌───────────┬───────┐ +│ geom_type │ count │ +╞═══════════╪═══════╡ +│ W │ 10740 │ +│ N │ 9556 │ +│ L │ 202 │ +└───────────┴───────┘ +``` + +## Meta table + +PgOSM Flex tracks processing metadata in the ``osm.pgosm_flex`` table. The initial import +has `osm2pgsql_mode = 'create'`, the subsequent update has +`osm2pgsql_mode = 'append'`. + + +```sql +SELECT osm_date, region, srid, + pgosm_flex_version, osm2pgsql_version, osm2pgsql_mode + FROM osm.pgosm_flex +; +``` + +```bash +┌────────────┬───────────────────────────┬──────┬────────────────────┬───────────────────┬────────────────┐ +│ osm_date │ region │ srid │ pgosm_flex_version │ osm2pgsql_version │ osm2pgsql_mode │ +╞════════════╪═══════════════════════════╪══════╪════════════════════╪═══════════════════╪════════════════╡ +│ 2022-11-04 │ north-america/us-colorado │ 3857 │ 0.6.2-e1f140f │ 1.7.2 │ create │ +│ 2022-11-25 │ north-america/us-colorado │ 3857 │ 0.6.2-e1f140f │ 1.7.2 │ append │ +└────────────┴───────────────────────────┴──────┴────────────────────┴───────────────────┴────────────────┘ +``` diff --git a/docs/src/QUICK-START.md b/docs/src/QUICK-START.md new file mode 100644 index 0000000..4becd09 --- /dev/null +++ b/docs/src/QUICK-START.md @@ -0,0 +1,161 @@ +# Quick Start + + + +See the [Docker Usage](#docker-usage) section below for an explanation of +these commands. + +```bash +mkdir ~/pgosm-data +export POSTGRES_USER=postgres +export POSTGRES_PASSWORD=mysecretpassword + +docker run --name pgosm -d --rm \ + -v ~/pgosm-data:/app/output \ + -v /etc/localtime:/etc/localtime:ro \ + -e POSTGRES_PASSWORD=$POSTGRES_PASSWORD \ + -p 5433:5432 -d rustprooflabs/pgosm-flex + +docker exec -it \ + pgosm python3 docker/pgosm_flex.py \ + --ram=8 \ + --region=north-america/us \ + --subregion=district-of-columbia +``` + + + +## PgOSM via Docker + +The PgOSM Flex +[Docker image](https://hub.docker.com/r/rustprooflabs/pgosm-flex) +is hosted on Docker Hub. +The image includes all the pre-requisite software and handles all of the options, +logic, an post-processing steps required. Features include: + +* Automatic data download from Geofabrik and validation against checksum +* Custom Flex layers built in Lua +* Mix and match layers using Layersets +* Loads to Docker-internal Postgres, or externally defined Postgres +* Supports `osm2pgsql-replication` and `osm2pgsql --append` mode +* Export processed data via `pg_dump` for loading into additional databases + + +## Docker usage + +This section outlines a typical import using Docker to run PgOSM Flex. +See the full Docker instructions in [docs/DOCKER-RUN.md](docs/DOCKER-RUN.md). + +Create directory for the `.osm.pbf` file, output `.sql` file, log output, and +the osm2pgsql command ran. + + +```bash +mkdir ~/pgosm-data +``` + +Set environment variables for the temporary Postgres connection in Docker. +These are required for the Docker container to run. + + +```bash +export POSTGRES_USER=postgres +export POSTGRES_PASSWORD=mysecretpassword +``` + +Start the `pgosm` Docker container. At this point, Postgres / PostGIS +is available on port `5433`. + +```bash +docker run --name pgosm -d --rm \ + -v ~/pgosm-data:/app/output \ + -v /etc/localtime:/etc/localtime:ro \ + -e POSTGRES_PASSWORD=$POSTGRES_PASSWORD \ + -p 5433:5432 -d rustprooflabs/pgosm-flex +``` + +Use `docker exec` to run the processing for the Washington D.C subregion. +This example uses three (3) parameters to specify the total system RAM (8 GB) +along with a region/subregion. + +* Total RAM for osm2pgsql, Postgres and OS (`8`) +* Region (`north-america/us`) +* Sub-region (`district-of-columbia`) (Optional) + + +```bash +docker exec -it \ + pgosm python3 docker/pgosm_flex.py \ + --ram=8 \ + --region=north-america/us \ + --subregion=district-of-columbia +``` + + +The above command takes roughly 1 minute to run if the PBF for today +has already been downloaded. +If the PBF is not downloaded it will depend on how long +it takes to download the 17 MB PBF file + ~ 1 minute processing. + + +## After processing + +The processed OpenStreetMap data is also available in the Docker container on port `5433`. +You can connect and query directly in the Docker container. + +```bash +psql -h localhost -p 5433 -d pgosm -U postgres -c "SELECT COUNT(*) FROM osm.road_line;" + +┌───────┐ +│ count │ +╞═══════╡ +│ 39865 │ +└───────┘ +``` + + +The `~/pgosm-data` directory has two (2) files from a typical single run. +The PBF file and its MD5 checksum have been renamed with the date in the filename. +This enables loading the file downloaded today +again in the future, either with the same version of PgOSM Flex or the latest version. The `docker exec` command uses the `PGOSM_DATE` environment variable +to load these historic files. + + +If `--pg-dump` option is used the output `.sql` is also saved in +the `~/pgosm-data` directory. +This `.sql` file can be loaded into any other database with PostGIS and the proper +permissions. + + +```bash +ls -alh ~/pgosm-data/ + +-rw-r--r-- 1 root root 18M Jan 21 03:45 district-of-columbia-2023-01-21.osm.pbf +-rw-r--r-- 1 root root 70 Jan 21 04:39 district-of-columbia-2023-01-21.osm.pbf.md5 +-rw-r--r-- 1 root root 163M Jan 21 16:14 north-america-us-district-of-columbia-default-2023-01-21.sql +``` + + + +## Meta table + +PgOSM Flex tracks processing metadata in the ``osm.pgosm_flex`` table. The initial import +has `osm2pgsql_mode = 'create'`, the subsequent update has +`osm2pgsql_mode = 'append'`. + + +```sql +SELECT osm_date, region, srid, + pgosm_flex_version, osm2pgsql_version, osm2pgsql_mode + FROM osm.pgosm_flex +; +``` + +```bash +┌────────────┬───────────────────────────┬──────┬────────────────────┬───────────────────┬────────────────┐ +│ osm_date │ region │ srid │ pgosm_flex_version │ osm2pgsql_version │ osm2pgsql_mode │ +╞════════════╪═══════════════════════════╪══════╪════════════════════╪═══════════════════╪════════════════╡ +│ 2022-11-04 │ north-america/us-colorado │ 3857 │ 0.6.2-e1f140f │ 1.7.2 │ create │ +│ 2022-11-25 │ north-america/us-colorado │ 3857 │ 0.6.2-e1f140f │ 1.7.2 │ append │ +└────────────┴───────────────────────────┴──────┴────────────────────┴───────────────────┴────────────────┘ +``` diff --git a/docs/src/README.md b/docs/src/README.md new file mode 100644 index 0000000..e0e6749 --- /dev/null +++ b/docs/src/README.md @@ -0,0 +1,69 @@ +# PgOSM Flex + +PgOSM Flex ([GitHub](https://github.com/rustprooflabs/pgosm-flex)) +provides high quality OpenStreetMap datasets in PostGIS using the +[osm2pgsql Flex output](https://osm2pgsql.org/doc/manual.html#the-flex-output). +This project provides a curated set of Lua and SQL scripts to clean and organize +the most commonly used OpenStreetMap data, such as roads, buildings, and points of interest (POIs). + +Running PgOSM Flex is easy via the PgOSM Docker image +[hosted on Docker Hub](https://hub.docker.com/repository/docker/rustprooflabs/pgosm-flex). + + +1. The [quick start](QUICK-START.md) shows how easy it is to get started +1. Change how PgOSM Flex runs with [common customizations](COMMON-CUSTOMIZATION.md) +1. [Customize layersets](LAYERSETS.md) to change what data you load + + +## Project goals + +* High quality spatial data +* Reliable +* Easy to customize +* Easy to use + + +## Project decisions + +A few decisions made in this project: + +* ID column is `osm_id` +* Geometry column named `geom` +* Defaults to same units as OpenStreetMap (e.g. km/hr, meters) +* Data not included in a dedicated column is available from `osm.tags.tags` (`JSONB`) +* Points, Lines, and Polygons are not mixed in a single table +* Tracks latest Postgres, PostGIS, and osm2pgsql versions + +This project's approach is to do as much processing in the Lua styles +passed along to osm2pgsql, with post-processing steps creating indexes, +constraints and comments. + + +## Versions Supported + +Minimum versions supported: + +* Postgres 12 +* PostGIS 3.0 +* osm2pgsql 1.8.0 + +Defining [Postgres indexes in the Lua styles](https://osm2pgsql.org/doc/manual.html#defining-indexes) +bumps osm2pgsql minimum requirement to 1.8.0. + +This project will attempt, but not guarantee, to support PostgreSQL 12 until it +reaches it EOL support. + + +## Minimum Hardware + +### RAM + +osm2pgsql requires [at least 2 GB RAM](https://osm2pgsql.org/doc/manual.html#main-memory). + +### Storage + +Fast SSD drives are strongly recommended. It should work on slower storage devices (HDD, +SD, etc), +however the [osm2pgsql-tuner](https://github.com/rustprooflabs/osm2pgsql-tuner) +package used to determine the best osm2pgsql command assumes fast SSDs. + diff --git a/docs/src/REPLICATION.md b/docs/src/REPLICATION.md new file mode 100644 index 0000000..c0c1d44 --- /dev/null +++ b/docs/src/REPLICATION.md @@ -0,0 +1,77 @@ +# Stay Updated with Replication + +The `--replication` option of PgOSM Flex enables `osm2pgsql-replication` +to provide an easy and quick way to keep your OpenStreetMap data refreshed. + + +> The `--replication` mode is stable as of 0.7.0. It was added as an experimental feature in 0.4, originally under the `--append` option. + + +PgOSM Flex's `--replication` mode wraps around the `osm2pgsql-replication` package +included with `osm2pgsql`. The first time running an import with `--replication` +mode runs osm2pgsql normally, with `--slim` mode and without `--drop`. +After osm2pgsql completes, `osm2pgsql-replication init ...` is ran to setup +the DB for updates. +This mode of operation results in larger database as the intermediate osm2pgsql +tables (`--slim`) must be left in the database (no `--drop`). + + +> Important: The original `--append` option is now under `--replication`. The `--append` option was removed in PgOSM Flex 0.7.0. See [#275](https://github.com/rustprooflabs/pgosm-flex/issues/275) for context. + + +When using replication you need to pin your process to a specific PgOSM Flex version +in the `docker run` command. When upgrading to new versions, +be sure to check the release notes for manual upgrade steps for `--replication`. +The release notes for +[PgOSM Flex 0.6.1](https://github.com/rustprooflabs/pgosm-flex/releases/tag/0.6.1) +are one example. +The notes discussed in the release notes have reference SQL scripts +under `db/data-migration` folder. + +---- + +**WARNING - Due to the ability to configure custom layersets these data-migration +scripts need manual review, and possibly manual adjustments for +your specific database and process.** + +---- + + +The other important change when using replication is to increase Postgres' `max_connections`. +See [this discussion on osm2pgsql](https://github.com/openstreetmap/osm2pgsql/discussions/1650) +for why this is necessary. + +If using the Docker-internal Postgres instance this is done with `-c max_connections=300` +in the `docker run` command. External database connections must update this +in the appropriate `postgresql.conf` file. + + +```bash +docker run --name pgosm -d --rm \ + -v ~/pgosm-data:/app/output \ + -v /etc/localtime:/etc/localtime:ro \ + -e POSTGRES_PASSWORD=$POSTGRES_PASSWORD \ + -p 5433:5432 \ + -d rustprooflabs/pgosm-flex:0.7.0 \ + -c max_connections=300 +``` + + +Run the `docker exec` step with `--replication`. + +```bash +docker exec -it \ + pgosm python3 docker/pgosm_flex.py \ + --ram=8 \ + --region=north-america/us \ + --subregion=district-of-columbia \ + --pgosm-date 2022-12-30 \ + --replication +``` + +Running the above command a second time will detect that the target database +has `osm2pgsql-replication` setup and load data via the defined replication +service. + +> Note: The `--pgosm-date` parameter is ignored during subsequent imports using `--replication`. + diff --git a/docs/ROUTING.md b/docs/src/ROUTING.md similarity index 100% rename from docs/ROUTING.md rename to docs/src/ROUTING.md diff --git a/docs/src/SUMMARY.md b/docs/src/SUMMARY.md new file mode 100644 index 0000000..9576a10 --- /dev/null +++ b/docs/src/SUMMARY.md @@ -0,0 +1,39 @@ +# Summary + +[About PgOSM Flex](README.md) + +# User Guide + +- [Quick Start](./QUICK-START.md) +- [Common Customizations](./COMMON-CUSTOMIZATION.md) +- [Customize Layersets](./LAYERSETS.md) +- [One Big Guide of Docker - Being replaced with more targeted pages](./DOCKER-RUN.md) +- [Processing Time](./PERFORMANCE.md) +- [Query examples](./QUERY.md) +- [Routing](./ROUTING.md) + + +# Production usages + + +- [Using External Postgres Connection](./POSTGRES-EXTERNAL.md) +- [Postgres Permissions](./POSTGRES-PERMISSIONS.md) +- [Stay Updated with Replication](./REPLICATION.md) +- [Using Update Mode](./UPDATE-MODE.md) +- [QGIS Styles](./QGIS-STYLES.md) + +# Developers + +- [Build and Push Docker Images](./DOCKER-BUILD.md) +- [Testing PgOSM Flex](./TESTS.md) +- [Developing QGIS Styles](./QGIS-STYLES-DEV.md) + + +# PgOSM Flex w/out Docker + +These steps are not regularly tested or explicitly supported. + +- [MANUAL-STEPS-RUN](./MANUAL-STEPS-RUN.md) +- [MANUAL-STEPS-REPLICATION](./MANUAL-STEPS-REPLICATION.md) + + diff --git a/docs/src/TESTS.md b/docs/src/TESTS.md new file mode 100644 index 0000000..8764276 --- /dev/null +++ b/docs/src/TESTS.md @@ -0,0 +1,76 @@ +# Testing PgOSM Flex + +The `Makefile` at the root of this project tests many core aspects of +PgOSM Flex's functionality. It builds the Docker image, tests a few usage +scenarios (including `--input-file`) and runs both Python unit tests +and Data Import tests. The data import tests verify row counts by +`osm_type` and `osm_subtype` of many tables. + + +## Run all tests + +To run all tests run `make` from the project's root directory. + +```bash +make +``` + +A simplified usage for quicker testing during development. + +```bash +make docker-exec-default unit-tests +``` + + +## Python unit tests + +The Python unit tests are under `pgosm-flex/docker/tests/`. These tests use +Python's `unittest` module. The `make` process runs these using +`coverage run ...` and `coverage report ...`. +See the [Makefile](https://github.com/rustprooflabs/pgosm-flex/blob/main/Makefile) +for exact implementation. + + +## Data import tests + +Under `pgosm-flex/tests`. The `run-output-tests.sh` script is ran by +running `make`. The script loops over the `.sql` scripts under +`pgosm-flex/tests/sql/`, runs the queries via `psql` using +`--no-psqlrc -tA` and compares the output from the query against the +expected output saved under `pgosm-flex/tests/expected`. + + + + +> FIXME: At this time the `run-extra-loads.sh` script is not ran automatically. There are not any usage notes covering those random side tests. + + +## What is not tested + +Functionality of `osm2pgsql-replication` to actually update data. Challenge +is that to test this it requires having a recent `.osm.pbf` file for the initial +import. Attempting to use the test D.C. file used for all other testing +(from January 13, 2021), the initial import works, however a subsequent +refresh fails. + +```bash +2023-01-29 08:11:35,553:INFO:pgosm-flex:helpers:2023-01-29 08:11:35 [INFO]: Using replication service 'http://download.geofabrik.de/north-america/us/district-of-columbia-updates'. Current sequence 2856 (2021-01-13 14:42:03-07:00). +2023-01-29 08:11:36,866:INFO:pgosm-flex:helpers:Traceback (most recent call last): +2023-01-29 08:11:36,866:INFO:pgosm-flex:helpers:File "/usr/local/bin/osm2pgsql-replication", line 556, in +2023-01-29 08:11:36,866:INFO:pgosm-flex:helpers:sys.exit(main()) +2023-01-29 08:11:36,866:INFO:pgosm-flex:helpers:File "/usr/local/bin/osm2pgsql-replication", line 550, in main +2023-01-29 08:11:36,867:INFO:pgosm-flex:helpers:return args.handler(conn, args) +2023-01-29 08:11:36,867:INFO:pgosm-flex:helpers:File "/usr/local/bin/osm2pgsql-replication", line 402, in update +2023-01-29 08:11:36,867:INFO:pgosm-flex:helpers:endseq = repl.apply_diffs(outhandler, seq + 1, +2023-01-29 08:11:36,867:INFO:pgosm-flex:helpers:File "/usr/local/lib/python3.9/dist-packages/osmium/replication/server.py", line 177, in apply_diffs +2023-01-29 08:11:36,868:INFO:pgosm-flex:helpers:diffs = self.collect_diffs(start_id, max_size) +2023-01-29 08:11:36,868:INFO:pgosm-flex:helpers:File "/usr/local/lib/python3.9/dist-packages/osmium/replication/server.py", line 143, in collect_diffs +2023-01-29 08:11:36,868:INFO:pgosm-flex:helpers:left_size -= rd.add_buffer(diffdata, self.diff_type) +2023-01-29 08:11:36,868:INFO:pgosm-flex:helpers:RuntimeError: gzip error: inflate failed: incorrect header check +2023-01-29 08:11:36,890:WARNING:pgosm-flex:pgosm_flex:Failure. Return code: 1 +2023-01-29 08:11:36,890:INFO:pgosm-flex:pgosm_flex:Skipping pg_dump +2023-01-29 08:11:36,890:WARNING:pgosm-flex:pgosm_flex:PgOSM Flex completed with errors. Details in output +``` + + + diff --git a/docs/UPDATE-MODE.md b/docs/src/UPDATE-MODE.md similarity index 100% rename from docs/UPDATE-MODE.md rename to docs/src/UPDATE-MODE.md diff --git a/docs/dc-example-route-start-end-vertices.png b/docs/src/dc-example-route-start-end-vertices.png similarity index 100% rename from docs/dc-example-route-start-end-vertices.png rename to docs/src/dc-example-route-start-end-vertices.png diff --git a/docs/dc-example-route-start-motor-access-control.png b/docs/src/dc-example-route-start-motor-access-control.png similarity index 100% rename from docs/dc-example-route-start-motor-access-control.png rename to docs/src/dc-example-route-start-motor-access-control.png diff --git a/docs/dc-example-route-start-no-access-control.png b/docs/src/dc-example-route-start-no-access-control.png similarity index 100% rename from docs/dc-example-route-start-no-access-control.png rename to docs/src/dc-example-route-start-no-access-control.png