Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
21 commits
Select commit Hold shift + click to select a range
027485d
Beginning of restructuring, non-functional
rustprooflabs Dec 15, 2022
d8f4ebc
Hard coding a fix back to functional state
rustprooflabs Dec 16, 2022
c9943fa
Renaming internal varibles throughout to make functionality more clea…
rustprooflabs Dec 26, 2022
1822683
Start moving import mode logic to new module
rustprooflabs Dec 27, 2022
5155b41
Fix broken unit tests. Add new unit tests for ImportMode. Minor clea…
rustprooflabs Dec 27, 2022
672d9fd
Testing additional recommended osm2pgsql commands for new run-time sc…
rustprooflabs Dec 27, 2022
4cf7be7
Add logic to skip post-processing SQL with --update append. Add basic…
rustprooflabs Dec 27, 2022
5d66a09
Merge pull request #285 from rustprooflabs/add--for-update--and--upda…
rustprooflabs Dec 27, 2022
4676948
Adjust example of update mode to use much smaller extracts via osmium
rustprooflabs Dec 28, 2022
b30b73d
Bump osm2pgsql-tuner requirement. Adding notes to unmaintained docume…
rustprooflabs Dec 30, 2022
50924db
Clean up readme files
rustprooflabs Dec 31, 2022
7536991
Merge pull request #288 from rustprooflabs/cleanup-and-docs
rustprooflabs Dec 31, 2022
898585c
Update runtime options to use --pg-dump option instead of --skip-dump…
rustprooflabs Dec 31, 2022
897aff6
Flip logic to avoid using not
rustprooflabs Dec 31, 2022
6ff2b93
Merge pull request #289 from rustprooflabs/switch-from-skip-dump-to-p…
rustprooflabs Dec 31, 2022
5d30d9f
Add upgrade step to Dockerfile
rustprooflabs Jan 13, 2023
9617e24
Finish removing --append and --skip-dump options
rustprooflabs Jan 15, 2023
741c2e4
Merge pull request #291 from rustprooflabs/parameter-cleanup
rustprooflabs Jan 15, 2023
db10840
Follow up to prior, removing options from function call
rustprooflabs Jan 15, 2023
1552fed
Merge pull request #292 from rustprooflabs/really-remove-options
rustprooflabs Jan 15, 2023
321edb2
Bump license year. Add docker build notes
rustprooflabs Jan 15, 2023
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@ LABEL maintainer="PgOSM Flex - https://github.com/rustprooflabs/pgosm-flex"
ARG OSM2PGSQL_BRANCH=master

RUN apt-get update \
&& apt-get upgrade -y \
&& apt-get install -y --no-install-recommends \
sqitch wget ca-certificates \
git make cmake g++ \
Expand Down
2 changes: 1 addition & 1 deletion LICENSE
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
MIT License

Copyright (c) 2020-2022 Ryan Lambert
Copyright (c) 2020-2023 Ryan Lambert

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
Expand Down
9 changes: 5 additions & 4 deletions Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -83,7 +83,8 @@ docker-exec-default: build-run-docker
--layerset=everything \
--ram=$(RAM) \
--region=north-america/us \
--subregion=district-of-columbia
--subregion=district-of-columbia \
--pg-dump # pg_dump is not part of the default. Added here to ensure this usage is tested


.PHONE: docker-exec-input-file
Expand All @@ -110,7 +111,7 @@ docker-exec-input-file: build-run-docker
--layerset=minimal \
--ram=$(RAM) \
--input-file=/app/output/$(INPUT_FILE_NAME) \
--data-only --skip-dump --skip-nested # Make this test run faster
--data-only --skip-nested # Make this test run faster



Expand Down Expand Up @@ -149,7 +150,7 @@ docker-exec-replication-w-input-file: build-run-docker
--ram=$(RAM) \
--replication \
--input-file=/app/output/$(INPUT_FILE_NAME) \
--data-only --skip-dump --skip-nested # Make this test run faster
--data-only --skip-nested # Make this test run faster


.PHONE: docker-exec-region
Expand Down Expand Up @@ -179,7 +180,7 @@ docker-exec-region: build-run-docker
--layerset=minimal \
--ram=$(RAM) \
--region=$(REGION_FILE_NAME) \
--data-only --skip-dump --skip-nested # Make this test run faster
--data-only --skip-nested # Make this test run faster


.PHONY: unit-tests
Expand Down
95 changes: 44 additions & 51 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,9 +5,10 @@ PgOSM Flex provides high quality OpenStreetMap datasets in PostGIS using the
This project provides a curated set of Lua and SQL scripts to clean and organize
the most commonly used OpenStreetMap data, such as roads, buildings, and points of interest (POIs).

The easiest way to use PgOSM Flex is via [the Docker image](docs/DOCKER-RUN.md).
For ultimate control and customization,
there are [instructions for installing and running manually](docs/MANUAL-STEPS-RUN.md).
The recommended way to use PgOSM Flex is via the PgOSM Docker image
[hosted on Docker Hub](https://hub.docker.com/repository/docker/rustprooflabs/pgosm-flex).
Basic usage instructions are included in this README.md file, full Docker
usage instructions are available in [docs/DOCKER-RUN.md](docs/DOCKER-RUN.md).


## Project decisions
Expand All @@ -17,13 +18,14 @@ A few decisions made in this project:
* ID column is `osm_id`
* Geometry stored in SRID 3857 (customizable)
* Geometry column named `geom`
* Default to same units as OpenStreetMap (e.g. km/hr, meters)
* Data not deemed worthy of a dedicated column goes in side table `osm.tags`. Raw key/value data stored in `JSONB` column
* Defaults to same units as OpenStreetMap (e.g. km/hr, meters)
* Data not included in a dedicated column goes into the `osm.tags` table's `JSONB` column
* Points, Lines, and Polygons are not mixed in a single table
* Tracks latest Postgres, PostGIS, and osm2pgsql versions

This project's approach is to do as much processing in the Lua styles
passed along to osm2pgsql, with post-processing steps creating indexes, constraints and comments.
passed along to osm2pgsql, with post-processing steps creating indexes,
constraints and comments.



Expand Down Expand Up @@ -66,9 +68,9 @@ The PBF/MD5 source files are archived by date on your local storage
with the ability to easily reload them at a later date.


### Basic Docker usage
### Docker usage

This section outlines the basic operations for using Docker to run PgOSM-Flex.
This section outlines a typical import using Docker to run PgOSM-Flex.
See [the full Docker instructions in docs/DOCKER-RUN.md](docs/DOCKER-RUN.md).

Create directory for the `.osm.pbf` file, output `.sql` file, log output, and
Expand Down Expand Up @@ -100,7 +102,7 @@ docker run --name pgosm -d --rm \
```

Use `docker exec` to run the processing for the Washington D.C subregion.
This example uses three (3) parameters to specify the totaol system RAM (8 GB)
This example uses three (3) parameters to specify the total system RAM (8 GB)
along with a region/subregion.

* Total RAM for osm2pgsql, Postgres and OS (`8`)
Expand All @@ -127,14 +129,29 @@ it takes to download the 17 MB PBF file + ~ 1 minute processing.
### After processing


The `~/pgosm-data` directory has three (3) files from a single run.
The processed OpenStreetMap data is also available in the Docker container on port `5433`.
You can connect and query directly in the Docker container.

```bash
psql -h localhost -p 5433 -d pgosm -U postgres -c "SELECT COUNT(*) FROM osm.road_line;"

┌───────┐
│ count │
╞═══════╡
│ 39865 │
└───────┘
```


The `~/pgosm-data` directory has two (2) files from a typical single run.
The PBF file and its MD5 checksum have been renamed with the date in the filename.
This enables loading the file downloaded today
again in the future, either with the same version of PgOSM Flex or the latest version. The `docker exec` command uses the `PGOSM_DATE` environment variable
to load these historic files.


The output `.sql` is also saved in the `~/pgosm-data` directory.
If the optional `--pg-dump` option is used, the output `.sql` is also saved in
the `~/pgosm-data` directory.


```bash
Expand All @@ -143,10 +160,8 @@ ls -alh ~/pgosm-data/
-rw-r--r-- 1 root root 17M Nov 2 19:57 district-of-columbia-2021-11-03.osm.pbf
-rw-r--r-- 1 root root 70 Nov 2 19:59 district-of-columbia-2021-11-03.osm.pbf.md5
-rw-r--r-- 1 root root 156M Nov 3 19:10 pgosm-flex-north-america-us-district-of-columbia-default-2021-11-03.sql

```


This `.sql` file can be loaded into a PostGIS enabled database. The following example
creates an empty `myosm` database to load the processed OpenStreetMap data into.

Expand All @@ -160,23 +175,7 @@ psql -d myosm \
```


The processed OpenStreetMap data is also available in the Docker container on port `5433`.
You can connect and query directly in the Docker container.

```bash
psql -h localhost -p 5433 -d pgosm -U postgres -c "SELECT COUNT(*) FROM osm.road_line;"

┌───────┐
│ count │
╞═══════╡
│ 39865 │
└───────┘
```



See [more in docs/DOCKER-RUN.md](docs/DOCKER-RUN.md) about other ways to customize
how PgOSM Flex runs.
> See [more in docs/DOCKER-RUN.md](docs/DOCKER-RUN.md) about other ways to customize how PgOSM Flex runs.


## Layer Sets
Expand All @@ -191,12 +190,11 @@ See [docs/LAYERSETS.md](docs/LAYERSETS.md) for details.

If you use QGIS to visualize OpenStreetMap, there are a few basic
styles using the `public.layer_styles` table created by QGIS.
This data is loaded by default and can be excluded with `--data-only`.

See [the QGIS Style README.md](https://github.com/rustprooflabs/pgosm-flex/blob/main/db/qgis-style/README.md)
for more information.

This data is loaded by default and can be excluded with `--data-only`.


## Explore data loaded

Expand Down Expand Up @@ -264,8 +262,8 @@ For example queries with data loaded by PgOSM-Flex see

## Points of Interest (POIs)


Loads an range of tags into a materialized view (`osm.poi_all`) for easy searching POIs.
PgOSM Flex loads an range of tags into a materialized view (`osm.poi_all`) for
easily searching POIs.
Line and polygon data is forced to point geometry using
`ST_Centroid()`. This layer duplicates a bunch of other more specific layers
(shop, amenity, etc.) to provide a single place for simplified POI searches.
Expand Down Expand Up @@ -330,10 +328,10 @@ SELECT geom_type, COUNT(*)
## One table to rule them all

From the perspective of database design, the `osm.unitable` option is the **worst**!
This violates all sorts of best practices established in this project
This table violates all sorts of best practices established in this project
by shoving all features into a single unstructured table.

> This style included in PgOSM-Flex is intended to be used for troubleshooting and quality control. It is not intended to be used for real production workloads! This table is helpful for exploring the full data set when you don't really know what you are looking for, but you know **where** you are looking.
> This style included in PgOSM Flex is intended to be used for troubleshooting and quality control. It is not intended to be used for real production workloads! This table is helpful for exploring the full data set when you don't really know what you are looking for, but you know **where** you are looking.

Unitable is loaded with the `everything` layerset. Feel free to create your own
customized layerset if needed.
Expand All @@ -355,24 +353,19 @@ docker exec -it \

## JSONB support

PgOSM-Flex uses `JSONB` in Postgres to store the raw OpenSteetMap
PgOSM-Flex uses `JSONB` in Postgres to store the raw OpenStreetMap
key/value data (`tags` column) and relation members (`member_ids`).
The `tags` column only exists in the `osm.tags` and `osm.unitable` tables.
The `member_ids` column is included in:

Current `JSONB` columns:

* `osm.tags.tags`
* `osm.unitable.tags`
* `osm.place_polygon.member_ids`
* `osm.vplace_polygon.member_ids`
* `osm.poi_polygon.member_ids`



## On-server import
* `osm.place_polygon`
* `osm.poi_polygon`
* `osm.public_transport_line`
* `osm.public_transport_polygon`
* `osm.road_line`
* `osm.road_major`
* `osm.road_polygon`

Don't want to use the Docker process?
See [docs/MANUAL-STEPS-RUN.md](docs/MANUAL-STEPS-RUN.md) for prereqs and steps
for running without Docker.



Expand Down
24 changes: 16 additions & 8 deletions docker/db.py
Original file line number Diff line number Diff line change
Expand Up @@ -181,25 +181,33 @@ def pg_isready():
return True


def prepare_pgosm_db(data_only, db_path, replication):
def prepare_pgosm_db(data_only, db_path, import_mode):
"""Runs through series of steps to prepare database for PgOSM.

Parameters
--------------------------
data_only : bool
db_path : str
replication : bool
import_mode : import_mode.ImportMode
"""

if pg_conn_parts()['pg_host'] == 'localhost':
drop_it = True
LOGGER.debug('Running standard database prep for in-Docker operation. Includes DROP/CREATE DATABASE')
if replication:
LOGGER.debug('Skipping DB drop b/c of append (osm2pgsql-replication) mode')
else:
LOGGER.debug('Dropping database')
LOGGER.debug(f'import_mode: {import_mode}')
if import_mode.slim_no_drop:
if not import_mode.append_first_run:
drop_it = False
if import_mode.replication_update:
drop_it = False

if drop_it:
LOGGER.debug('Dropping local database if exists')
drop_pgosm_db()
else:
LOGGER.debug('Not dropping local DB. This is expected with subsequent import via --replication OR --update=append.')

create_pgosm_db()

else:
LOGGER.info('Using external database. Ensure the target database is setup properly with proper permissions.')

Expand All @@ -214,7 +222,7 @@ def prepare_pgosm_db(data_only, db_path, replication):


def pg_version_check():
"""Checks Postgres machine-readible server_version_num.
"""Checks Postgres machine-readable server_version_num.

Sends to logs and returns value.

Expand Down
82 changes: 82 additions & 0 deletions docker/import_mode.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,82 @@
"""Import Mode provides class to ease logic related to various import modes.
"""
import logging


class ImportMode():
"""Determines logical variables used to control program flow.

WARNING: The values for `append_first_run` and `replication_update`
are used to determine when to drop the local DB. Be careful with any
changes to these values.
"""
def __init__(self, replication, replication_update, update):
"""Computes two variables, slim_no_drop and append_first_run
based on inputs.

Parameters
--------------------------
replication : bool
replication_update : bool
update : str or None
Valid options are 'create' or 'append', lining up with osm2pgsql's
`--create` and `--append` modes.
"""
self.logger = logging.getLogger('pgosm-flex')
self.replication = replication
self.replication_update = replication_update

# The input via click should enforce this, still worth checking here
valid_update_options = ['append', 'create', None]

if update not in valid_update_options:
raise ValueError(f'Invalid option for --update. Valid options: {valid_update_options}')

self.update = update
self.set_slim_no_drop()
self.set_append_first_run()
self.set_run_post_sql()


def set_append_first_run(self):
"""Uses `replication_update` and `update` to determine value for
`self.append_first_run`
"""
if self.replication_update:
self.append_first_run = False
else:
self.append_first_run = True

if self.update is not None:
if self.update == 'create':
self.append_first_run = True
else:
self.append_first_run = False

def set_slim_no_drop(self):
"""Uses `replication` and `update` to determine value for
`self.slim_no_drop`
"""
self.slim_no_drop = False

if self.replication:
self.slim_no_drop = True

if self.update is not None:
self.slim_no_drop = True

def set_run_post_sql(self):
"""Uses `update` value to determine value for
`self.run_post_sql`. This value determines if the post-processing SQL
should be executed.

Note: Not checking replication/replication_update because subsequent
imports use osm2pgsql-replication, which does not attempt to run
the post-processing SQL scripts.
"""
self.run_post_sql = True

if self.update is not None:
if self.update == 'append':
self.run_post_sql = False

Loading