consolidate_intersections makes non-integer node IDs which result in invalid osm xml #647

parkertimmins · 2021-01-27T18:00:25Z

Problem description

What did you do?
Loaded place graph, projected graph, consolidated intersections, saved to osm xml file.
What did you expect to happen?
Resulting xml file should have integer node IDs, per https://wiki.openstreetmap.org/wiki/Elements
And what actually happened instead?
Resulting xml has some node IDs of the form: "123-0", "1-1", etc..

Environment information

What operating system are you using? Ubuntu 18.04
What Python version are you using? 3.9.1
What OSMnx version are you using? 1.0.1

# Name Version Build Channel _libgcc_mutex 0.1 conda_forge conda-forge _openmp_mutex 4.5 1_gnu conda-forge alabaster 0.7.12 py_0 anaconda appdirs 1.4.4 py_0 anaconda attrs 20.3.0 pyhd3deb0d_0 conda-forge babel 2.8.0 py_0 anaconda black 19.10b0 py_0 anaconda boost-cpp 1.74.0 hc6e9bd1_2 conda-forge branca 0.4.2 pyhd8ed1ab_0 conda-forge brotlipy 0.7.0 py39h3811e60_1001 conda-forge bzip2 1.0.8 h7f98852_4 conda-forge c-ares 1.17.1 h36c2ea0_0 conda-forge ca-certificates 2020.10.14 0 anaconda cairo 1.16.0 h7979940_1007 conda-forge certifi 2020.12.5 py39hf3d152e_1 conda-forge cffi 1.14.4 py39he32792d_1 conda-forge cfitsio 3.470 hb418390_7 conda-forge chardet 4.0.0 py39hf3d152e_1 conda-forge click 7.1.2 pyh9f0ad1d_0 conda-forge click-plugins 1.1.1 py_0 conda-forge cligj 0.7.1 pyhd8ed1ab_0 conda-forge coverage 5.4 py39h3811e60_0 conda-forge cryptography 3.3.1 py39h3da14fd_1 conda-forge curl 7.71.1 he644dc0_8 conda-forge cycler 0.10.0 py_2 conda-forge decorator 4.4.2 py_0 conda-forge descartes 1.1.0 py_4 conda-forge docutils 0.16 py39hf3d152e_3 conda-forge expat 2.2.9 he1b5a44_2 conda-forge fiona 1.8.18 py39h3573629_0 conda-forge flake8 3.8.4 py_0 anaconda folium 0.12.0 pyhd8ed1ab_0 conda-forge fontconfig 2.13.1 hba837de_1004 conda-forge freetype 2.10.4 h0708190_1 conda-forge freexl 1.0.5 h516909a_1002 conda-forge gdal 3.1.4 py39h3f36f43_2 conda-forge geopandas 0.8.1 py_0 conda-forge geos 3.8.1 he1b5a44_0 conda-forge geotiff 1.6.0 h5d11630_3 conda-forge gettext 0.19.8.1 h0b5b191_1005 conda-forge giflib 5.2.1 h36c2ea0_2 conda-forge hdf4 4.2.13 h10796ff_1004 conda-forge hdf5 1.10.6 nompi_h6a2412b_1114 conda-forge icu 68.1 h58526e2_0 conda-forge idna 2.10 pyh9f0ad1d_0 conda-forge imagesize 1.2.0 py_0 anaconda importlib-metadata 2.0.0 py_1 anaconda importlib_metadata 2.0.0 1 anaconda iniconfig 1.1.1 py_0 anaconda isort 5.7.0 pyhd8ed1ab_0 conda-forge jinja2 2.11.2 pyh9f0ad1d_0 conda-forge joblib 1.0.0 pyhd8ed1ab_0 conda-forge jpeg 9d h36c2ea0_0 conda-forge json-c 0.13.1 hbfbb72e_1002 conda-forge kealib 1.4.14 h0042707_0 conda-forge kiwisolver 1.3.1 py39h1a9c180_1 conda-forge krb5 1.17.2 h926e7f8_0 conda-forge lcms2 2.11 hcbb858e_1 conda-forge ld_impl_linux-64 2.35.1 hea4e1c9_1 conda-forge libblas 3.9.0 7_openblas conda-forge libcblas 3.9.0 7_openblas conda-forge libcurl 7.71.1 hcdd3856_8 conda-forge libdap4 3.20.6 h1d1bd15_1 conda-forge libedit 3.1.20191231 he28a2e2_2 conda-forge libev 4.33 h516909a_1 conda-forge libffi 3.3 h58526e2_2 conda-forge libgcc-ng 9.3.0 h2828fa1_18 conda-forge libgdal 3.1.4 h02eeb80_2 conda-forge libgfortran-ng 9.3.0 hff62375_18 conda-forge libgfortran5 9.3.0 hff62375_18 conda-forge libglib 2.66.4 h164308a_1 conda-forge libgomp 9.3.0 h2828fa1_18 conda-forge libiconv 1.16 h516909a_0 conda-forge libkml 1.3.0 h74f7ee3_1012 conda-forge liblapack 3.9.0 7_openblas conda-forge libnetcdf 4.7.4 nompi_h56d31a8_107 conda-forge libnghttp2 1.41.0 h8cfc5f6_2 conda-forge libopenblas 0.3.12 pthreads_h4812303_1 conda-forge libpng 1.6.37 h21135ba_2 conda-forge libpq 12.3 h255efa7_3 conda-forge libspatialindex 1.9.3 he1b5a44_3 conda-forge libspatialite 5.0.0 h4dde289_0 conda-forge libssh2 1.9.0 hab1572f_5 conda-forge libstdcxx-ng 9.3.0 h6de172a_18 conda-forge libtiff 4.2.0 hdc55705_0 conda-forge libuuid 2.32.1 h7f98852_1000 conda-forge libwebp-base 1.1.0 h36c2ea0_3 conda-forge libxcb 1.13 h7f98852_1003 conda-forge libxml2 2.9.10 h72842e0_3 conda-forge lz4-c 1.9.3 h9c3ff4c_0 conda-forge markupsafe 1.1.1 py39h3811e60_3 conda-forge matplotlib-base 3.3.3 py39h2fa2bec_0 conda-forge mccabe 0.6.1 py_1 conda-forge more-itertools 8.5.0 py_0 anaconda munch 2.5.0 py_0 conda-forge mypy_extensions 0.4.3 py39hf3d152e_3 conda-forge ncurses 6.2 h58526e2_4 conda-forge networkx 2.5 py_0 conda-forge numpy 1.19.5 py39hdbf815f_1 conda-forge olefile 0.46 pyh9f0ad1d_1 conda-forge openjpeg 2.4.0 hf7af979_0 conda-forge openssl 1.1.1i h7f98852_0 conda-forge osmnx 1.0.1 pyhd3deb0d_0 conda-forge packaging 20.4 py_0 anaconda pandas 1.2.1 py39hde0f152_0 conda-forge pathspec 0.7.0 py_0 anaconda pcre 8.44 he1b5a44_0 conda-forge pillow 8.1.0 py39h2bb83ca_1 conda-forge pip 20.3.3 pyhd8ed1ab_0 conda-forge pixman 0.40.0 h36c2ea0_0 conda-forge pluggy 0.12.0 py_0 anaconda poppler 0.89.0 h2de54a5_5 conda-forge poppler-data 0.4.10 0 conda-forge postgresql 12.3 hc2f5b80_3 conda-forge proj 7.1.1 h966b41f_3 conda-forge pthread-stubs 0.4 h36c2ea0_1001 conda-forge py 1.9.0 py_0 anaconda pycodestyle 2.6.0 py_0 anaconda pycparser 2.20 pyh9f0ad1d_2 conda-forge pydocstyle 5.1.1 py_0 anaconda pyflakes 2.2.0 py_0 anaconda pygments 2.7.1 py_0 anaconda pyopenssl 20.0.1 pyhd8ed1ab_0 conda-forge pyparsing 2.4.7 pyh9f0ad1d_0 conda-forge pyproj 2.6.1.post1 py39h7ce32b3_3 conda-forge pysocks 1.7.1 py39hf3d152e_3 conda-forge pytest 6.2.1 py39hf3d152e_1 conda-forge python 3.9.1 hffdb5ce_3_cpython conda-forge python-dateutil 2.8.1 py_0 conda-forge python_abi 3.9 1_cp39 conda-forge pytz 2020.5 pyhd8ed1ab_0 conda-forge readline 8.0 he28a2e2_2 conda-forge regex 2020.11.13 py39h3811e60_1 conda-forge requests 2.25.1 pyhd3deb0d_0 conda-forge rtree 0.9.7 py39hb102c33_1 conda-forge scikit-learn 0.24.1 py39h4dfa638_0 conda-forge scipy 1.6.0 py39hee8e79c_0 conda-forge setuptools 49.6.0 py39hf3d152e_3 conda-forge shapely 1.7.1 py39hfa2dc8b_1 conda-forge six 1.15.0 pyh9f0ad1d_0 conda-forge snowballstemmer 2.0.0 py_0 anaconda sphinx 3.2.1 py_0 anaconda sphinxcontrib-applehelp 1.0.2 py_0 anaconda sphinxcontrib-devhelp 1.0.2 py_0 anaconda sphinxcontrib-htmlhelp 1.0.3 py_0 anaconda sphinxcontrib-jsmath 1.0.1 py_0 anaconda sphinxcontrib-qthelp 1.0.3 py_0 anaconda sphinxcontrib-serializinghtml 1.1.4 py_0 anaconda sqlite 3.34.0 h74cdb3f_0 conda-forge threadpoolctl 2.1.0 pyh5ca1d4c_0 conda-forge tiledb 2.1.6 h91fcb0e_1 conda-forge tk 8.6.10 h21135ba_1 conda-forge toml 0.10.1 py_0 anaconda tornado 6.1 py39h3811e60_1 conda-forge typed-ast 1.4.2 py39h3811e60_0 conda-forge typing_extensions 3.7.4.3 py_0 anaconda tzcode 2020f h7f98852_0 conda-forge tzdata 2020f he74cb21_0 conda-forge urllib3 1.26.2 pyhd8ed1ab_0 conda-forge wheel 0.36.2 pyhd3deb0d_0 conda-forge xerces-c 3.2.3 h9d8b166_2 conda-forge xorg-kbproto 1.0.7 h7f98852_1002 conda-forge xorg-libice 1.0.10 h516909a_0 conda-forge xorg-libsm 1.2.3 h84519dc_1000 conda-forge xorg-libx11 1.6.12 h516909a_0 conda-forge xorg-libxau 1.0.9 h7f98852_0 conda-forge xorg-libxdmcp 1.1.3 h7f98852_0 conda-forge xorg-libxext 1.3.4 h516909a_0 conda-forge xorg-libxrender 0.9.10 h516909a_1002 conda-forge xorg-renderproto 0.11.1 h14c3975_1002 conda-forge xorg-xextproto 7.3.0 h7f98852_1002 conda-forge xorg-xproto 7.0.31 h7f98852_1007 conda-forge xz 5.2.5 h516909a_1 conda-forge zipp 3.3.1 py_0 anaconda zlib 1.2.11 h516909a_1010 conda-forge zstd 1.4.8 ha95c52a_1 conda-forge

Provide a complete minimal reproducible example

import osmnx as ox

ox.__version__
ox.config(all_oneway=True, log_console=True, use_cache=True)

G = ox.graph_from_place('Canton, North Carolina', network_type='all')
G_proj = ox.project_graph(G)
G2 = ox.consolidate_intersections(G_proj)
ox.osm_xml.save_graph_xml(G2, 'canton.osm')

# display the problematic node ids
xmllint --format canton.osm | grep -P ' id="\d+-\d+"'

Possible Solution
When consolidated intersections are clustered geometrically, and it turns out there are actually multiple weakly connected component, each component is given an ID made from the original cluster ID and an integer index:

osmnx/osmnx/simplification.py

Line 533 in c2f55a3

gdf.loc[wcc, "cluster"] = f"{cluster_label}-{suffix}"

After separating the weakly connected component, these multi-part string IDs can be replaced with new integer ID as follows

 # reassign integer cluster labels since wccs with suffixes have string labels
 gdf["cluster"] = gdf["cluster"].factorize()[0]

I couldn't find any existing code that depends on the format of these IDs, but I could certainly be missing it.

gboeing · 2021-01-27T19:23:46Z

@parkertimmins good point. The .osm formatted XML should have integer node IDs. The question is how to best implement that, given topological node consolidation. I see two options:

have the save_graph_xml function handle such graphs more robustly to assign integer IDs there
have the consolidate_intersections function renumber node IDs with integers (such as using pandas's factorize method)

I couldn't find any existing code that depends on the format of these IDs, but I could certainly be missing it.

The purpose of this format is to retain meaningful information about subclusters, which may be useful in subsequent analytics. For example, it indicates that nodes 123-0 and 123-1 are topological subclusters of an original geometric cluster. So the key question for me becomes: must node IDs be integers for all use cases, or only for XML saving? I lean toward option 1 above because I don't think node IDs must be integers for all use cases. But I'm open to ideas.

parkertimmins · 2021-01-27T22:10:43Z

I agree with that. Since there's useful information in the IDs, and the integer requirement is specific to .osm xml, it makes sense to enforce this in save_graph_xml.

What do you think about something like the following

    if not gdf_nodes["id"].astype(str).str.isdigit().all():
        gdf_nodes["int_id"] = gdf_nodes["id"].astype(str).factorize()[0]
        gdf_edges["u"] = pd.merge(gdf_edges, gdf_nodes, how="left", left_on="u", right_on="id")["int_id"]
        gdf_edges["v"] = pd.merge(gdf_edges, gdf_nodes, how="left", left_on="v", right_on="id")["int_id"]
        gdf_nodes["id"] = gdf_nodes["int_id"]
        gdf_nodes.drop(columns=["int_id"], inplace=True)

in osm_xml at:

osmnx/osmnx/osm_xml.py

Line 199 in c2f55a3

Are there any fields that I'm missing that need to be upated?

Alternatively, in an effort to diff an xml with the multipart ID and an integer ID I wrote the following. It keeps the IDs as is when possible, by replacing "45-0" with "45" and all suffixes >=1 getting new IDs. It's a dumb amount of complexity just to keep the IDs the same, so probably not a good idea.

    if not gdf_nodes["id"].astype(str).str.isdigit().all():
        gdf_nodes[['cluster_id', 'suffix']] = gdf_nodes['id'].astype(str).str.split("-", expand=True)
        gdf_nodes['suffix'] = gdf_nodes['suffix'].fillna(0).astype(int)
        gdf_nodes['cluster_id'] = gdf_nodes['cluster_id'].astype(int)
        gdf_nodes.sort_values(by=['suffix', 'cluster_id'], kind='mergesort', inplace=True)
        gdf_nodes["int_id"] = gdf_nodes["id"].factorize(sort=False)[0]
        gdf_edges["u"] = pd.merge(gdf_edges, gdf_nodes, how="left", left_on="u", right_on="id")["int_id"]
        gdf_edges["v"] = pd.merge(gdf_edges, gdf_nodes, how="left", left_on="v", right_on="id")["int_id"]
        gdf_nodes["id"] = gdf_nodes["int_id"]
        gdf_nodes.drop(columns=["int_id"], inplace=True)

gboeing · 2021-01-28T05:07:45Z

Probably more efficient to do the node relabeling directly in NetworkX prior to converting the graph to GeoDataFrames:

try:
    np.array(G.nodes).astype(np.int64)
except ValueError:
    G = nx.convert_node_labels_to_integers(G)

Though this does make me wonder now if maybe we should just do this universally in the consolidate_intersections function. It may be confusing if you're working with a consolidated graph in OSMnx that has one set of node IDs, but when you dump it to XML it suddenly and silently has a different set of arbitrary node IDs. The benefit would be universally consistent node IDs after consolidating intersections, and the drawback would be a slight loss of information (by losing the subcluster suffixes). The benefit may outweigh the drawback.

parkertimmins · 2021-01-28T18:50:33Z

More efficient and way prettier!
I don't disagree that silently changing the IDs in save_graph_xml could cause confusion.
What do you think about adding the cluster id as an attribute, similar to the original_osmid field? That way the information wont be lost.

gboeing · 2021-01-29T19:56:20Z

What do you think about adding the cluster id as an attribute, similar to the original_osmid field? That way the information wont be lost.

The more I think about it, the more I lean toward keeping it simple here (ie, not retaining additional attributes). This information may have been more useful to me in testing than to the public for analytics. And, anyone interested in the geometric vs topological consolidation difference can diff the results with rebuild_graph True and False. So perhaps the subcluster information is unnecessary at this point.

parkertimmins · 2021-02-01T16:02:19Z

That make sense that it may not be worth the additional complexity. In that case, I think just

 # reassign integer cluster labels since wccs with suffixes have string labels
 gdf["cluster"] = gdf["cluster"].factorize()[0]

after separating the weakly connected components will be adequate.

gboeing · 2021-02-01T17:58:05Z

Is there much difference in runtime or outcome if we do this via

gdf["cluster"] = gdf["cluster"].factorize()[0]

versus rebuilding the graph and then relabeling with

G = nx.convert_node_labels_to_integers(G)

I guess as long as they both produce the same graph in the end, we can just use whichever option is faster.

gboeing · 2021-02-11T18:36:26Z

@parkertimmins would you like to contribute this fix as a PR?

parkertimmins · 2021-02-11T22:09:49Z

@gboeing Yes, sorry I never got back to this. I'll test the options we discussed above and make a PR for the faster one.

parkertimmins · 2021-02-11T22:30:18Z

Ran consolidate_intersections on a graph with 21724 nddes and 43290 edges:
Using:
gdf["cluster"] = gdf["cluster"].factorize()[0]
the time to factorize was: 0.0018s, and total time to consolidate was 15.4s
and with:
G = nx.convert_node_labels_to_integers(G)
the time to factorize was: 0.37s, and total time to consolidate was 16.5s

Based on this, I'll make a PR with the first option

gboeing · 2021-02-11T23:49:02Z

Thanks!

- as per gboeing/osmnx#647 saving consolidated clusters gives invlaid integers as nodes when reloading graphs so... just make sure that you either preserve what's there, or recalculate integers. Thanks gboeing. - cut nearly minute out of the loadtime for the standard graph load.

gboeing · 2024-03-02T21:40:47Z

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

consolidate_intersections makes non-integer node IDs which result in invalid osm xml #647

consolidate_intersections makes non-integer node IDs which result in invalid osm xml #647

parkertimmins commented Jan 27, 2021 •

edited

gboeing commented Jan 27, 2021

parkertimmins commented Jan 27, 2021 •

edited

gboeing commented Jan 28, 2021

parkertimmins commented Jan 28, 2021 •

edited

gboeing commented Jan 29, 2021

parkertimmins commented Feb 1, 2021

gboeing commented Feb 1, 2021

gboeing commented Feb 11, 2021

parkertimmins commented Feb 11, 2021

parkertimmins commented Feb 11, 2021

gboeing commented Feb 11, 2021

gboeing commented Mar 2, 2024

consolidate_intersections makes non-integer node IDs which result in invalid osm xml #647

consolidate_intersections makes non-integer node IDs which result in invalid osm xml #647

Comments

parkertimmins commented Jan 27, 2021 • edited

gboeing commented Jan 27, 2021

parkertimmins commented Jan 27, 2021 • edited

gboeing commented Jan 28, 2021

parkertimmins commented Jan 28, 2021 • edited

gboeing commented Jan 29, 2021

parkertimmins commented Feb 1, 2021

gboeing commented Feb 1, 2021

gboeing commented Feb 11, 2021

parkertimmins commented Feb 11, 2021

parkertimmins commented Feb 11, 2021

gboeing commented Feb 11, 2021

gboeing commented Mar 2, 2024

parkertimmins commented Jan 27, 2021 •

edited

parkertimmins commented Jan 27, 2021 •

edited

parkertimmins commented Jan 28, 2021 •

edited