Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tesselation fails at Generating input point array... with ValueError: need at least one array to concatenate #297

Closed
bramson opened this issue Sep 6, 2021 · 9 comments · Fixed by #298
Milestone

Comments

@bramson
Copy link

bramson commented Sep 6, 2021

As the title says, Tesselation fails at the Generating input point array.... step with ValueError: need at least one array to concatenate

My code is simply:

import momepy as mm
tess = mm.Tessellation(edgeDF5, 'communityID', limit=wardsPolygon, shrink=0)

Where wardsPolygon is a <class 'shapely.geometry.multipolygon.MultiPolygon'> of Tokyo's 23 wards and
'communityID' is a column of unique IDs for each polygon in my geopadaframe called edgeDF5, which contains Polygons of neighborhoods in the 'geometry' column (and no other columns).
I don't want space between adjacent polygons; I want a partition (every point is in some polygon), so I set the shrink to 0.

Here's the output:

Traceback (most recent call last):

  File "...3938314878.py", line 1, in <module>
    tess = mm.Tessellation(edgeDF5, 'communityID', limit=wardsPolygon, shrink=0)

  File "...\lib\site-packages\momepy\elements.py", line 252, in __init__
    self.tessellation = self._morphological_tessellation(

  File "...\lib\site-packages\momepy\elements.py", line 276, in _morphological_tessellation
    points, ids = self._dense_point_array(

  File "...\lib\site-packages\momepy\elements.py", line 334, in _dense_point_array
    points = np.vstack(points)

  File "<__array_function__ internals>", line 5, in vstack

  File "...\lib\site-packages\numpy\core\shape_base.py", line 282, in vstack
    return _nx.concatenate(arrs, 0)

  File "<__array_function__ internals>", line 5, in concatenate

ValueError: need at least one array to concatenate

The result of checking the input is:

print(mm.CheckTessellationInput(edgeDF5, shrink=0))
Collapsed features  : 0
Split features      : 0
Overlapping features: 0

So, why isn't this working and what do I need to do to make it work?

@bramson
Copy link
Author

bramson commented Sep 6, 2021

While debugging the source code, I found that the geoms array input to _dense_point_array, which a comment says should be an "array of pygeos lines" is actually an array of pygeos polygons.

This is converted within _dense_point_array to linestrings via lines = pygeos.boundary(geoms) so no problem there.

Ah ha!!! I found that the distance parameter is set by the segment parameter in _morphological_tessellation to 0.5.
But my data is in standard lat/lon coords, so the lengths of the boundary linestrings are all less than 0.1.
Points are only added if length > distance, so no points are added to the points array.
So I set segment = 0.01 in the tesselation function call, but...

After fixing that, I get an error

  File "...\lib\site-packages\momepy\elements.py", line 337, in _dense_point_array
    np.linspace(0.1, length - 0.1, num=int((length - 0.1) // distance)),

  File "<__array_function__ internals>", line 5, in linspace

  File "...\lib\site-packages\numpy\core\function_base.py", line 122, in linspace
    raise ValueError("Number of samples, %s, must be non-negative." % num)

ValueError: Number of samples, -6, must be non-negative.

So, essentially, I have come to realize that using this function requires the coordinates to be in some meters-based CRS.
I can do that, but I did not expect that.

@martinfleis
Copy link
Member

Hi @bramson,

thank you for the report.

Yes, a lot of operations in Tessellation (including the Voronoi tessellation) are distance-based. With geometries in lat/lon, distance has no meaning so momepy expects projected CRS.

The second error is still related to this, segment = 0.01 is still way too large (by several orders of magnitude).

I would recommend reprojecting your geometries to some local projection, like EPSG:2451.

I did not expect that.

This is a good point. I'll add a note to the documentation and a check to the code, so it emits a warning.

@torresanton
Copy link

torresanton commented Sep 7, 2023

ValueError: need at least one array to concatenate

Hi, I am getting the same error, but in my case, my dataset was in a projected CRS, EPGS: '3857'. I am working with the Open Building dataset from Google. With polygons with more than 85% confidence (around 300k polygons) is okay, but when I decrease the confidence, and then increase the number of polygons(max number 2M 300k for my research area), I start to get this error.

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In[13], line 1
----> 1 tessellation = momepy.Tessellation(Buildings_Lima[:50], "uID", limit, verbose=True, segment=0.1)
      2 tessellation = tessellation.tessellationls

File [~/anaconda3/envs/momepy_env/lib/python3.11/site-packages/momepy/elements.py:259](https://vscode-remote+rs4-002ecode-002ebunsencloud-002ede.vscode-resource.vscode-cdn.net/config/workspace/MoMepy_lima/MorphoMetric_Lima/~/anaconda3/envs/momepy_env/lib/python3.11/site-packages/momepy/elements.py:259), in Tessellation.__init__(self, gdf, unique_id, limit, shrink, segment, verbose, enclosures, enclosure_id, threshold, use_dask, n_chunks)
    252     # add convex hull buffered large distance to eliminate infinity issues
    253     limit = (
    254         gpd.GeoSeries(limit, crs=gdf.crs)
    255         .translate(xoff=-centre_x, yoff=-centre_y)
    256         .array[0]
    257     )
--> 259     self.tessellation = self._morphological_tessellation(
    260         gdf, unique_id, limit, shrink, segment, verbose
    261     )
    263 self.tessellation["geometry"] = self.tessellation["geometry"].translate(
    264     xoff=centre_x, yoff=centre_y
    265 )

File [~/anaconda3/envs/momepy_env/lib/python3.11/site-packages/momepy/elements.py:285](https://vscode-remote+rs4-002ecode-002ebunsencloud-002ede.vscode-resource.vscode-cdn.net/config/workspace/MoMepy_lima/MorphoMetric_Lima/~/anaconda3/envs/momepy_env/lib/python3.11/site-packages/momepy/elements.py:285), in Tessellation._morphological_tessellation(self, gdf, unique_id, limit, shrink, segment, verbose, check)
    282 objects = objects.set_index(unique_id)
    284 print("Generating input point array...") if verbose else None
--> 285 points, ids = self._dense_point_array(
    286     objects.geometry.array, distance=segment, index=objects.index
    287 )
    289 hull = shapely.convex_hull(limit)
    290 bounds = shapely.bounds(hull)

File [~/anaconda3/envs/momepy_env/lib/python3.11/site-packages/momepy/elements.py:343](https://vscode-remote+rs4-002ecode-002ebunsencloud-002ede.vscode-resource.vscode-cdn.net/config/workspace/MoMepy_lima/MorphoMetric_Lima/~/anaconda3/envs/momepy_env/lib/python3.11/site-packages/momepy/elements.py:343), in Tessellation._dense_point_array(self, geoms, distance, index)
    340         points.append(shapely.get_coordinates(pts))
    341         ids += [ix] * len(pts)
--> 343 points = np.vstack(points)
    345 return points, ids

File <__array_function__ internals>:200, in vstack(*args, **kwargs)

File [~/anaconda3/envs/momepy_env/lib/python3.11/site-packages/numpy/core/shape_base.py:296](https://vscode-remote+rs4-002ecode-002ebunsencloud-002ede.vscode-resource.vscode-cdn.net/config/workspace/MoMepy_lima/MorphoMetric_Lima/~/anaconda3/envs/momepy_env/lib/python3.11/site-packages/numpy/core/shape_base.py:296), in vstack(tup, dtype, casting)
    294 if not isinstance(arrs, list):
    295     arrs = [arrs]
--> 296 return _nx.concatenate(arrs, 0, dtype=dtype, casting=casting)

File <__array_function__ internals>:200, in concatenate(*args, **kwargs)

ValueError: need at least one array to concatenate

As you can see I even try with a small sample (50 polygons). The confidence of an object being a polygon is related to its size, as I have learned by exploring the dataset.

Any idea about how to get out of this error?

PS: I can work with polygons with high confidence, but then the number of polygons decreases and my dataset gets less representative of the city.

@martinfleis
Copy link
Member

@torresanton Could you dump Buildings_Lima[:50] to a file and share it?

@torresanton
Copy link

torresanton commented Sep 8, 2023

I'm sharing a pickle file, and I have pypickle 1.1.0 installed
Sample of the dataset that triggers the issue: open_building_65_conf_issue.txt

A similar error happend with momepy.CheckTessellationInput

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In[7], line 1
----> 1 momepy.CheckTessellationInput(Lima_buil)

File [~/anaconda3/envs/momepy_env/lib/python3.11/site-packages/momepy/preprocessing.py:376](https://vscode-remote+rs4-002ecode-002ebunsencloud-002ede.vscode-resource.vscode-cdn.net/config/workspace/MoMepy_lima/MorphoMetric_Lima/~/anaconda3/envs/momepy_env/lib/python3.11/site-packages/momepy/preprocessing.py:376), in CheckTessellationInput.__init__(self, gdf, shrink, collapse, split, overlap)
    369 sindex = shrink.sindex
    370 hits = shrink.bounds.apply(
    371     lambda row: list(sindex.intersection(row)), axis=1
    372 )
    373 od_matrix = pd.DataFrame(
    374     {
    375         "origin": np.repeat(hits.index, hits.apply(len)),
--> 376         "dest": np.concatenate(hits.values),
    377     }
    378 )
    379 od_matrix = od_matrix[od_matrix.origin != od_matrix.dest]
    380 duplicated = pd.DataFrame(np.sort(od_matrix, axis=1)).duplicated()

File <__array_function__ internals>:200, in concatenate(*args, **kwargs)

ValueError: need at least one array to concatenate

Thanks very much!!

@martinfleis
Copy link
Member

@torresanton you have very tiny geometries. Mean length of perimeter is 0.000349 and mean area 1.0670127274172174e-08. If that is supposed to represent buildings then it is not correct.

In any case, segment=0.1 is way longer than the length of each perimeter, leading to this issue. Dimensions like you have are not expected.

@torresanton
Copy link

I suspected something similar. Perhaps the best idea is to remove very tiny polygons and work only over a certain threshold of area and perimeter. At the moment momepy.Tessellation is working with polygons from #OpenBuildingGoogle with a confidence equal to or higher than 0.85. Thanks very much

@martinfleis
Copy link
Member

I mean, there are no other than those tiny polygons. This will surely not come from Google's Open Buildings like this. There has been some erroneous coordinate transformation or something like that.

@torresanton
Copy link

Yes, @martinfleis is right. When the open_building_dataset.csv is downloaded it doesn't have the crs parameter, and I couldn't set_crs() properly, even after figuring out the right one (<Geographic 2D CRS: EPSG:4326>Name: WGS 84).
Fortunately, there is a CLI tool, apparently developed very recently that can handle appropriately the open_building_dataset.csv, and convert it to gpkg or other formats with proper crs, this is https://opengeos.github.io/open-buildings. After using this tool the crs is correctly assigned from the beginning and then you can change it to a projected one and perform geometry calculations. Thanks again.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants