# UPPP 135 - Week 5

<a target="_blank" href="https://colab.research.google.com/github/knaaptime/uppp135-winter26-assn/blob/main/week5/graphs.ipynb">
  <img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/>
</a>

In [None]:
import pandas as pd
import geopandas as gpd
from fsspec import filesystem
from libpysal.graph import Graph

In [None]:
fs = filesystem("https")

cities = gpd.read_file(
    "https://github.com/knaaptime/uppp135-winter26-assn/raw/refs/heads/main/week4/OCTraffic_Cities.zip"
)
cities = cities.set_geometry(cities.buffer(0))
crs = cities.estimate_utm_crs()
cities = cities.to_crs(crs)

tracts = gpd.read_parquet(
    "https://github.com/oturns/example_datasets/raw/refs/heads/main/acs/ca_tracts_2021.pq",
    filesystem=fs,
)
tracts = tracts.to_crs(crs)
tracts = tracts[
    tracts.centroid.intersects(cities[cities["city"] == "Irvine"].union_all())
]


In [None]:
tracts=tracts.set_index('geoid')

In [None]:
tracts.explore(tooltip=False)

## Contiguity Graphs

### Queen

In [None]:
g_queen = Graph.build_contiguity(tracts, rook=False)

In [None]:
g_queen.summary()

In [None]:
g_queen.adjacency

In [None]:
tracts.shape

In [None]:
g_queen.cardinalities

In [None]:
g_queen.cardinalities.hist()

In [None]:
g_queen.plot(tracts)

In [None]:
g_queen.explore(tracts)

In [None]:

# it can be useful to plot the tract boundaries along with the Graph itself. How would we do that?

### Rook

In [None]:
g_rook = Graph.build_contiguity(tracts, rook=True)

In [None]:
g_rook.summary()

Which graph, rook or queen will be *denser* (i.e. more edges; more connections between observations)?

In [None]:
# how can we look tat this in code?

In [None]:
# lets plot the rook graph and tracts onto the same map object

## Distance

when we have large polygons that are mutually exclusive and *exhaust* the study area, then contiguity graphs can be a good choice for representing spatial connectivity between regions. They are also conceptually and computationally simple and scale well to large datasets

For other data, though, contiguity may not be the best option. What if we're working with building-level observations? 

(first lets collect building data from [overturemaps](https://overturemaps.org/)). We need to

1. install the python package because it's not preinstalled on google colab,
2. define the region we want to download buildings in (the *bounding box* of Irvine
    - subset the bounding box to the actual city boundary
3. make sure our Coordinate Reference Systems are ok
4. convert from arrow representation to geopandas

In [None]:
!pip install "git+https://github.com/OvertureMaps/overturemaps-py"

In [None]:
import overturemaps

In [None]:
overturemaps.record_batch_reader?

the package is new, so it's not documented very well yet, but that `bbox` parameter means we need to pass in the coordinates of the bounding box we care about (in lat/long coordinates, i.e. epsg=4326)

In [None]:
tracts.to_crs(4326).total_bounds

In [None]:
irvine_bounds = list(tracts.to_crs(4326).total_bounds)

In [None]:
irvine_bounds

In [None]:
irvine_bldgs = overturemaps.record_batch_reader(
    "building", irvine_bounds, stac=True
).read_all()

irvine_bldgs = gpd.GeoDataFrame.from_arrow(irvine_bldgs)

In [None]:
irvine_bldgs.crs

whats going on?

In [None]:
irvine_bldgs = irvine_bldgs.set_crs(4326)

i happen to know that overture gives data back in lat/long (and it would be a good guess anyway, since we passed our bounding box to collect data in lat/long), so we can just `set_crs` to the correct system

In [None]:
irvine_bldgs.crs

and now we can convert into the same UTM system as the rest of our data

In [None]:
irvine_bldgs = irvine_bldgs.to_crs(crs)

In [None]:
irvine_bldgs = irvine_bldgs[irvine_bldgs.intersects(tracts.union_all())]

In [None]:
irvine_bldgs.explore(tooltip=False)

In [None]:
irvine_bldgs.shape

In [None]:
!pip install osmnx

In [None]:
import osmnx as ox

In [None]:
campus =  ox.geocode_to_gdf('university of california, Irvine')

In [None]:
campus.explore()

In [None]:
campus = campus.to_crs(crs)

In [None]:
uci_bldgs = irvine_bldgs[irvine_bldgs.intersects(campus.union_all())]

In [None]:
uci_bldgs.shape

In [None]:
uci_bldgs.explore()

In [None]:
bldgs_queen = Graph.build_contiguity(uci_bldgs, rook=False)

In [None]:
bldgs_queen.summary()

anything look weird about the summary?

using `explore ` with this much data can be tricky because it's built on older browser technology. It will be a little slow but we can do it

In [None]:
bldgs_queen.explore(uci_bldgs)

In [None]:
g_bldgs_1km = Graph.build_distance_band(uci_bldgs, threshold=1000)

but the distance between polygons is undefined!

since the buildings are pretty small polygons, we can reasonably use the distance between their centers as our measure of proximity. Thus when building the graph we can temporarily set the geometry column to the centroid of each polygon (its mathematical center)

since we're note saving `uci_bldgs.set_geometry(uci_bldgs.centroid)` into a variable, the change wont persist beyond this function call

### 100 meters

In [None]:
g_bldgs_100m = Graph.build_distance_band(uci_bldgs.set_geometry(uci_bldgs.centroid), threshold=100)

In [None]:
g_bldgs_100m.summary()

In [None]:
g_bldgs_100m.explore(uci_bldgs)

In [None]:
m=uci_bldgs.explore(style_kwds={'weight':0.5})
g_bldgs_100m.explore(uci_bldgs, edge_kws=dict(style_kwds=(dict(weight=0.5))),m=m)

### 200 meters

In [None]:
g_bldgs_200m = Graph.build_distance_band(uci_bldgs.set_geometry(uci_bldgs.centroid), threshold=200)

In [None]:
g_bldgs_200m.summary()

In [None]:
g_bldgs_200m.cardinalities.rename('200m').hist(legend=True)

g_bldgs_100m.cardinalities.rename('100m').hist(legend=True, alpha=0.6)

In [None]:
m=uci_bldgs.explore(style_kwds={'weight':0.5})
g_bldgs_200m.explore(uci_bldgs, edge_kws=dict(style_kwds=(dict(weight=0.5))),m=m)

## K-Nearest-Neighbors (KNN)

In [None]:
g_knn5 = Graph.build_knn(uci_bldgs, k=5)

In [None]:
g_knn5 = Graph.build_knn(uci_bldgs.set_geometry(uci_bldgs.centroid), k=5)

is knn5 going to be sparser or denser than the 100m Graph?

In [None]:
g_knn5.cardinalities.mean()

In [None]:
g_knn5.pct_nonzero

In [None]:
g_bldgs_100m.cardinalities.mean()

In [None]:
g_bldgs_100m.pct_nonzero

In [None]:
g_knn5.summary()

In [None]:
m=uci_bldgs.explore(style_kwds={'weight':0.5})
g_knn5.explore(uci_bldgs, edge_kws=dict(style_kwds=(dict(weight=0.5))),m=m)

is KNN-5 going to be sparser or denser than KNN-10?

In [None]:
g_knn10 = Graph.build_knn(uci_bldgs.set_geometry(uci_bldgs.centroid), k=10)

In [None]:
g_knn10.pct_nonzero

In [None]:
g_knn10.pct_nonzero > g_knn5.pct_nonzero

In [None]:
m=uci_bldgs.explore(style_kwds={'weight':0.5})
g_knn10.explore(uci_bldgs, edge_kws=dict(style_kwds=(dict(weight=0.5))),m=m)

## Bonus

In [None]:
#!pip install lonboard

In [None]:
from lonboard import viz

In [None]:
viz(uci_bldgs, polygon_kwargs={'get_elevation':uci_bldgs['height'].fillna(6)})