# Obtain Hex Addresses

We will use reverse geocoding to find the nearest address to each hex centre.

In [1]:
import geopandas as gpd
import numpy as np
import plotly.express as px
from diskcache import Cache
from geopy import Nominatim
from geopy.distance import distance
from ratelimiter import RateLimiter
from tqdm.notebook import tqdm

In [2]:
df_hex = gpd.read_file('../data/BBMP_hex.geojson')
print(df_hex.shape)
df_hex.head()

(942, 9)


Unnamed: 0,id,hex_id,ward_no,centre_lat,centre_lon,resolution,pop_total,ward_name,geometry
0,8861892db3fffff,8861892db3fffff,1,13.113109,77.60952,8,1413.220043,Kempegowda Ward,"POLYGON ((77.61369 13.11060, 77.61378 13.11559..."
1,886016975dfffff,886016975dfffff,1,13.128071,77.609773,8,1762.379434,Kempegowda Ward,"POLYGON ((77.61395 13.12557, 77.61403 13.13055..."
2,8860169759fffff,8860169759fffff,1,13.120601,77.605432,8,1786.718829,Kempegowda Ward,"POLYGON ((77.60961 13.11810, 77.60969 13.12308..."
3,8860169645fffff,8860169645fffff,1,13.090701,77.596498,8,2635.117082,Kempegowda Ward,"POLYGON ((77.60067 13.08820, 77.60076 13.09318..."
4,886016962dfffff,886016962dfffff,1,13.120644,77.588573,8,1853.947643,Kempegowda Ward,"POLYGON ((77.59275 13.11814, 77.59283 13.12313..."


## Reverse Geocoding

Data is loaded from the GeoJSON file created earlier and displayed.

We obtain the closest addresses of each hex centre using Nominatim reverse geocoding. We need to cache the responses received, since querying Nominatim is quite slow, and we want to avoid repeating requests.

In [3]:
# Set cache location for Nominatim requests
nom_cache = '../data/cache/nominatim'

# Create Nominatim geocoder object with custom user agent as required by their terms of service
geocoder = Nominatim(user_agent = 'coursera_capstone')

# Create RateLimiter object to ensure we don't exceed Nominatim's 1 reqest per second rule
limiter = RateLimiter(max_calls = 1, period = 1)

In [4]:
# Empty lists to store responses
address = []
addr_coord = []

points = df_hex[['centre_lat', 'centre_lon']].to_records()

queries = 0 # Added variable to track number of records retrieved from cache

with Cache(nom_cache) as cache:
    for p in tqdm(points):
        query = (p.centre_lat, p.centre_lon)
        key = str(query) #! key must be a unique string
        
        if key in cache:
            response = cache[key] # Read cached value
        else:
            with limiter:
                response = geocoder.reverse(query, timeout = 30, addressdetails=True)
                cache[key] = response # Set cache value
                queries += 1
            
        address.append(response.address)
        addr_coord.append((response.latitude, response.longitude))
print('{} new queries made.'.format(queries))

  0%|          | 0/942 [00:00<?, ?it/s]

0 new queries made.


## Calculating Distances

We use geopy to calculate the geodesic distance between each hexagon centre and the address obtained. We want to check if the addresses provided are close to the hex centres.

In [5]:
centre_coord = df_hex[['centre_lat', 'centre_lon']].to_records(index=False)

address_error = [] # Empty list to store data

for centre, addr in zip(centre_coord, addr_coord):
    dist = distance(centre, addr).meters
    address_error.append(dist) # Use Geopy function

# Plot histogram
address_err_fig = px.histogram(
    x = address_error,
    histnorm = 'percent',
    #cumulative = True,
    template = 'plotly',
)

address_err_fig.update_layout(
    title = "Error in address locations",
    title_x = 0.5,
    xaxis_title = 'Distance between Hex centre & Address (meters)',
    yaxis_title = 'Percentage of hexes',
    bargap = 0.01,
)

The reverse geocoding is accurate enough. Almost all of the addresses returned are within a few hundred meters of the hexagon centre. In any case, we will mainly be using the hex centres for future functions - the address is mainly for reference.

In [6]:
# Assign to new columns
df_hex['address'] = address
df_hex.to_feather('../data/bangalore_hex_addresses.feather') # Save file
df_hex.head() # Display final table


this is an initial implementation of Parquet/Feather file support and associated metadata.  This is tracking version 0.1.0 of the metadata specification at https://github.com/geopandas/geo-arrow-spec

This metadata specification does not yet make stability promises.  We do not yet recommend using this in a production setting unless you are able to rewrite your Parquet/Feather files.




Unnamed: 0,id,hex_id,ward_no,centre_lat,centre_lon,resolution,pop_total,ward_name,geometry,address
0,8861892db3fffff,8861892db3fffff,1,13.113109,77.60952,8,1413.220043,Kempegowda Ward,"POLYGON ((77.61369 13.11060, 77.61378 13.11559...","Yelahanka, Kempegowda, Yelahanka Zone, Bengalu..."
1,886016975dfffff,886016975dfffff,1,13.128071,77.609773,8,1762.379434,Kempegowda Ward,"POLYGON ((77.61395 13.12557, 77.61403 13.13055...","Kempegowda, Yelahanka Zone, Bengaluru, Bangalo..."
2,8860169759fffff,8860169759fffff,1,13.120601,77.605432,8,1786.718829,Kempegowda Ward,"POLYGON ((77.60961 13.11810, 77.60969 13.12308...","Kempegowda, Yelahanka Zone, Bengaluru, Bangalo..."
3,8860169645fffff,8860169645fffff,1,13.090701,77.596498,8,2635.117082,Kempegowda Ward,"POLYGON ((77.60067 13.08820, 77.60076 13.09318...","Bellary Road, Amruthnagar, Byatarayanapura, Ye..."
4,886016962dfffff,886016962dfffff,1,13.120644,77.588573,8,1853.947643,Kempegowda Ward,"POLYGON ((77.59275 13.11814, 77.59283 13.12313...","Chowdeswari Ward, Yelahanka Zone, Bengaluru, B..."
