This is a script for geocoding and plotting spatial information from an EXCEL spreadsheet with an "Addresses" column. The API used is **Google.**

For this service, you need to create a so-called API key (a unique string of characters) via the Google developers platform.

The first step is to get COLAB working:


In [None]:
## mount drive
from google.colab import drive
drive.mount("/content/drive")

A file path needs to be defined for storing input or output files linked with this script:

In [None]:
directory="/content/drive/My Drive/Colab_DigiKAR/"

Now we can install packages that are not part of Python's standard distribution but are necessary for geocoding and plotting maps. There will most likely be a dependency error for NumPy, but the script should still work.

In [None]:
## install packages that are not part of Python's standard distribution

!pip install geocoder
!pip install basemap
!pip install ipyleaflet
!pip install geojson
!pip install googlemaps
!pip install gmaps
!pip install keplergl
!pip install geopandas

Now that all packages are installed, we can read the input data (in this case from Github or Google Drive) and display the content in a table.

In [20]:
## import relevant packages for geocoding as well as reading and writing data
import pandas as pd
import geocoder
# command needed for correct plotting in Jupyter Notebooks:
%matplotlib inline 
from pandas_profiling import ProfileReport
from googlemaps import Client as GoogleMaps
import googlemaps
import gmaps
from keplergl import KeplerGl
import geopandas as gpd
from mpl_toolkits.basemap import Basemap
import matplotlib.pyplot as plt
import os
import json
from geojson import Feature, FeatureCollection, Point

## geocode data from spreadsheet

# enter Google API key
gmaps = googlemaps.Client(key='YOURKEY')

# input addresses in EXCEL format and read
infile='https://github.com/ieg-dhr/DigiKAR/blob/main/OntologyFiles/2023-01_Places_AP3_MASTER_Github.xlsx?raw=true'  # alternative input from Google Drive
addresses_df = pd.read_excel(infile)
display(addresses_df)

Unnamed: 0,place_old,places_new,suffix,community,region_1,region_2,continent,variant_1,variant_2,variant_3,count,ID,Source
0,Bayrisch Hof,Bayrisch Hof,,,,,Europe,Stadtamhof;Stadt am Hof,,,1.0,B1,Bavariae (Homann 1752)
1,Kuta Lingga,Kuta Lingga,,,,,Europe,,,,1.0,11,Broek
2,Kuta Waringin,Kuta Waringin,,,,,Europe,,,,1.0,12,Broek
3,Sambas,Sambas,,,,,Europe,,,,1.0,13,Broek
4,Sampit,Sampit,,,,,Europe,,,,1.0,10,Broek
...,...,...,...,...,...,...,...,...,...,...,...,...,...
4278,Mühlhausen,#,,,,,Europe,,,,,,Universitätsmatrikeln
4279,Mühlhausen/Thüringen,#,,,,,Europe,,,,,,Universitätsmatrikeln
4280,Oberwesaliensis,#,,,,,Europe,,,,,,Universitätsmatrikeln
4281,Olpensis,#,,,,,Europe,,,,,,Universitätsmatrikeln


Now we will use the Pandas package to read the content of the address column to a so-called DataFrame. A DataFrame is a data structure that organizes data into a 2-dimensional table of rows and columns, much like a spreadsheet. This 2-dimensional structure is often used to manipulate data with programming languages. Our "manipulation" is the act of geocoding.

In [29]:
# read information from address column to dataframe
addresses = addresses_df["places_new"].values.tolist()

latitudes = []
longitudes = []
google_addresses = []

try:
	# geocode each address in file
	for x in addresses:
		try:
			g = gmaps.geocode(x)
			g_lat = g[0]['geometry']['location']['lat']
			g_lng = g[0]['geometry']['location']['lng']
			g_address = g[0]['formatted_address']

		except IndexError:
			#print("No coordinates found for: ", x)
			g_lat = "0"
			g_lng = "0"
			g_address = "0"

		except ValueError:
			print("No places found.")
			g_lat = "0"
			g_lng = "0"
			g_address = "0"
			continue
		
		print(x, g_lat, g_lng, g_address)

	# add information to lists
		google_addresses.append(g_address)
		latitudes.append(g_lat)
		longitudes.append(g_lng)
except ValueError:
	print("No other data found.")
	
# write information to new columns in dataframe
print(len(latitudes))
print(len(longitudes))
print(len(google_addresses))
addresses_df["lat"] = latitudes
addresses_df["lng"] = longitudes
addresses_df["Google address"] = google_addresses

print("All addresses geocoded!")

Bayrisch Hof 50.3135391 11.9127814 Hof, Germany
Kuta Lingga 3.4186851 97.86807859999999 Kuta Lingga, Bukit Tusam, Southeast Aceh, Aceh, Indonesia
Kuta Waringin -6.9739667 107.5340794 Kutawaringin, Bandung Regency, West Java, Indonesia
Sambas 0 0 0
Sampit -2.5394654 112.9586863 Sampit, Mentawa Baru Hulu, Mentawa Baru Ketapang, East Kotawaringin Regency, Central Kalimantan, Indonesia
Eluinghen 0 0 0
Lossen 0 0 0
Marienborch 51.8249976 5.884582 Mariënbosch, Mariënboomseweg, 6523 Nijmegen, Netherlands
Maubach,  Herzogenrath 50.8577238 6.082865 Maubacher Str., 52134 Herzogenrath, Germany
Gemünd,  Schleiden 50.5751918 6.4931518 Gemünd, 53937 Schleiden, Germany
Wolframs-Eschenbach 49.2262035 10.7232303 Wolframs-Eschenbach, Germany
Rothenfels 49.89293379999999 9.585955199999999 Rothenfels, Germany
Dittelsheim-Heßloch 49.7410972 8.245784500000001 Dittelsheim-Heßloch, Germany
Lauda,  Lauda-Königshofen 49.56799160000001 9.7047677 Lauda, 97922 Lauda-Königshofen, Germany
Mainz-Mombach 50.0201939 8.

If all addresses have been successfully geocoded, the next step is to check the geocoding and write the results to a new EXCEL file. 


In [30]:
# view geocoded data
display(addresses_df)

# write geocoded places to new file
addresses_df.to_excel(directory+"GEOCODING_AP3/Addresses_Geocoded_withGoogle.xlsx")

Unnamed: 0,place_old,places_new,suffix,community,region_1,region_2,continent,variant_1,variant_2,variant_3,count,ID,Source,lat,lng,Google address
0,Bayrisch Hof,Bayrisch Hof,,,,,Europe,Stadtamhof;Stadt am Hof,,,1.0,B1,Bavariae (Homann 1752),50.313539,11.912781,"Hof, Germany"
1,Kuta Lingga,Kuta Lingga,,,,,Europe,,,,1.0,11,Broek,3.418685,97.868079,"Kuta Lingga, Bukit Tusam, Southeast Aceh, Aceh..."
2,Kuta Waringin,Kuta Waringin,,,,,Europe,,,,1.0,12,Broek,-6.973967,107.534079,"Kutawaringin, Bandung Regency, West Java, Indo..."
3,Sambas,Sambas,,,,,Europe,,,,1.0,13,Broek,0,0,0
4,Sampit,Sampit,,,,,Europe,,,,1.0,10,Broek,-2.539465,112.958686,"Sampit, Mentawa Baru Hulu, Mentawa Baru Ketapa..."
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
4278,Mühlhausen,#,,,,,Europe,,,,,,Universitätsmatrikeln,0,0,0
4279,Mühlhausen/Thüringen,#,,,,,Europe,,,,,,Universitätsmatrikeln,0,0,0
4280,Oberwesaliensis,#,,,,,Europe,,,,,,Universitätsmatrikeln,0,0,0
4281,Olpensis,#,,,,,Europe,,,,,,Universitätsmatrikeln,0,0,0


Our geocoded data have been written to a new EXCEL file, which is handy for further (manual) data cleaning and data enrichment. But EXCEL is unfortunately not a file format which GIS applications can handle. This is why we also need to export our geocoded data to GeoJSON.

The conversion of a DataFrame to GeoJSON follows the instructions in the following tutorial by Geoff Boeing:

https://notebook.community/captainsafia/nteract/applications/desktop/example-notebooks/pandas-to-geojson

In [33]:
# convert coordinates to floats

addresses_df['lat'] = addresses_df['lat'].astype(float)
addresses_df['lng'] = addresses_df['lng'].astype(float)

# ignore places that have not been geocoded

df_geo = addresses_df.dropna(subset=['lat', 'lng'], axis=0, inplace=False)

# combine information in GeoJSON fromat

def df_to_geojson(df, properties, lat='lat', lon='lng'):
    # create a new python dict to contain our geojson data, using geojson format
    geojson = {'type':'FeatureCollection', 'features':[]}

    # loop through each row in the dataframe and convert each row to geojson format
    for _, row in df.iterrows():
        # create a feature template to fill in
        feature = {'type':'Feature',
                   'properties':{},
                   'geometry':{'type':'Point',
                               'coordinates':[]}}

        # fill in the coordinates
        feature['geometry']['coordinates'] = [row[lon],row[lat]]

        # for each column, get the value and add it as a new feature property
        for prop in properties:
            feature['properties'][prop] = row[prop]
        
        # add this feature (aka, converted dataframe row) to the list of features inside our dict
        geojson['features'].append(feature)
    
    return geojson

cols = ['places_new', 'Google address']
geojson = df_to_geojson(df_geo, cols)

with open(directory+'AP3.geojson', 'w', encoding='utf-8') as f:
    json.dump(geojson, f, ensure_ascii=False)

Your Google Drive should now contain a file with the "geojson" file ending. We can check if this file has been created and if it is well-formed.

In [34]:
## double-check if GeoJSON file has been created and is well-formed

# load GeoJSON data

with open(directory+'GEOCODING_AP3/AP3.geojson', 'r') as f2:
    data = json.load(f2)
    print(data)

{'type': 'FeatureCollection', 'features': [{'type': 'Feature', 'properties': {'Addresses': 'Mainz', 'Google address': 'Aachen, Germany'}, 'geometry': {'type': 'Point', 'coordinates': [8.2472526, 49.9928617]}}, {'type': 'Feature', 'properties': {'Addresses': 'Meißen', 'Google address': 'Aachen, Germany'}, 'geometry': {'type': 'Point', 'coordinates': [13.4976592, 51.1617842]}}, {'type': 'Feature', 'properties': {'Addresses': 'Wiesbaden', 'Google address': 'Aachen, Germany'}, 'geometry': {'type': 'Point', 'coordinates': [8.239760799999999, 50.0782184]}}, {'type': 'Feature', 'properties': {'Addresses': 'Köln', 'Google address': 'Aachen, Germany'}, 'geometry': {'type': 'Point', 'coordinates': [6.9602786, 50.937531]}}, {'type': 'Feature', 'properties': {'Addresses': 'Paris ', 'Google address': 'Aachen, Germany'}, 'geometry': {'type': 'Point', 'coordinates': [2.3522219, 48.856614]}}, {'type': 'Feature', 'properties': {'Addresses': 'Bonn', 'Google address': 'Aachen, Germany'}, 'geometry': {'ty

Now we can plot the geocoded data to an interactive map. The code below is partly based on an Ipyleaflet Tutorial provided by the *Carpentries Incubator*:

https://carpentries-incubator.github.io/jupyter_maps/01-introduction/index.html

In [35]:
## plot geocoded data on interactive map

# initialise interactive map

from ipyleaflet import Map, basemaps, GeoJSON, LayersControl
import random

# customise map

map = Map(center = (55, 7), zoom = 5, min_zoom = 1, max_zoom = 20, 
    basemap=basemaps.Stamen.Terrain)

# add functionality to add or remove layers to map itself

map.add_control(LayersControl())

def random_color(feature):
    return {
        'color': 'black',
        'fillColor': random.choice(['red', 'yellow', 'green', 'orange']),
    }

geo_json = GeoJSON(
    data=data,
    style={
        'opacity': 1, 'dashArray': '7', 'fillOpacity': 0.1, 'weight': 2
    },
    hover_style={
        'color': 'red', 'dashArray': '0', 'fillOpacity': 0.5
    },
    style_callback=random_color
)

# add geocoded data to map

map.add_layer(geo_json)

map


Map(center=[55, 7], controls=(ZoomControl(options=['position', 'zoom_in_text', 'zoom_in_title', 'zoom_out_text…

Congratulations, you have just plotted a new map! At the moment, the map only has markers for the point geometries but no pop-up labels. To embed those, other Python packages will need to be imported first. I will add pop-ups in the next development step. 

Notebook created by: Monika Barget

Latest update: 26 January 2023