##### STA 141B Data & Web Technologies for Data Analysis

### Lecture 15, 02/26/2026, Cartography + Geocoding

### Announcements

- Midterm exams were uploaded to Canvas.
- Third homework due February 27.
- Reminder: Use of AI for the homeworks is strictly forbidden.

### Discussion section
- Details on the final project
- PyPlot (Matplotlib)

### Final project
Demonstrate your proficiency in __at least two major topics__ of this lecture. 
These topics include:
- Database handling (SQL)
- Web scraping (API / manual scraping / Selenium)
- Visualisation (Static / Maps / Choropleth Maps / Interactive )
- Natural Language Processing
- Data processing (__exhaustive__ use of Numpy/Pandas/Concurrency)

### Today's topics

Interactive plots:
- Cartography
- Geocoding

## Maps

The __folium__ package uses the Leaflet JavaScript library to make interactive maps.

The function to create a map is `folium.Map()`. The function's parameters control the position, style, and initial zoom of the map.

If you want to change the size of the map, you first need to create a `folium.Figure()`, and then add the map to the figure with `.add_child()`.

In [None]:
import folium
import folium.plugins

In [None]:
folium.Map()

In [None]:
folium.Map(location=[38, -122], zoom_start=12)

In [None]:
folium.Map(location=[38, -122], zoom_start=8, width = 600, height = 400)

In [None]:
folium.Map(location = [38.54, -121.75], zoom_start = 15)

In [None]:
m = folium.Map(width = 500, height = 500) # not ideal
m

In [None]:
# Make a map.
m = folium.Map(location = [38.54000, -121.74771], zoom_start = 18)
# Davis: 38.5449, -121.7405

# optional: set up a Figure to control the size of the map
fig = folium.Figure(width = 600, height = 400)
fig.add_child(m)

In [51]:
from IPython.display import display
def show_map(m, w = 800, h = 500):
    fig = folium.Figure(width = w, height = h)
    fig.add_child(m)
    display(m)

In [None]:
show_map(m, 500, 500)

We can change the tiles. For more details, see [here](https://python-visualization.github.io/folium/latest/user_guide/raster_layers/tiles.html) 

In [None]:
m = folium.Map(tiles = "cartodbpositron") # change tile
show_map(m)

In [None]:
import folium.plugins
m = folium.plugins.DualMap(zoom_start=8)

folium.TileLayer("openstreetmap").add_to(m.m1)
folium.TileLayer("cartodbpositron").add_to(m.m2)
folium.TileLayer(
    tiles='https://server.arcgisonline.com/ArcGIS/rest/services/World_Terrain_Base/MapServer/tile/{z}/{y}/{x}',
    attr='Esri',
    name='Esri Satellite'
).add_to(m.m2)
folium.LayerControl(collapsed=False).add_to(m)

m

Some Plugins:

In [None]:
m = folium.Map(width = 800, height = 600, location = [0,0], zoom_start = 5)
folium.plugins.Fullscreen(position = 'topleft', force_separate_button=False,).add_to(m) # add the fullscreen button
folium.plugins.Geocoder().add_to(m) # find a location via https://nominatim.org/ LIMITED to 1 request/second!

folium.plugins.LocateControl(auto_start=False).add_to(m) # let your browser find your location
folium.plugins.MiniMap(zoom_level_offset=-7, toggle_display=True).add_to(m) # minimap
folium.plugins.Terminator().add_to(m) # daylight/shadow
m.add_child(
    folium.LatLngPopup() # if you click somewhere, you'll see a Popup with your location
)
show_map(m, 800, 600)

In [None]:
import folium
from folium.plugins import Draw

m = folium.Map()

Draw(export=True).add_to(m)

show_map(m)

The [Yolo County Restuarants Dataset](http://anson.ucdavis.edu/~nulle/yolo_food.feather) contains locations and health inspector scores for all restaurants in Yolo County, California.

Let's use __folium__ to display the restaurants on a map.

In [None]:
import pandas as pd 

food = pd.read_feather("../data/yolo_food.feather")
food.head()

Unlike most of the plotting packages we used before, __folium__ does not automatically handle missing values. So in order to make our map, we first need to remove the missing values from our dataset.

In [None]:
food_cp = food.copy()

In [None]:
food_cp = food_cp[food_cp.lat.notna() & food_cp.lng.notna()]

In [None]:
food_cp.shape

In [None]:
food.shape

Now we can make the map. For each restaurant, we have to create a circle and add it to the map.

In [None]:
m = folium.Map(location = [38.5449, -121.7405], zoom_start = 15)

cols = ["FacilityName", "lat", "lng"]
for name, lat, lng in food_cp[cols].itertuples(index = False):
    popup = folium.Popup(name, parse_html = True)
    circle = folium.Circle([float(lat), float(lng)], color = "blue", radius = 10, popup = popup)
    m.add_child(circle)

folium.plugins.LocateControl(auto_start=False).add_to(m) # let your browser find your location
folium.plugins.Fullscreen(position = 'topleft', force_separate_button=False,).add_to(m) # add the fullscreen button
folium.plugins.Geocoder().add_to(m) # find a location via https://nominatim.org/ LIMITED to 1 request/second!

fig = folium.Figure(width = 900, height = 600)
fig.add_child(m)
fig.save("../output/yolo_food_map.html")

In [None]:
m

### END OF FOURTEENTH LECTURE

<img src="../images/golden_gate.jpg" alt="Picture of the Golden Gate Bridge" style="height: 500px;"/>

## GEOCODING

The folium pacakage can be very useful in combination with geocoding, that is, getting the coordinates for a specific address.

Read the [Documentation](https://nominatim.org/release-docs/develop/api/Overview/) of Nominatim API.

Only 1 request/second, User-Agent must be specified, cache must be used.

#### (Forward) Geocoding
Convert address to coordinates.

https://nominatim.openstreetmap.org/ui/search.html

#### Reverse Geocoding
Convert coordinates to (human readable) address.

https://nominatim.openstreetmap.org/ui/reverse.html

37.817615499999995,-122.4783123

#### Geocoding (the clumsy way)

Try to send a request manually:

In [1]:
import requests

In [2]:
url = 'https://nominatim.openstreetmap.org/search'
headers = {
    'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:145.0) Gecko/20100101 Firefox/145.0'
}

In [10]:
import requests_cache

# Install cache, specifying a cache file name and optional expiration time
requests_cache.install_cache('../data/geocoding2', expire_after=3600)

In [22]:
response = requests.get(url, headers=headers, params={
    'q': 'Golden Gate Bridge'})

In [23]:
response.raise_for_status()

In [24]:
response.text

'<!DOCTYPE html>\n<html lang="en">\n<head>\n  <meta charset=\'utf-8\'>\n  <meta name=\'viewport\' content=\'width=device-width,initial-scale=1\'>\n\n  <title>Nominatim Demo</title>\n\n  <link rel="icon" type="image/png" href="theme/favicon-194x194.png" sizes="194x194">\n  <link rel="icon" type="image/png" href="theme/favicon-32x32.png" sizes="32x32">\n\n  <link rel=\'stylesheet\' href=\'build/bundle.css\'>\n  <link rel=\'stylesheet\' href=\'theme/style.css\'>\n\n  <script src=\'config.defaults.js\'></script>\n  <script src=\'theme/config.theme.js\'></script>\n\n  <script>\n    if (Nominatim_Config.Reverse_Only) {\n      window.location.pathname = window.location.pathname.replace(\'search.html\', \'reverse.html\');\n    }\n  </script>\n  <script defer src=\'build/bundle.js\'></script>\n</head>\n\n<body>\n</body>\n</html>\n'

This does not seem to work. However, if we add the parameter format=json, we get the desired result:

In [25]:
response = requests.get(url, headers=headers, params={
    'q': 'Golden Gate Bridge',
    'format': 'json'})

In [26]:
response.raise_for_status()

In [27]:
response.text

'[{"place_id":300012887,"licence":"Data © OpenStreetMap contributors, ODbL 1.0. http://osm.org/copyright","osm_type":"way","osm_id":370672707,"lat":"37.8176155","lon":"-122.4783123","class":"man_made","type":"bridge","place_rank":30,"importance":0.5587403180126761,"addresstype":"man_made","name":"Golden Gate Bridge","display_name":"Golden Gate Bridge, Presidio Parkway, San Francisco, California, 94129, United States","boundingbox":["37.8080000","37.8323502","-122.4809672","-122.4763955"]},{"place_id":297482376,"licence":"Data © OpenStreetMap contributors, ODbL 1.0. http://osm.org/copyright","osm_type":"way","osm_id":595194543,"lat":"37.8202408","lon":"-122.4785700","class":"highway","type":"motorway","place_rank":26,"importance":0.5587403180126761,"addresstype":"road","name":"Golden Gate Bridge","display_name":"Golden Gate Bridge, San Francisco, Marin County, California, 94129, United States","boundingbox":["37.8081298","37.8323305","-122.4807209","-122.4764734"]},{"place_id":299365430

#### Geocoding (the easy way)

As usual, there is a package in Python for this. However, you must adhere to some rules, if you want to use Nominatim.
- Provide a User-Agent
- Send no more than one request/second (seriously!)

In [30]:
from geopy.geocoders import Nominatim

geolocator = Nominatim(user_agent="my_geocoding_app")

In [31]:
address = "1600 Amphitheatre Parkway, Mountain View, CA"
location = geolocator.geocode(address)

if location:
    print(f"Address: {location.address}")
    print(f"Latitude: {location.latitude}")
    print(f"Longitude: {location.longitude}")
else:
    print("Location not found.")

Address: Google Building 41, 1600, Amphitheatre Parkway, Mountain View, Santa Clara County, California, 94043, United States
Latitude: 37.4224857
Longitude: -122.0855846


#### Application

Let's find the coordinates of all Olympic Games' host cities!

In [33]:
import pandas as pd

In [34]:
df = pd.read_csv('../data/Winter.csv')

In [35]:
df

Unnamed: 0,index,Year,Host_country,Host_city,Country_Name,Country_Code,Gold,Silver,Bronze
0,0,1924,France,Chamonix,United States,USA,1,2,1
1,1,1924,France,Chamonix,Great Britain,GBR,1,1,2
2,2,1924,France,Chamonix,Austria,AUT,2,1,0
3,3,1924,France,Chamonix,Norway,NOR,4,7,6
4,4,1924,France,Chamonix,Finland,FIN,4,4,3
...,...,...,...,...,...,...,...,...,...
404,404,2018,South Korea,Pyeongchang,Slovakia,SVK,1,2,0
405,405,2018,South Korea,Pyeongchang,China,CHN,1,6,2
406,406,2018,South Korea,Pyeongchang,Hungary,HUN,1,0,0
407,407,2018,South Korea,Pyeongchang,Poland,POL,1,0,1


In [36]:
cities = df['Host_city'].unique()

In [37]:
cities

array(['Chamonix', 'St. Moritz', 'Lake Placid', 'Garmisch-Partenkirchen',
       'Oslo', "Cortina d'Ampezzo", 'Squaw Valley', 'Innsbruck',
       'Grenoble', 'Sapporo', 'Sarajevo', 'Calgary', 'Albertville',
       'Lillehammer', 'Nagano', 'Salt Lake City', 'Turin', 'Vancouver',
       'Sochi', 'Pyeongchang'], dtype=object)

In [38]:
wgames = df[['Host_city', 'Year']].drop_duplicates()

In [39]:
wgames

Unnamed: 0,Host_city,Year
0,Chamonix,1924
10,St. Moritz,1928
22,Lake Placid,1932
32,Garmisch-Partenkirchen,1936
43,St. Moritz,1948
56,Oslo,1952
69,Cortina d'Ampezzo,1956
82,Squaw Valley,1960
96,Innsbruck,1964
110,Grenoble,1968


In [40]:
wgames.set_index('Host_city', inplace=True)

In [41]:
wgames

Unnamed: 0_level_0,Year
Host_city,Unnamed: 1_level_1
Chamonix,1924
St. Moritz,1928
Lake Placid,1932
Garmisch-Partenkirchen,1936
St. Moritz,1948
Oslo,1952
Cortina d'Ampezzo,1956
Squaw Valley,1960
Innsbruck,1964
Grenoble,1968


In [42]:
wgames['lat'] = None
wgames['lon'] = None

In [43]:
wgames

Unnamed: 0_level_0,Year,lat,lon
Host_city,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Chamonix,1924,,
St. Moritz,1928,,
Lake Placid,1932,,
Garmisch-Partenkirchen,1936,,
St. Moritz,1948,,
Oslo,1952,,
Cortina d'Ampezzo,1956,,
Squaw Valley,1960,,
Innsbruck,1964,,
Grenoble,1968,,


In [44]:
import time

In [45]:
for city in wgames.index:
    add = geolocator.geocode(city)
    time.sleep(1) # wait for one second!
    wgames.loc[city, 'lat'] = add.latitude
    wgames.loc[city, 'lon'] = add.longitude

In [46]:
wgames

Unnamed: 0_level_0,Year,lat,lon
Host_city,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Chamonix,1924,45.92467,6.872751
St. Moritz,1928,46.497896,9.839243
Lake Placid,1932,44.283119,-73.982832
Garmisch-Partenkirchen,1936,47.492374,11.096281
St. Moritz,1948,46.497896,9.839243
Oslo,1952,59.91333,10.73897
Cortina d'Ampezzo,1956,46.538333,12.137351
Squaw Valley,1960,36.70593,-119.200118
Innsbruck,1964,47.26543,11.392769
Grenoble,1968,45.18756,5.735782


Geocoding is especially useful, if combined with folium.
Let's visualise the locations!

In [47]:
import folium

In [48]:
m = folium.Map()

In [49]:
for index, row in wgames.iterrows():
    folium.Marker(
        location=[row['lat'], row['lon']],
        popup=index + ": " + str(row['Year']),
    ).add_to(m)

In [52]:
from IPython.display import display
def show_map(m, w = 800, h = 500):
    fig = folium.Figure(width = w, height = h)
    fig.add_child(m)
    display(m)

In [53]:
show_map(m, 800, 500)

Now, let's do the same for the summer!

In [87]:
summer = pd.read_csv('../data/Summer_Olympic_Medals.csv')

In [88]:
sgames = summer[['Host_city', 'Year']].drop_duplicates()

In [89]:
sgames

Unnamed: 0,Host_city,Year
0,Athens,1896
11,Paris,1900
32,St. Louis,1904
45,London,1908
64,Stockholm,1912
83,Antwerp,1920
105,Paris,1924
132,Amsterdam,1928
165,Los Angeles,1932
192,Berlin,1936


In [90]:
sgames[sgames['Year'] == 1956]

Unnamed: 0,Host_city,Year
304,Melbourne/Stockholm,1956


In [91]:
sgames.loc[sgames['Year'] == 1956, 'Host_city']

304    Melbourne/Stockholm
Name: Host_city, dtype: object

Change this to only one city, because we cannot find the location of Melbourne/Stockholm. Let's choose Melbourne, since Stockholm already hosted Olympic games in 1912!

In [92]:
sgames[sgames['Host_city'] == 'Stockholm']

Unnamed: 0,Host_city,Year
64,Stockholm,1912


In [93]:
sgames.loc[sgames['Year'] == 1956, 'Host_city'] = 'Melbourne'

In [94]:
sgames[sgames['Year'] == 1956]

Unnamed: 0,Host_city,Year
304,Melbourne,1956


In [95]:
sgames.set_index('Host_city', inplace=True)

In [96]:
sgames

Unnamed: 0_level_0,Year
Host_city,Unnamed: 1_level_1
Athens,1896
Paris,1900
St. Louis,1904
London,1908
Stockholm,1912
Antwerp,1920
Paris,1924
Amsterdam,1928
Los Angeles,1932
Berlin,1936


In [97]:
sgames['lat'] = None
sgames['lon'] = None

In [98]:
for city in sgames.index:
    add = geolocator.geocode(city)
    time.sleep(1)
    sgames.loc[city, 'lat'] = add.latitude
    sgames.loc[city, 'lon'] = add.longitude

In [99]:
m = folium.Map()

In [103]:
for index, row in sgames.iterrows():
    folium.Marker(
        location=[row['lat'], row['lon']],
        popup=folium.Popup(index + ": " + str(row['Year']) + " (Summer)", max_width=200),
        icon=folium.Icon(color='orange')
    ).add_to(m)
for index, row in wgames.iterrows():
    folium.Marker(
        location=[row['lat'], row['lon']],
        popup=folium.Popup(index + ": " + str(row['Year']) + " (Winter)", max_width=200),
        icon=folium.Icon(color='darkblue')
    ).add_to(m)

In [106]:
show_map(m, 900, 600)

But what if we want to be able to distinguish between summer games and winter games?

In [108]:
import folium.plugins

m = folium.Map(location = [45, 0], zoom_start = 2)
# Davis: 38.5449, -121.7405

fig = folium.Figure(width = 1100, height = 700)
fig.add_child(m)

# create groups for both Olympic Game types.
winter_group = folium.FeatureGroup(name='Winter Games') 
summer_group = folium.FeatureGroup(name='Summer Games')

for index, row in sgames.iterrows():
    folium.Marker(
        location=[row['lat'], row['lon']],
        popup=folium.Popup(index + ": " + str(row['Year']) + " (Summer)", max_width=200),
        icon=folium.Icon(color="orange", icon="sun", prefix="fa")
    ).add_to(summer_group)
for index, row in wgames.iterrows():
    folium.Marker(
        location=[row['lat'], row['lon']],
        popup=folium.Popup(index + ": " + str(row['Year']) + " (Winter)", max_width=200),
        icon=folium.Icon(color="blue", icon="snowflake", prefix="fa")
    ).add_to(winter_group)

winter_group.add_to(m)
summer_group.add_to(m)
folium.plugins.Fullscreen(position = 'topleft', force_separate_button=False,).add_to(m) # add the fullscreen button

folium.LayerControl(collapsed=False).add_to(m)

show_map(m, 1100, 700)

In [109]:
m.save("../output/olympic_games.html")