# MapReader Workshop @ ADHO DH 2025
## Text Spotting with IIIF Resources


**For use in Google Colab**
 

Written by Rosie Wood and Katherine McDonough.
Reviewed and tested by Kaspar Beelen and Daniel Wilson.

Learn more about the MapReader team at https://github.com/maps-as-data/MapReader?tab=readme-ov-file#contributors. 

Run the next cell while we go through the slides.

In [None]:
# set up for google colab - this cell will take a while to run!
!git clone https://github.com/maps-as-data/workshop-dh2025
!pip install -r workshop-dh2025/requirements.txt

!git clone https://github.com/maps-as-data/MapTextPipeline.git

In [None]:
# enable custom widgets in colab
from google.colab import output
output.enable_custom_widget_manager()

# Download

In [None]:
from mapreader import IIIFDownloader

from piffle.load_iiif import load_iiif_image, load_iiif_presentation

# Georeferenced Map Example

Leventhal Map & Education Center, Boston Public Library

*Path map of the eastern part of Mount Desert Island, Maine* (1903)

https://collections.leventhalmap.org/search/commonwealth:cj82m682d

- IIF Manifest - https://collections.leventhalmap.org/search/commonwealth:cj82m682d/manifest
- Allmaps viewer - https://viewer.allmaps.org/?url=https%3A%2F%2Fannotations.allmaps.org%2Fimages%2Ff29ad52e4d2477a2
- Allmaps georeference annotation - https://annotations.allmaps.org/images/f29ad52e4d2477a2


In [None]:
# Download

downloader = IIIFDownloader(
    "https://annotations.allmaps.org/images/f29ad52e4d2477a2",
    iiif_versions=3,
    iiif_uris="https://annotations.allmaps.org/images/f29ad52e4d2477a2"
)

In [None]:
# save the maps as geotiffs
downloader.save_georeferenced_maps()

# Load

https://mapreader.readthedocs.io/en/latest/using-mapreader/step-by-step-guide/2-load.html

In [None]:
from mapreader import loader

In [None]:
from PIL import Image
Image.MAX_IMAGE_PIXELS = None  # Disable limit on image size


# change this path to the saved IIIF image you want to load
my_files = loader("./maps/ea5a3e20e44cea9c_masked.tif")

In [None]:
# len() shows the total number of images currently read (or sliced, see below)
print(f"Number of images: {len(my_files)}")

In [None]:
print(my_files)

In [None]:
my_files.add_metadata("./maps/metadata.csv")

In [None]:
parent_list = my_files.list_parents()

## Patchify map

In [None]:
my_files.patchify_all(patch_size=1000, path_save="./patches_1000_pixel")  # in pixels

In [None]:
my_files.show_sample(num_samples=12, tree_level="patch")

In [None]:
my_files.show_patches(
    parent_id=parent_list[0],
    figsize=(15, 15)
)

For georeferenced maps: Add coordinate increments for text spotting (1 degree latitude = X pixels)

In [None]:
my_files.add_coord_increments()

In [None]:
parent_df, patch_df = my_files.convert_images(save=True)

In [None]:
parent_df.head()

In [None]:
patch_df.head()

In [None]:
patch_list = my_files.list_patches()

## Detect Text

Here, we show how to load an already fine-tuned text spotting (detection & recognition) model and run the model inference on your patches.

Download 'rumsey-finetune.pth' from https://drive.google.com/drive/folders/1AEURUafbgx8tnA83uvIwq8_hxae0U008?usp=sharing.

Add to the 'MapTextPipeline' folder in your Google Colab environment.

In [None]:
# SEE NOTE ABOVE ABOUT DOWNLOADING THE MODEL WEIGHTS

# Then change this to your own path, see the README for more details on how to get these
MAPTEXT_MODEL_PATH = "./MapTextPipeline/rumsey-finetune.pth"

In [None]:
# https://github.com/maps-as-data/MapTextPipeline

cfg_file = f"./MapTextPipeline/final_rumsey.yaml"
weights_file = MAPTEXT_MODEL_PATH


In [None]:
# Set parameters for MapTextPipeline model

from mapreader import MapTextRunner

my_runner = MapTextRunner(
    patch_df,
    parent_df,
    cfg_file = cfg_file,
    weights_file = weights_file,
    device = "cpu",
)

### Run on all patches in the patch dataframe

In [None]:
# takes approx 25 mins to run on M1 MacBook Pro
# If this is too long, uncomment the cell below and then run on just the first 8 patches

# my_runner.patch_df = my_runner.patch_df[:8]

patch_predictions = my_runner.run_all(return_dataframe=True)

In [None]:
my_runner.show_predictions(
    patch_list[0],
    figsize=(15, 15),
    border_color="r",
    text_color="b",
    )

## Scale up to parent images

In [None]:
parent_predictions = my_runner.convert_to_parent_pixel_bounds(return_dataframe=True)

In [None]:
parent_predictions.head()

In [None]:
my_runner.show_predictions(
    parent_list[0],
    figsize=(15, 15),
    border_color="r",
    text_color="b",
    )

## Concert pixel bounds to coordinates

In [None]:
geo_predictions = my_runner.convert_to_coords(return_dataframe=True)

Saving these outputs will give you a geojson file you can load into a GIS software.

In [None]:
my_runner.to_geojson("./example_output.geojson")