# MapReader Workshop @ ADHO DH 2025
## Patch Classification with IIIF Resources


**For use in Google Colab**
 

Written by Rosie Wood and Katherine McDonough.
Reviewed and tested by Kaspar Beelen and Daniel Wilson.

Learn more about the MapReader team at https://github.com/maps-as-data/MapReader?tab=readme-ov-file#contributors. 

In [None]:
# set up for google colab - this cell will take a while to run!
!pip install mapreader[dev]

In [None]:
# enable custom widgets in colab
from google.colab import output
output.enable_custom_widget_manager()

# Download

https://mapreader.readthedocs.io/en/latest/using-mapreader/step-by-step-guide/1-download.html

In [None]:
from mapreader import IIIFDownloader

In [None]:
downloader = IIIFDownloader(
    "https://annotations.allmaps.org/manifests/a0d6d3379cfd9f0a",
    iiif_versions=3,
    iiif_uris="https://annotations.allmaps.org/manifests/a0d6d3379cfd9f0a"
)

In [None]:
downloader.save_georeferenced_maps(path_save="maps_iiif")

## Load maps and patchify

https://mapreader.readthedocs.io/en/latest/using-mapreader/step-by-step-guide/2-load.html

Now we have saved our IIIF maps, we can just follow the same steps as in the previous notebook.

We will now load both maps and their metadata using the `loader`.

From here, we can patchify our maps, visualise metadata and add further information about our maps/patches.

In [None]:
from mapreader import loader

In [None]:
my_maps = loader("./maps_iiif/*masked.tif")

In [None]:
my_maps.add_metadata("./maps_iiif/metadata.csv")

In [None]:
print(my_maps) # see which maps you have loaded

In [None]:
parent_df = my_maps.convert_images()[0]
parent_df

### Patchify maps

Choosing a patch size is an important part of using MapReader.

Before patchifying our maps, we need to think about which visual features we want to find in our maps. This will help us pick a suitable patch size.

Types of features we might want to label are:

- Continuous features (e.g. roads, rivers, lines/patterns/shading)
- Discrete features (e.g. buildings, trees, other symbology)
- Abstract or composite concepts features (e.g. farmland, urban/rural areas)

The patch size should be large enough to distinguish our visual features but small enough to get useful results.

Since we have added metadata to our maps, we have information about their coordinates and so can use the "meters" method to patchify our maps.

We will be annotating buildings and will slice our maps into 100x100 meter patches for this workshop.

In [None]:
my_maps.patchify_all(method="meters", patch_size=100, skip_blank_patches=True)

> If you now look in your files you will see a `patches_100_meters` directory which contains all the patches of your two maps.

In [None]:
print(my_maps)

In [None]:
# show a sample of the patches
my_maps.show_sample(num_samples=3, tree_level="patch")

Since our model will be looking at pixel values to classify our patches, it can be useful to know some statistics about these.

- The mean pixel value for a patch gives an idea of the average "brightness" of the patch, with higher values indicating lighter patches so more empty/white space.
- The standard deviation of pixel values gives an idea of the variation in pixel values across the patch. Higher values indicate more variation in pixel values.

We can calculate these statistics using the `calc_pixel_stats()` method:

In [None]:
my_maps.calc_pixel_stats()

We can look at what information we have about each of our parent images and patches (including the pixel statistics we just calculated).

The easiest way to do this is to create dataframes containing parent and patch information using the `convert_images()` method:

In [None]:
parent_df, patch_df = my_maps.convert_images()

In [None]:
parent_df.head() # parent information

In [None]:
patch_df.head() # patch information (showing only first 5 rows)

## Annotate

https://mapreader.readthedocs.io/en/latest/using-mapreader/step-by-step-guide/3-annotate.html

In [None]:
from mapreader import Annotator

Before we begin annotating, we need to set up our annotation task by specifying labels, a task name and a username for the person annotating.

There are two options for picking good labels:
- Binary labels: e.g. "building" and "not building"/"no"
- Multi-class labels (but these must be mutually exclusive!): e.g. "building", "road", "building and road" and "neither building nor road"/"no"

>__*NOTE*__: You can change the labels in the cell below if you'd like to annotate something else!!

In [None]:
task_name = "buildings" # rename if you want to try a different task
labels = ["building", "no"] # change these to the labels you want to use
username = "rosie" # change this to your username

In [None]:
annotator = Annotator(
    patch_df, # the information about our patches
    parent_df, # the information about our parent images
    task_name=task_name,
    labels=labels,
    username=username,
    resize_to=300, # resize the patches to 300x300 pixels in the annotation interface
	)

First, we will annotate with no context image.
This is representative of what the model sees during training so can be helpful for understanding what visual features are a good choice for labelling.

> Do as many annotations as you want here (we will do more annotations in the cell below).

In [None]:
annotator.annotate()

In [None]:
print(len(annotator.get_labelled_data())) # see number of annotations
annotator.get_labelled_data() # see annotations

To make annotating easier, it can be helpful to see the patch in its surrounding context.
This is done by setting `show_context=True`.

We will now have a go at this.

In [None]:
annotator.annotate(show_context=True, resize_to=600) # show the context of the patch and resize to 600x600 pixels to make it easier to see

In [None]:
print(len(annotator.get_labelled_data())) # see number of annotations
annotator.get_labelled_data() # new annotations are added to the existing ones

> If you now look in your files you will see an `annotations` directory containing a CSV file with your annotations.
This file is auto-saved and updated each time you add new annotations.

In [None]:
annotator.annotations_file # see the path to the annotations file

## Train your model

https://mapreader.readthedocs.io/en/latest/using-mapreader/step-by-step-guide/4-classify/train.html

There is no definite answer to how many annotations you need to train a model.
However, the more annotations you have, the better your model will likely be - e.g. for our railspace/building models we had 62K annotations!

> __*NOTE:*__ It is very unlikely you will have time to do enough annotations in the workshop to create a great model, but you should have enough to see some results.

Here are some tips for getting the most out of your annotations:

- Start by annotating a representative sample of your data.
- In most experiments, patch-level classification is quick and so we can iteratively check the performance of our models after training. i.e. Do some annotations, train your model, visually inspect results to identify systematic errors, do some targeted annotations to fix these errors, then repeat the training process and so on.
- Use contextual information - If you expect a certain pattern in neighboring patches you can use that information to identify and locate possible errors.
- Use external datasets (e.g., StopsGB and railway patches).

### Load annotations

In [None]:
from mapreader import AnnotationsLoader

In [None]:
annotations = AnnotationsLoader() # initialise

annotations.load(
    annotator.annotations_file, # path to the annotations file
)

During model training labels must be integers instead of strings so the `AnnotationsLoader` will create a mapping between labels and label indices.

In [None]:
annotations.labels_map # the mapping between the labels and label indices

To fine-tune our model, we will split our annotations into train (70%), validation (15%) and test (15%) datasets.

- The train dataset is used to train the model (i.e. to update the model parameters).
- The validation set is used to evaluate the model's performance during training (but not to update the model parameters).
- The test set is unseen data reserved for us to use to evaluate the model's performance after training.

e.g. if you have 100 annotations, the train dataset will have 70 annotations, the validation dataset will have 15 annotations and the test dataset will have 15 annotations.

In [None]:
annotations.create_datasets() # create the datasets

You will need to have __*at least*__ one instance of each label in each dataset (ideally you'd want a lot more than this, but for the workshop it should be fine).

You can check this using the code below:

In [None]:
for set_name, dataset in annotations.datasets.items():
    print(f"Number of instances of each label in '{set_name}':\n{dataset.patch_df['label'].value_counts().to_dict()}\n")

> Continue to training the model if you feel you have enough annotations. If not, go back and add some more using the cells above.

In [None]:
dataloaders = annotations.create_dataloaders() # create the dataloaders

### Set up and train the model

In [None]:
from mapreader import ClassifierContainer

In [None]:
my_classifier = ClassifierContainer(
    "resnet18", # the model architecture, choose from https://pytorch.org/vision/0.8/models.html
    labels_map=annotations.labels_map,
    dataloaders=dataloaders,
)

In [None]:
my_classifier.add_loss_fn("cross-entropy") # add the loss function

In [None]:
my_classifier.initialize_optimizer("adam") # add the optimizer

In [None]:
my_classifier.initialize_scheduler() # add the scheduler

Now we can actually train the model, we will start with 10 epochs (1 epoch = 1 full pass through the training data):

In [None]:
# train the model
my_classifier.train(num_epochs=10)

> If you now look in your files you will see a `models` directory containing your model files.

### Visualize progress

MapReader logs a number of common metrics during model training/evaluation and saves them in a dictionary ``my_classifier.metrics``.
For example:
- loss, calculated using the loss function we defined earlier (i.e. cross-entropy)
- f-scores
- precision scores
- recall scores

[This page](https://cohere.com/blog/classification-eval-metrics) provides a good overview of what each of these scores mean.

In [None]:
my_classifier.plot_metric(
    metrics="loss", # choose the metric to plot
	phases=["train", "val"], # choose the phases to plot
    )

## Infer

https://mapreader.readthedocs.io/en/latest/using-mapreader/step-by-step-guide/4-classify/infer.html

We first need to create a new dataset containing all our patches, including the ones from the map we didn't annotate.

In [None]:
from mapreader import PatchDataset

In [None]:
patch_dataset = PatchDataset(
    patch_df,
    transform="test", # apply the test transform on the patches
    )

In [None]:
my_classifier.load_dataset(patch_dataset, set_name="all_patches") # load the dataset

Now we can use our fine-tuned model to predict the labels on the rest of our patches:

In [None]:
my_classifier.inference(set_name="all_patches")

In [None]:
my_classifier.save_predictions("all_patches") # save the predictions

> If you now look in your files you will see a file called ``all_patches_predictions_patch_df.csv`` which contains predictions for each patch.

### Visualize results

We can load the predictions as metadata in our `my_maps` object.
This makes it easy to visualize our predictions.

In [None]:
my_maps.add_metadata(
    "./all_patches_predictions_patch_df.csv",
    tree_level="patch" # add the predictions as patch metadata
)

In [None]:
# Visualise predicted labels on map (change to 0 or 1 for each parent map)

# Yellow = 1 (building)
# Purple = 0 (no building)

parent_list = my_maps.list_parents()
my_maps.explore_patches(
    parent_list[1],
    column_to_plot="pred",
)

We can save our results as CSVs, or in GEOJSON format for use in GIS software:

In [None]:
parent_df, patch_df = my_maps.convert_images(save=True, save_format="csv") # as CSV

In [None]:
my_maps.save_patches_to_geojson("predicted_outputs_iiif.geojson", rewrite=True) # as GeoJSON

### You are done! 

Next, you can
- add more annotations to improve your model performance,
- try out your model on other map sheets, or
- test different patch sizes and labels.
