# Mapreader Workshops 2024

----

First check you have the correct version of MapReader: v1.3.2

This can be downloaded from pypi using `pip install mapreader==1.3.2` or by checking out the repo at [this commit](https://github.com/Living-with-machines/MapReader/releases/tag/v1.3.2)

In [1]:
import mapreader
assert mapreader.__version__ == '1.3.2'

-------------

# Annotate

Mapreader's ``Annotate`` subpackage is used to annotate images/patches. 

Today, we will annotate our 100x100 meter patches.

In [None]:
from mapreader import Annotator

__**YOUR TURN**__: Set up your `annotator`

See [here](https://mapreader.readthedocs.io/en/latest/User-guide/Annotate.html#annotate-your-images) in docs.

Before you begin annotating your images, you must tell MapReader:

- which labels you'd like to use (``labels``)
- who is doing the annotations (``username``)
- which task you are running (``task_name``)

We will also use the ``sortby="mean_pixel_R"`` option, so that the patches with the highest R pixel intensities are shown first.

In [None]:
# labels = []
# username = ""
# task_name = ""

In [None]:
# annotator = Annotator(
#     patch_paths="./patches_100_meters/*png",
#     parent_paths="./maps/*png",
#     metadata_path="./maps/metadata.csv",
#     labels=labels,
#     username=username,
#     task_name=task_name,
#     sortby="mean_pixel_R",
#     ascending=True,
# )

__**YOUR TURN**__: Annotate some patches.

See [here](https://mapreader.readthedocs.io/en/latest/User-guide/Annotate.html#annotate-your-images) in docs.

In [None]:
annotator.annotate(show_context=True)

As you're progressing through the patches to annotate them, you'll see they are being saved to a file.

In [None]:
annotator.annotations_file

----

# Classify

Mapreader's ``Classify`` subpackage is used to 1) train or fine-tune a CV (computer vision) model to recognize visual features based on your annotated patches and 2) use your model to predict the labels of patches across entire datasets.

It contains two important classes:

- ``AnnotationsLoader`` - This is used to load and review your annotations and to create datasets and dataloaders which are used to train your model.
- ``ClassifierContainer`` - This is used to set up your model, train/fine-tune it using your datasets and to infer labels on new datasets.

## Load annotations

In [None]:
from mapreader import AnnotationsLoader

In [None]:
annotated_images = AnnotationsLoader()

__**YOUR TURN**__: Load your annotations. They are saved in your ``"./annotations/"`` directory as a ``.csv`` file. You'll need to look in your files to see the exact file name.

See [here](https://mapreader.readthedocs.io/en/latest/User-guide/Classify/Train.html#load-and-check-annotations) in docs.

In [None]:
# annotated_images.load()

Running ``annotated_images.labels_map`` will show you you the indexing of your labels. This is so they can be treated as numbers instead of strings in the model.

In [None]:
annotated_images.labels_map

### Review labels

Before training your model, you should check your annotations and ensure you are happy with your labels.

This can be done using the ``.review_labels()`` method.

For example, to re-label image with ``id: 5``, type "5" into the text box, press enter.
A text box will show the possible labels (e.g. ``['no_railspace', 'railspace']``). 
You should then type the new label you'd like for that patch (e.g. ``railspace``) and press enter again to confirm. 

> _**NOTE**_: type ``exit`` to quit!

__**YOUR TURN**__: Review your annotations.

See [here](https://mapreader.readthedocs.io/en/latest/User-guide/Classify/Train.html#load-and-check-annotations) in docs.

In [None]:
# annotated_images.review_labels()

### Create datasets and dataloaders

Before using your annotated images to train your model, you will first need to:

1. Split your annotated images into “train”, “val” and and, optionally, “test” datasets.
2. Define some transforms which will be applied to your images to ensure your they are in the right format.
3. Create dataloaders which can be used to load small batches of your dataset during training/inference and apply the transforms to each image in the batch.

> __**NOTE**__: Go to the [Classify/Train](https://mapreader.readthedocs.io/en/latest/User-guide/Classify/Train.html#prepare-datasets-and-dataloaders) section of the user-guide for more information.

The ``.create_dataloaders()`` method carries out these three steps. 

> __**NOTE**__: The default train/val/test split, image transforms and sampler will be used if no arguments are supplied to the ``.create_dataloader()`` method. 

In [None]:
dataloaders = annotated_images.create_dataloaders()

The code below can be used to see the number of instances of each labelled image in each dataset. 

This shows the importance of having enough annotations so that each dataset contains a good sample of patches for training, validating and testing your model.

In [None]:
for set_name, dataset in annotated_images.datasets.items():
    print(f'Number of instances of each label in "{set_name}":')
    value_counts = dataset.patch_df["label"].value_counts()
    for i in range(len(annotated_images.labels_map)):
        print(f"{annotated_images.labels_map[i]}:\t{value_counts[i]}")

## Train your model

### Set up your ``my_classifier`` object

In [None]:
from mapreader import ClassifierContainer

The below will make sure that the model training/inference runs as as fast as possible on your machine by using CUDA (GPU) or MPS if they are available.

This ``device`` variable can then be fed into the ``ClassifierContainer``.

In [None]:
import torch

device = 'cuda' if torch.cuda.is_available() else 'mps' if torch.backends.mps.is_available() else 'cpu'  

In [None]:
my_classifier = ClassifierContainer(
    "resnet18", 
    labels_map=annotated_images.labels_map,
    dataloaders=dataloaders,
    device=device,
)

In [None]:
my_classifier.add_criterion("cross-entropy")

In [None]:
my_classifier.initialize_optimizer()

In [None]:
my_classifier.initialize_scheduler()

### Train your model using your "train" and "val" datasets

__**YOUR TURN**__: Train your model for 10 epochs.

See [here](https://mapreader.readthedocs.io/en/latest/User-guide/Classify/Train.html#train-fine-tune-your-model) in docs.

In [None]:
# my_classifier.train(num_epochs=10)

### Visualize results

MapReader logs a number of common metrics during model training/evaluation and saves them in a dictionary ``my_classifier.metrics``.
For example:
- loss, calculated using the loss function we defined earlier (i.e. cross-entropy)
- f-scores
- precision scores
- recall scores

[This page](https://medium.com/@priyankads/beyond-accuracy-recall-precision-f1-score-roc-auc-6ef2ce097966) provides a good overview of what each of these scores mean.

For each metric, a value is logged once per epoch, either on the training dataset ("train") or the validation dataset ("val").
You can see a complete list of the metrics by running ``list(my_classifier.metrics.keys())``.

In [None]:
list(my_classifier.metrics.keys())

To plot a metric (or multiple metrics), we can use MapReaders ``plot_metric()`` method, passing the metrics we'd like to plot as the ``y_axis`` arguments.


In [None]:
my_classifier.plot_metric(
    y_axis=["epoch_loss_train", "epoch_loss_val"],
    y_label="loss",
    legends=["train loss", "valid loss"],)

__**YOUR TURN**__: Try visualizing another metric.

See [here](https://mapreader.readthedocs.io/en/latest/User-guide/Classify/Train.html#plot-metrics) in docs.

In [None]:
# my_classifier.plot_metric()

Alternatively, you can just use ``print`` to view the metrics. 

For example, the below prints f-scores per class for each class in your labels map. Each number represents the f-score after each pass through the validation dataset.

In [None]:
for label_id, label_name in annotated_images.labels_map.items():
    print(label_name, my_classifier.metrics['epoch_fscore_'+str(label_id)+'_val'])

### Test

The "test" dataset can be used to test out your model on previously unseen images. 

As these are already annotated, it makes it easy to understand whether the model is performing as expected.

__**YOUR TURN**__: Run inference on the ``"test"`` dataset.

See [here](https://mapreader.readthedocs.io/en/latest/User-guide/Classify/Train.html#testing) in docs.

In [None]:
# my_classifier.inference()

In [None]:
label = annotated_images.labels_map[1]
print(label)

In [None]:
my_classifier.show_inference_sample_results(label=label, min_conf=0.8)

Remember to save your predictions!

In [None]:
my_classifier.save_predictions("test")

# Infer 

The fine-tuned model can now be used to infer, or predict, the labels of "unseen" patches.

To show how inference works, we will predict the labels on patches from just one parent image. 

We will do this by creating a ``subset_patch_df`` from our previously saved ``patch_df.csv``.
Our new ``subset_patch_df`` will only contain the information of patches from ``map_75650661.png``.

In [None]:
import pandas as pd

patch_df = pd.read_csv("./patch_df.csv", index_col=0)  # load our patch_df.csv file

subset_patch_df = patch_df[
    patch_df["parent_id"] == "map_75650661.png"
]  # filter for our chosen parent image
subset_patch_df.head()

> __**NOTE**__: MapReader can be used to predict the labels on entire datasets and so creating a ``subset_patch_df`` is not needed in most use cases.

### Create a dataset (``infer``) from our ``subset_patch_df``

In [None]:
from mapreader import PatchDataset

In [None]:
infer = PatchDataset(subset_patch_df, transform="val", patch_paths_col="image_path")

### Load dataset into ``my_classifier``

In [None]:
my_classifier.load_dataset(infer, "infer")

### Run model inference

__**YOUR TURN**__: Run inference on your ``"infer"`` dataset

See [here](https://mapreader.readthedocs.io/en/latest/User-guide/Classify/Train.html#infer-predict) in docs.

In [None]:
# my_classifier.inference()

Save results!

In [None]:
my_classifier.save_predictions("infer")

### Save results to metadata

To add the predictions back into a ``MapImages`` object, we simply need to load our predictions csv file as metadata.

Since we have started a new notebook, we can create a new ``MapImages`` object by loading our patches.

> **NOTE** : Since we've only run inference on one parent map (``map_75650661.png``), we are only going to load patches from that map by regex searching for ``75650661`` in the file names.

In [None]:
from mapreader import load_patches

In [None]:
my_maps = load_patches(
    "./patches_100_meters/*75650661*png", parent_paths="./maps/map_75650661.png"
)

In [None]:
my_maps.add_metadata("./infer_predictions_patch_df.csv", ignore_mismatch=True, tree_level="patch")

In [None]:
my_maps.add_shape()

We can use the ``.show_parent()`` method to see how our predictions look on our parent map sheet (``map_75650661.png``).

In [None]:
my_maps.show_parent(
    "map_75650661.png",
    column_to_plot="pred",
    vmin=0,
    vmax=1,
    alpha=0.5,
    patch_border=False,
)

And the ``.convert_images()`` method to save our results.

In [None]:
parent_df, patch_df = my_maps.convert_images(save=True, save_format="xlsx") # here we are saving to xlsx so we don't change our "*.csv" files from before!

We can also save our outputs as a ``geojson`` file using the ``.save_patches_to_geojson()`` method.
> _**NOTE**_: This will require you to convert your patch coordinates into a polygon format. If these aren't already available, they can be added using the ``.add_patch_polygons()`` method.

In [None]:
my_maps.add_patch_polygons()
my_maps.save_patches_to_geojson()

Beyond MapReader, these outputs can be used to generate interesting visualizations in other tools.

For example, here are two visualizations of the rail space data from [our paper]:

- https://felt.com/map/MapReader-Launch-Event-map-Urban-Areas-and-Rail-space-9AqftKrvPTlWfwOGkdkCGkD
- https://maps.nls.uk/projects/mapreader/index.html#zoom=6.0&lat=56.00000&lon=-4.00000

# Documentation

Please refer to the [MapReader documentation](https://mapreader.readthedocs.io/en/latest/) for more information.