# MapReader Autumn Workshop (2024)


> **NOTE**: Before you begin, change your runtime to GPU to speed things up!

In [None]:
# set up for google colab - this cell will take a while to run!
!git clone https://github.com/maps-as-data/mapreader-autumn-workshop-2024.git
!pip install mapreader[dev]
!CC=clang CXX=clang++ python -m pip install 'git+https://github.com/facebookresearch/detectron2.git'
!git clone https://github.com/maps-as-data/DeepSolo.git
!python -m pip install 'git+https://github.com/maps-as-data/DeepSolo.git'
!wget https://huggingface.co/rwood-97/DeepSolo_ic15_res50/resolve/main/ic15_res50_finetune_synth-tt-mlt-13-15-textocr.pth

In [None]:
# enable custom widgets in colab
from google.colab import output
output.enable_custom_widget_manager()

## Download maps

https://mapreader.readthedocs.io/en/latest/using-mapreader/step-by-step-guide/1-download.html

For this workshop, we have pre-selected and downloaded two maps from the OS 6-inch 1st edition map series.
These were downloaded from a tile layer hosted by the NLS. You can find more information about NLS tile layers [here](https://maps.nls.uk/).

The two maps and their metadata are saved in the `maps` directory of the `mapreader-autumn-workshop-2024` repository (which we cloned earlier).

We will use one to demonstrate annotating and training, and the other to demonstrate inference.


## Load maps and patchify

https://mapreader.readthedocs.io/en/latest/using-mapreader/step-by-step-guide/2-load.html

We will now load one map and its metadata using the `loader`.

From here, we can patchify our map, visualise metadata and add further information about our map/patches.

In [None]:
from mapreader import loader

In [None]:
my_maps = loader("./mapreader-autumn-workshop-2024/maps/map_74427695.png") # load just one map

In [None]:
my_maps.add_metadata("./mapreader-autumn-workshop-2024/maps/metadata.csv", ignore_mismatch=True)

In [None]:
print(my_maps) # see which maps you have loaded

To run text spotting, we will slice our maps into 1000x1000 pixel patches:

In [None]:
my_maps.patchify_all(method="pixel", patch_size=1000) 

> If you now look in your files you will see a `patches_1000_pixel` directory which contains all the patches of your map.

In [None]:
print(my_maps)

In [None]:
# show a sample of the patches
my_maps.show_sample(num_samples=3, tree_level="patch")

Once we have these patches, we can create dataframes containing parent and patch information using the `convert_images()` method:

In [None]:
parent_df, patch_df = my_maps.convert_images()

In [None]:
parent_df.head() # parent information

In [None]:
patch_df.head() # patch information (showing only first 5 rows)

## Spot text

https://mapreader.readthedocs.io/en/latest/using-mapreader/step-by-step-guide/6-spot-text.html

MapReader offers three different text spotting frameworks:

1. DPText-DETR (detection only)
2. DeepSolo (detection and recognition)
3. MapTextPipeline (detection and recognition)

For this workshop, we will use the DeepSolo framework to spot text on our patches.

In [None]:
from mapreader import DeepSoloRunner

In [None]:
# paths to our config and weights files for the text spotting model
cfg_file = "./DeepSolo/configs/R_50/IC15/finetune_150k_tt_mlt_13_15_textocr.yaml"
weights_file = "./ic15_res50_finetune_synth-tt-mlt-13-15-textocr.pth"

In [None]:
my_runner = DeepSoloRunner(
    patch_df,
    parent_df = parent_df,
    cfg_file = cfg_file,
    weights_file = weights_file,
)

### Run the model

Due to time constraints, we will run the DeepSolo model on just one patch for the workshop.
This is done using the `run_on_image()` method.

We will also use the `return_dataframe` argument to return a dataframe which makes looking at the results easier.

In [None]:
patch_predictions = my_runner.run_on_image("./patches_1000_pixel/patch-0-0-1000-1000-#map_74427695.png#.png", return_dataframe=True)

In [None]:
patch_predictions.head() # view the predictions

We can view the results using the `show()` method:

In [None]:
my_runner.show(
    "patch-0-0-1000-1000-#map_74427695.png#.png", # patch id
)

We've saved the predictions for the rest of our patches in the `text_predictions` directory of the `mapreader-autumn-workshop-2024` repository.

These are saved as a `pkl` file, which we can load using the `pickle` library:

In [None]:
import pickle

with open("./mapreader-autumn-workshop-2024/text_predictions/patch_predictions.pkl", "rb") as f:
	my_runner.patch_predictions = pickle.load(f) # load these as the patch predictions attribute

Now we've loaded these, we can use the `show()` method to visualise the predictions for any patch we like:

> **NOTE**: You can change the patch id to view the predictions for different patches.

In [None]:
my_runner.show(
    "patch-0-4000-1000-5000-#map_74427695.png#.png", # patch id
)

We can convert the patch pixel bounds to parent pixel bounds using the `convert_to_parent_pixel_bounds()` method.

This rescales the pixel bounds of the text predictions to the parent image.

In [None]:
parent_predictions = my_runner.convert_to_parent_pixel_bounds(return_dataframe=True)

In [None]:
parent_predictions.head() # view the predictions

Now, we can view the results for a whole parent image using the `show()` method:

In [None]:
my_runner.show(
    "map_74427695.png", # parent id
)

Since we added metadata to our map earlier, we can also convert pixel bounds to geographic coordinates using the `convert_to_coords()` method. 

Once we have this, we can export our results to a GeoJSON file and load them into a GIS software to visualize them.

In [None]:
geo_predictions = my_runner.convert_to_coords(return_dataframe=True)

In [None]:
geo_predictions.head() # view the predictions

In [None]:
my_runner.save_to_geojson("deepsolo_text_predictions.geojson")

### Search in results

We can search our results using the `search_preds()` method. 

This will search in the parent predictions NOT the georeferenced predictions so we will need to convert our search results to coordinates later!

> **NOTE**: This method accepts regex patterns. Feel free to try this out if you are familiar with regex.

In [None]:
search_results = my_runner.search_preds("church", ignore_case=True, return_dataframe=True)

In [None]:
search_results.head() # view the search results

We can also view these results on our parent images using the `show_search_results()` method.

Since we only have two parent maps, we can pick one to show results for.

In [None]:
my_runner.show_search_results(
    "map_74427695.png", # parent id
)

And lastly, we can convert our search results to coordinates and save them using the `save_search_results_to_geojson()` method:

In [None]:
geo_search_results = my_runner.save_search_results_to_geojson("search_results.geojson")