# Using and reusing notebooks in high-performance computing environments

For centuries, maps have played a predominant role in modern Western society, being a valuable resource in strategic fields, such as astronomy, geometry, printing, geodetic measurement systems, optics, as well as aerospace technology, computer and information science. This use case intends to answer the following questions: (1) To what extent are historical maps visually useful to demonstrate how territories, borders, and place names have changed over centuries in Europe? (2) How different cultures influenced each other in the context of trade and exploration? (3) In education, how do maps represent different interpretations of the past?

This use case also demonstrates the potential of existing cloud and HPC infrastructures in Europe in combination with notebooks, LLM and AI models to apply computer vision, identify relevant entities, and generate new metadata to facilitate new ways of browsing the data.

### Available Datasets 

This section describes existing collections of maps made available by relevant institutions.

For instance, [KU Leuven Libraries](https://bib.kuleuven.be/english/heritage/heritagecollections/types-of-material/maps-and-atlases) hosts more than 20,000 maps and atlases with a focus on Belgian territory from the 16th century to the present. These materials show us how the landscape used to look and thus offer essential information for geographers and historians. They provide [documentation](https://bib.kuleuven.be/english/heritage/how-to-search/how-to-search-maps) to show how to access and search the maps.

<img width="30%" src="https://bib.kuleuven.be/bijzondere-collecties/images/erfgoed_oude_drukken/r3a-19840-000332.jpg/@@images/image-400-aaaa9986e11af72ae305e9b4e829de13.jpeg">

The [Royal Danish Library](https://www.kb.dk/en/find-materials/collections/map-collection) also provides access to a collection of maps. The oldest maps in the collection date from the 16th century. For example, by using the [following link](https://soeg.kb.dk/discovery/search?query=any,contains,danmark&tab=Everything&search_scope=MyInst_and_CI&vid=45KBDK_KGL:KGL&facet=rtype,include,maps&lang=en&offset=10&came_from=pagination_1_2), we can retrieve maps related to Denmark. The following image shows an example.

<img width="50%" src="https://kb-images.kb.dk/DAMJP2/DAM/Maps/0000/069/459/DK003600/full/full/0/native.jpg">

The [National Library of Spain](https://bnedigital.bne.es/bd/es/results?y=s&o=&o=&w=mapa&w=&f=ficha&f=texto_ficha&g=ws&f4=Material+cartogr%C3%A1fico+manuscrito) provides access to a collection of maps, including textual metadata, that can be exported as a txt file. The [following link](https://bnedigital.bne.es/bd/es/export?y=s&o=&o=&w=mapa&w=&f=ficha&f=texto_ficha&g=ws&f4=Material+cartogr%C3%A1fico+manuscrito&x=adefadbf-b10b-4a34-a0d7-98513056a7b3) provides access to the metadata of the collection extracted as textual documentation. Note that most of the records are provided under a CC0 licence.

<img width="30%" src="https://bnedigital.bne.es/bd/es/medium?id=d5a51609-f45a-48f0-836e-1bffe32430f7">

Here we can see an overview the metadata provided by the National Library of Spain. As we can see the metadata is limited to the title, authors, dates and some notes.

```
Registro 1

    Título:              [Mapa itinerario de Guipúzcoa]

    Tipo de documento:   Material cartográfico manuscrito

    Autoría:             

    Fecha:               [18­-]

    Materia:             

    Descripción física:  1 mapa : ms., col.

    Signatura:           MR/42/471

    MMS ID:              991000586059708606

    Identificador corto: 0174194456

    CDU:                 (466.2)

    URL:                 https://bnedigital.bne.es/bd/es/card?id=2d3c9301-a1d1-45de-ae3c-8d7c5c02221c
------------------------------------------------------------------------------------------
```

## We will use the National Library of Spain as example

### Retrieve Metadata

### Extract OCR from images

For the text extraction we are going to use a LLM approach, in particular, Ollama. We will test additional LLMs such as llava, deepseek, Mistral and QWEN. 

In [None]:
!pip install ollama-ocr

In [5]:
from ollama_ocr import OCRProcessor

# Initialize OCR processor
ocr = OCRProcessor(model_name='llama3.2-vision:11b')  # You can use any vision model available on Ollama
#ocr = OCRProcessor(model_name='llava:7b')
ocr = OCRProcessor(model_name='deepseek-ocr')

# Process an image
result = ocr.process_image(
    image_path="imgs/Isle-Saint-Domingue.jpg",
    format_type="markdown"  # Options: markdown, text, json, structured, key_value
)
print(result)

Using default prompt: Extract all text content from this image in en **exactly as it appears**, without modification, summarization, or omission.
                                Format the output in markdown:
                                - Use headers (#, ##, ###) **only if they appear in the image**
                                - Preserve original lists (-, *, numbered lists) as they are
                                - Maintain all text formatting (bold, italics, underlines) exactly as seen
                                - **Do not add, interpret, or restructure any content**
                            
Error processing image: 404 Client Error: Not Found for url: http://localhost:11434/api/generate


### Transform metadata

### Integration with the ECCCH

[ECHOES](https://www.echoes-eccch.eu/faq/) is building a federated Knowledge Graph to allow for high level integration of resources. It will also serve as an entry point for all queries and requests related to any kind of information available within the Cultural heritage Cloud. The Knowledge Graph will use the proposed Heritage Digital Twin Ontology (HDTO) to unify descriptions and facilitate query and navigation. The current version of the ECHOES HDTO is available [here](https://www.echoes-eccch.eu/wp-content/uploads/2025/06/ECHOES_HDT_Ontology.pdf).

The main vocabulary employed to describe the resources is [CIDOC-CRM](https://cidoc-crm.org/).

### Publication

In [None]:
https://bnedigital.bne.es/bd/card?oid=0000001402

In [None]:
https://bnedigital.bne.es/bd/card?oid=0000000735