Demonstration/proof-of-concept code to demonstrate the use of the [kraken OCR/HTR software](https://kraken.re/) via its API.
This notebook follows the [kraken tutorial](https://kraken.re/main/api.html) closely.

Peter Stokes, EPHE-PSL, March 2025

# Setup

First, we set up the relevant libraries and create a couple of generic helper functions.

First, we need to install Kraken in our Collab environment so that we can use it. **If you are doing this on a system with kraken already installed (e.g. your own computer) then you should skip this step.**

In [None]:
# Only if kraken not already installed (e.g. in Google Collab)
# Kraken pinned to 5.2.9 because later versions seem to conflict with the Collab setup, but this is likely to change.

#!pip install kraken==5.2.9

In [None]:
import kraken
from kraken import blla, serialization
from kraken.lib import vgsl
from PIL import Image

import io, urllib, json, requests

In [None]:
# Download a sample image to test. Note that we don't need a very high resolution image, so let's be good citizens and
# download a version reduced to 25% (note pct:25 in the URL).
# For further information see the IIIF Image API: https://iiif.io/api/image/3.0/#4-image-requests

img_url = 'https://iiif.bodleian.ox.ac.uk/iiif/image/671d12e9-e014-417d-bba1-c3f16ff447f1/full/pct:25/0/default.jpg'

fd = urllib.request.urlopen(img_url)
image_file = io.BytesIO(fd.read())
im = Image.open(image_file)

In [None]:
# Generic function to download a file and store locally
# TODO: if file already exists then could simply exit, or have a flag to replace or not

def download_file(url, filename):
    try:
        response = requests.get(url, stream=True)
        response.raise_for_status()  # Raise an exception for bad status codes

        with open(filename, 'wb') as file:
            for chunk in response.iter_content(chunk_size=8192):
                file.write(chunk)
        print(f"File downloaded successfully to {filename}")
        return filename
    except requests.exceptions.RequestException as e:
        print(f"Error downloading file: {e}")
        return None

# Segmentation

Let's try downloading some freely-available segmentation models from the Zenodo repository and GitHub and see how they differ.

In [None]:
# Segment using the default blla model for comparison and have a look at the resulting data structure

baseline_seg = blla.segment(im)

In [None]:
# Let's see how many regions and lines are detected

print(len(baseline_seg.regions), "regions detected")
print(len(baseline_seg.lines), "lines detected")

In [None]:
# Let's try some more specialised models. This one is designed to find interlinear glosses

interlinear_url = 'https://github.com/malamatenia/Eutyches/raw/refs/heads/main/kraken-YALTAi/models/interlinear_BL.mlmodel'
interlinear_path = download_file(url=interlinear_url, filename='interlinear_BL.mlmodel')
interlinear_model = vgsl.TorchVGSLModel.load_model(interlinear_path)

In [None]:
# Run the segmentation and see how many interlinear additions it found

interlinear_seg = blla.segment(im, model = interlinear_model)
print(len(interlinear_seg.lines), "interlinear additions detected on this page")

# Recognition

Here we download some models for recognition (automatic transcription) and test them on another image

In [None]:
# Download an appropriate model. If you change the test image then be sure to change the model if necessary.
from kraken.lib import models

recmodel_url = 'https://zenodo.org/records/15030337/files/catmus-medieval-1.6.0.mlmodel?download=1'
recmodel_path = download_file(url=recmodel_url, filename='catmus-medieval-1.6.0.mlmodel')
recmodel = models.load_any(recmodel_path)

In [None]:
# Download a new image...

#img_url = 'https://bl.digirati.io/images/ark:/81055/vdc_100059910515.0x00006d/full/pct:25/0/default.jpg'
img_url = 'https://stacks.stanford.edu/image/iiif/pg511wq8230%252F520_034_R_TC_46/full/pct:25/0/default.jpg'

# NB this is a large image even at 25%, so be patient!
#img_url = 'https://iiif.durham.ac.uk/iiif/trifle/32150/t2/mc/z3/t2mcz30ps641/f7a5ce05416d134803625dcdddc84339.jp2/full/pct:25/0/default.jpg'

fd = urllib.request.urlopen(img_url)
image_file = io.BytesIO(fd.read())
im = Image.open(image_file)

In [None]:
# Now segment it...

baseline_seg = blla.segment(im)

In [None]:
# Now run the recognition, given the recognition model and the results of our segmentation

from kraken.rpred import rpred

pred_it = rpred(network=recmodel,
                    im=im,
                    bounds=baseline_seg)

# Print the raw transcription
for record in pred_it:
    print(record)

Note the data structure of the prediction results. From the kraken tutorial:

> The output isn’t just a sequence of characters but, depending on the type of segmentation supplied, a kraken.containers.BaselineOCRRecord or kraken.containers.BBoxOCRRecord record object containing the character prediction, cuts (approximate locations), and confidences.

Let's have a look at it:

In [None]:
record.prediction

In [None]:
record.confidences

In [None]:
record.cuts

# Palaeographical analysis using Cuts

Although kraken is designed for transcription, it does give approximate information about the likely location of characters on the image. This isn't perfect, but we can use it to automatically show the images. Because this is only an approximation, we can increase the size of the image in order to increase the likelihood of capturing the full letter. As you will see, this does not work particularly well, but we will see a slightly more sophisticated approach in the next workbook.

We could create images, but since we're working with IIIF, let's instead generate the IIIF URL to each image. This means that we also need to convert the kraken coordinates to the format that IIIF expects. We also need to take into account that we have scaled the image by 25%, but according to IIIF the scaling happens *after* the region is calculated, so we need to multiply our coordinates by 4 to allow for this.

In [None]:
# Note that we are just using the cuts variable directly, which means we will be looking at the last-detected line on the page.

x_marg = 30 # Add a margin of error
y_marg = 10 # Usually the vertical is fairly correct (at least for this type of script.
char_urls = []

for c in record.cuts:
    xy1, xy2, xy3, xy4 = c
    start_x = xy1[0] - x_marg
    start_y = xy1[1] - y_marg
    end_x = xy3[0] - xy1[0] + x_marg
    end_y = xy3[1] - xy1[1] + y_marg

    char_urls.append(img_url.replace("/full/pct:25/0", f"/{start_x*4},{start_y*4},{end_x*4},{end_y*4}/full/0"))

print(char_urls)

In [None]:
search_char = 'e'

a_indexes = [i for i, x in enumerate(record.prediction) if x == search_char]
print(a_indexes)
print([char_urls[i+1] for i in a_indexes])