# Experiment with recognizing times from MK8DX screenshots

In [21]:
from transformers import AutoModel, AutoTokenizer
import tempfile
import PIL
from pathlib import Path
import warnings
import os

In [27]:
tokenizer = AutoTokenizer.from_pretrained('ucaslcl/GOT-OCR2_0', trust_remote_code=True, resume_download=None)

In [29]:
model = AutoModel.from_pretrained('ucaslcl/GOT-OCR2_0', 
                                  trust_remote_code=True, 
                                  low_cpu_mem_usage=True, 
                                  device_map='cuda', 
                                  use_safetensors=True, 
                                  pad_token_id=tokenizer.eos_token_id,
                                  resume_download=None)

In [32]:
model = model.eval().cuda()


In [13]:
image_file = 'data/test_images/2019102710534900_c.jpg'

In [33]:
res = model.chat(tokenizer, image_file, ocr_type='ocr')

The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:151643 for open-end generation.


In [34]:
print(res)

And e Solo Race 1:24.739 0:28.703 Race against Ghost 0:28.330 0:27.706 View Ghost Biddybuggy Roller Online Ghosts Gold Glider Upload Ghost Data GC N Yoshi Circuit A OK B


## Identify which kind of image we have

We need a way to distinguish among at least three image formats:
1. Blue background, track name at bottom, combo listed with text.
2. Track background, racing alone (no ghost).
3. Track background, racing against ghost.

We also need a way to identify how many laps there are in the race. This can vary between two and five, with three being the most common.

### Determine kind of image

In [53]:
class ImageToOCR():
    def __init__(self, image_file_name):
        self.image_file_name = image_file_name
    
    @property
    def ocr(self) -> str:
        return self.ocr_box(box='')
    
    def ocr_box(self, box='') -> str:
        """
        Given a box defining opposite corners of a box in pixels,
        return the text ocr'ed from that box.

        box is a string containing an array of integers: '[x1, y1, x2, y2]'. 
        If empty, the box is the entire image.
        """
        text = model.chat(tokenizer, self.image_file_name, ocr_type='ocr', ocr_box=box)
        return text
    
    @property
    def kind(self):
        with PIL.Image.open(self.image_file_name) as f:
            image_size = f.size
        
        lower_left_button = self.ocr_box(box='[109, 647, 138, 676]')
        lower_right_lap_2_number = self.ocr_box(box='[1020, 456, 1044, 480]')
        last_digit_top_lap_2 = self.ocr_box(box='[1162, 220, 1183, 253]')

        if image_size != (1280, 720):
            # Wrong size; can't be a screen capture from the Switch
            self.image_kind = 'other'
        elif lower_left_button == 'B':
            # Looks for the B button in the lower left corner
            self.image_kind = 1
        elif lower_right_lap_2_number == '2':
            # Looks for the second lap in the lower box of times.
            # Assumes that the course has three laps.
            self.image_kind = 3
        elif int(last_digit_top_lap_2) is not None:
            # If there's no lower box of times, looks for the second lap
            # in the upper box of times.
            self.image_kind = 2
        else:
            # Can't positively identify as one of the defined kinds.
            self.image_kind = 'other'
        return self.image_kind

In [55]:
# kind 1
image_file = 'data/test_images/2019102710534900_c.jpg'
print(image_file + ': ' + str(ImageToOCR(image_file).kind))

# kind 2
image_file = 'data/test_images/2023062418352400_s.jpg'
print(image_file + ': ' + str(ImageToOCR(image_file).kind))

# kind 3
image_file = 'data/test_images/2023070714422000_s.jpg'
print(image_file + ': ' + str(ImageToOCR(image_file).kind))

# kind other
image_file = 'data/test_images/dog.jpg'
print(image_file + ': ' + str(ImageToOCR(image_file).kind))

The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:151643 for open-end generation.
The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:151643 for open-end generation.
The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:151643 for open-end generation.
The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:151643 for open-end gene

data/test_images/2019102710534900_c.jpg: 1


The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:151643 for open-end generation.
The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:151643 for open-end generation.


data/test_images/2023062418352400_s.jpg: 2


The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:151643 for open-end generation.
The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:151643 for open-end generation.
The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:151643 for open-end generation.


data/test_images/2023070714422000_s.jpg: 3


The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:151643 for open-end generation.
The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:151643 for open-end generation.


data/test_images/dog.jpg: other


In [None]:
with PIL.Image.open(image_file) as f:
    image_size = f.size

if image_size != (1280, 720):
    image_kind = 'other'
elif ImageToOCR(image_file_name=image_file).ocr_box(box='[109, 647, 138, 676]') == 'B':
    image_kind = 1
elif ImageToOCR(image_file_name=image_file).ocr_box(box='[1022, 456, 1044, 480]') == '2':
    image_kind = 3
elif ImageToOCR(image_file_name=image_file).ocr_box(box='[1022, 226, 1044, 248]') == '2':
    image_kind = 2
else:
    image_kind = 'other'

print(image_kind)