# OCR Tutorial

In this tutorial, you will learn how to use OCR (EasyOCR) to detect text from Aria frames.


### Notebook stuck?
Note that because of Jupyter issues, sometimes the code may stuck at visualization. We recommend **restart the kernels** and try again to see if the issue is resolved.

## Step 1. Install Project Aria Tools
Run the following cell to install Project Aria Tools for reading Aria recordings in .vrs format

In [None]:
# Specifics for Google Colab
google_colab_env = 'google.colab' in str(get_ipython())
print("Running from Google Colab, installing projectaria_tools")
!pip install projectaria-tools

## Step 2. Prepare an Aria recording

We will set the vrsfile path to your collected Aria recording.

Upload your Aria recording in your Google Drive before running the cell.

Here, we assume it is uploaded to **`My Drive/Fridge/sample.vrs`**

*(You can check the content of the mounted drive by running `!ls "/content/drive/My Drive/"` in a cell.)*



In [None]:
from google.colab import drive
import os
drive.flush_and_unmount()
drive.mount('/content/drive/')
my_vrs_file_path = 'Fridge/sample.vrs'
vrsfile = "/content/drive/My Drive/" + my_vrs_file_path
print(f"INFO: vrsfile set to {vrsfile}")

## Step 3. Create data provider

Create projectaria data_provider so you can load the content of the vrs file.

In [None]:
from projectaria_tools.core import data_provider, calibration
from projectaria_tools.core.sensor_data import TimeDomain, TimeQueryOptions
from projectaria_tools.core.stream_id import RecordableTypeId, StreamId
import numpy as np
from matplotlib import pyplot as plt

print(f"Creating data provider from {vrsfile}")
provider = data_provider.create_vrs_data_provider(vrsfile)
if not provider:
    print("Invalid vrs data provider")

## Step 4. Display VRS rgb content in thumbnail images

Goals:
- Summarize a VRS using 20 image side by side, to visually inspect the collected data.

Key learnings:
- Image streams are identified with a Unique Identifier: stream_id
- Image frames are identified with timestamps
- PIL images can be created from Numpy array

Customization
- To change the number of sampled images, change the variable `sample_count` to a desired number.
- To change the thumbnail size, change the variable `resize_ratio` to a desired value.

In [None]:
from PIL import Image
from tqdm import tqdm

sample_count = 30
resize_ratio = 10

rgb_stream_id = StreamId("214-1")

# Retrieve image size for the RGB stream
time_domain = TimeDomain.DEVICE_TIME  # query data based on host time
option = TimeQueryOptions.CLOSEST # get data whose time [in TimeDomain] is CLOSEST to query time

# Retrieve Start and End time for the given Sensor Stream Id
start_time = provider.get_first_time_ns(rgb_stream_id, time_domain)
end_time = provider.get_last_time_ns(rgb_stream_id, time_domain)

image_config = provider.get_image_configuration(rgb_stream_id)
width = image_config.image_width
height = image_config.image_height

thumbnail = newImage = Image.new(
    "RGB", (int(width * sample_count / resize_ratio), int(height / resize_ratio))
)
current_width = 0


# Samples 10 timestamps
sample_timestamps = np.linspace(start_time, end_time, sample_count)
for sample in tqdm(sample_timestamps):
    image_tuple = provider.get_image_data_by_time_ns(rgb_stream_id, int(sample), time_domain, option)
    image_array = image_tuple[0].to_numpy_array()
    image = Image.fromarray(image_array)
    new_size = (
        int(image.size[0] / resize_ratio),
        int(image.size[1] / resize_ratio),
    )
    image = image.resize(new_size).rotate(-90)
    thumbnail.paste(image, (current_width, 0))
    current_width = int(current_width + width / resize_ratio)

from IPython.display import Image
display(thumbnail)

## Step 5. Install EasyOCR


In [None]:
# Install detectron2
!pip install easyocr

## Step 6. Run OCR

Run OCR for each sampled timestamps in Step 4.

- The detected text will be stored in `ocr_dict`.

- You can set the image size using the `imsize` variable.

- You can add list of languages to be parsed as follows:
```
reader = easyocr.Reader(['en', 'fr', 'ch_sim'])
```
For all supported languages in EasyOCR, see https://www.jaided.ai/easyocr/.

- You can set the `confidence_thres` to only keep the texts that have confidences above the threshold.

- The output will be in a list format, each item represents a bounding box, the text detected and confident level, respectively.
```
[ ([[226, 170], [414, 170], [414, 220], [226, 220]], 'Yuyuan Rd.', 0.8261902332305908),
 ([[79, 173], [125, 173], [125, 213], [79, 213]], 'W', 0.9848111271858215),
 ([[529, 173], [569, 173], [569, 213], [529, 213]], 'E', 0.8405593633651733)]
 ```


In [None]:
import easyocr
from PIL import Image

imsize = 3072
confidence_thres = 0.2

ocr_dict = {
    'timestamps': [],
    'texts': [],
    'bboxes': [],
}

reader = easyocr.Reader(['en',])  # Load EasyOCR model. Only need to be called once.

for sample in tqdm(sample_timestamps):

    # Fetch image
    image_tuple = provider.get_image_data_by_time_ns(rgb_stream_id, int(sample), time_domain, option)
    image_array = image_tuple[0].to_numpy_array()
    image = Image.fromarray(image_array)
    new_size = (imsize, imsize)
    image = np.asarray(image.resize(new_size).rotate(-90))
    result = reader.readtext(image)
    print(f"result: {result}")

    if result is not None:
      ocr_dict['timestamps'].append(sample)
      ocr_dict['bboxes'].append([res[0] for res in result if res[2]> confidence_thres])
      ocr_dict['texts'].append([res[1] for res in result if res[2]> confidence_thres])


## Step 8. Display and save detected text lists

### We can get ocr results for each timestamps.

In [None]:
ocr_save_path = '/content/ocr_results.json'

import pandas as pd
df = pd.DataFrame(ocr_dict)
df.to_json(ocr_save_path)
# Set the maximum width of each column
pd.set_option('display.max_colwidth', None)  # Replace None with a number if needed
display(df[['timestamps', 'texts']])

Unnamed: 0,timestamps,texts
0,1641293000000.0,[]
1,1642534000000.0,[]
2,1643776000000.0,[]
3,1645017000000.0,[]
4,1646258000000.0,"[AcTiVESMART, %67, 0]"
5,1647500000000.0,"[ActiveshaRT, TECHNOLOGY, %7, Stghattng;, CAMPARI', Tomatoes]"
6,1648741000000.0,"[ActineskaRt, 9, 3]"
7,1649982000000.0,"[Harn, Guan, Baby Bella Mushrooms, bebe bella cham, ACTIvESMART, Technology, ladelphia, Reproductte]"
8,1651224000000.0,"[WA, Han, and, Cancer, %7, ActiveshaRT, TECHNOLOGT, RNING:, Reproducute ]"
9,1652465000000.0,"[MdELpha, ACTIvESMART, TECHNOLOGY]"
