# Kili Tutorial: Import OCR pre-annotations in Kili

In this tutorial we will see how to import OCR pre-annotations in Kili using [Google vision API](https://cloud.google.com/vision/docs/ocr). Pre-annotating your data will allow you to gain a significant time when performing [OCR](https://cloud.kili-technology.com/docs/text-pdf-interfaces/image-transcription-ocr/#docsNav) using Kili. 

The data we use comes from [The Street View Text Dataset](http://www.iapr-tc11.org/mediawiki/index.php?title=The_Street_View_Text_Dataset).

## Loading an image from The Street View Dataset in Kili

You can obtain the image for this tutorial using the following link (https://drive.google.com/uc?export=view&id=1ceNwCgLwIyyjPwU42xIoz6mMT3enLewW):
<img src="https://drive.google.com/uc?export=view&id=1ceNwCgLwIyyjPwU42xIoz6mMT3enLewW" width="800">

We will use the Google to perform an optical caracter recognition of the different texts in the image.

We can now create the interface we will be using in our project. For OCR, the interface to use is a classification jobs with nested transcriptions for each category.

In [2]:
json_interface =  {
    "jobs": {
        "JOB_0": {
            "mlTask": "OBJECT_DETECTION",
            "tools": [
                "rectangle"
            ],
            "instruction": "Categories",
            "required": 1,
            "isChild": False,
            "content": {
                "categories": {
                    "STORE_INFORMATIONS": {
                        "name": "Store informations",
                        "children": [
                            "JOB_1"
                        ]
                    },
                    "PRODUCTS": {
                        "name": "Products",
                        "children": [
                            "JOB_2"
                        ]
                    }
                },
                "input": "radio"
            }
        },
        "JOB_1": {
            "mlTask": "TRANSCRIPTION",
            "instruction": "Transcription of store informations",
            "required": 1,
            "isChild": True
        },
        "JOB_2": {
            "mlTask": "TRANSCRIPTION",
            "instruction": "Transcription of products",
            "required": 1,
            "isChild": True
        }
    }
}

In [3]:
!pip install google-cloud-vision
import os
import io

from google.cloud import vision
from google.oauth2 import service_account
from kili.client import Kili

Collecting google-cloud-vision
  Downloading google_cloud_vision-2.6.2-py2.py3-none-any.whl (370 kB)
[K     |████████████████████████████████| 370 kB 3.8 MB/s eta 0:00:01
[?25hCollecting google-api-core[grpc]<3.0.0dev,>=1.28.0
  Downloading google_api_core-2.3.0-py2.py3-none-any.whl (109 kB)
[K     |████████████████████████████████| 109 kB 3.2 MB/s eta 0:00:01
[?25hCollecting proto-plus>=1.15.0
  Downloading proto_plus-1.19.8-py3-none-any.whl (45 kB)
[K     |████████████████████████████████| 45 kB 10.9 MB/s eta 0:00:01
Collecting google-auth<3.0dev,>=1.25.0
  Downloading google_auth-2.3.3-py2.py3-none-any.whl (155 kB)
[K     |████████████████████████████████| 155 kB 10.0 MB/s eta 0:00:01
[?25hCollecting googleapis-common-protos<2.0dev,>=1.52.0
  Downloading googleapis_common_protos-1.54.0-py2.py3-none-any.whl (207 kB)
[K     |████████████████████████████████| 207 kB 11.9 MB/s eta 0:00:01
[?25hCollecting protobuf>=3.12.0
  Downloading protobuf-3.19.1-cp38-cp38-macosx_10_9_x86_6

In [7]:
# Authenticate to Kili Technology
api_key = os.getenv('KILI_USER_API_KEY')
api_endpoint = os.getenv('KILI_API_ENDPOINT')
kili = Kili(api_key=api_key, api_endpoint=api_endpoint)

# Create an OCR project
project = kili.create_project(
    description='OCR street view',
    input_type='IMAGE',
    json_interface=json_interface,
    title='Street text annotation'
)
project_id = project['id']
users = kili.users(api_key=api_key, fields=['email'])
kili.append_to_roles(
    project_id=project_id,
    user_email=users[0]['email'],
    role='ADMIN'
)

{'id': 'ckwxngdqd007lal9kfj3ae1rg',
 'jsonInterface': {'jobs': {'JOB_0': {'mlTask': 'OBJECT_DETECTION',
    'tools': ['rectangle'],
    'instruction': 'Categories',
    'required': 1,
    'isChild': False,
    'content': {'categories': {'STORE_INFORMATIONS': {'name': 'Store informations',
       'children': ['JOB_1']},
      'PRODUCTS': {'name': 'Products', 'children': ['JOB_2']}},
     'input': 'radio'}},
   'JOB_1': {'mlTask': 'TRANSCRIPTION',
    'instruction': 'Transcription of store informations',
    'required': 1,
    'isChild': True},
   'JOB_2': {'mlTask': 'TRANSCRIPTION',
    'instruction': 'Transcription of products',
    'required': 1,
    'isChild': True}}},
 'title': 'Street text annotation',
 'roles': [{'user': {'id': 'user-6',
    'email': 'test+github@kili-technology.com'},
   'role': 'ADMIN'}]}

## Creating OCR annotations using Google Vision API

We will now see how to perform OCR on our image using Google Vision API.

First you will need to create an account on https://cloud.google.com:
  - create a project (or use an exesting one)
  - then go to  "API and services"/library and serach for "vision API"
  - activate the API for your project (You might need to associate facturation information if you haven't already)
  
Now that the API is activated we will need to get an API in order to call later in our project:
  - go to "API and services"/indentification
  - create a service account with authorization to use the vision API
  
On the service account details page:
  - click on add a key
  - download the key using json format
  - place the key in the folder of the project



Install Google Cloud API using: `pip install --upgrade google-cloud-storage`

We can now start to code to add OCR annotations to the asset metadata! (You can also perform OCR on remote images using a URL: [detect text in images](https://cloud.google.com/vision/docs/ocr#vision_text_detection-python))

In [9]:
# Declare the path to your API_KEY
PATH_API_KEY = ''

In [10]:
def implicit():
    from google.cloud import storage

    # If you don't specify credentials when constructing the client, the
    # client library will look for credentials in the environment.
    storage_client = storage.Client()

    # Make an authenticated API request
    buckets = list(storage_client.list_buckets())
    print(buckets)

In [1]:
def detect_text(path):
    """Detects text in the file."""
    credentials = service_account.Credentials.from_service_account_file(PATH_API_KEY)
    client = vision.ImageAnnotatorClient(credentials=credentials)

    with io.open(path, 'rb') as image_file:
        content = image_file.read()

    image = vision.types.Image(content=content)

    response = client.text_detection(image=image)
    texts = response.text_annotations
    text_annotations = []

    for text in texts:
        
        vertices = [{"x": vertex.x, "y": vertex.y}
                    for vertex in text.bounding_poly.vertices]
        
        tmp = {"description": text.description,
               "boundingPoly": {
                      "vertices": vertices,
                  },
              }
        
        text_annotations.append(tmp)

    if response.error.message:
        raise Exception(
            '{}\nFor more info on error messages, check: '
            'https://cloud.google.com/apis/design/errors'.format(
                response.error.message))
                
    return text_annotations  

In [17]:
text_annotations = detect_text(PATH_TO_IMG) 

<class 'bytes'>


We now need to format the results of the OCR to fit in Kili's asset metadata

In [23]:
IMG_WIDTH = 1680
IMG_HEIGHT = 1050

full_text_annotations = {
    "fullTextAnnotation": {
        "pages": [{"height": IMG_HEIGHT, "width": IMG_WIDTH}],}, "textAnnotations": text_annotations
}

We respect Google's Vision API [`AnnotateImageResponse`](https://cloud.google.com/vision/docs/reference/rest/v1/AnnotateImageResponse) format. So in the end, the OCR data to insert into Kili as a JSON metadata contains:

- [Full text annotation](https://cloud.google.com/vision/docs/reference/rest/v1/AnnotateImageResponse#TextAnnotation). A list of pages in the document with their respective heights and widths.
- [A list of text annotations](https://cloud.google.com/vision/docs/reference/rest/v1/AnnotateImageResponse#EntityAnnotation) with:
  - the text content;
  - coordinates of the bounding box.

```
{
  "fullTextAnnotation": { "pages": [{ "height": 914, "width": 813 }] },
  "textAnnotations": [
    {
      "description": "7SB75",
      "boundingPoly": {
        "vertices": [
          { "x": 536, "y": 259 },
          { "x": 529, "y": 514 },
          { "x": 449, "y": 512 },
          { "x": 456, "y": 257 }
        ]
      }
    },
    {
      "description": "09TGG",
      "boundingPoly": {
        "vertices": [
          { "x": 436, "y": 256 },
          { "x": 435, "y": 515 },
          { "x": 360, "y": 515 },
          { "x": 361, "y": 256 }
        ]
      }
    }
  ]
}
```

In [24]:
# Add asset with pre-annotations to project

external_id = 'store'
content = 'https://drive.google.com/uc?export=view&id=1ceNwCgLwIyyjPwU42xIoz6mMT3enLewW'

kili.append_many_to_dataset(
    project_id=project_id,
    content_array=[content],
    external_id_array=[external_id],
    json_metadata_array=[full_text_annotations]
)

{'id': 'ckhc6pwtr020v0785m7adiare'}

## Annotate in Kili

You can now annotate your images and you will se the text automatically extracted.

<img src="https://raw.githubusercontent.com/kili-technology/kili-playground/master/recipes/img/store_with_ocr.png" width="800">