<a href="https://colab.research.google.com/github/siwarnasri/Snippet-Library/blob/main/handwritten_text_with_azure.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Recognizing and extracting handwritten text from images, using Microsoft Azure and Python

This notebook contains the necessary code to send an image to Azure, get Azure to recognize and extract handwritten text, and send that text back to us.

You are currently looking at a copy. To **run** this code, you need to be signed into a Google account, and then hit 'file' and 'save a copy in drive'. You'll be able to run your **copy**.

## How this notebook works

Each cell below contains an individual part of the complete program. We are going to run each cell in turn. A cell is **active** if it appears with a shadow around it (as if it was popped up a bit). If you mouse over an active cell where you see `[ ]`, a little 'run' button appears. You click that run button, and a little wait indicator appears (how long it appears depends on the complexity of the operation). **When the code is finished running** a number will appear between the brackets, like so: `[1]`. This tells you that this was cell number 1 to run.

**Important:** Let a cell finish before running the next cell below it. If you run cells out of order, or try to run two cells at the same time, things can get screwed up.

Got that? Ready? Then let's go!

## 1. Get the packages we need, import the bits we want.

We only do this whenever we start this process. It only has to be done once per session.


In [None]:
# First we go get the sdk:
!pip install --upgrade azure-cognitiveservices-vision-computervision

In [None]:
# https://docs.microsoft.com/en-us/azure/cognitive-services/computer-vision/quickstarts-sdk/python-sdk
from azure.cognitiveservices.vision.computervision import ComputerVisionClient
from azure.cognitiveservices.vision.computervision.models import TextOperationStatusCodes
from azure.cognitiveservices.vision.computervision.models import TextRecognitionMode
from azure.cognitiveservices.vision.computervision.models import VisualFeatureTypes
from msrest.authentication import CognitiveServicesCredentials

from array import array
import os
from PIL import Image
import sys
import time

## 2. Now we connect up with Azure

We only do this during our start up, after we've done the bit above. **Notice lines 4 and 5 in the block below**, where it says `COMPUTER_VISION_SUBSCRIPTION_KEY` and `COMPUTER_VISION_ENDPOINT`. These are two variables that you need to complete with the information from the computer vision service you created in your Azure portal (a reminder of what this looks like is at [figure 7](shawngraham.github.io/dhmuse/detecting-handwriting/#a-shortcut).

The key will be a long string of letters and numbers separated by hyphens; paste this **between** the `'` where it says `xxxx`. The endpoint will be a full url **with** the https bit. Paste that **between** the `'` where it says `yyyy`.

Once you've done that, run the three blocks of code below in order.

**PS** if you share this notebook, _delete_ the key and endpoint and replace with xxx and yyy before you share it. You don't want this info lying around.

In [None]:
# you have to sign up for a free trial with azure, https://portal.azure.com
# then make a resource under 'cognitive resources'
# for computer vision to get the correct api, endpoint
os.environ['COMPUTER_VISION_SUBSCRIPTION_KEY']='xxxx'
os.environ['COMPUTER_VISION_ENDPOINT']='yyyy'

In [None]:
# Add your Computer Vision subscription key to your environment variables.
if 'COMPUTER_VISION_SUBSCRIPTION_KEY' in os.environ:
    subscription_key = os.environ['COMPUTER_VISION_SUBSCRIPTION_KEY']
else:
    print("\nSet the COMPUTER_VISION_SUBSCRIPTION_KEY environment variable.\n**Restart your shell or IDE for changes to take effect.**")
    sys.exit()
# Add your Computer Vision endpoint to your environment variables.
if 'COMPUTER_VISION_ENDPOINT' in os.environ:
    endpoint = os.environ['COMPUTER_VISION_ENDPOINT']
else:
    print("\nSet the COMPUTER_VISION_ENDPOINT environment variable.\n**Restart your shell or IDE for changes to take effect.**")
    sys.exit()

In [None]:
computervision_client = ComputerVisionClient(endpoint, CognitiveServicesCredentials(subscription_key))

## 3. Set up and load our target image.

Now the fun begins. The two blocks below do two things.

The first creates a variable called 'remote_image_url' and gives it the information of where our image lives.

The second loads the image up and gets it ready for processing.

Easiest solution: Put your images in a github repository. Once your image is uploaded to github, right-click it and select 'copy image url'. The resulting URL will follow this pattern: https://raw.githubusercontent.com/shawngraham/demo/master/frontenac-card.png  .  The 'raw.githubusercontent.com/your-name/your-repo/master/file-name.png' is what you want.

Run the two code blocks in order.

In [None]:
remote_image_url = "http://data2.archives.ca/ap/c/c147108.jpg"

In [None]:
'''
Batch Read File, recognize printed text - remote
This example will extract printed text in an image, then print results, line by line.
This API call can also recognize handwriting (not shown).
'''
print("===== Batch Read File - remote =====")
# Get an image with printed text
remote_image_printed_text_url = remote_image_url
# remote_image_printed_text_url = "https://raw.githubusercontent.com/Azure-Samples/cognitive-services-sample-data-files/master/ComputerVision/Images/printed_text.jpg"
# Call API with URL and raw response (allows you to get the operation location)
recognize_printed_results = computervision_client.batch_read_file(remote_image_printed_text_url,  raw=True)

===== Batch Read File - remote =====


## 4. Now OCR that handwriting!

The next bit of code sends our image through the Azure computers. Just run it, and Azure will give us back the text that it has identified. Copy the text to a file somewhere, go back to step 3 and change up the remote_image_url, run those two blocks, then this one, to get your next image(s).

ta da!

In [None]:
# Get the operation location (URL with an ID at the end) from the response
operation_location_remote = recognize_printed_results.headers["Operation-Location"]
# Grab the ID from the URL
operation_id = operation_location_remote.split("/")[-1]

# Call the "GET" API and wait for it to retrieve the results
while True:
    get_printed_text_results = computervision_client.get_read_operation_result(operation_id)
    if get_printed_text_results.status not in ['NotStarted', 'Running']:
        break
    time.sleep(1)

# Print the detected text, line by line
if get_printed_text_results.status == TextOperationStatusCodes.succeeded:
    for text_result in get_printed_text_results.recognition_results:
        for line in text_result.lines:
            print(line.text)
            #print(line.bounding_box)
print()

Complete. So stripped la a bath . I found the
briden had burst and there was no hot water In
bathing . so of shaved smoothly - scrubbed my lice
and neck and arms - brushed my teeth, and .. .
bat down to speak to you and live with my
Children , Bless you all - make yourallies as
pure and sweet as flowers specked with crystal
tar . Love with all your hearts . everything and
beryone . What a game it is ! All of us , individually
Going our way - a different outlooks in nearly all
things and yet so Similar - Hour grand it is for
them who have companions near to hem - I'm
awful sorry for the lonely mes . They have it
half the chance, but then they are given often more
spirit to fight and their vision is very lived.
hell . my sweet little wife . (I've cultwated a sense of
Smell, and think of you as an odown . the same with
my children ) the clocks have just struck midnight . I'm
going to bed - to be guiltly with my own thoughts.
to he content - Is make the most of time between work
it's all w

## 5. And of course, do other things too.

You don't have to run these blocks, but you can try them out on other images if you want (again, change the remote_image_url in Step 3).

You're welcome to try them out. All of this is for free, as long as you only do it at a rate of 20 per minute, and to a maximum of 5000 per month. Every time you see a comment in the code saying 'Call API', that's what we're talking about - we call the Azure computer using its API, in essence saying, 'hey, here's another image, what text is in it?'. That's a call.

In [None]:
'''
Describe an Image - remote
This example describes the contents of an image with the confidence score.
'''
print("===== Describe an image - remote =====")
# Call API
description_results = computervision_client.describe_image(remote_image_url )

# Get the captions (descriptions) from the response, with confidence level
print("Description of remote image: ")
if (len(description_results.captions) == 0):
    print("No description detected.")
else:
    for caption in description_results.captions:
        print("'{}' with confidence {:.2f}%".format(caption.text, caption.confidence * 100))

===== Describe an image - remote =====
Description of remote image: 
'a close up of text on a white background' with confidence 82.97%


In [None]:
'''
Categorize an Image - remote
This example extracts (general) categories from a remote image with a confidence score.
'''
print("===== Categorize an image - remote =====")
# Select the visual feature(s) you want.
remote_image_features = ["categories"]
# Call API with URL and features
categorize_results_remote = computervision_client.analyze_image(remote_image_url , remote_image_features)

# Print results with confidence score
print("Categories from remote image: ")
if (len(categorize_results_remote.categories) == 0):
    print("No categories detected.")
else:
    for category in categorize_results_remote.categories:
        print("'{}' with confidence {:.2f}%".format(category.name, category.score * 100))

===== Categorize an image - remote =====
Categories from remote image: 
'abstract_' with confidence 0.78%
'others_' with confidence 0.39%
'text_' with confidence 36.33%
'text_menu' with confidence 22.27%


In [None]:
'''
Tag an Image - remote
This example returns a tag (key word) for each thing in the image.
'''
print("===== Tag an image - remote =====")
# Call API with remote image
tags_result_remote = computervision_client.tag_image(remote_image_url )

# Print results with confidence score
print("Tags in the remote image: ")
if (len(tags_result_remote.tags) == 0):
    print("No tags detected.")
else:
    for tag in tags_result_remote.tags:
        print("'{}' with confidence {:.2f}%".format(tag.name, tag.confidence * 100))

===== Tag an image - remote =====
Tags in the remote image: 
'text' with confidence 99.93%
'handwriting' with confidence 96.85%
'letter' with confidence 92.40%
'menu' with confidence 73.72%
'document' with confidence 69.52%


In [None]:
'''
Detect Objects - remote
This example detects different kinds of objects with bounding boxes in a remote image.
'''
print("===== Detect Objects - remote =====")
# Get URL image with different objects
remote_image_url_objects = "http://data2.archives.ca/ap/c/c147108.jpg"
# Call API with URL
detect_objects_results_remote = computervision_client.detect_objects(remote_image_url_objects)

# Print detected objects results with bounding boxes
print("Detecting objects in remote image:")
if len(detect_objects_results_remote.objects) == 0:
    print("No objects detected.")
else:
    for object in detect_objects_results_remote.objects:
        print("object at location {}, {}, {}, {}".format( \
        object.rectangle.x, object.rectangle.x + object.rectangle.w, \
        object.rectangle.y, object.rectangle.y + object.rectangle.h))

===== Detect Objects - remote =====
Detecting objects in remote image:
No objects detected.


In [None]:
# Call API with content type (landmarks) and URL
detect_domain_results_landmarks = computervision_client.analyze_image_by_domain("landmarks", remote_image_url)
print()

print("Landmarks in the remote image:")
if len(detect_domain_results_landmarks.result["landmarks"]) == 0:
    print("No landmarks detected.")
else:
    for landmark in detect_domain_results_landmarks.result["landmarks"]:
        print(landmark["name"])


Landmarks in the remote image:
No landmarks detected.


In [None]:
'''
Detect Color - remote
This example detects the different aspects of its color scheme in a remote image.
'''
print("===== Detect Color - remote =====")
# Select the feature(s) you want
remote_image_features = ["color"]
# Call API with URL and features
detect_color_results_remote = computervision_client.analyze_image(remote_image_url, remote_image_features)

# Print results of color scheme
print("Getting color scheme of the remote image: ")
print("Is black and white: {}".format(detect_color_results_remote.color.is_bw_img))
print("Accent color: {}".format(detect_color_results_remote.color.accent_color))
print("Dominant background color: {}".format(detect_color_results_remote.color.dominant_color_background))
print("Dominant foreground color: {}".format(detect_color_results_remote.color.dominant_color_foreground))
print("Dominant colors: {}".format(detect_color_results_remote.color.dominant_colors))

===== Detect Color - remote =====
Getting color scheme of the remote image: 
Is black and white: False
Accent color: 3B2417
Dominant background color: Grey
Dominant foreground color: Grey
Dominant colors: ['Grey']
