# Try Azure Form Recognizer with the Python SDK

You can use Azure's Form Recognizer service to analyze and extract data from:
* Forms
* Invoices
* Receipts
* Business cards
* ID documents (e.g. Passports, licences, etc.)

In this notebook, you'll learn how to use the Form Recognizer Python SDK to call the service and manage the response -- all in 10 minutes or less. Each of the examples below read images/documents from data store on GitHub. However, with the Form Recognizer SDK, you can also analyze local images or files stored in Azure Blob storage, see [Python SDK reference for more info](https://docs.microsoft.com/en-us/python/api/azure-ai-formrecognizer/azure.ai.formrecognizer.formrecognizerclient?view=azure-python).

**Note**: This notebook doesn't cover custom models, but we have a notebook for that as well. You can use custom models with any of the operations that you learn about and test in this notebook.


## Before you get started

You'll need:

* An [Azure subscription](https://azure.microsoft.com/en-us/free/cognitive-services/)
* An [Azure Form Recognizer resource](https://ms.portal.azure.com/#create/Microsoft.CognitiveServicesFormRecognizer) in the Free (F0) or Standard (S0) pricing tier. Both will work for this notebook
* Install the Form Recognizer client library to your environment. We strongly recommend that you run all of these notebooks in a virtual environment (virtualenv, venv, pyenv, pipenv, etc.). Run this command from your terminal/command line: `pip install azure-ai-formrecognizer --pre`.

## Data sources

In this notebook, we're going to analyze images stored in a Github repository. However, with Form Recognizer you can also read files from Azure Blob Storage or a URI location, as well as locally. 

* To read from an Azure Blob Storage SAS or URI location, you can use the methods we go over in this notebook. 
* to read from a local file, see [reference](#reference).

## Import modules

The first thing we need to do is import a few modules. Here's what they are and what you'll use them for:

1. `os` - This module is used to interact with the operating system using Python.
2. `azure.core.exceptions`
   * `ResourceNotFoundError` - An error response, typically triggered by a 412 response (for update) or 404 (for get/post).
3. `azure.ai.formrecognizer` - This module includes all Form Recognizer classes and components, like the `FormRecognizerClient` and `FormRecognizerTrainingClient`.
   * `FormRecognizerClient` - Used to create a Form Recognizer client which you'll use to interact with the service.
4. `azure.core.credentials` - This module is what's used to manage Azure credentials. Specificially, we are using `AzureKeyCredentials`.
   * `AzureKeyCredentials` - Provides the ability to update the key without creating a new client.

In [None]:
import os
from azure.core.exceptions import ResourceNotFoundError
from azure.ai.formrecognizer import FormRecognizerClient
from azure.ai.formrecognizer import FormTrainingClient
from azure.core.credentials import AzureKeyCredential

## Create a Form Recognizer client

Let's create a Form Recognizer client, which we'll use to send requests and get responses. 

Before you continue, you'll need to add the endpoint and key from your Form Recognizer resources. Both of these are available in the Azure portal in the resource you've created in the **Keys and endpoints** blade.

* `endpoint` - The region for your Form Recognizer resource. For example: "https://YOUR-NAME.cognitiveservices.azure.com/"
* `key` - The key for your Form Recognizer resource. 

In [None]:
endpoint = 'PASTE_YOUR_FORM_RECOGNIZER_ENDPOINT'
key = 'PASTE_YOUR_FORM_RECOGNIZER_KEY'

In [None]:
form_recognizer_client = FormRecognizerClient(endpoint, AzureKeyCredential(key))

## Analyze the layout of a document

Here we're going to analyze a purchase order document. With the `begin_recognize_content_from_url()` method, we can pass an image to the Form Recognizer service, and get a response that includes:

* Tables identified in the document
* Text identified in each cell of the document
* The location of a cell 
* And a confidence score from the service that gives you an idea of the perceived accuracy of the identification as determined by the service. The closer a score is to `1` the higher the accuracy of the recognition result.

We're going to analyze an [image in the Azure samples repository](https://raw.githubusercontent.com/Azure/azure-sdk-for-python/master/sdk/formrecognizer/azure-ai-formrecognizer/tests/sample_forms/forms/Form_1.jpg), but feel free to test with your own documents.


In [None]:
sample_form = "https://raw.githubusercontent.com/Azure/azure-sdk-for-python/master/sdk/formrecognizer/azure-ai-formrecognizer/tests/sample_forms/forms/Form_1.jpg"

poller = form_recognizer_client.begin_recognize_content_from_url(sample_form)
page = poller.result()

table = page[0].tables[0] # page 1, table 1
print("Table found on page {}:".format(table.page_number))
for cell in table.cells:
    print("Cell text: {}".format(cell.text))
    print("Location: {}".format(cell.bounding_box))
    print("Confidence score: {}\n".format(cell.confidence))

## Extract data from invoices

Here we're going to analyze an invoice. With the `begin_recognize_invoices_from_url()` method, we can pass an image to the Form Recognizer service, and get a response that includes:

* Vendor name
* Vendor address
* Customer name
* Customer address
* Customer sddress recipient
* Invoice ID
* Invoice date
* Invoice total
* Due date
* Confidence score

We're going to analyze an [invoice in the Azure samples repository](https://raw.githubusercontent.com/Azure-Samples/cognitive-services-REST-api-samples/master/curl/form-recognizer/simple-invoice.png), but feel free to test with your own invoice.

In [None]:
sample_invoice = "https://raw.githubusercontent.com/Azure-Samples/cognitive-services-REST-api-samples/master/curl/form-recognizer/simple-invoice.png"

poller = form_recognizer_client.begin_recognize_invoices_from_url(sample_invoice)
invoices = poller.result()

for idx, invoice in enumerate(invoices):
    print("--------Recognizing invoice #{}--------".format(idx+1))
    vendor_name = invoice.fields.get("VendorName")
    if vendor_name:
        print("Vendor Name: {} has confidence: {}".format(vendor_name.value, vendor_name.confidence))
    vendor_address = invoice.fields.get("VendorAddress")
    if vendor_address:
        print("Vendor Address: {} has confidence: {}".format(vendor_address.value, vendor_address.confidence))
    customer_name = invoice.fields.get("CustomerName")
    if customer_name:
        print("Customer Name: {} has confidence: {}".format(customer_name.value, customer_name.confidence))
    customer_address = invoice.fields.get("CustomerAddress")
    if customer_address:
        print("Customer Address: {} has confidence: {}".format(customer_address.value, customer_address.confidence))
    customer_address_recipient = invoice.fields.get("CustomerAddressRecipient")
    if customer_address_recipient:
        print("Customer Address Recipient: {} has confidence: {}".format(customer_address_recipient.value, customer_address_recipient.confidence))
    invoice_id = invoice.fields.get("InvoiceId")
    if invoice_id:
        print("Invoice Id: {} has confidence: {}".format(invoice_id.value, invoice_id.confidence))
    invoice_date = invoice.fields.get("InvoiceDate")
    if invoice_date:
        print("Invoice Date: {} has confidence: {}".format(invoice_date.value, invoice_date.confidence))
    invoice_total = invoice.fields.get("InvoiceTotal")
    if invoice_total:
        print("Invoice Total: {} has confidence: {}".format(invoice_total.value, invoice_total.confidence))
    due_date = invoice.fields.get("DueDate")
    if due_date:
        print("Due Date: {} has confidence: {}".format(due_date.value, due_date.confidence))

## Extract data from receipts

Here we're going to analyze a receipt. With the `begin_recognize_receipts_from_url()` method, we can pass a receipt to the Form Recognizer service, and get a response that includes:

* Confidence score
* Merchant name
* Merchant address
* Merchant phone number
* Transaction date
* Transaction time
* Receipt items
* Subtotal
* Tax
* Total

We're going to analyze a [receipt in the Azure samples repository](https://raw.githubusercontent.com/Azure/azure-sdk-for-python/master/sdk/formrecognizer/azure-ai-formrecognizer/tests/sample_forms/receipt/contoso-receipt.png), but feel free to test with your own receipts.


In [None]:
sample_receipt = "https://raw.githubusercontent.com/Azure/azure-sdk-for-python/master/sdk/formrecognizer/azure-ai-formrecognizer/tests/sample_forms/receipt/contoso-receipt.png"

poller = form_recognizer_client.begin_recognize_receipts_from_url(sample_receipt)
result = poller.result()

for receipt in result:
    for name, field in receipt.fields.items():
        if name == "Items":
            print("Receipt Items:")
            for idx, items in enumerate(field.value):
                print("...Item #{}".format(idx + 1))
                for item_name, item in items.value.items():
                    print("......{}: {} has confidence {}".format(item_name, item.value, item.confidence))
        else:
            print("{}: {} has confidence {}".format(name, field.value, field.confidence))

## Extract data from business cards

Here we're going to extract data from a busioness card. With the `begin_recognize_business_cards_from_url()` method, we can pass a business card to the Form Recognizer service, and get a response that includes:

* Contact First Name
* Contact Last Name
* Company Name
* Department
* Job Title
* Email
* Website
* Address
* Mobile phone number
* Fax number
* Other phone number

We're going to analyze a [business card in the Azure samples repository](https://raw.githubusercontent.com/Azure/azure-sdk-for-python/master/sdk/formrecognizer/azure-ai-formrecognizer/samples/sample_forms/business_cards/business-card-english.jpg), but feel free to test with your own business card image.

In [None]:
sample_biz_card = "https://raw.githubusercontent.com/Azure/azure-sdk-for-python/master/sdk/formrecognizer/azure-ai-formrecognizer/samples/sample_forms/business_cards/business-card-english.jpg"

poller = form_recognizer_client.begin_recognize_business_cards_from_url(sample_biz_card)
business_cards = poller.result()

for idx, business_card in enumerate(business_cards):
    print("--------Recognizing business card #{}--------".format(idx+1))
    contact_names = business_card.fields.get("ContactNames")
    if contact_names:
        for contact_name in contact_names.value:
            print("Contact First Name: {} has confidence: {}".format(
                contact_name.value["FirstName"].value, contact_name.value["FirstName"].confidence
            ))
            print("Contact Last Name: {} has confidence: {}".format(
                contact_name.value["LastName"].value, contact_name.value["LastName"].confidence
            ))
    company_names = business_card.fields.get("CompanyNames")
    if company_names:
        for company_name in company_names.value:
            print("Company Name: {} has confidence: {}".format(company_name.value, company_name.confidence))
    departments = business_card.fields.get("Departments")
    if departments:
        for department in departments.value:
            print("Department: {} has confidence: {}".format(department.value, department.confidence))
    job_titles = business_card.fields.get("JobTitles")
    if job_titles:
        for job_title in job_titles.value:
            print("Job Title: {} has confidence: {}".format(job_title.value, job_title.confidence))
    emails = business_card.fields.get("Emails")
    if emails:
        for email in emails.value:
            print("Email: {} has confidence: {}".format(email.value, email.confidence))
    websites = business_card.fields.get("Websites")
    if websites:
        for website in websites.value:
            print("Website: {} has confidence: {}".format(website.value, website.confidence))
    addresses = business_card.fields.get("Addresses")
    if addresses:
        for address in addresses.value:
            print("Address: {} has confidence: {}".format(address.value, address.confidence))
    mobile_phones = business_card.fields.get("MobilePhones")
    if mobile_phones:
        for phone in mobile_phones.value:
            print("Mobile phone number: {} has confidence: {}".format(phone.value, phone.confidence))
    faxes = business_card.fields.get("Faxes")
    if faxes:
        for fax in faxes.value:
            print("Fax number: {} has confidence: {}".format(fax.value, fax.confidence))
    work_phones = business_card.fields.get("WorkPhones")
    if work_phones:
        for work_phone in work_phones.value:
            print("Work phone number: {} has confidence: {}".format(work_phone.value, work_phone.confidence))
    other_phones = business_card.fields.get("OtherPhones")
    if other_phones:
        for other_phone in other_phones.value:
            print("Other phone number: {} has confidence: {}".format(other_phone.value, other_phone.confidence))

## Reference

There is a lot more that you can do with Azure Form Recognizer. To deep dive on specific operations, see [Python SDK reference](https://docs.microsoft.com/en-us/python/api/azure-ai-formrecognizer/?view=azure-python).

## Sample code

* Sample code used in this notebook is also available in the [Python SDK repository](https://github.com/Azure/azure-sdk-for-python/tree/master/sdk/formrecognizer/azure-ai-formrecognizer)
* Learn how to get started in C#, Java, and JavaScript on [Microsoft Docs](https://docs.microsoft.com/en-us/azure/cognitive-services/form-recognizer/quickstarts/client-library).

## Learn more about customization

Coming soon... 