# libpostal Tutorial

Our pre-built vertical analyzers ([receipt](https://docs.microsoft.com/en-us/azure/cognitive-services/form-recognizer/concept-receipts) and [business card](https://docs.microsoft.com/en-us/azure/cognitive-services/form-recognizer/concept-business-cards)) of the [Form Recegnizer](https://azure.microsoft.com/en-us/services/cognitive-services/form-recognizer/) service extract document information including addresses from documents. The addresses are raw texts, but we sometimes want more detailed address information such as street names, cities and ZIP Codes.

We can easily utilize the open source tool _libpostal_ to parse the structured information from the full addresses. The core library of _libpostal_ is written in C while it supports language binding for Python, Ruby, Java, PHP and NodeJS.

This tutorial demonstrates how to use the _libpostal_ Python binding to get the structured information from analyzed results.

### Reference

* libpostal - https://github.com/openvenues/libpostal

## Installation

### Python 3

Install Python 3.4 or above (as _libpostal_ suggested).

### libpostal

Please follow the instructions [here](https://github.com/openvenues/libpostal#installation-windows) to install _libpostal_ on Windows. If you're using MacOS or Ubuntu, please refer to the instructions [here](https://github.com/openvenues/libpostal#installation-maclinux). After you installed the dependencies and compile _libpostal_, please run the following command to install _libpostal_ Python package.

In [1]:
# You only need to execute this cell once.
# If both Python 2 and Python 3 are on your machine, please try with `pip3 install postal`.
# If an permission error occurs, please try to run it as administrator. 
#  - On Windows, open the Command Prompt as Administrator, and then do `pip install postal` or `pip3 install postal`. 
#  - On MacOS or Ubuntu, try `sudo pip install postal` or `sudo pip3 install postal`.
# After running this command, please restart your IPython kernel.

%pip install postal

You should consider upgrading via the 'pip install --upgrade pip' command.[0m
Note: you may need to restart the kernel to use updated packages.


## Get Started

Execute the following cell. _libpostal_ is installed successfully if you see the following output.

```python
[('the book club', 'house'),
 ('100-106', 'house_number'),
 ('leonard st', 'road'),
 ('shoreditch', 'suburb'),
 ('london', 'city'),
 ('ec2a 4rh', 'postcode'),
 ('united kingdom', 'country')]
```

In [2]:
from postal.parser import parse_address
parse_address('The Book Club 100-106 Leonard St Shoreditch London EC2A 4RH, United Kingdom')

[('the book club', 'house'),
 ('100-106', 'house_number'),
 ('leonard st', 'road'),
 ('shoreditch', 'suburb'),
 ('london', 'city'),
 ('ec2a 4rh', 'postcode'),
 ('united kingdom', 'country')]

## Parse Address from Analyze Result of Business Cards

Here is an en-US sample business card image of Chris Smith from Cloud & AI Department in Contoso. The address is "4001 1st Ave NE Redmond, WA 98052". The json file `samples/bizcard_regular_en_us.json` is the analyze result of the image `samples/bizcard_regular_en_us.jpg` by calling the `pre-built businessCard v2.1.1 API`.

![Business Card Sample](samples/bizcard_regular_en_us.small.jpg)

The following script parses the extracted addresses and print them out.

In [3]:
import json
from postal.parser import parse_address

# Load the analyze result
with open('samples/bizcard_regular_en_us.json') as fp:
    data = json.load(fp)

addresses = data['analyzeResult']['documentResults'][0]['fields']['Addresses']
for address in addresses['valueArray']:
    # Get the full address
    full_address = address['valueString']
    # Parse the address
    parsed_address = parse_address(full_address)
    print('Full address:\t', full_address)
    print('Parsed address:\t', parsed_address)

Full address:	 4001 1st Ave NE Redmond, WA 98052
Parsed address:	 [('4001', 'house_number'), ('1st ave ne', 'road'), ('redmond', 'city'), ('wa', 'state'), ('98052', 'postcode')]
