### PDF Layout analyzer with Form Recognizer Container

<img src="https://i.imgur.com/XBP3utx.png" width="240" height="240" align="left" style="padding-right: 20px;"/>

[**Sample PDF file from Github**](https://github.com/mdrakiburrahman/form-recognizer/blob/main/artifacts/mortgage.pdf)

We'll be having the Container scan the image in 2 ways: <br>
**1** - Using container's outbound internet access to hit the Github repo (for demo) <br>
**2** -  Converting the content of the file client side (from local), and passing it in as `octet-stream` (no outbound access required)

### 1. Using outbound internet access on the container to hit the Github repo (for demo)

In [1]:
import requests
import json

In [2]:
endpoint = 'http://localhost:5000/formrecognizer/v2.1/layout/syncAnalyze?language=en&readingOrder=natural'
file = 'https://raw.githubusercontent.com/mdrakiburrahman/form-recognizer/main/artifacts/mortgage.pdf'
response = requests.post(endpoint, \
                         headers={'accept': 'application/json'
                         , 'Content-Type': 'application/json'},
                         json={'source': file
                         })

print(json.dumps(json.loads(response.content), indent=4, sort_keys=True))

{
    "analyzeResult": {
        "pageResults": [
            {
                "page": 1,
                "tables": []
            },
            {
                "page": 2,
                "tables": []
            }
        ],
        "readResults": [
            {
                "angle": 0,
                "height": 11,
                "language": "en",
                "lines": [
                    {
                        "appearance": {
                            "style": {
                                "confidence": 1,
                                "name": "other"
                            }
                        },
                        "boundingBox": [
                            3.0818,
                            1.0345,
                            5.4234,
                            1.0345,
                            5.4234,
                            1.1572,
                            3.0818,
                            1.1572
                        ],
  

### 2 -  Converting the content of the file client side (from local), and passing it in as `octet-stream` (no outbound access required)

In [3]:
with open(r"C:\Users\mdrrahman\Documents\GitHub\form-recognizer\artifacts\mortgage.pdf", "rb") as pdf_file:
    encoded_string = pdf_file.read()
    
print(encoded_string[0:100])

b'%PDF-1.4\r%\xe2\xe3\xcf\xd3\r\n9 0 obj<</H[516 157]/Linearized 1/E 4284/L 8947/N 2/O 12/T 8721>>\rendobj\r           '


In [4]:
endpoint = 'http://localhost:5000/formrecognizer/v2.1/layout/syncAnalyze?language=en&readingOrder=natural'

response = requests.post(endpoint, headers={'accept': 'application/json'
                         , 'Content-Type': 'application/octet-stream'},
                         data=encoded_string)

print(json.dumps(json.loads(response.content), indent=4, sort_keys=True))

{
    "analyzeResult": {
        "pageResults": [
            {
                "page": 1,
                "tables": []
            },
            {
                "page": 2,
                "tables": []
            }
        ],
        "readResults": [
            {
                "angle": 0,
                "height": 11,
                "language": "en",
                "lines": [
                    {
                        "appearance": {
                            "style": {
                                "confidence": 1,
                                "name": "other"
                            }
                        },
                        "boundingBox": [
                            3.0818,
                            1.0345,
                            5.4234,
                            1.0345,
                            5.4234,
                            1.1572,
                            3.0818,
                            1.1572
                        ],
  