# Experiment via Jupyter Top 2 OCR tools

Description: 

Figure out the interface between Jupyter and APIs/models of the top 2 OCR tools. 
Consider using a separate notebook for each OCR tool.

Learn how the Python request module works to interface with APIs.

Deliverables:

Successfully interact with an API via Jupyter. Able to receive a valid JSON request. 

Successfully generate predictions in Jupyter for top OCR models. In a notebook, be able to feed a static image into each tool and receive a LaTeX tool.

Possibly render a LaTeX string as math to the notebook. 

# OCR Mathpix OCR


## useful links
https://medium.com/swlh/using-and-calling-an-api-with-python-494a18cb1f44

In [3]:
# pip install requests
# pip install --upgrade requests

import requests

import sys
import base64
import requests
import json

from IPython.display import Markdown as md

# Working w API requests

In [2]:
r = requests.get('https://www.romexchange.com/')

In [3]:
r.status_code

406

We get a 406. 406 Not Acceptable.

What we can do is feed it something it likes and understands rather than just the query.

In [4]:
url = 'https://www.romexchange.com/'

headers = { 'Content-type': 'application/json'}

In [5]:
r = requests.get(url, headers = headers)
r.status_code

406

This will still not work but it's closer...
Problem is the default python user agane is 'python-requests/2.21.0' is likely being blocked so we'll do something else.

In [6]:
url = 'https://www.romexchange.com/'

headers = {'User-Agent': 'XY', 'Content-type':'application/json'}

r = requests.get(url, headers=headers)

r.status_code

200

Returns a 200 so we had a valid request.

Now to take a look at the content, we can call the .text method to get out some information.

In [7]:
r.text

'<!doctype html><html lang="en"><head><meta charset="utf-8"><meta name="viewport" content="width=device-width,initial-scale=1,shrink-to-fit=no"><meta name="theme-color" content="#000000"><meta name="description" content="Track price changes of any item in Ragnarok M on the Global and SEA servers. Easily see which item prices are rising or falling the fastest and compare between servers to gain an advantage in trading."><meta name="keywords" content="Ragnarok, Online, Mobile, Eternal Love, Exchange, History, ROM, Ragnarok M, RO, Price, Market, Tracker, Global, SEA"><meta property="og:site_name" content="ROM Exchange"><meta property="og:title" content="ROM Exchange - Ragnarok M: Eternal Love Exchange Price History"><meta property="og:description" content="Track price changes of any item in Ragnarok M on the Global and SEA servers. Easily see which item prices are rising or falling the fastest and compare between servers to gain an advantage in trading."><meta property="og:image" content=

# Applying this to MathPix

https://mathpix.com/docs/ocr/examples

https://docs.mathpix.com/#process-image-v3-text

https://api.mathpix.com/v3/text

## Write the Code

Send an API request and get the response back. 
Transform this request.

First use a hand-written picture you used.

In [9]:
# this dictionary is necess for handling...
{
    "src": "data:image/jpeg;base64,...",
    "formats": ["text", "data", "html"],
    "data_options": {
        "include_asciimath": True,
        "include_latex": True
    }
}

# put desired file path here
file_path = 'static math/integral_smpl_1.jpg'
image_uri = "data:image/jpg;base64," + base64.b64encode(open(file_path, "rb").read()).decode()

r = requests.post("https://api.mathpix.com/v3/text",
    data=json.dumps({'src': image_uri}),
    headers={"app_id": "jaime_meriz13_gmail_com_0ae761_524ac2", 
             "app_key": "8c504ea5335669f6a2c567f97fab91b34e6fee47f2f8ed849535dd2c2402bf24",
             "Content-type": "application/json"})

print(json.dumps(json.loads(r.text), indent=4, sort_keys=True))

{
    "auto_rotate_confidence": 0.9951534127435329,
    "auto_rotate_degrees": 90,
    "confidence": 1,
    "confidence_rate": 1,
    "is_handwritten": true,
    "is_printed": false,
    "latex_styled": "\\int \\frac{1}{x^{2}+5^{2}} d x",
    "request_id": "02f26e73a1930265cb734a433d5266c2",
    "text": "\\( \\int \\frac{1}{x^{2}+5^{2}} d x \\)"
}


## Process LaTeX

With the inital request, turn it into regular LaTeX printed to the screen. 

Beyond this, produce several test cases with varying image inputs.

In [10]:
json_return = json.loads(r.text)
latex_return = json_return.get("latex_styled")

# expected: \int \frac{1}{x^{2}+5^{2}} d x
print(latex_return)

\int \frac{1}{x^{2}+5^{2}} d x


We can write to the screen using a code cell. There is a thread talking about widgets available for markdown to be able to do this. 

https://stackoverflow.com/questions/18878083/can-i-use-variables-on-an-ipython-notebook-markup-cell/43911937

In [4]:
# see above import for the meaning behind this

md("$ %s $"%(latex_return))

NameError: name 'latex_return' is not defined

## how well did it do?

<img style="transform: rotate(-90deg); width:200px" src="static math/integral_smpl_1.jpg" />

We successfully translated the one-line equation correctly.

# Test Case 1

Try submitting a block of hand-written equations.

## Submit an API request

In [10]:
## HARD INQUIRY
# put desired file path here
file_path = 'static math/u_substitution_smpl_1.jpg'
image_uri = "data:image/jpg;base64," + base64.b64encode(open(file_path, "rb").read()).decode()

r = requests.post("https://api.mathpix.com/v3/text",
    data=json.dumps({'src': image_uri}),
    headers={"app_id": "jaime_meriz13_gmail_com_0ae761_524ac2", 
             "app_key": "8c504ea5335669f6a2c567f97fab91b34e6fee47f2f8ed849535dd2c2402bf24",
             "Content-type": "application/json"})

print(json.dumps(json.loads(r.text), indent=4, sort_keys=True))

{
    "auto_rotate_confidence": 0.9968418261936909,
    "auto_rotate_degrees": 90,
    "confidence": 0.9912392695173367,
    "confidence_rate": 0.9912392695173367,
    "is_handwritten": true,
    "is_printed": false,
    "latex_styled": "\\begin{array}{l}\nx=5 \\tan \\theta \\\\\nd x=5 \\sec ^{2} \\theta d \\theta \\\\\n\\frac{5^{2}}{x^{2}+5^{2}}=\\cos ^{2} \\theta \\\\\n\\frac{\\cos ^{2} \\theta}{5^{2}}=\\frac{1}{x^{2}+5^{2}}\n\\end{array}",
    "request_id": "c06dc207bf84186b9acda71511f17571",
    "text": "\\( x=5 \\tan \\theta \\)\n\\( d x=5 \\sec ^{2} \\theta d \\theta \\)\n\\( \\frac{5^{2}}{x^{2}+5^{2}}=\\cos ^{2} \\theta \\)\n\\( \\frac{\\cos ^{2} \\theta}{5^{2}}=\\frac{1}{x^{2}+5^{2}} \\)"
}


## produce the LaTeX

In [11]:
json_return = json.loads(r.text)
latex_return = json_return.get("latex_styled")

print(latex_return)

\begin{array}{l}
x=5 \tan \theta \\
d x=5 \sec ^{2} \theta d \theta \\
\frac{5^{2}}{x^{2}+5^{2}}=\cos ^{2} \theta \\
\frac{\cos ^{2} \theta}{5^{2}}=\frac{1}{x^{2}+5^{2}}
\end{array}


## print the LaTeX to the screen

In [2]:
md("$ %s $"%(latex_return))

NameError: name 'md' is not defined

'latex_return'

## how well did we do?

The image below was converted perfectly.

<img style="transform: rotate(-90deg); width:250px" src="static math/u_substitution_smpl_1.jpg" />