# Experiment Top OCR Tools via Jupyter Notebooks

This notebook explores the MathPix OCR tool and how to interface with it using Jupyter.
We learn how the Python request module works to interface with APIs. 

We make a JSON request and generate predictions to be evaluated by us for accuracy.

For any predictions, we see how we can invoke markdown to render what LaTeX is returned.

# Mathpix OCR

See this link for calling an API:  
https://medium.com/swlh/using-and-calling-an-api-with-python-494a18cb1f44

In [2]:
import sys
import base64
import requests
import json

from IPython.display import Markdown as md

# Working w API Requests

In [3]:
r = requests.get('https://www.romexchange.com/')
r.status_code

406

We get a 406. 406 Not Acceptable.

What we can do is feed it something it likes and understands rather than just the query.

In [4]:
url = 'https://www.romexchange.com/'
headers = { 'Content-type': 'application/json'}

In [5]:
r = requests.get(url, headers = headers)
r.status_code

406

This will still not work but it's closer...
Problem is the default python user agane is 'python-requests/2.21.0' is likely being blocked so we'll do something else.

In [6]:
url = 'https://www.romexchange.com/'
headers = {'User-Agent': 'XY', 'Content-type':'application/json'}
r = requests.get(url, headers=headers)
r.status_code

200

Returns a 200 so we had a valid request.

Now to take a look at the content, we can call the `.text` method to get out some information.

In [7]:
r.text

'<!doctype html><html lang="en"><head><meta charset="utf-8"><meta name="viewport" content="width=device-width,initial-scale=1,shrink-to-fit=no"><meta name="theme-color" content="#000000"><meta name="description" content="Track price changes of any item in Ragnarok M on the Global and SEA servers. Easily see which item prices are rising or falling the fastest and compare between servers to gain an advantage in trading."><meta name="keywords" content="Ragnarok, Online, Mobile, Eternal Love, Exchange, History, ROM, Ragnarok M, RO, Price, Market, Tracker, Global, SEA"><meta property="og:site_name" content="ROM Exchange"><meta property="og:title" content="ROM Exchange - Ragnarok M: Eternal Love Exchange Price History"><meta property="og:description" content="Track price changes of any item in Ragnarok M on the Global and SEA servers. Easily see which item prices are rising or falling the fastest and compare between servers to gain an advantage in trading."><meta property="og:image" content=

## Sending A Request To MathPix

https://mathpix.com/docs/ocr/examples

https://docs.mathpix.com/#process-image-v3-text

https://api.mathpix.com/v3/text

Send an API request and get the response back. 
Transform this request.

First use a hand-written picture you used.

The values `app_id` and `app_key` come from an individual MathPix account. 

In [None]:
# This dictionary is for handing JSON requests.
{
    "src": "data:image/jpeg;base64,...",
    "formats": ["text", "data", "html"],
    "data_options": {
        "include_asciimath": True,
        "include_latex": True
    }
}

# Specify the file path to apply OCR on.
file_path = 'figures/integral_smpl_1.png'
image_uri = "data:image/jpg;base64," + base64.b64encode(open(file_path, "rb").read()).decode()

r = requests.post("https://api.mathpix.com/v3/text",
    data=json.dumps({'src': image_uri}),
    headers={"app_id": ,  
             "app_key":,
             "Content-type": "application/json"})

print(json.dumps(json.loads(r.text), indent=4, sort_keys=True))

<img src="figures/mathpix_return.png" width = 500 />

## Process The LaTeX

With the inital request, turn it into regular LaTeX printed to the screen. 

Beyond this, produce several test cases with varying image inputs.

In [None]:
json_return = json.loads(r.text)
latex_return = json_return.get("latex_styled")

# The expected return is: \int \frac{1}{x^{2}+5^{2}} d x. 
print(latex_return)

<img src="figures/text_predict.png" width = 400 />

We can write to the screen using a code cell. There is a thread talking about widgets available for markdown to be able to do this. 

https://stackoverflow.com/questions/18878083/can-i-use-variables-on-an-ipython-notebook-markup-cell/43911937

In [None]:
# Use a python cell to call a markdown command.
md("$$ \Huge %s $$"%(latex_return))

<img src="figures/ren_predict.png" width = 500 />

## How Well Did MathPix Do?
We successfully translated the one-line equation correctly.

<img style="transform: rotate(-90deg); width:400px" src="figures/integral_smpl_1.png" />

# Test Case 1

Try submitting a block of hand-written equations.

## Submit an API request
When submitting the request, be sure to specify `app_id` and `app_key` which give you access to the API. They may be specified in a config file or directly supplied here in the `headers` dictionary.

In [None]:
file_path = 'figures/u_substitution_smpl_1.png'
image_uri = "data:image/jpg;base64," + base64.b64encode(open(file_path, "rb").read()).decode()

r = requests.post("https://api.mathpix.com/v3/text",
    data=json.dumps({'src': image_uri}),
    headers={"app_id": , 
             "app_key": ,
             "Content-type": "application/json"})

print(json.dumps(json.loads(r.text), indent=4, sort_keys=True))

<img src="figures/text_predict_2.png" width=700 />

## Produce The Predicted LaTeX

In [None]:
json_return = json.loads(r.text)
latex_return = json_return.get("latex_styled")

print(latex_return)

<img src="figures/text_predict_2_extracted.png" width = 600/>

Print the LaTeX in a cell.

In [None]:
md("$$\Huge %s $$"%(latex_return))

<img src="figures/ren_predict_2.png" width =400 />

## How Well Did MathPix Do?

The image below was converted perfectly.

<img style="transform: rotate(-90deg); width:400px" src="figures/u_substitution_smpl_1.png" />

## Conclusion

We successfully built the backend code to send an aPI request to MathPix. The predictions were 100% accurate.