# SKEMA-TA2-UAZ Incremental Structral Alignment Demo (2023-04-28)

**Authors**: Liang Zhang, Adarsh Pyarelal, Clayton Morrison

Pipeline: PDF document -> Equation images -> *MathML representation -> *Graph representation -> *Equation alignment

\* What ISA involves in this pipeline

The overall goal of this demo is:
- Align two equations represented by presentation MathML and return the matching ratio and the visualization of the aligned result

Swagger docs for the REST API can be found at http://localhost:8080/docs/

## Incremental structural alignment

**A quick review**: We proposed using seeded graph matching (SGM) to achieve incremental structural alignment (ISA) of equations. At a high level, the procedure is as follows:

1. Create a graph representation based on the presentation MathML input.
2. Construct the adjacency matrices corresponding to the above graph representations. 
3. Apply the SGM algorithm with the two adjacency matrices as inputs.
4. Return the matching ratio and the visualization of the aligned result


* Please apply 'pip install requests pydot graspologic' if you don't have them installed in your machine.

We will align the core equations from SIDARTHE and SIDARTHE+V as the first example.

In [None]:
from IPython.display import Image

In [None]:
Image(url="data/SIDARTHE_core_equations.png", width=400, height=400) # Human in the loop: identify the core of SIDARTHE

In [None]:
Image(url="data/SIDARTHEV_core_equations.png", width=400, height=400) # Human in the loop: identify the core of SIDARTHE+V

In [None]:
# -*- coding: utf-8 -*
import warnings
warnings.filterwarnings('ignore')
import requests
from graphviz import Source

In [None]:
file1_path = './data/SIDARTHE_eq1.xml'
file2_path = './data/SIDARTHE_V_eq1.xml'

In [None]:
# Read the MathML files of the equation 1 in SIDARTHE and the equation 1 in SIDARTEH+V
with open(file1_path, 'r') as f1, open(file2_path, 'r') as f2:
    file1_content = f1.read()
    file2_content = f2.read()

Call the ISA API

In [None]:
# Send a PUT request to the service endpoint
response = requests.put('http://localhost:3002/align-eqns', params={'file1': file1_content, 'file2': file2_content})

In [None]:
# Check the response status code
if response.status_code == 200:
    # Parse the response content as a JSON object
    json_response = response.json()
    # Access the matching ratio and union graph properties
    matching_ratio = json_response['matching_ratio']
    union_graph = json_response['union_graph']
    print(f"Matching ratio: {matching_ratio}\nUnion graph: {union_graph}")
    src = Source(union_graph)
    src.render('data/union_graph_ex1', format='png', view=False)
else:
    # Handle the error response
    print(f"Error: {response.status_code} - {response.text}")

In the union graph, the blue portion represents the overlap between the two equations, while the red and green portions are exclusive to equations 1 and 2, respectively. We can see that in SIDARTHE+V the authors added 𐌘 to Equation 1.

In [None]:
Image(url="data/union_graph_ex1.png", width=600, height=600) # Visualization of the alignment result

In the second example, the equations in SIDARTHE's paper are compared to Code Version A. In this version, there are some intentionally added errors.

In [None]:
file1_path = './data/SIDARTHE_eq1.xml'
file2_path = './data/SIDARTHE_Code_A_eq1.xml'

In [None]:
# Read the MathML files of the equation 1 in SIDARTHE and the equation 1 in SIDARTEH Code Version A
with open(file1_path, 'r') as f1, open(file2_path, 'r') as f2:
    file1_content = f1.read()
    file2_content = f2.read()

Call the ISA API

In [None]:
# Send a PUT request to the service endpoint
response = requests.put('http://localhost:3002/align-eqns', params={'file1': file1_content, 'file2': file2_content})

In [None]:
# Check the response status code
if response.status_code == 200:
    # Parse the response content as a JSON object
    json_response = response.json()
    # Access the matching ratio and union graph properties
    matching_ratio = json_response['matching_ratio']
    union_graph = json_response['union_graph']
    print(f"Matching ratio: {matching_ratio}\nUnion graph: {union_graph}")
    src = Source(union_graph)
    src.render('data/union_graph_ex2', format='png', view=False)
else:
    # Handle the error response
    print(f"Error: {response.status_code} - {response.text}")

We can see that the code implementation is identical to the equation in the paper.

In [None]:
Image(url="data/union_graph_ex2.png", width=600, height=600) # Visualization of the alignment result

In [None]:
file1_path = './data/SIDARTHE_eq2.xml'
file2_path = './data/SIDARTHE_Code_A_eq2.xml'

In [None]:
# Read the MathML files of the equation 2 in SIDARTHE and the equation 2 in SIDARTEH Code Version A
with open(file1_path, 'r') as f1, open(file2_path, 'r') as f2:
    file1_content = f1.read()
    file2_content = f2.read()

Call the ISA API

In [None]:
# Send a PUT request to the service endpoint
response = requests.put('http://localhost:3002/align-eqns', params={'file1': file1_content, 'file2': file2_content})

In [None]:
# Check the response status code
if response.status_code == 200:
    # Parse the response content as a JSON object
    json_response = response.json()
    # Access the matching ratio and union graph properties
    matching_ratio = json_response['matching_ratio']
    union_graph = json_response['union_graph']
    print(f"Matching ratio: {matching_ratio}\nUnion graph: {union_graph}")
    src = Source(union_graph)
    src.render('data/union_graph_ex3', format='png', view=False)
else:
    # Handle the error response
    print(f"Error: {response.status_code} - {response.text}")

It can be seen that ε has been removed from the code, and β and γ have switched positions.

In [None]:
Image(url="data/union_graph_ex3.png", width=600, height=600) # Visualization of the alignment result

In [19]:
Image(url="data/union_graph_ex3.png", width=600, height=600) # Visualization of the alignment result