# Pipeline

1. Load the **list of math boxes** in the ```<PDF name>/pdf_coor.txt``` 
    
    ```
    -> Result: Math box list ```[{'id','x', 'y', 'width', 'height'},...]
    ```

2. Load the **cells data** in the ```<PDF name>/<PDF name>.json```, then add the ```id``` for each section (for masking purpose)

    ```
    -> Result: Cells list ```[{'x', 'y', 'width', 'height', 'text',...,'id'},...]
    ```

3. For **each math box** in the:
    
    3.1 Find all cells that **interact** with the math box **(we find all the cell first then perform the second stage, not merging two stage into once as we may reconstruct the cell that intially not interact with the box)**.

    3.2 For each interacted cell, perform: 
    
    - Perform the ```box increasement``` (increase the size of the box so that it absolutely overlap the box in ```step 3.3```) with the cells that the interact area **> 40%** area of the cell. These cell are masked as ```overlap```, adding to ```overlap_list```.

    - Perform the ```cut_cell_box``` (looking for the text in the interaction section and reconstruct the interating cell - its content and its bounding box in the later step (in ```step 4```) ) with the cells that the interact area **<= 40%** area of the cell. These cell are masked as ```non_overlap```, adding to ```non_overlap_list```.

    3.3 For each cell in  ```overlap_list```, perform the ```box_increasement``` for the current math box.

    3.4 Return three items:

    - **math_box** : new construct math box 

        ```
        -> Result: [x,y,width,height]
        ```

    - **overlap_list**: list of cells that are overlaped by the ```math_box``` which will not be used when construct new PDF.

        ```
        -> Result: [{'x', 'y', 'width', 'height', 'text',...},...]
        ```

    - **non_overlap_list**: list of cells that are not overlaped by the ```math_box``` (but have the interaction section with the box), which will be performed ```cut_cell_box``` to remove the overlap text (if any) between the cell and the box

        ```
        -> Result: [[{'x', 'y', 'width', 'height', 'text',...},[x,y,width,height]],...]
        ```
    
4. After perform ```step 3``` for all the math box, we will get:

    - List of new math boxes

    - List of cutting cells (not yet cut)

    - List of overlaped cells

    We then using the ```id``` attribute of the cell, to find the remain cells in Cells list, result in:

    - List of remaining cells

    Then, we perform the ```cut_cell_box``` for all the cells in the cutting cells list, result in a new cutted cell list.

    - List of cutted cells

    Then, we merge two list ```List of cutted cells``` and ```List of remaining cells``` into single list called ```all_cells```.

    In summary so far, we will finally have two lists:

    - ```merged_boxes``` (list of new math boxes):

        ```
        -> Result: [{'id', 'x', 'y', 'width', 'heigth'},...]

        ```

    - ```all_cells``` (list of reconstructed and remaining cells):

        ```
        -> Result: [{'x', 'y', 'width', 'heigth', 'text',...,'text_vi','id'},...]
        ```

5. Create a new folder with name fotmar ```<PDF name>_processed``` to store ```merged_boxes``` and ```all_cells```:


    - ```pdf_coor.txt```: each line will have format ```id x y width height``` (no comma, separate by a space, each line for a box)

    - ```<PDF name>.json```: list of dicts, each dict corresponding to a cell with attribute: ```'x'```, ```'y'```, ```'width'```, ```'height'```, ```'text'```, ```'font'``` , ```'text_vi'```, ```'id'```.

In [271]:
import fitz
import os
import json

In [272]:
def convert_point_to_box(coor):
    '''
    Convert two points (x1, y1) as top-left and (x2, y2) as bottom-right to a box (x, y, width, height).
    
    Args:
    coor = [x1,y1,x2,y2]
        x1 (float): x-coordinate of the top-left point
        y1 (float): y-coordinate of the top-left point
        x2 (float): x-coordinate of the bottom-right point
        y2 (float): y-coordinate of the bottom-right point
    
    Returns:
        list: A list [x, y, width, height] representing the box
    
    Raises:
        ValueError: If x2 < x1 or y2 < y1, indicating an invalid rectangle
    '''
    x1,y1,x2,y2 = coor

    # Validate that bottom-right is to the right and below top-left
    if x2 < x1 or y2 < y1:
        raise ValueError("Invalid rectangle: x2 must be >= x1 and y2 must be >= y1")
    
    # Top-left corner is (x1, y1)
    x = x1
    y = y1
    
    # Width = x2 - x1, Height = y2 - y1
    width = x2 - x1
    height = y2 - y1
    
    return [x, y, width, height]

In [273]:
def check_overlap(cell, math_box):
    '''
    Checking the overlap of the math box and cell in cell list

    Args: 
    cell: {'x', 'y', 'width', 'height',...}
    math_box: [x_box, y_box, width_box, height_box]

    Returns:
    True/False
    '''
    x1, y1, width_1, height_1 = [cell['x'], cell['y'], cell['width'], cell['height']]
    x2, y2, width_2, height_2 = math_box
    
    # Check if one box is to the left of the other
    if x1 + width_1 < x2 or x2 + width_2 < x1:
        return False
    elif x2 + width_2 < x1 or x1 + width_1 < x2:
        return False

    # Check if one box is above the other
    if y1 + height_1 < y2 or y2 + height_2 < y1:
        return False
    elif y2 + height_2 < y1 or y1 + height_1 < y2:
        return False
    
    # If neither condition is true, the boxes overlap
    return True

In [274]:
def overlap_ratio(cell, math_box):
    '''
    Calculate the overlap ratio and return it with the coordinates of the overlapping rectangle.
    
    Args:
        cell: {'x', 'y', 'width', 'height',...} - coordinates and dimensions of the cell
        math_box (list): [x_box, y_box, width_box, height_box] - coordinates and dimensions of the math box
    
    Returns:
        list: [ratio, x, y, width, height] where:
              - ratio (float): Percentage of the cell's area that overlaps with the math box
              - x, y (float): Top-left corner of the overlapping rectangle
              - width, height (float): Dimensions of the overlapping rectangle
              Returns [0.0, 0, 0, 0, 0] if no overlap or if cell has zero area
    '''
    # Extract coordinates and dimensions
    x1, y1, width_cell, height_cell = [cell['x'], cell['y'], cell['width'], cell['height']]
    x2, y2, width_box, height_box = math_box
    
    # Calculate the coordinates of the overlapping rectangle
    x_left = max(x1, x2)
    x_right = min(x1 + width_cell, x2 + width_box)
    y_top = max(y1, y2)
    y_bottom = min(y1 + height_cell, y2 + height_box)
    
    # Check if there is an overlap
    if x_right <= x_left or y_bottom <= y_top:
        return [0.0, 0, 0, 0, 0]  # No overlap
    
    # Calculate the area of the overlapping rectangle
    overlap_width = x_right - x_left
    overlap_height = y_bottom - y_top
    overlap_area = overlap_width * overlap_height
    
    # Calculate the area of the cell
    cell_area = width_cell * height_cell
    
    # Avoid division by zero (if cell has zero area)
    if cell_area == 0:
        return [0.0, 0, 0, 0, 0]
    
    # Calculate the overlap ratio as a percentage
    ratio = (overlap_area / cell_area) * 100
    
    # Return the ratio and overlap coordinates
    return [ratio, x_left, y_top, overlap_width, overlap_height]

In [275]:
def get_box_overlap(cells, math_box):
    '''
    Get all overlap (interact cell) from list of cells and a given math box

    Args:
        cells: list of Cells [{'x','y','width','height',...},...]
        box: [x,y,width,height]

    Output:
        not_overlap_list: []
    '''
    overlap_list = []

    not_overlap_list = []

    # Very important: need to extract

    for cell in cells:
        if (check_overlap(cell, math_box)):
            overlap_info = overlap_ratio(cell, math_box)
            overlap_rate = overlap_info[0]
            overlap_box = overlap_info[1:]

            #print('Overlap ratio:', overlap_rate, 'Overlap box:', overlap_box)

            if overlap_rate > 40.0:
                overlap_list.append(cell)
            else:
                not_overlap_list.append([cell, overlap_box])

    return not_overlap_list, overlap_list

In [276]:
def box_increasement(cell, math_box):
    '''
    Perform the box increasement

    Args:
        cell: {'x', 'y', 'width', 'height',...} - coordinates and dimensions of the cell
        math_box : [x_box, y_box, width_box, height_box] - coordinates and dimensions of the math box
    
    Return:
        box_increasement: [min(x_cell, x_box), min(y_cell, y_box), new_widht, new_height]
    '''
    x1, y1, w1, h1 = [cell["x"], cell["y"], cell["width"], cell["height"]]
    x2, y2, w2, h2 = math_box

    x_new = min(x1,x2)
    y_new = min(y1, y2)

    diff_x = x2 - x1
    diff_y = y2 - y1

    w_new = w2
    h_new = h2

    if diff_x > 0: # ( x_new = x1) 
        if w1 - abs(diff_x) >  w2:
            w_new = w1
        else:
            w_new = w2 + abs(diff_x)		
    else: # (x_new = x2)
        if w2 - abs(diff_x) > w1:
            w_new = w2
        else:
            w_new = w1 + abs(diff_x)	

    if diff_y > 0: # ( y_new = y1) 
        if h1 - abs(diff_y) >  h2:
            h_new = h1
        else:
            h_new = h2 + abs(diff_y)		
    else: #(y_new = y2)
        if h2 - abs(diff_y) > h1:
            h_new = h2
        else:
            h_new = h1 + abs(diff_y)	


    increasement_box = [x_new, y_new, w_new, h_new]

    return increasement_box


In [277]:
def box_increasement_from_list(overlap_list, math_box):
    '''
    Perform the increasement for a given box with respect to give overlap list

    Args:
    overlap_list: 
        [{'x', 'y', 'width', 'height',...},...]
    math_box:
        [x,y,w,h]
    '''

    math_box = math_box

    for cell in overlap_list:
        math_box = box_increasement(cell, math_box)

    return math_box

In [278]:
def box_overlap_list(cells, math_box):
    '''
    Get the: 
        overlap -> [[{'x', 'y', 'height', 'width','text'},['x','y','width','height']],...]  
    and 
        non overlap cell -> [{'x', 'y', 'height', 'width',...},...]

    Perform the box increasement

    Args:
    cells: list of Cell, each cell is a list [x,y,width,height] 
    box: [x,y,width,height]

    Returns:
        overlap cell list
    '''

    overlap_list = []

    not_overlap_list = []

    math_box = math_box

    not_overlap_list, overlap_list = get_box_overlap(cells, math_box)

    math_box = box_increasement_from_list(overlap_list, math_box)

    return math_box, not_overlap_list, overlap_list

In [279]:
def cut_cells_box(pdf ,cut_cells, no_cutted_cell_id):
    '''
    Perform the overlap cutting from the given cell

    Args:
        pdf: PDF path with respect to the cell
        cut_cells: List of cells and its cutted box 
            [[{'x', 'y', 'width', 'height', ...},[x, y, width, height]],...]
        no_cutted_cell_id: This is list of id of cell that not performed the cut

    Return:
        List of  Re-text and rebox the cell
            [{'x', 'y', 'width', 'height', 'text'},...]
    '''

    # Open the PDF
    doc = fitz.open(pdf)

    # Choose the page
    page = doc[0]  # first page (0-based index)

    def get_new_cell(page, cut_cell):
    # Step 1: Get the text from one region
        x1, y1, w, h = cut_cell[1]
        x2 = x1 + w
        y2 = y1 + h

        # Define the snipping rectangle (x0, y0, x1, y1)
        # Units are in points (1/72 inch)
        rect = fitz.Rect(x1, y1, x2, y2)

        # Extract text from the defined region
        text = page.get_textbox(rect)

        #print(text)

        # Step 2: Reconstruct the cell

        new_cell = cut_cell[0].copy()

        #print(new_cell["text"])

        new_cell['text'] =  new_cell['text'].replace(text.strip(), '')    
        new_cell['x'] = new_cell['x'] + w
        new_cell['width'] = new_cell['width'] - w

        return new_cell
    
    cutted_cell_list = []

    for cell in cut_cells:
        if (cell[0]['id'] not in no_cutted_cell_id):
            cell_place_holder = get_new_cell(page, cell)
            cutted_cell_list.append(cell_place_holder)
        else:
            continue

    return cutted_cell_list

In [280]:
import os

def load_math_boxes(root_folder):
    '''
    Load the .txt file which is called 'pdf_coor.txt' in the given folder.

    The 'pdf_coor.txt' in the folder contains many rows, each with the format:
        1 12.5 13.4 50.3 60.6
    These are 5 values respectively:
        - id
        - x_left
        - y_left
        - x_right
        - y_right

    Converts each row to (id, x_left, y_left, width, height)

    Args:
    root_folder: str - Path to the folder (e.g., './Math_notation')

    Returns:
    List of dicts with format: 
    {'id': id, 'x': x_left, 'y': y_left, 'width': width, 'height': height}
    '''
    file_path = os.path.join(root_folder, 'pdf_coor.txt')
    boxes = []

    with open(file_path, 'r') as f:
        for line in f:
            parts = line.strip().split()
            if len(parts) != 5:
                continue  # Skip malformed lines
            try:
                id_, x_left, y_left, x_right, y_right = map(float, parts)
                box = {
                    'id': int(id_),
                    'x': x_left,
                    'y': y_left,
                    'width': x_right - x_left,
                    'height': y_right - y_left
                }
                boxes.append(box)
            except ValueError:
                continue  # Skip lines with invalid values

    return boxes

In [281]:
def insert_cell_id(cells):
    '''
    Insert unique Id for each cell

    And

    Cell original overlap list (as many box may resi)

    Args:
        cells: List of cells
            [{'x','y', 'width', 'height', 'text',..},...]

    Output:
        new_cells: List of cells with id:
            [{'id', 'x', 'y', 'width', 'height', 'text',...},...]
    '''

    cells = cells
    
    cnt = 1

    for cell in cells:
        cell['id'] = cnt
        cnt = cnt + 1

    return cells

In [282]:
def reconstruct_text_cell(cells, math_box_list):
    '''
    For each math_box in the math_box_list:
        1. Find all the Overlap (actually interact) cell in cells
        2.
            - Merge the math_box with overlaped cell with ratio > 40%
            - Perform the cell cutting with overlaped cell with ratio <= 40%
        3. Return the boxes, merge boxs, cells and reconstruct cells

    Args:
        pdf: pdf with respect to the cell and math_box
        cells: list of cells
            [{'x', 'y', 'width', 'height', 'text', ...},]
        math_box_list: list of box
            [{'id', 'x', 'y', 'width', 'height'}]

    Return:
        List of box: 
            [{'id', 'x', 'y', 'width', 'height'},...]
        New cells list: 
            [{'x', 'y', 'width', 'height', 'text',...},...]
    '''
    cells = cells

    merged_boxes = []

    overlap_id_list = []

    cut_cell_list = [] # not overlaped and will perform the cut

    cut_cell_list_id = []

    for box in math_box_list:
        original_box = box
        
        original_box_coor = [original_box['x'], 
                             original_box['y'], 
                             original_box['width'],
                             original_box['height']]

        #print(original_box)
        
        merge_box, cut_cell_list_box, overlap_list_box = box_overlap_list(cells, original_box_coor)  

        #print('Overlap_list:', overlap_list_box, '\n')

        if (len(overlap_list_box) > 0):
            overlap_id_list.extend([tmp['id'] for tmp in overlap_list_box ])

        if (len(cut_cell_list_box) > 0):
            #print(cut_cell_list_box)
            cut_cell_list.extend(cut_cell_list_box)

            cut_cell_list_id.extend(tmp[0]['id'] for tmp in cut_cell_list_box)

        box_holder = {
            'id' : original_box['id'],
            'x' : merge_box[0],
            'y' : merge_box[1],
            'width' : merge_box[2],
            'height' : merge_box[3]
        }

        merged_boxes.append(box_holder)

        # print('Merge_box:', merge_box)
        # print('Cut_cell_list:', cut_cell_list_box)
        #print('Overlap_list:', overlap_list_box)

        # print('-----------------------')
    
    remain_cell_list_id = [tmp['id'] for tmp in cells if tmp['id'] not in overlap_id_list and tmp['id'] not in cut_cell_list_id]

    #print('Overlap_list (will be not used):', overlap_id_list)
    #print('Reconstruct the text cell (re OCR and cut cell):' ,cut_cell_list_id)
    #print('The remaing cells kept normal:',remain_cell_list_id)


    return merged_boxes, overlap_id_list, cut_cell_list, remain_cell_list_id

In [283]:
def reconstruct_text_cell_from_file(pdf_name):

    def load_layout_cells(file_name):
        # Open the JSON file and load the data
        with open(file_name, 'r') as json_file:
            data = json.load(json_file)
        return data

    math_box_list = load_math_boxes(pdf_name)

    cells = load_layout_cells(pdf_name + '/' + pdf_name + '.json')

    cells = insert_cell_id(cells)

    merged_boxes, overlap_id_list, cut_cell_list, remain_cell_list_id = reconstruct_text_cell(cells, math_box_list)

    cutted_cells = cut_cells_box(pdf_name + '.pdf', cut_cell_list, remain_cell_list_id)

    remain_cell = [cell for cell in cells if cell['id'] in remain_cell_list_id]

    all_cells = cutted_cells + remain_cell

    def export_math_boxes_and_text_cells(cells_text, math_box, folder_name):
        """
        Creates a new folder and writes two files:
        - 'reconstruct_translated_pdf.json' with the full list of dicts
        - 'pdf_coor.txt' with space-separated values: id x y width height

        Args:
            cells_text (list of dict): List of box dictionaries
            math_box (list of dict): List of dictionaries
            folder_name (str): Name of the new folder to create
        """
        folder_name = folder_name + '_processed'

        os.makedirs(folder_name, exist_ok=True)  # Create folder if it doesn't exist

        # Write JSON file
        json_path = os.path.join(folder_name, pdf_name + '.json')
        with open(json_path, 'w') as json_file:
            json.dump(cells_text, json_file, indent=4)

        # Write txt file
        txt_path = os.path.join(folder_name, 'pdf_coor.txt')
        with open(txt_path, 'w') as txt_file:
            for item in math_box:
                line = f"{item['id']} {item['x']} {item['y']} {item['width']} {item['height']}"
                txt_file.write(line + '\n')

        print(f"Files saved in: {folder_name}")
    
    export_math_boxes_and_text_cells(all_cells, merged_boxes, pdf_name)

In [284]:
reconstruct_text_cell_from_file('Math_notation')

Files saved in: Math_notation_processed


In [285]:
# cells = {'cells': [{'x': 72.0,
#    'y': 72.99362182617188,
#    'width': 466.31976318359375,
#    'height': 11.967155456542969,
#    'text': 'dominated by the unitary Hagedorn states is altered to be dominated by the orthogonal',
#    'font': {'color': [0, 0, 0, 255], 'name': 'CMR12', 'size': 11},
#    'text_vi': 'dominated by the unitary Hagedorn states is altered to be dominated by the orthogonal'},
#   {'x': 72.0,
#    'y': 93.87332153320312,
#    'width': 466.12030029296875,
#    'height': 11.967155456542969,
#    'text': 'Hagedorn states when the dilute nuclear matter is heated up to higher temperatures. Since',
#    'font': {'color': [0, 0, 0, 255], 'name': 'CMR12', 'size': 11},
#    'text_vi': 'Hagedorn states when the dilute nuclear matter is heated up to higher temperatures. Since'},
#   {'x': 72.0,
#    'y': 114.87332153320312,
#    'width': 466.25439453125,
#    'height': 11.967155456542969,
#    'text': 'the mass spectral exponent for orthogonal Hagedorns (i.e. colorless orthogonal states) is',
#    'font': {'color': [0, 0, 0, 255], 'name': 'CMR12', 'size': 11},
#    'text_vi': 'the mass spectral exponent for orthogonal Hagedorns (i.e. colorless orthogonal states) is'},
#   {'x': 72.0,
#    'y': 135.75344848632812,
#    'width': 59.320587158203125,
#    'height': 11.967147827148438,
#    'text': 'found to be',
#    'font': {'color': [0, 0, 0, 255], 'name': 'CMR12', 'size': 11},
#    'text_vi': 'found to be'},
#   {'x': 131.32058715820312,
#    'y': 135.75344848632812,
#    'width': 11.475357055664062,
#    'height': 11.9552001953125,
#    'text': 'α',
#    'font': {'color': [0, 0, 0, 255], 'name': 'CMMI12', 'size': 11},
#    'text_vi': 'α'},
#   {'x': 142.80003356933594,
#    'y': 140.54225158691406,
#    'width': 4.2321319580078125,
#    'height': 7.9701080322265625,
#    'text': '1',
#    'font': {'color': [0, 0, 0, 255], 'name': 'CMR8', 'size': 7},
#    'text_vi': '1'},
#   {'x': 147.03216552734375,
#    'y': 135.75344848632812,
#    'width': 391.151123046875,
#    'height': 13.767135620117188,
#    'text': '= 3, it is likely that the orthogonal Hagedorn matter undergoes third order',
#    'font': {'color': [0, 0, 0, 255], 'name': 'CMR12', 'size': 11},
#    'text_vi': '= 3, it is likely that the orthogonal Hagedorn matter undergoes third order'},
#   {'x': 72.00018310546875,
#    'y': 156.63357543945312,
#    'width': 466.32012939453125,
#    'height': 11.967147827148438,
#    'text': 'phase transition to quark-gluon plasma. Furthermore, it is possible that the orthogonal',
#    'font': {'color': [0, 0, 0, 255], 'name': 'CMR12', 'size': 11},
#    'text_vi': 'phase transition to quark-gluon plasma. Furthermore, it is possible that the orthogonal'},
#   {'x': 72.00018310546875,
#    'y': 177.63357543945312,
#    'width': 198.53466796875,
#    'height': 11.967147827148438,
#    'text': 'Hagedorn states are altered to colorless',
#    'font': {'color': [0, 0, 0, 255], 'name': 'CMR12', 'size': 11},
#    'text_vi': 'Hagedorn states are altered to colorless'},
#   {'x': 270.53485107421875,
#    'y': 177.63357543945312,
#    'width': 11.37554931640625,
#    'height': 11.9552001953125,
#    'text': 'U',
#    'font': {'color': [0, 0, 0, 255], 'name': 'CMMI12', 'size': 11},
#    'text_vi': 'U'},
#   {'x': 283.08013916015625,
#    'y': 177.63357543945312,
#    'width': 14.98297119140625,
#    'height': 11.967147827148438,
#    'text': '(1)',
#    'font': {'color': [0, 0, 0, 255], 'name': 'CMR12', 'size': 11},
#    'text_vi': '(1)'},
#   {'x': 298.08013916015625,
#    'y': 176.3024444580078,
#    'width': 6.71881103515625,
#    'height': 7.9701080322265625,
#    'text': 'N',
#    'font': {'color': [0, 0, 0, 255], 'name': 'CMMI8', 'size': 7},
#    'text_vi': 'N'},
#   {'x': 304.800048828125,
#    'y': 178.6456298828125,
#    'width': 3.341461181640625,
#    'height': 5.9775848388671875,
#    'text': 'c',
#    'font': {'color': [0, 0, 0, 255], 'name': 'CMMI6', 'size': 5},
#    'text_vi': 'c'},
#   {'x': 312.4801330566406,
#    'y': 177.63357543945312,
#    'width': 225.77499389648438,
#    'height': 11.967147827148438,
#    'text': 'states when the very dilute nuclear matter is',
#    'font': {'color': [0, 0, 0, 255], 'name': 'CMR12', 'size': 11},
#    'text_vi': 'states when the very dilute nuclear matter is'},
#   {'x': 72.00013732910156,
#    'y': 198.51370239257812,
#    'width': 466.21287536621094,
#    'height': 11.967147827148438,
#    'text': 'further heated up to higher temperatures. The very dilute nuclear matter might be created',
#    'font': {'color': [0, 0, 0, 255], 'name': 'CMR12', 'size': 11},
#    'text_vi': 'further heated up to higher temperatures. The very dilute nuclear matter might be created'},
#   {'x': 72.00015258789062,
#    'y': 219.39340209960938,
#    'width': 30.64043426513672,
#    'height': 11.967147827148438,
#    'text': 'in the',
#    'font': {'color': [0, 0, 0, 255], 'name': 'CMR12', 'size': 11},
#    'text_vi': 'in the'},
#   {'x': 102.64058685302734,
#    'y': 219.39340209960938,
#    'width': 16.38970184326172,
#    'height': 11.9552001953125,
#    'text': 'pp',
#    'font': {'color': [0, 0, 0, 255], 'name': 'CMMI12', 'size': 11},
#    'text_vi': 'pp'},
#   {'x': 119.03028869628906,
#    'y': 219.39340209960938,
#    'width': 419.0613250732422,
#    'height': 11.967147827148438,
#    'text': 'collisions at LHC besides the heavy ion collisions. The Hagedorn matter which',
#    'font': {'color': [0, 0, 0, 255], 'name': 'CMR12', 'size': 11},
#    'text_vi': 'collisions at LHC besides the heavy ion collisions. The Hagedorn matter which'},
#   {'x': 72.00006103515625,
#    'y': 240.39340209960938,
#    'width': 154.614501953125,
#    'height': 11.967147827148438,
#    'text': 'is dominated by the colorless',
#    'font': {'color': [0, 0, 0, 255], 'name': 'CMR12', 'size': 11},
#    'text_vi': 'is dominated by the colorless'},
#   {'x': 226.61456298828125,
#    'y': 240.39340209960938,
#    'width': 13.53570556640625,
#    'height': 11.9552001953125,
#    'text': 'U',
#    'font': {'color': [0, 0, 0, 255], 'name': 'CMMI12', 'size': 11},
#    'text_vi': 'U'},
#   {'x': 241.31997680664062,
#    'y': 240.39340209960938,
#    'width': 14.983001708984375,
#    'height': 11.967147827148438,
#    'text': '(1)',
#    'font': {'color': [0, 0, 0, 255], 'name': 'CMR12', 'size': 11},
#    'text_vi': '(1)'},
#   {'x': 256.3199768066406,
#    'y': 239.06227111816406,
#    'width': 6.71881103515625,
#    'height': 7.9701080322265625,
#    'text': 'N',
#    'font': {'color': [0, 0, 0, 255], 'name': 'CMMI8', 'size': 7},
#    'text_vi': 'N'},
#   {'x': 263.0398864746094,
#    'y': 241.40582275390625,
#    'width': 3.341461181640625,
#    'height': 5.9775848388671875,
#    'text': 'c',
#    'font': {'color': [0, 0, 0, 255], 'name': 'CMMI6', 'size': 5},
#    'text_vi': 'c'},
#   {'x': 272.880126953125,
#    'y': 240.39334106445312,
#    'width': 166.30264282226562,
#    'height': 11.967147827148438,
#    'text': 'has the mass spectral exponent',
#    'font': {'color': [0, 0, 0, 255], 'name': 'CMR12', 'size': 11},
#    'text_vi': 'has the mass spectral exponent'},
#   {'x': 439.1827697753906,
#    'y': 240.39334106445312,
#    'width': 13.09246826171875,
#    'height': 11.9552001953125,
#    'text': 'α',
#    'font': {'color': [0, 0, 0, 255], 'name': 'CMMI12', 'size': 11},
#    'text_vi': 'α'},
#   {'x': 452.2752380371094,
#    'y': 240.39334106445312,
#    'width': 26.96978759765625,
#    'height': 11.967147827148438,
#    'text': '= 3',
#    'font': {'color': [0, 0, 0, 255], 'name': 'CMR12', 'size': 11},
#    'text_vi': '= 3'},
#   {'x': 479.2789001464844,
#    'y': 240.39334106445312,
#    'width': 5.846099853515625,
#    'height': 11.9552001953125,
#    'text': '/',
#    'font': {'color': [0, 0, 0, 255], 'name': 'CMMI12', 'size': 11},
#    'text_vi': '/'},
#   {'x': 485.1589660644531,
#    'y': 240.39334106445312,
#    'width': 9.119842529296875,
#    'height': 11.967147827148438,
#    'text': '2.',
#    'font': {'color': [0, 0, 0, 255], 'name': 'CMR12', 'size': 11},
#    'text_vi': '2.'},
#   {'x': 504.11895751953125,
#    'y': 240.39334106445312,
#    'width': 33.9600830078125,
#    'height': 11.967147827148438,
#    'text': 'Hence,',
#    'font': {'color': [0, 0, 0, 255], 'name': 'CMR12', 'size': 11},
#    'text_vi': 'Hence,'},
#   {'x': 71.99996948242188,
#    'y': 261.2734680175781,
#    'width': 466.1206359863281,
#    'height': 11.9671630859375,
#    'text': 'the nuclear matter that is dominated by these states does not undergo direct abrupt phase',
#    'font': {'color': [0, 0, 0, 255], 'name': 'CMR12', 'size': 11},
#    'text_vi': 'the nuclear matter that is dominated by these states does not undergo direct abrupt phase'},
#   {'x': 71.99996948242188,
#    'y': 282.153564453125,
#    'width': 466.2405090332031,
#    'height': 11.9671630859375,
#    'text': 'transition to quark-gluon plasma but rather smooth cross-over phase transition. When the',
#    'font': {'color': [0, 0, 0, 255], 'name': 'CMR12', 'size': 11},
#    'text_vi': 'transition to quark-gluon plasma but rather smooth cross-over phase transition. When the'},
#   {'x': 71.99996948242188,
#    'y': 303.153564453125,
#    'width': 466.2113952636719,
#    'height': 11.9671630859375,
#    'text': 'medium is further heated up to higher temperature these states (i.e. Hagedorn states with',
#    'font': {'color': [0, 0, 0, 255], 'name': 'CMR12', 'size': 11},
#    'text_vi': 'medium is further heated up to higher temperature these states (i.e. Hagedorn states with'},
#   {'x': 71.99996948242188,
#    'y': 324.0336608886719,
#    'width': 141.463134765625,
#    'height': 11.9671630859375,
#    'text': 'the mass spectral exponent',
#    'font': {'color': [0, 0, 0, 255], 'name': 'CMR12', 'size': 11},
#    'text_vi': 'the mass spectral exponent'},
#   {'x': 213.46310424804688,
#    'y': 324.0336608886719,
#    'width': 12.253005981445312,
#    'height': 11.9552001953125,
#    'text': 'α',
#    'font': {'color': [0, 0, 0, 255], 'name': 'CMMI12', 'size': 11},
#    'text_vi': 'α'},
#   {'x': 225.7161102294922,
#    'y': 324.0336608886719,
#    'width': 24.32989501953125,
#    'height': 11.9671630859375,
#    'text': '= 3',
#    'font': {'color': [0, 0, 0, 255], 'name': 'CMR12', 'size': 11},
#    'text_vi': '= 3'},
#   {'x': 250.07997131347656,
#    'y': 324.0336608886719,
#    'width': 5.846099853515625,
#    'height': 11.9552001953125,
#    'text': '/',
#    'font': {'color': [0, 0, 0, 255], 'name': 'CMMI12', 'size': 11},
#    'text_vi': '/'},
#   {'x': 255.96005249023438,
#    'y': 324.0336608886719,
#    'width': 282.2514343261719,
#    'height': 11.9671630859375,
#    'text': '2) may be mutated to metastable colored quark-gluon',
#    'font': {'color': [0, 0, 0, 255], 'name': 'CMR12', 'size': 11},
#    'text_vi': '2) may be mutated to metastable colored quark-gluon'},
#   {'x': 72.00004577636719,
#    'y': 345.0336608886719,
#    'width': 188.6228790283203,
#    'height': 11.9671630859375,
#    'text': 'bags with the mass spectral exponent',
#    'font': {'color': [0, 0, 0, 255], 'name': 'CMR12', 'size': 11},
#    'text_vi': 'bags with the mass spectral exponent'},
#   {'x': 260.6229248046875,
#    'y': 345.0336608886719,
#    'width': 10.813262939453125,
#    'height': 11.9552001953125,
#    'text': 'α',
#    'font': {'color': [0, 0, 0, 255], 'name': 'CMMI12', 'size': 11},
#    'text_vi': 'α'},
#   {'x': 271.4361877441406,
#    'y': 345.0336608886719,
#    'width': 21.6900634765625,
#    'height': 11.9671630859375,
#    'text': '= 1',
#    'font': {'color': [0, 0, 0, 255], 'name': 'CMR12', 'size': 11},
#    'text_vi': '= 1'},
#   {'x': 293.1602478027344,
#    'y': 345.0336608886719,
#    'width': 5.846099853515625,
#    'height': 11.9552001953125,
#    'text': '/',
#    'font': {'color': [0, 0, 0, 255], 'name': 'CMMI12', 'size': 11},
#    'text_vi': '/'},
#   {'x': 299.0403137207031,
#    'y': 345.0336608886719,
#    'width': 239.02279663085938,
#    'height': 11.9671630859375,
#    'text': '2. Since the states with mass spectral exponent',
#    'font': {'color': [0, 0, 0, 255], 'name': 'CMR12', 'size': 11},
#    'text_vi': '2. Since the states with mass spectral exponent'},
#   {'x': 72.00032043457031,
#    'y': 365.91375732421875,
#    'width': 7.436134338378906,
#    'height': 11.9552001953125,
#    'text': 'α',
#    'font': {'color': [0, 0, 0, 255], 'name': 'CMMI12', 'size': 11},
#    'text_vi': 'α'},
#   {'x': 79.43645477294922,
#    'y': 365.91375732421875,
#    'width': 21.690078735351562,
#    'height': 11.9671630859375,
#    'text': '= 1',
#    'font': {'color': [0, 0, 0, 255], 'name': 'CMR12', 'size': 11},
#    'text_vi': '= 1'},
#   {'x': 101.1605224609375,
#    'y': 365.91375732421875,
#    'width': 5.846092224121094,
#    'height': 11.9552001953125,
#    'text': '/',
#    'font': {'color': [0, 0, 0, 255], 'name': 'CMMI12', 'size': 11},
#    'text_vi': '/'},
#   {'x': 107.04060363769531,
#    'y': 365.91375732421875,
#    'width': 431.1598358154297,
#    'height': 11.9671630859375,
#    'text': '2 do not pass direct explosive deconﬁnement phase transition to quark-gluon plasma,',
#    'font': {'color': [0, 0, 0, 255], 'name': 'CMR12', 'size': 11},
#    'text_vi': '2 do not pass direct explosive deconﬁnement phase transition to quark-gluon plasma,'},
#   {'x': 72.0006103515625,
#    'y': 386.7938537597656,
#    'width': 466.12054443359375,
#    'height': 11.9671630859375,
#    'text': 'the colored quark-gluon bags expand smoothly and the system undergoes smooth phase',
#    'font': {'color': [0, 0, 0, 255], 'name': 'CMR12', 'size': 11},
#    'text_vi': 'the colored quark-gluon bags expand smoothly and the system undergoes smooth phase'},
#   {'x': 72.0006103515625,
#    'y': 407.7938537597656,
#    'width': 212.03964233398438,
#    'height': 11.9671630859375,
#    'text': 'transition to colored quark-gluon plasma.',
#    'font': {'color': [0, 0, 0, 255], 'name': 'CMR12', 'size': 11},
#    'text_vi': 'transition to colored quark-gluon plasma.'},
#   {'x': 87.0006103515625,
#    'y': 428.6735534667969,
#    'width': 317.9353332519531,
#    'height': 11.9671630859375,
#    'text': 'The orthogonal Hagedorn states are mutated to the colorless',
#    'font': {'color': [0, 0, 0, 255], 'name': 'CMR12', 'size': 11},
#    'text_vi': 'The orthogonal Hagedorn states are mutated to the colorless'},
#   {'x': 404.9359436035156,
#    'y': 428.6735534667969,
#    'width': 12.694854736328125,
#    'height': 11.9552001953125,
#    'text': 'U',
#    'font': {'color': [0, 0, 0, 255], 'name': 'CMMI12', 'size': 11},
#    'text_vi': 'U'},
#   {'x': 418.800537109375,
#    'y': 428.6735534667969,
#    'width': 14.982940673828125,
#    'height': 11.9671630859375,
#    'text': '(1)',
#    'font': {'color': [0, 0, 0, 255], 'name': 'CMR12', 'size': 11},
#    'text_vi': '(1)'},
#   {'x': 433.800537109375,
#    'y': 427.34246826171875,
#    'width': 6.71881103515625,
#    'height': 7.9700927734375,
#    'text': 'N',
#    'font': {'color': [0, 0, 0, 255], 'name': 'CMMI8', 'size': 7},
#    'text_vi': 'N'},
#   {'x': 440.52044677734375,
#    'y': 429.6860046386719,
#    'width': 3.341461181640625,
#    'height': 5.977569580078125,
#    'text': 'c',
#    'font': {'color': [0, 0, 0, 255], 'name': 'CMMI6', 'size': 5},
#    'text_vi': 'c'},
#   {'x': 449.640380859375,
#    'y': 428.6735534667969,
#    'width': 88.6148681640625,
#    'height': 11.9671630859375,
#    'text': 'quark-gluon bags',
#    'font': {'color': [0, 0, 0, 255], 'name': 'CMR12', 'size': 11},
#    'text_vi': 'quark-gluon bags'},
#   {'x': 72.0003662109375,
#    'y': 449.55364990234375,
#    'width': 417.11981201171875,
#    'height': 11.9671630859375,
#    'text': 'due to the high thermal excitations in the hot and very dilute nuclear matter (i.e.',
#    'font': {'color': [0, 0, 0, 255], 'name': 'CMR12', 'size': 11},
#    'text_vi': 'due to the high thermal excitations in the hot and very dilute nuclear matter (i.e.'},
#   {'x': 489.12017822265625,
#    'y': 449.55364990234375,
#    'width': 12.201995849609375,
#    'height': 11.9552001953125,
#    'text': 'µ',
#    'font': {'color': [0, 0, 0, 255], 'name': 'CMMI12', 'size': 11},
#    'text_vi': 'µ'},
#   {'x': 501.36102294921875,
#    'y': 454.34246826171875,
#    'width': 6.384063720703125,
#    'height': 7.9700927734375,
#    'text': 'B',
#    'font': {'color': [0, 0, 0, 255], 'name': 'CMMI8', 'size': 7},
#    'text_vi': 'B'},
#   {'x': 507.7450866699219,
#    'y': 449.2547607421875,
#    'width': 13.464935302734375,
#    'height': 22.542266845703125,
#    'text': '≈',
#    'font': {'color': [0, 0, 0, 255], 'name': 'CMSY10', 'size': 11},
#    'text_vi': '≈'},
#   {'x': 524.6407470703125,
#    'y': 449.55364990234375,
#    'width': 13.6798095703125,
#    'height': 11.9671630859375,
#    'text': '0).',
#    'font': {'color': [0, 0, 0, 255], 'name': 'CMR12', 'size': 11},
#    'text_vi': '0).'},
#   {'x': 72.000732421875,
#    'y': 470.55364990234375,
#    'width': 360.29461669921875,
#    'height': 11.9671630859375,
#    'text': 'Since the new nuclear matter turns to be dominated by the colorless',
#    'font': {'color': [0, 0, 0, 255], 'name': 'CMR12', 'size': 11},
#    'text_vi': 'Since the new nuclear matter turns to be dominated by the colorless'},
#   {'x': 432.29534912109375,
#    'y': 470.55364990234375,
#    'width': 12.815582275390625,
#    'height': 11.9552001953125,
#    'text': 'U',
#    'font': {'color': [0, 0, 0, 255], 'name': 'CMMI12', 'size': 11},
#    'text_vi': 'U'},
#   {'x': 446.2806701660156,
#    'y': 470.55364990234375,
#    'width': 14.982940673828125,
#    'height': 11.9671630859375,
#    'text': '(1)',
#    'font': {'color': [0, 0, 0, 255], 'name': 'CMR12', 'size': 11},
#    'text_vi': '(1)'},
#   {'x': 461.2806701660156,
#    'y': 469.1026306152344,
#    'width': 6.71881103515625,
#    'height': 7.9700927734375,
#    'text': 'N',
#    'font': {'color': [0, 0, 0, 255], 'name': 'CMMI8', 'size': 7},
#    'text_vi': 'N'},
#   {'x': 468.0005798339844,
#    'y': 471.44580078125,
#    'width': 3.341461181640625,
#    'height': 5.977569580078125,
#    'text': 'c',
#    'font': {'color': [0, 0, 0, 255], 'name': 'CMMI6', 'size': 5},
#    'text_vi': 'c'},
#   {'x': 477.1205139160156,
#    'y': 470.55364990234375,
#    'width': 61.091705322265625,
#    'height': 11.9671630859375,
#    'text': 'quark-gluon',
#    'font': {'color': [0, 0, 0, 255], 'name': 'CMR12', 'size': 11},
#    'text_vi': 'quark-gluon'},
#   {'x': 72.00051879882812,
#    'y': 491.4337463378906,
#    'width': 466.1817932128906,
#    'height': 11.9671630859375,
#    'text': 'bags, it does not likely undergo direct phase transition to explosive quark-gluon plasma. But',
#    'font': {'color': [0, 0, 0, 255], 'name': 'CMR12', 'size': 11},
#    'text_vi': 'bags, it does not likely undergo direct phase transition to explosive quark-gluon plasma. But'},
#   {'x': 72.00051879882812,
#    'y': 512.3138427734375,
#    'width': 466.1374206542969,
#    'height': 11.9671630859375,
#    'text': 'instead, the resultant Hagedorn states are gradually altered to metastable colored quark-',
#    'font': {'color': [0, 0, 0, 255], 'name': 'CMR12', 'size': 11},
#    'text_vi': 'instead, the resultant Hagedorn states are gradually altered to metastable colored quark-'},
#   {'x': 72.00051879882812,
#    'y': 533.3138427734375,
#    'width': 75.599853515625,
#    'height': 11.9671630859375,
#    'text': 'gluon bubbles.',
#    'font': {'color': [0, 0, 0, 255], 'name': 'CMR12', 'size': 11},
#    'text_vi': 'gluon bubbles.'},
#   {'x': 157.9202423095703,
#    'y': 533.3138427734375,
#    'width': 380.1720428466797,
#    'height': 11.9671630859375,
#    'text': 'The metastable colored quark-gluon bags expand gradually and overlap',
#    'font': {'color': [0, 0, 0, 255], 'name': 'CMR12', 'size': 11},
#    'text_vi': 'The metastable colored quark-gluon bags expand gradually and overlap'},
#   {'x': 72.00051879882812,
#    'y': 554.1934814453125,
#    'width': 466.1986389160156,
#    'height': 11.9671630859375,
#    'text': 'each other smoothly until the entire space is ﬁlled by giant colored (non-singlet) bags.',
#    'font': {'color': [0, 0, 0, 255], 'name': 'CMR12', 'size': 11},
#    'text_vi': 'each other smoothly until the entire space is ﬁlled by giant colored (non-singlet) bags.'},
#   {'x': 72.00051879882812,
#    'y': 575.0736083984375,
#    'width': 466.1996154785156,
#    'height': 11.9671630859375,
#    'text': 'The resultant matter have an initial neutral color charge aftermath the phase transition.',
#    'font': {'color': [0, 0, 0, 255], 'name': 'CMR12', 'size': 11},
#    'text_vi': 'The resultant matter have an initial neutral color charge aftermath the phase transition.'},
#   {'x': 72.00051879882812,
#    'y': 596.0736083984375,
#    'width': 465.9831848144531,
#    'height': 11.9671630859375,
#    'text': 'Therefore, the constraints of the conserved color charges must be embedded in the system',
#    'font': {'color': [0, 0, 0, 255], 'name': 'CMR12', 'size': 11},
#    'text_vi': 'Therefore, the constraints of the conserved color charges must be embedded in the system'},
#   {'x': 72.00051879882812,
#    'y': 616.9537353515625,
#    'width': 466.1343688964844,
#    'height': 11.9671630859375,
#    'text': 'through the color chemical potentials. This kind of (color-non-singlet) matter with the mass',
#    'font': {'color': [0, 0, 0, 255], 'name': 'CMR12', 'size': 11},
#    'text_vi': 'through the color chemical potentials. This kind of (color-non-singlet) matter with the mass'},
#   {'x': 72.00051879882812,
#    'y': 637.8338623046875,
#    'width': 91.90286254882812,
#    'height': 11.96710205078125,
#    'text': 'spectral exponent',
#    'font': {'color': [0, 0, 0, 255], 'name': 'CMR12', 'size': 11},
#    'text_vi': 'spectral exponent'},
#   {'x': 163.90338134765625,
#    'y': 637.8338623046875,
#    'width': 13.212982177734375,
#    'height': 11.9552001953125,
#    'text': 'α',
#    'font': {'color': [0, 0, 0, 255], 'name': 'CMMI12', 'size': 11},
#    'text_vi': 'α'},
#   {'x': 177.12045288085938,
#    'y': 642.6226196289062,
#    'width': 14.372756958007812,
#    'height': 7.97015380859375,
#    'text': 'non',
#    'font': {'color': [0, 0, 0, 255], 'name': 'CMMI8', 'size': 7},
#    'text_vi': 'non'},
#   {'x': 191.4932098388672,
#    'y': 637.8338623046875,
#    'width': 346.5076446533203,
#    'height': 13.76708984375,
#    'text': 'undergoes a smooth cross-over phase transition to non-explosive',
#    'font': {'color': [0, 0, 0, 255], 'name': 'CMR12', 'size': 11},
#    'text_vi': 'undergoes a smooth cross-over phase transition to non-explosive'},
#   {'x': 72.0003662109375,
#    'y': 658.8338623046875,
#    'width': 466.13812255859375,
#    'height': 11.96710205078125,
#    'text': 'quark-gluon plasma. The multi-processes mechanism in the phase transition from the low-',
#    'font': {'color': [0, 0, 0, 255], 'name': 'CMR12', 'size': 11},
#    'text_vi': 'quark-gluon plasma. The multi-processes mechanism in the phase transition from the low-'},
#   {'x': 72.0003662109375,
#    'y': 679.7138671875,
#    'width': 466.24041748046875,
#    'height': 11.9671630859375,
#    'text': 'lying hadronic phase to the quark-gluon plasma strongly indicates the ﬂuid behaviour for the',
#    'font': {'color': [0, 0, 0, 255], 'name': 'CMR12', 'size': 11},
#    'text_vi': 'lying hadronic phase to the quark-gluon plasma strongly indicates the ﬂuid behaviour for the'},
#   {'x': 72.0003662109375,
#    'y': 700.5936279296875,
#    'width': 466.31964111328125,
#    'height': 11.9671630859375,
#    'text': 'quark-gluon plasma. The color-singlet states for the quark-gluon bag with an orthogonal',
#    'font': {'color': [0, 0, 0, 255], 'name': 'CMR12', 'size': 11},
#    'text_vi': 'quark-gluon plasma. The color-singlet states for the quark-gluon bag with an orthogonal'},
#   {'x': 72.0003662109375,
#    'y': 721.5936279296875,
#    'width': 466.0914306640625,
#    'height': 11.9671630859375,
#    'text': 'color representation rather than the unitary one can be interpreted as a gas of Coulomb',
#    'font': {'color': [0, 0, 0, 255], 'name': 'CMR12', 'size': 11},
#    'text_vi': 'color representation rather than the unitary one can be interpreted as a gas of Coulomb'},
#   {'x': 293.400390625,
#    'y': 755.9139404296875,
#    'width': 11.72613525390625,
#    'height': 11.96710205078125,
#    'text': '52',
#    'font': {'color': [0, 0, 0, 255], 'name': 'CMR12', 'size': 11},
#    'text_vi': '52'}]}["cells"]

In [286]:
# name_root = 'Math_notation'

In [287]:
# math_box_list = load_math_boxes(name_root)

In [288]:
# cells = insert_cell_id(cells)

In [289]:
# merged_boxes, overlap_id_list, cut_cell_list, remain_cell_list_id = reconstruct_text_cell(cells, math_box_list)

In [290]:
# cutted_cells = cut_cells_box(name_root + '.pdf', cut_cell_list, remain_cell_list_id)

In [291]:
# remain_cell = [cell for cell in cells if cell['id'] in remain_cell_list_id]

In [292]:
# all_cells = cutted_cells + remain_cell

In [293]:
# import os
# import json

# def export_math_boxes_and_text_cells(cells_text, math_box, folder_name):
#     """
#     Creates a new folder and writes two files:
#     - 'reconstruct_translated_pdf.json' with the full list of dicts
#     - 'pdf_coor.txt' with space-separated values: id x y width height

#     Args:
#         cells_text (list of dict): List of box dictionaries
#         math_box (list of dict): List of dictionaries
#         folder_name (str): Name of the new folder to create
#     """
#     folder_name = folder_name + '_processed'

#     os.makedirs(folder_name, exist_ok=True)  # Create folder if it doesn't exist

#     # Write JSON file
#     json_path = os.path.join(folder_name, 'reconstruct_translated_pdf.json')
#     with open(json_path, 'w') as json_file:
#         json.dump(cells_text, json_file, indent=4)

#     # Write txt file
#     txt_path = os.path.join(folder_name, 'pdf_coor.txt')
#     with open(txt_path, 'w') as txt_file:
#         for item in math_box:
#             line = f"{item['id']} {item['x']} {item['y']} {item['width']} {item['height']}"
#             txt_file.write(line + '\n')

#     print(f"Files saved in: {folder_name}")


In [294]:
# export_math_boxes_and_text_cells(all_cells, merged_boxes, name_root)

# Visualization

In [295]:
import fitz  # PyMuPDF
from PIL import Image
import os
import ast  # To safely parse the tuple from size.txt
from PyPDF2 import PdfReader, PdfWriter, Transformation
import copy

In [296]:
def visualize_to_pdf_from_cells(cells_text, cells_box ,pdf_path):
    '''
    Visualize the red box from the cells list to the given PDF

    Args:
        cells_text: List of cells with format {'x', 'y', 'width', 'height'}
        cells_box: List of cells with format {'x', 'y', 'width', 'height'}
        pdf_path: PDF to visualize

    Return:
        New PDF file with prefix '_vis.PDF'
    '''

    # Open scaled PDF
    pdf_doc = fitz.open(pdf_path)
    pdf_name = os.path.splitext(os.path.basename(pdf_path))[0]
    output_pdf = f"{pdf_name}_vis.pdf"
    
    page = pdf_doc[0]

    for cell in cells_text:
        x, y, w, h = cell['x'], cell['y'], cell['width'], cell['height']

        x, y, w, h = map(float, [x, y, w, h])

        rect = fitz.Rect(x, y, x+w, y+h)

        # Red        
        page.draw_rect(rect, color=(1, 0, 0), fill=None, width=1)
    
    for cell in cells_box:
        x, y, w, h = cell['x'], cell['y'], cell['width'], cell['height']

        x, y, w, h = map(float, [x, y, w, h])

        rect = fitz.Rect(x, y, x+w, y+h)
        
        # Green
        page.draw_rect(rect, color=(0, 1, 0), fill=None, width=1)

    # Save PDF
    pdf_doc.save(output_pdf)
    pdf_doc.close()
    print(f"Saved output PDF as {output_pdf}")

In [297]:
# visualize_to_pdf_from_cells(all_cells, merged_boxes, 'Math_notation.pdf')

# Original Box

In [298]:
cells_original = {'cells': [{'x': 72.0,
   'y': 72.99362182617188,
   'width': 466.31976318359375,
   'height': 11.967155456542969,
   'text': 'dominated by the unitary Hagedorn states is altered to be dominated by the orthogonal',
   'font': {'color': [0, 0, 0, 255], 'name': 'CMR12', 'size': 11},
   'text_vi': 'dominated by the unitary Hagedorn states is altered to be dominated by the orthogonal'},
  {'x': 72.0,
   'y': 93.87332153320312,
   'width': 466.12030029296875,
   'height': 11.967155456542969,
   'text': 'Hagedorn states when the dilute nuclear matter is heated up to higher temperatures. Since',
   'font': {'color': [0, 0, 0, 255], 'name': 'CMR12', 'size': 11},
   'text_vi': 'Hagedorn states when the dilute nuclear matter is heated up to higher temperatures. Since'},
  {'x': 72.0,
   'y': 114.87332153320312,
   'width': 466.25439453125,
   'height': 11.967155456542969,
   'text': 'the mass spectral exponent for orthogonal Hagedorns (i.e. colorless orthogonal states) is',
   'font': {'color': [0, 0, 0, 255], 'name': 'CMR12', 'size': 11},
   'text_vi': 'the mass spectral exponent for orthogonal Hagedorns (i.e. colorless orthogonal states) is'},
  {'x': 72.0,
   'y': 135.75344848632812,
   'width': 59.320587158203125,
   'height': 11.967147827148438,
   'text': 'found to be',
   'font': {'color': [0, 0, 0, 255], 'name': 'CMR12', 'size': 11},
   'text_vi': 'found to be'},
  {'x': 131.32058715820312,
   'y': 135.75344848632812,
   'width': 11.475357055664062,
   'height': 11.9552001953125,
   'text': 'α',
   'font': {'color': [0, 0, 0, 255], 'name': 'CMMI12', 'size': 11},
   'text_vi': 'α'},
  {'x': 142.80003356933594,
   'y': 140.54225158691406,
   'width': 4.2321319580078125,
   'height': 7.9701080322265625,
   'text': '1',
   'font': {'color': [0, 0, 0, 255], 'name': 'CMR8', 'size': 7},
   'text_vi': '1'},
  {'x': 147.03216552734375,
   'y': 135.75344848632812,
   'width': 391.151123046875,
   'height': 13.767135620117188,
   'text': '= 3, it is likely that the orthogonal Hagedorn matter undergoes third order',
   'font': {'color': [0, 0, 0, 255], 'name': 'CMR12', 'size': 11},
   'text_vi': '= 3, it is likely that the orthogonal Hagedorn matter undergoes third order'},
  {'x': 72.00018310546875,
   'y': 156.63357543945312,
   'width': 466.32012939453125,
   'height': 11.967147827148438,
   'text': 'phase transition to quark-gluon plasma. Furthermore, it is possible that the orthogonal',
   'font': {'color': [0, 0, 0, 255], 'name': 'CMR12', 'size': 11},
   'text_vi': 'phase transition to quark-gluon plasma. Furthermore, it is possible that the orthogonal'},
  {'x': 72.00018310546875,
   'y': 177.63357543945312,
   'width': 198.53466796875,
   'height': 11.967147827148438,
   'text': 'Hagedorn states are altered to colorless',
   'font': {'color': [0, 0, 0, 255], 'name': 'CMR12', 'size': 11},
   'text_vi': 'Hagedorn states are altered to colorless'},
  {'x': 270.53485107421875,
   'y': 177.63357543945312,
   'width': 11.37554931640625,
   'height': 11.9552001953125,
   'text': 'U',
   'font': {'color': [0, 0, 0, 255], 'name': 'CMMI12', 'size': 11},
   'text_vi': 'U'},
  {'x': 283.08013916015625,
   'y': 177.63357543945312,
   'width': 14.98297119140625,
   'height': 11.967147827148438,
   'text': '(1)',
   'font': {'color': [0, 0, 0, 255], 'name': 'CMR12', 'size': 11},
   'text_vi': '(1)'},
  {'x': 298.08013916015625,
   'y': 176.3024444580078,
   'width': 6.71881103515625,
   'height': 7.9701080322265625,
   'text': 'N',
   'font': {'color': [0, 0, 0, 255], 'name': 'CMMI8', 'size': 7},
   'text_vi': 'N'},
  {'x': 304.800048828125,
   'y': 178.6456298828125,
   'width': 3.341461181640625,
   'height': 5.9775848388671875,
   'text': 'c',
   'font': {'color': [0, 0, 0, 255], 'name': 'CMMI6', 'size': 5},
   'text_vi': 'c'},
  {'x': 312.4801330566406,
   'y': 177.63357543945312,
   'width': 225.77499389648438,
   'height': 11.967147827148438,
   'text': 'states when the very dilute nuclear matter is',
   'font': {'color': [0, 0, 0, 255], 'name': 'CMR12', 'size': 11},
   'text_vi': 'states when the very dilute nuclear matter is'},
  {'x': 72.00013732910156,
   'y': 198.51370239257812,
   'width': 466.21287536621094,
   'height': 11.967147827148438,
   'text': 'further heated up to higher temperatures. The very dilute nuclear matter might be created',
   'font': {'color': [0, 0, 0, 255], 'name': 'CMR12', 'size': 11},
   'text_vi': 'further heated up to higher temperatures. The very dilute nuclear matter might be created'},
  {'x': 72.00015258789062,
   'y': 219.39340209960938,
   'width': 30.64043426513672,
   'height': 11.967147827148438,
   'text': 'in the',
   'font': {'color': [0, 0, 0, 255], 'name': 'CMR12', 'size': 11},
   'text_vi': 'in the'},
  {'x': 102.64058685302734,
   'y': 219.39340209960938,
   'width': 16.38970184326172,
   'height': 11.9552001953125,
   'text': 'pp',
   'font': {'color': [0, 0, 0, 255], 'name': 'CMMI12', 'size': 11},
   'text_vi': 'pp'},
  {'x': 119.03028869628906,
   'y': 219.39340209960938,
   'width': 419.0613250732422,
   'height': 11.967147827148438,
   'text': 'collisions at LHC besides the heavy ion collisions. The Hagedorn matter which',
   'font': {'color': [0, 0, 0, 255], 'name': 'CMR12', 'size': 11},
   'text_vi': 'collisions at LHC besides the heavy ion collisions. The Hagedorn matter which'},
  {'x': 72.00006103515625,
   'y': 240.39340209960938,
   'width': 154.614501953125,
   'height': 11.967147827148438,
   'text': 'is dominated by the colorless',
   'font': {'color': [0, 0, 0, 255], 'name': 'CMR12', 'size': 11},
   'text_vi': 'is dominated by the colorless'},
  {'x': 226.61456298828125,
   'y': 240.39340209960938,
   'width': 13.53570556640625,
   'height': 11.9552001953125,
   'text': 'U',
   'font': {'color': [0, 0, 0, 255], 'name': 'CMMI12', 'size': 11},
   'text_vi': 'U'},
  {'x': 241.31997680664062,
   'y': 240.39340209960938,
   'width': 14.983001708984375,
   'height': 11.967147827148438,
   'text': '(1)',
   'font': {'color': [0, 0, 0, 255], 'name': 'CMR12', 'size': 11},
   'text_vi': '(1)'},
  {'x': 256.3199768066406,
   'y': 239.06227111816406,
   'width': 6.71881103515625,
   'height': 7.9701080322265625,
   'text': 'N',
   'font': {'color': [0, 0, 0, 255], 'name': 'CMMI8', 'size': 7},
   'text_vi': 'N'},
  {'x': 263.0398864746094,
   'y': 241.40582275390625,
   'width': 3.341461181640625,
   'height': 5.9775848388671875,
   'text': 'c',
   'font': {'color': [0, 0, 0, 255], 'name': 'CMMI6', 'size': 5},
   'text_vi': 'c'},
  {'x': 272.880126953125,
   'y': 240.39334106445312,
   'width': 166.30264282226562,
   'height': 11.967147827148438,
   'text': 'has the mass spectral exponent',
   'font': {'color': [0, 0, 0, 255], 'name': 'CMR12', 'size': 11},
   'text_vi': 'has the mass spectral exponent'},
  {'x': 439.1827697753906,
   'y': 240.39334106445312,
   'width': 13.09246826171875,
   'height': 11.9552001953125,
   'text': 'α',
   'font': {'color': [0, 0, 0, 255], 'name': 'CMMI12', 'size': 11},
   'text_vi': 'α'},
  {'x': 452.2752380371094,
   'y': 240.39334106445312,
   'width': 26.96978759765625,
   'height': 11.967147827148438,
   'text': '= 3',
   'font': {'color': [0, 0, 0, 255], 'name': 'CMR12', 'size': 11},
   'text_vi': '= 3'},
  {'x': 479.2789001464844,
   'y': 240.39334106445312,
   'width': 5.846099853515625,
   'height': 11.9552001953125,
   'text': '/',
   'font': {'color': [0, 0, 0, 255], 'name': 'CMMI12', 'size': 11},
   'text_vi': '/'},
  {'x': 485.1589660644531,
   'y': 240.39334106445312,
   'width': 9.119842529296875,
   'height': 11.967147827148438,
   'text': '2.',
   'font': {'color': [0, 0, 0, 255], 'name': 'CMR12', 'size': 11},
   'text_vi': '2.'},
  {'x': 504.11895751953125,
   'y': 240.39334106445312,
   'width': 33.9600830078125,
   'height': 11.967147827148438,
   'text': 'Hence,',
   'font': {'color': [0, 0, 0, 255], 'name': 'CMR12', 'size': 11},
   'text_vi': 'Hence,'},
  {'x': 71.99996948242188,
   'y': 261.2734680175781,
   'width': 466.1206359863281,
   'height': 11.9671630859375,
   'text': 'the nuclear matter that is dominated by these states does not undergo direct abrupt phase',
   'font': {'color': [0, 0, 0, 255], 'name': 'CMR12', 'size': 11},
   'text_vi': 'the nuclear matter that is dominated by these states does not undergo direct abrupt phase'},
  {'x': 71.99996948242188,
   'y': 282.153564453125,
   'width': 466.2405090332031,
   'height': 11.9671630859375,
   'text': 'transition to quark-gluon plasma but rather smooth cross-over phase transition. When the',
   'font': {'color': [0, 0, 0, 255], 'name': 'CMR12', 'size': 11},
   'text_vi': 'transition to quark-gluon plasma but rather smooth cross-over phase transition. When the'},
  {'x': 71.99996948242188,
   'y': 303.153564453125,
   'width': 466.2113952636719,
   'height': 11.9671630859375,
   'text': 'medium is further heated up to higher temperature these states (i.e. Hagedorn states with',
   'font': {'color': [0, 0, 0, 255], 'name': 'CMR12', 'size': 11},
   'text_vi': 'medium is further heated up to higher temperature these states (i.e. Hagedorn states with'},
  {'x': 71.99996948242188,
   'y': 324.0336608886719,
   'width': 141.463134765625,
   'height': 11.9671630859375,
   'text': 'the mass spectral exponent',
   'font': {'color': [0, 0, 0, 255], 'name': 'CMR12', 'size': 11},
   'text_vi': 'the mass spectral exponent'},
  {'x': 213.46310424804688,
   'y': 324.0336608886719,
   'width': 12.253005981445312,
   'height': 11.9552001953125,
   'text': 'α',
   'font': {'color': [0, 0, 0, 255], 'name': 'CMMI12', 'size': 11},
   'text_vi': 'α'},
  {'x': 225.7161102294922,
   'y': 324.0336608886719,
   'width': 24.32989501953125,
   'height': 11.9671630859375,
   'text': '= 3',
   'font': {'color': [0, 0, 0, 255], 'name': 'CMR12', 'size': 11},
   'text_vi': '= 3'},
  {'x': 250.07997131347656,
   'y': 324.0336608886719,
   'width': 5.846099853515625,
   'height': 11.9552001953125,
   'text': '/',
   'font': {'color': [0, 0, 0, 255], 'name': 'CMMI12', 'size': 11},
   'text_vi': '/'},
  {'x': 255.96005249023438,
   'y': 324.0336608886719,
   'width': 282.2514343261719,
   'height': 11.9671630859375,
   'text': '2) may be mutated to metastable colored quark-gluon',
   'font': {'color': [0, 0, 0, 255], 'name': 'CMR12', 'size': 11},
   'text_vi': '2) may be mutated to metastable colored quark-gluon'},
  {'x': 72.00004577636719,
   'y': 345.0336608886719,
   'width': 188.6228790283203,
   'height': 11.9671630859375,
   'text': 'bags with the mass spectral exponent',
   'font': {'color': [0, 0, 0, 255], 'name': 'CMR12', 'size': 11},
   'text_vi': 'bags with the mass spectral exponent'},
  {'x': 260.6229248046875,
   'y': 345.0336608886719,
   'width': 10.813262939453125,
   'height': 11.9552001953125,
   'text': 'α',
   'font': {'color': [0, 0, 0, 255], 'name': 'CMMI12', 'size': 11},
   'text_vi': 'α'},
  {'x': 271.4361877441406,
   'y': 345.0336608886719,
   'width': 21.6900634765625,
   'height': 11.9671630859375,
   'text': '= 1',
   'font': {'color': [0, 0, 0, 255], 'name': 'CMR12', 'size': 11},
   'text_vi': '= 1'},
  {'x': 293.1602478027344,
   'y': 345.0336608886719,
   'width': 5.846099853515625,
   'height': 11.9552001953125,
   'text': '/',
   'font': {'color': [0, 0, 0, 255], 'name': 'CMMI12', 'size': 11},
   'text_vi': '/'},
  {'x': 299.0403137207031,
   'y': 345.0336608886719,
   'width': 239.02279663085938,
   'height': 11.9671630859375,
   'text': '2. Since the states with mass spectral exponent',
   'font': {'color': [0, 0, 0, 255], 'name': 'CMR12', 'size': 11},
   'text_vi': '2. Since the states with mass spectral exponent'},
  {'x': 72.00032043457031,
   'y': 365.91375732421875,
   'width': 7.436134338378906,
   'height': 11.9552001953125,
   'text': 'α',
   'font': {'color': [0, 0, 0, 255], 'name': 'CMMI12', 'size': 11},
   'text_vi': 'α'},
  {'x': 79.43645477294922,
   'y': 365.91375732421875,
   'width': 21.690078735351562,
   'height': 11.9671630859375,
   'text': '= 1',
   'font': {'color': [0, 0, 0, 255], 'name': 'CMR12', 'size': 11},
   'text_vi': '= 1'},
  {'x': 101.1605224609375,
   'y': 365.91375732421875,
   'width': 5.846092224121094,
   'height': 11.9552001953125,
   'text': '/',
   'font': {'color': [0, 0, 0, 255], 'name': 'CMMI12', 'size': 11},
   'text_vi': '/'},
  {'x': 107.04060363769531,
   'y': 365.91375732421875,
   'width': 431.1598358154297,
   'height': 11.9671630859375,
   'text': '2 do not pass direct explosive deconﬁnement phase transition to quark-gluon plasma,',
   'font': {'color': [0, 0, 0, 255], 'name': 'CMR12', 'size': 11},
   'text_vi': '2 do not pass direct explosive deconﬁnement phase transition to quark-gluon plasma,'},
  {'x': 72.0006103515625,
   'y': 386.7938537597656,
   'width': 466.12054443359375,
   'height': 11.9671630859375,
   'text': 'the colored quark-gluon bags expand smoothly and the system undergoes smooth phase',
   'font': {'color': [0, 0, 0, 255], 'name': 'CMR12', 'size': 11},
   'text_vi': 'the colored quark-gluon bags expand smoothly and the system undergoes smooth phase'},
  {'x': 72.0006103515625,
   'y': 407.7938537597656,
   'width': 212.03964233398438,
   'height': 11.9671630859375,
   'text': 'transition to colored quark-gluon plasma.',
   'font': {'color': [0, 0, 0, 255], 'name': 'CMR12', 'size': 11},
   'text_vi': 'transition to colored quark-gluon plasma.'},
  {'x': 87.0006103515625,
   'y': 428.6735534667969,
   'width': 317.9353332519531,
   'height': 11.9671630859375,
   'text': 'The orthogonal Hagedorn states are mutated to the colorless',
   'font': {'color': [0, 0, 0, 255], 'name': 'CMR12', 'size': 11},
   'text_vi': 'The orthogonal Hagedorn states are mutated to the colorless'},
  {'x': 404.9359436035156,
   'y': 428.6735534667969,
   'width': 12.694854736328125,
   'height': 11.9552001953125,
   'text': 'U',
   'font': {'color': [0, 0, 0, 255], 'name': 'CMMI12', 'size': 11},
   'text_vi': 'U'},
  {'x': 418.800537109375,
   'y': 428.6735534667969,
   'width': 14.982940673828125,
   'height': 11.9671630859375,
   'text': '(1)',
   'font': {'color': [0, 0, 0, 255], 'name': 'CMR12', 'size': 11},
   'text_vi': '(1)'},
  {'x': 433.800537109375,
   'y': 427.34246826171875,
   'width': 6.71881103515625,
   'height': 7.9700927734375,
   'text': 'N',
   'font': {'color': [0, 0, 0, 255], 'name': 'CMMI8', 'size': 7},
   'text_vi': 'N'},
  {'x': 440.52044677734375,
   'y': 429.6860046386719,
   'width': 3.341461181640625,
   'height': 5.977569580078125,
   'text': 'c',
   'font': {'color': [0, 0, 0, 255], 'name': 'CMMI6', 'size': 5},
   'text_vi': 'c'},
  {'x': 449.640380859375,
   'y': 428.6735534667969,
   'width': 88.6148681640625,
   'height': 11.9671630859375,
   'text': 'quark-gluon bags',
   'font': {'color': [0, 0, 0, 255], 'name': 'CMR12', 'size': 11},
   'text_vi': 'quark-gluon bags'},
  {'x': 72.0003662109375,
   'y': 449.55364990234375,
   'width': 417.11981201171875,
   'height': 11.9671630859375,
   'text': 'due to the high thermal excitations in the hot and very dilute nuclear matter (i.e.',
   'font': {'color': [0, 0, 0, 255], 'name': 'CMR12', 'size': 11},
   'text_vi': 'due to the high thermal excitations in the hot and very dilute nuclear matter (i.e.'},
  {'x': 489.12017822265625,
   'y': 449.55364990234375,
   'width': 12.201995849609375,
   'height': 11.9552001953125,
   'text': 'µ',
   'font': {'color': [0, 0, 0, 255], 'name': 'CMMI12', 'size': 11},
   'text_vi': 'µ'},
  {'x': 501.36102294921875,
   'y': 454.34246826171875,
   'width': 6.384063720703125,
   'height': 7.9700927734375,
   'text': 'B',
   'font': {'color': [0, 0, 0, 255], 'name': 'CMMI8', 'size': 7},
   'text_vi': 'B'},
  {'x': 507.7450866699219,
   'y': 449.2547607421875,
   'width': 13.464935302734375,
   'height': 22.542266845703125,
   'text': '≈',
   'font': {'color': [0, 0, 0, 255], 'name': 'CMSY10', 'size': 11},
   'text_vi': '≈'},
  {'x': 524.6407470703125,
   'y': 449.55364990234375,
   'width': 13.6798095703125,
   'height': 11.9671630859375,
   'text': '0).',
   'font': {'color': [0, 0, 0, 255], 'name': 'CMR12', 'size': 11},
   'text_vi': '0).'},
  {'x': 72.000732421875,
   'y': 470.55364990234375,
   'width': 360.29461669921875,
   'height': 11.9671630859375,
   'text': 'Since the new nuclear matter turns to be dominated by the colorless',
   'font': {'color': [0, 0, 0, 255], 'name': 'CMR12', 'size': 11},
   'text_vi': 'Since the new nuclear matter turns to be dominated by the colorless'},
  {'x': 432.29534912109375,
   'y': 470.55364990234375,
   'width': 12.815582275390625,
   'height': 11.9552001953125,
   'text': 'U',
   'font': {'color': [0, 0, 0, 255], 'name': 'CMMI12', 'size': 11},
   'text_vi': 'U'},
  {'x': 446.2806701660156,
   'y': 470.55364990234375,
   'width': 14.982940673828125,
   'height': 11.9671630859375,
   'text': '(1)',
   'font': {'color': [0, 0, 0, 255], 'name': 'CMR12', 'size': 11},
   'text_vi': '(1)'},
  {'x': 461.2806701660156,
   'y': 469.1026306152344,
   'width': 6.71881103515625,
   'height': 7.9700927734375,
   'text': 'N',
   'font': {'color': [0, 0, 0, 255], 'name': 'CMMI8', 'size': 7},
   'text_vi': 'N'},
  {'x': 468.0005798339844,
   'y': 471.44580078125,
   'width': 3.341461181640625,
   'height': 5.977569580078125,
   'text': 'c',
   'font': {'color': [0, 0, 0, 255], 'name': 'CMMI6', 'size': 5},
   'text_vi': 'c'},
  {'x': 477.1205139160156,
   'y': 470.55364990234375,
   'width': 61.091705322265625,
   'height': 11.9671630859375,
   'text': 'quark-gluon',
   'font': {'color': [0, 0, 0, 255], 'name': 'CMR12', 'size': 11},
   'text_vi': 'quark-gluon'},
  {'x': 72.00051879882812,
   'y': 491.4337463378906,
   'width': 466.1817932128906,
   'height': 11.9671630859375,
   'text': 'bags, it does not likely undergo direct phase transition to explosive quark-gluon plasma. But',
   'font': {'color': [0, 0, 0, 255], 'name': 'CMR12', 'size': 11},
   'text_vi': 'bags, it does not likely undergo direct phase transition to explosive quark-gluon plasma. But'},
  {'x': 72.00051879882812,
   'y': 512.3138427734375,
   'width': 466.1374206542969,
   'height': 11.9671630859375,
   'text': 'instead, the resultant Hagedorn states are gradually altered to metastable colored quark-',
   'font': {'color': [0, 0, 0, 255], 'name': 'CMR12', 'size': 11},
   'text_vi': 'instead, the resultant Hagedorn states are gradually altered to metastable colored quark-'},
  {'x': 72.00051879882812,
   'y': 533.3138427734375,
   'width': 75.599853515625,
   'height': 11.9671630859375,
   'text': 'gluon bubbles.',
   'font': {'color': [0, 0, 0, 255], 'name': 'CMR12', 'size': 11},
   'text_vi': 'gluon bubbles.'},
  {'x': 157.9202423095703,
   'y': 533.3138427734375,
   'width': 380.1720428466797,
   'height': 11.9671630859375,
   'text': 'The metastable colored quark-gluon bags expand gradually and overlap',
   'font': {'color': [0, 0, 0, 255], 'name': 'CMR12', 'size': 11},
   'text_vi': 'The metastable colored quark-gluon bags expand gradually and overlap'},
  {'x': 72.00051879882812,
   'y': 554.1934814453125,
   'width': 466.1986389160156,
   'height': 11.9671630859375,
   'text': 'each other smoothly until the entire space is ﬁlled by giant colored (non-singlet) bags.',
   'font': {'color': [0, 0, 0, 255], 'name': 'CMR12', 'size': 11},
   'text_vi': 'each other smoothly until the entire space is ﬁlled by giant colored (non-singlet) bags.'},
  {'x': 72.00051879882812,
   'y': 575.0736083984375,
   'width': 466.1996154785156,
   'height': 11.9671630859375,
   'text': 'The resultant matter have an initial neutral color charge aftermath the phase transition.',
   'font': {'color': [0, 0, 0, 255], 'name': 'CMR12', 'size': 11},
   'text_vi': 'The resultant matter have an initial neutral color charge aftermath the phase transition.'},
  {'x': 72.00051879882812,
   'y': 596.0736083984375,
   'width': 465.9831848144531,
   'height': 11.9671630859375,
   'text': 'Therefore, the constraints of the conserved color charges must be embedded in the system',
   'font': {'color': [0, 0, 0, 255], 'name': 'CMR12', 'size': 11},
   'text_vi': 'Therefore, the constraints of the conserved color charges must be embedded in the system'},
  {'x': 72.00051879882812,
   'y': 616.9537353515625,
   'width': 466.1343688964844,
   'height': 11.9671630859375,
   'text': 'through the color chemical potentials. This kind of (color-non-singlet) matter with the mass',
   'font': {'color': [0, 0, 0, 255], 'name': 'CMR12', 'size': 11},
   'text_vi': 'through the color chemical potentials. This kind of (color-non-singlet) matter with the mass'},
  {'x': 72.00051879882812,
   'y': 637.8338623046875,
   'width': 91.90286254882812,
   'height': 11.96710205078125,
   'text': 'spectral exponent',
   'font': {'color': [0, 0, 0, 255], 'name': 'CMR12', 'size': 11},
   'text_vi': 'spectral exponent'},
  {'x': 163.90338134765625,
   'y': 637.8338623046875,
   'width': 13.212982177734375,
   'height': 11.9552001953125,
   'text': 'α',
   'font': {'color': [0, 0, 0, 255], 'name': 'CMMI12', 'size': 11},
   'text_vi': 'α'},
  {'x': 177.12045288085938,
   'y': 642.6226196289062,
   'width': 14.372756958007812,
   'height': 7.97015380859375,
   'text': 'non',
   'font': {'color': [0, 0, 0, 255], 'name': 'CMMI8', 'size': 7},
   'text_vi': 'non'},
  {'x': 191.4932098388672,
   'y': 637.8338623046875,
   'width': 346.5076446533203,
   'height': 13.76708984375,
   'text': 'undergoes a smooth cross-over phase transition to non-explosive',
   'font': {'color': [0, 0, 0, 255], 'name': 'CMR12', 'size': 11},
   'text_vi': 'undergoes a smooth cross-over phase transition to non-explosive'},
  {'x': 72.0003662109375,
   'y': 658.8338623046875,
   'width': 466.13812255859375,
   'height': 11.96710205078125,
   'text': 'quark-gluon plasma. The multi-processes mechanism in the phase transition from the low-',
   'font': {'color': [0, 0, 0, 255], 'name': 'CMR12', 'size': 11},
   'text_vi': 'quark-gluon plasma. The multi-processes mechanism in the phase transition from the low-'},
  {'x': 72.0003662109375,
   'y': 679.7138671875,
   'width': 466.24041748046875,
   'height': 11.9671630859375,
   'text': 'lying hadronic phase to the quark-gluon plasma strongly indicates the ﬂuid behaviour for the',
   'font': {'color': [0, 0, 0, 255], 'name': 'CMR12', 'size': 11},
   'text_vi': 'lying hadronic phase to the quark-gluon plasma strongly indicates the ﬂuid behaviour for the'},
  {'x': 72.0003662109375,
   'y': 700.5936279296875,
   'width': 466.31964111328125,
   'height': 11.9671630859375,
   'text': 'quark-gluon plasma. The color-singlet states for the quark-gluon bag with an orthogonal',
   'font': {'color': [0, 0, 0, 255], 'name': 'CMR12', 'size': 11},
   'text_vi': 'quark-gluon plasma. The color-singlet states for the quark-gluon bag with an orthogonal'},
  {'x': 72.0003662109375,
   'y': 721.5936279296875,
   'width': 466.0914306640625,
   'height': 11.9671630859375,
   'text': 'color representation rather than the unitary one can be interpreted as a gas of Coulomb',
   'font': {'color': [0, 0, 0, 255], 'name': 'CMR12', 'size': 11},
   'text_vi': 'color representation rather than the unitary one can be interpreted as a gas of Coulomb'},
  {'x': 293.400390625,
   'y': 755.9139404296875,
   'width': 11.72613525390625,
   'height': 11.96710205078125,
   'text': '52',
   'font': {'color': [0, 0, 0, 255], 'name': 'CMR12', 'size': 11},
   'text_vi': '52'}]}["cells"]

In [299]:
math_box_list_original = load_math_boxes(name_root)

In [300]:
#visualize_to_pdf_from_cells(cells_original, math_box_list_original, 'Math_notation.pdf')