# latex.convert

> Convert LaTeX files into Obsidian.md notes (or vice versa)

This module contains functions and methods to automatically make Obsidian notes from LaTeX files of mathematical papers, most notably those on arXiv.

See the [Potential Problems](#potential-problems) section below for some common errors that arise from this module and how to circumvent them.

In [None]:
#| default_exp latex.convert

In [None]:
#| export
import os
from os import PathLike
from pathlib import Path
import re
from typing import Optional, Union


import bs4
from pathvalidate import sanitize_filename


from trouver.helper.files_and_folders import (
    text_from_file
)
from trouver.helper.html import remove_html_tags_in_text
from trouver.helper.regex import replace_string_by_indices
from trouver.markdown.markdown.file import (
    MarkdownFile, MarkdownLineEnum
)

from trouver.latex.divide import (
    divide_preamble, divide_latex_text, get_node_from_simple_text, _is_section_node, _section_title
)
from trouver.latex.folders import(
    section_and_subsection_titles_from_latex_parts, UNTITLED_SECTION_TITLE, _part_starts_section, _part_starts_subsection
)
from trouver.latex.formatting import(
    adjust_common_syntax_to_markdown, custom_commands, remove_dollar_signs_around_equationlike_envs, replace_commands_in_text
)

from trouver.markdown.obsidian.vault import VaultNote
from trouver.markdown.obsidian.personal.index_notes import (
    convert_title_to_folder_name
)
from trouver.markdown.obsidian.personal.note_processing import process_standard_information_note
from trouver.markdown.obsidian.personal.reference import setup_folder_for_new_reference
from trouver.markdown.obsidian.vault import VaultNote

In [None]:
#| export
DEFAULT_NUMBERED_ENVIRONMENTS = ['theorem', 'corollary', 'lemma', 'proposition',
                                 'definition', 'conjecture', 'remark', 'example',
                                 'question']

In [None]:
import glob
import shutil
import tempfile


from fastcore.test import ExceptionExpected, test_eq
from pathvalidate import validate_filename

from trouver.helper.tests import _test_directory# , non_utf8_chars_in_file
from trouver.markdown.obsidian.personal.reference import (
    delete_reference_folder
)


## Potential problems

The following are some frequently problems that arise when using this module:


#### UnicodeDecodeErrors arise when reading LaTeX files

By default, the `text_from_file` method in `trouver.helper` reads files and attempts to decode them in `utf-8`. If a LaTeX file has characters that cannot be decoded into `utf-8`, then a `UnicodeDecodeError` may be raised. In this case, one can find identify these characters using the `trouver.helper.non_utf8_chars_in_file` method and modify the LaTeX file manually. It may be useful to use a text editor to jump to the positions that the characters are at and to change the encoding of the LaTeX file into `utf-8`; for example, the author of `trouver` has opened some `ANSI`-encoded LaTeX documents in `Notepad++` and converted their encoding into `UTF-8`.

#### `NoDocumentNodeErrors` arise even though the LaTeX file has a document environemt (i.e. `\begin{document}...\end{document}`)

The `find_document_node` method in this module sometimes is not able to detect the docment environment of a LaTeX file. This error is known to arise when
- there are macros (which include commands) defined that represents/expands to characters including `\begin{...}... \end{...}`. For example

In [None]:
# TODO in the above explanation, include an example.

## Setup an Obsidian vault reference

In [None]:
#| export
def _replace_custom_commands_in_parts(
        parts: list[tuple[str, str]],
        custom_commands: list[tuple[str, int, Union[str, None], str]],
        repeat_replacing_custom_commands: int
        ) -> list[tuple[str, str]]:
    return [
        (title, replace_commands_in_text(
                text, custom_commands, repeat=repeat_replacing_custom_commands))
        for title, text in parts]


In [None]:
#| hide
# TODO: test _adjust_common_syntax_to_markdown_in_parts

text = r'Let $\tConf_n$ be the universal cover of $\Conf_n$.'
parts = [('1', text)]
cust_comms = [
    ('tConf', 0, None, '\\widetilde{Conf}'),
    ('Conf', 0, None, '\\Conf'),
    ]
test_eq(
    _replace_custom_commands_in_parts(parts, cust_comms, repeat_replacing_custom_commands=1),
    [('1', 'Let $\\widetilde{Conf}_n$ be the universal cover of $\\Conf_n$.')])



In [None]:
#| export
def _adjust_common_section_titles_in_parts(
        parts: list[tuple[str, str]],
        reference_name: str):
    """Adjust common section titles in `parts`

    Common section titles include, but are not limited to,
    `Introduction`, `Notations`,
    `Conventions`, `Preliminaries`, and `Notations and Conventions`    
    
    This is a helper function for `setup_reference_from_latex_parts`.

    """
    return [(_adjusted_title(title, text, reference_name), text)
            for title, text in parts]


# TODO: also adjust title if the title is of the form
# <section_number>_<common_section_title>, e.g.
# 7_acknowledgements
# 8_references
COMMON_SECTION_TITLES = [
    'introduction', 'notations', 'conventions', 'preliminaries',
    'notations and conventions', 'definitions', 'definitions and notations',
    'references', 'acknowledgements']


def _adjusted_title(
        title: str,
        text: str,
        reference_name: str):
    """Adjust the given title"""
    node = get_node_from_simple_text(text)
    if not _is_section_node(node):
        return title
    _, section_title = _section_title(text)
    if section_title.lower() in COMMON_SECTION_TITLES:
        return f'{title}_{reference_name}'
    else:
        return title 

In [None]:
#| hide
test_eq(_adjusted_title('1. Introduction', r'\section {Introduction}', 'reference_name'), '1. Introduction_reference_name')
test_eq(_adjusted_title('2. Not a common name', r'\section{Not a common name}', 'reference_name'), '2. Not a common name')
# test_eq(_adjusted_title(UNTITLED_SECTION_TITLE))


In [None]:
#| export
def _create_notes_from_parts(
        parts: list[tuple[str, str]],
        chapters: list[list[str]],
        index_note: VaultNote, # The index note of the reference that was created.
        vault: Path,
        reference_folder: Path,
        reference_name: str,
        template_mf: MarkdownFile, # The template of the reference that was created.
        ):
    """Create notes for the vault from `parts`."""
    title_numbering_folder_map = {
        title: convert_title_to_folder_name(title)
        for section in chapters
        for title in section}

    current_section, current_subsection = chapters[0][0], '' # section/subsection titles
    # Dict of dict of list of str. The top level keys
    # are section titles and the corresponding values are dicts whose
    # keys are subsection titles and values are lists of bulleted links texts
    # of the form `- [[linke_to_note]], Title/identifying information` to add.
    links_to_make = {current_section: {current_subsection: []}}  
    for part in parts:
        current_section, current_subsection = _create_part_or_update(
            part, title_numbering_folder_map, vault, reference_folder,
            reference_name, template_mf, current_section, current_subsection,
            links_to_make)

    _make_links_in_index_notes(
        links_to_make, title_numbering_folder_map, vault,
        reference_folder, reference_name)
    

def _make_links_in_index_notes(
        links_to_make: dict[str, dict[str, list[str]]],
        title_numbering_folder_map: dict[str, tuple[str, str]],
        vault: Path,
        reference_folder: Path,
        reference_name: str,
        ):
    """Helper function of `_create_notes_from_parts`. """
    for section_title, section_dict in links_to_make.items():
        section_folder = title_numbering_folder_map[section_title]
        _make_links_in_index_note_for_section(
            section_title, section_dict, section_folder,
            vault, reference_folder, reference_name)


def _make_links_in_index_note_for_section(
        section_title: str,
        section_dict: dict[str, list[str]],
        section_folder: str,
        vault: Path,
        reference_folder: Path,
        reference_name: str):
    """Helper function of `_create_notes_from_parts`. """
    rel_path = reference_folder / section_folder / f'_index_{section_folder}.md'
    section_index_note = VaultNote(vault, rel_path=rel_path)
    mf = MarkdownFile.from_vault_note(section_index_note)
    for subsection_title, links_to_make_in_header in section_dict.items():
        mf.add_line_in_section(
            subsection_title,
            {'type': MarkdownLineEnum.UNORDERED_LIST,
             'line': '\n'.join(links_to_make_in_header)})
    mf.write(section_index_note)


def _create_part_or_update(
        part: tuple[str, str],
        title_numbering_folder_map: dict[str, tuple[str, str]],
        vault: Path,
        reference_folder: Path,
        reference_name: str,
        template_mf: MarkdownFile,
        current_section: str,
        current_subsection: str,
        links_to_make: dict[str, dict[str, list[str]]],
        ) -> tuple[str, str]:
    """
    Consider `part` for creating a new note or for updating
    `current_section` and `current_subsection`

    Also append to `links_to_make` for each note that is created.

    Helper function of `_create_notes_from_parts`
    """
    if _part_starts_section(part):
        current_section = part[0]
        current_subsection = ''
        links_to_make[current_section] = {'': []}
        folder = title_numbering_folder_map[current_section]
        # Uncomment these out to not create notes that just start a section/subsection.
        # return current_section, current_subsection
    elif _part_starts_subsection(part):
        current_subsection = part[0]
        links_to_make[current_section][current_subsection] = []
        # return current_section, current_subsection

    created_note = _create_note_for_part(
        part, title_numbering_folder_map, vault, reference_folder,
        reference_name, template_mf, current_section, current_subsection)

    _update_links_to_make(
        part, current_section, current_subsection, links_to_make,
        created_note)
    return current_section, current_subsection


def _create_note_for_part(
        part: tuple[str, str],
        title_numbering_folder_map: dict[str, tuple[str, str]],
        vault: Path,
        reference_folder: Path,
        reference_name: str,
        template_mf: MarkdownFile,
        current_section: str,
        current_subsection: str
        ) -> VaultNote: # The created VaultNote.
    """Create a note for the part"""
    note_title = sanitize_filename(part[0])
    # TODO: test the removal of these characters.
    for char in r'[]|#^\~%':
        note_title = note_title.replace(char, '')
    note_contents = part[1]
    mf = template_mf.copy(True)
    mf.add_line_in_section(
        'Topic[^1]', {'type': MarkdownLineEnum.DEFAULT, 'line': note_contents})
    mf.parts[-1]['line'] += note_title
    section_folder = title_numbering_folder_map[current_section]
    # TODO: Make it so that unique_note_name indicates an unnumbered
    # note as unnumbered.
    unique_note_name = VaultNote.unique_name(
        f"{reference_name}_{note_title}", vault)
    if current_subsection == '':
        rel_path = (
            reference_folder / section_folder / f"{unique_note_name}.md")
    else:
        subsection_folder = title_numbering_folder_map[current_subsection]
        rel_path = (
            reference_folder / section_folder / subsection_folder / f"{unique_note_name}.md")
    vn = VaultNote(vault, rel_path=rel_path)
    vn.create()
    mf.write(vn)
    return vn



def _update_links_to_make(
        part: tuple[str, str],
        current_section: str,
        current_subsection: str,
        links_to_make: dict[str, dict[str, list[str]]],
        created_note: VaultNote
        ) -> None:
    """Update `links_to_make` after note is created.
    
    Helper function of `_create_part_or_update`.
    """
    # if current_subsection is not None:
    #     current_subsection_key = current_subsection
    # else:
    #     current_subsection_key = ''
    note_title = part[0]
    links_to_make[current_section][current_subsection].append(
        f'- [[{created_note.name}]], {note_title}'
    )
    




In [None]:
#| export

# TODO: test parts without a subsection.
# TODO: somehow contents before a section are not inclued. Fix this bug.
# TODO: If section titles are completely empty, e.g. https://arxiv.org/abs/math/0212208,
# Make section titles based on reference name.
def setup_reference_from_latex_parts(
        parts: list[tuple[str, str]], # Output of `divide_latex_text`
        custom_commands: list[tuple[str, int, Union[str, None], str]], # Output of `custom_commands` applied to the preamble of the LaTeX ddocument.`
        vault: PathLike, # An Obsidian.md vault,
        location: PathLike, # The path to make the new reference folder. Relative to `vault`.
        reference_name: PathLike, # The name of the new reference.
        authors: Union[str, list[str]], # Each str is the family name of each author.
        author_folder: PathLike = '_mathematicians', # The directory where the author files are stored in. Relative to `vault`.
        create_reference_file_in_references_folder: bool = True, # If `True`, then the reference file creation is attempted within `references_folder`. Otherwise, the reference file creation is attempted at the base of the newly setup folder for the reference..
        references_folder: PathLike = '_references', # The directory where the references files are stored in. Relative to `vault`.
        create_template_file_in_templates_folder: bool = True, # If `True`, then the template file creation is attempted within `templates_folder`. Otherwise, the template file creation is attempted at the base of the newly setup folder for the reference.
        templates_folder: PathLike = '_templates', # The directory where the template files are stored in. Relative to `vault`.
        template_file_name: str = '_template_common', # The template file from which to base the template file of the new reference.
        notation_index_template_file_name: str = '_template_notation_index', # The template file from which to base the notation index file of the new reference.
        glossary_template_file_name: str = '_template_glossary', # The template file from which to base the glossary file of the new reference.
        setup_temp_folder: bool = True, # If `True`, creates a `_temp` folder with an index file. This folder serves to house notes auto-created from LaTeX text files before moving them to their correct directories. Defaults to `True`.
        make_second_template_file_in_reference_directory: bool = True, # If `True`, creates a copy of the template note within the directory for the reference.
        copy_obsidian_configs: Optional[PathLike] = '.obsidian', # The folder relative to `vault` from which to copy obsidian configs.  If `None`, then no obsidian configs are copied to the reference folder. Defaults to `.obsidian`. 
        overwrite: Union[str, None] = None, # Specifies if and how to overwrite the reference folder if it already exists.  - If `'w'`, then deletes the contents of the existing reference folder, as well as the template and reference file before setting up the reference folder before creating the new reference folder.  - If `'a'`, then overwrites the contents of the reference folder, but does not remove existing files/folders.  - If `None`, then does not modify the existing reference folder and raises a `FileExistsError`.
        confirm_overwrite: bool = True, # Specifies whether or not to confirm the deletion of the reference folder if it already exists and if `overwrite` is `'w'`. Defaults to `True`.
        verbose: bool = False,
        replace_custom_commands: bool = True, # If `True`, replace the custom commands in the text of `parts` when making the notes.
        adjust_common_latex_syntax_to_markdown: bool = True, # If `True`, apply `adjust_common_syntax_to_markdown` to the text in `parts` when making the notes.`
        repeat_replacing_custom_commands: int = 1, # The number of times to repeat replacing the custom commands throughout the text; note that some custom commands could be "nested", i.e. the custom commands are defined in terms of other custom commands. Defaults to `1`, in which custom commands are replaced throughout the entire document once. If set to any negative number (e.g. `-1``), then this function attempts to replace custom commands until no commands to replace are found. 
        ) -> None:
    """Set up a reference folder in `vault` using an output of `divide_latex_text`, create
    notes from `parts`, and link notes in index files in the reference folder.

    Assumes that

    - `parts` is derived from a LaTeX document in which
        - all of the text belongs to sections.
        - all of the sections/subsections are uniquely named
    - The template file is has a section `# Topic`
    - The last line of the template file is a footnote indicating where the note comes from.
    - There is at most one reference folder in the vault whose name is given by
      `reference_name`.

    `parts` itself is not modified, even if `replace_custom_commands` and/or
    `adjust_common_latex_syntax_to_markdown` are set to `True`.

    cf. `setup_folder_for_new_reference` for how the reference folder is set up..

    The names for the subfolders of the reference folder are the section titles, except
    for sections with common titles such as `Introduction`, `Notations`, `Conventions`,
    `Preliminaries`, and `Notations and Conventions`. This ensures that the index
    file names for sections in different reference folders do not have the same name.

    Text/parts that precede explicitly given sections are included in the 
    first section's folder and are linked in the first section's index file.
    """
    parts = _adjust_common_section_titles_in_parts(parts, reference_name)
    chapters = section_and_subsection_titles_from_latex_parts(parts)
    if chapters[0][0] == UNTITLED_SECTION_TITLE:
        chapters[0][0] = f'{reference_name} {UNTITLED_SECTION_TITLE}'
    setup_folder_for_new_reference(
        reference_name, location, authors, vault, author_folder,
        create_reference_file_in_references_folder, references_folder,
        create_template_file_in_templates_folder, templates_folder,
        template_file_name, notation_index_template_file_name, 
        glossary_template_file_name, chapters, setup_temp_folder,
        make_second_template_file_in_reference_directory,
        copy_obsidian_configs, overwrite, confirm_overwrite, verbose)
    index_note = VaultNote(
        vault, rel_path=Path(location) / reference_name / f'_index_{reference_name}.md')
    template_note = VaultNote(vault, name=f'_template_{reference_name}')
    template_mf = MarkdownFile.from_vault_note(template_note)

    if replace_custom_commands:
        parts = _replace_custom_commands_in_parts(
            parts, custom_commands, repeat_replacing_custom_commands)
    if adjust_common_latex_syntax_to_markdown:
        parts = [(title, adjust_common_syntax_to_markdown(text))
                 for title, text in parts]
    
    reference_folder = Path(location) / reference_name
    _create_notes_from_parts(
        parts,
        chapters, 
        index_note,
        vault,
        reference_folder,
        reference_name,
        template_mf)
    


In [None]:
with (tempfile.TemporaryDirectory(prefix='temp_dir', dir=os.getcwd()) as temp_dir):
    temp_vault = Path(temp_dir) / 'test_vault_5'
    shutil.copytree(_test_directory() / 'test_vault_5', temp_vault)

    dir = _test_directory() / 'latex_examples' / 'latex_example_with_untitled_subsections_setup_to_a_vault'
    sample_latex_file = dir / 'main.tex' 
    sample_latex_text = text_from_file(sample_latex_file)
    preamble, _ = divide_preamble(sample_latex_text)
    parts = divide_latex_text(sample_latex_text, dir)
    cust_comms = custom_commands(preamble)
    
    setup_reference_from_latex_parts(
        parts, cust_comms, temp_vault, Path(''),
        'test_ref',
        ['Kim', 'Kim', 'Kim'])
    # os.startfile(temp_vault)
    # input()

In [None]:
# TODO: give an example for a LaTeX document with a multiline section
# TODO: give an example for a LaTeX document with a section that must be sanitized first, e.g.
# in banwait_et_al_cnpgrg2c, there is a section of the string
# `\section{Exceptional maximal subgroups of 
# \texorpdfstring{\(\GSp_4(\ff_\ell)\)}{GSp4Fell}}`

The following example demonstrates setting up a reference folder from a latex document with significant content before any explicitly specified sections. In particular, the reference folder contains a subfolder dedicated to the content that comes before the explicitly specified sections.

In [None]:
with (tempfile.TemporaryDirectory(prefix='temp_dir', dir=os.getcwd()) as temp_dir):
    temp_vault = Path(temp_dir) / 'test_vault_5'
    shutil.copytree(_test_directory() / 'test_vault_5', temp_vault)

    dir = _test_directory() / 'latex_examples' / 'latex_example_with_content_before_sections'
    sample_latex_file = dir / 'main.tex' 
    sample_latex_text = text_from_file(sample_latex_file)
    preamble, _ = divide_preamble(sample_latex_text)
    parts = divide_latex_text(sample_latex_text, dir)
    cust_comms = custom_commands(preamble)
    
    setup_reference_from_latex_parts(
        parts, cust_comms, temp_vault, Path(''),
        'test_ref',
        ['Kim', 'Kim', 'Kim'])
    reference_folder = temp_vault / 'test_ref'

    subdirectories = list(reference_folder.glob('**'))
    relative_subdirectories = [
        os.path.relpath(subdirectory, reference_folder)
        for subdirectory in subdirectories]
    print("The following are the subdirectories of `reference_folder` (relative to `temp_vault`):")
    print(relative_subdirectories)
    assert convert_title_to_folder_name(f'test_ref {UNTITLED_SECTION_TITLE}') in relative_subdirectories

    # os.startfile(temp_vault)
    # input()

The following are the subdirectories of `reference_folder` (relative to `temp_vault`):
['.', '.obsidian', '.obsidian\\plugins', '.obsidian\\plugins\\fast-link-edit', '.obsidian\\plugins\\obsidian-vimrc-support', '1_proof_of_theorem~refthmain', '1_proof_of_theorem~refthmain\\11_this_is_a_subsection', '1_proof_of_theorem~refthmain\\12_this_is_another_subsection', 'test_ref_untitled_section', '_temp']


## Compile `Obsidian.md` vault notes into LaTeX code

As a side note, the `remove_dollar_signs_around_equationlike_envs` function is one function used to revert some markdown-formatted code into code better suited for LaTeX.

In [None]:
output = remove_dollar_signs_around_equationlike_envs(
r'''$$\begin{align*}asdf\end{align*}$$''')
print(output)
assert '$' not in output

\begin{align*}asdf\end{align*}


In [None]:
#| export
def _highlight_latex_math(latex_str):
    # Case 2: Double dollar signs
    if latex_str.startswith('$$') and latex_str.endswith('$$'):
        return f'$$\\mathcolorbox{{lightgray}}{{{latex_str[2:-2]}}}$$'
    # Case 1: Single dollar signs
    elif latex_str.startswith('$') and latex_str.endswith('$'):
        return f'\\hl{{${latex_str[1:-1]}$}}'
    # Case 3: Equation-like environments
    else:
        # env_pattern = r'\\begin\{(equation|align|displaymath|eqnarray)\}(.*?)\\end\{\1\}'
        # match = re.search(env_pattern, latex_str, re.DOTALL)
        # if match:
        #     env, content = match.groups()
            # highlighted = f'\\begin{{{env}}}\\mathcolorbox{{lightgray}}{{{content.strip()}}}\\end{{{env}}}'
        # return latex_str.replace(match.group(0), highlighted)
        highlighted = rf'''\colorbox{{lightgray}}{{
    \begin{{minipage}}{{\dimexpr\textwidth-2\fboxsep}}
    {latex_str}
    \end{{minipage}}
}}'''
        return highlighted
    
    # If none of the above cases match, return the original string
    return latex_str

In [None]:
#| hide
# Test cases
print(_highlight_latex_math('$x^2 + y^2 = z^2$'))
print(_highlight_latex_math('$$\\frac{a}{b} + c = d$$'))
print(_highlight_latex_math('\\begin{equation}E = mc^2\\end{equation}'))
print(_highlight_latex_math('\\begin{align}a &= b \\\\ c &= d\\end{align}'))

\hl{$x^2 + y^2 = z^2$}
$$\mathcolorbox{lightgray}{\frac{a}{b} + c = d}$$
\colorbox{lightgray}{
    \begin{minipage}{\dimexpr\textwidth-2\fboxsep}
    \begin{equation}E = mc^2\end{equation}
    \end{minipage}
}
\colorbox{lightgray}{
    \begin{minipage}{\dimexpr\textwidth-2\fboxsep}
    \begin{align}a &= b \\ c &= d\end{align}
    \end{minipage}
}


In [None]:
#| export
def convert_notes_to_latex_code(
        notes: list[VaultNote],
        vault: PathLike,
        preamble: str,
        ) -> str:
    """
    Compile the contents of the `VaultNote`'s into code for a LaTeX file. 
    """
    tex_code = []
    for note in notes:
        body = process_standard_information_note(
            MarkdownFile.from_vault_note(note), vault, remove_html_tags=False)
        body = str(body)
        text = _replace_html_with_latex_command_markings(body)
        tex_code.append(text)
    main_code = '\n\n'.join(tex_code)
    output_code = rf'''
% AUTOGENERATED:
% 
% The LaTeX code in this document is autogenerated using `trouver`.
% It essentially is modified from the original LaTeX document by changing
% some formatting and by "marking" definitions and notations where
% they are introduced throughout the document. However, what constitutes a
% definition/notation introduced in a mathematical text is a subjective matter
% and the marking process uses machine learning models, so the markings may
% be inaccurate. Manual corrections to inaccurate markings or syntax errors
% may be made.
%
% IT MAY BE USEFUL TO RENAME THIS DOCUMENT to have the name of the original document
% so that the bibliographical information from the `.bbl` file (if available) can be
% incorporated upon compilation.
%
% The markings are done with the `\hl` command of the `soul` package, a custom `\mathcolorbox` command
% defined using the `\colorbox` command of the `xcolor` package, and the `\colorbox` command in tandem with the `\begin{{minipage}}`
% command. These markings may cause syntax errors when this autogenerated document
% is compiled. If such syntax errors occur, one may manually correct such syntax errors; if all
% attempts at correcting such a syntax error fails, it is recommended to remove the \hl or \colorbox
% command appropriately.
%
% Note that specific LaTeX compilers (e.g. `LaTeX`, `pdfLaTeX`, `XeLaTeX`, `LuaLaTeX`)
% may be needed to properly compile this autogenerated document depending on pre-existing LaTeX code
% or packages used by the original document. Compiling this autogenerated document
% on an updated version of the compiler may also help in rendering the markings.
%
% Files that accompanied the original source files (e.g. image files, other `.tex` files)
% may also be needed to properly compile this autogenerated document. As such, it is recommended
% to compile this autogenerated document after placing it in the same folder as the original source
% file.
%

{preamble}

\usepackage{{soul}}
\usepackage{{xcolor}}
\sethlcolor{{lightgray}}
\newcommand{{\mathcolorbox}}[2]{{\colorbox{{#1}}{{$\displaystyle #2$}}}}

\begin{{document}}

{main_code}

\end{{document}}'''
    return output_code


def _replace_html_with_latex_command_markings(
        body: str) -> str:
    """
    Helper function to 
    """
    body = remove_dollar_signs_around_equationlike_envs(body)
    text, tag_data = remove_html_tags_in_text(body)
    replace_ranges = [(data_point[1], data_point[2]) for data_point in tag_data]
    replace_text = [_text_to_replace_html_with(data_point) for data_point in tag_data]
    text = replace_string_by_indices(text, replace_ranges, replace_text)
    return text


def _text_to_replace_html_with(
        tag_data: tuple[bs4.element.Tag, int, int]
        ) -> str:
    """
    """
    tag = tag_data[0]
    if 'definition' in tag.attrs:
        return f'\\colorbox{{lightgray}}{{{tag.text}}}'
    elif 'notation' in tag.attrs:
        return _highlight_latex_math(tag.text)
    else:
        return tag.text
                

In [None]:
body = r"""
Let us explain why this point of view is useful for proving homological stability for Hurwitz spaces.  In most situations where homological stability is understood, one has a sequence of (usually connected) spaces $X_n$ and stabilization maps $f_n: X_n \to X_{n+1}$; the goal is to show that each $f_n$ induces homology isomorphisms in a range of dimensions.  Let $X= \sqcup_n X_n$, and consider the homology

<span notation="" style="border-width:1px;border-style:solid;padding:3px">$$M_p = H_p(X) =\oplus_n H_p(X_n)$$</span>

Give $M_p$ the structure of a $k[x]$-module by making the indeterminate $x$ act via the stabilization map.  $M_p$ admits a grading by the number $n$, and $x$ acts as a degree 1 operator.  Homological stability is rephrased as the statement that $x$ is an isomorphism in sufficiently high degree.  Equivalently, we need the quotient and $x$-torsion

$$\begin{array}{ccc} Tor_0^{k[x]}(k, M_p) = M_p/xM_p & {\rm and} & Tor_1^{k[x]}(k, M_p) = M_p[x] \end{array}$$

to be concentrated in low degrees.
"""
print(_replace_html_with_latex_command_markings(body))


Let us explain why this point of view is useful for proving homological stability for Hurwitz spaces.  In most situations where homological stability is understood, one has a sequence of (usually connected) spaces $X_n$ and stabilization maps $f_n: X_n \to X_{n+1}$; the goal is to show that each $f_n$ induces homology isomorphisms in a range of dimensions.  Let $X= \sqcup_n X_n$, and consider the homology

$$\mathcolorbox{lightgray}{M_p = H_p(X) =\oplus_n H_p(X_n)}$$

Give $M_p$ the structure of a $k[x]$-module by making the indeterminate $x$ act via the stabilization map.  $M_p$ admits a grading by the number $n$, and $x$ acts as a degree 1 operator.  Homological stability is rephrased as the statement that $x$ is an isomorphism in sufficiently high degree.  Equivalently, we need the quotient and $x$-torsion

$$\begin{array}{ccc} Tor_0^{k[x]}(k, M_p) = M_p/xM_p & {\rm and} & Tor_1^{k[x]}(k, M_p) = M_p[x] \end{array}$$

to be concentrated in low degrees.

