# latex.convert

> Convert LaTeX files into Obsidian.md notes

This module contains functions and methods to automatically make Obsidian notes from LaTeX files of mathematical papers, most notably those on arXiv.

See the [Potential Problems](#potential-problems) section below for some common errors that arise from this module and how to circumvent them.

In [None]:
#| default_exp latex.convert

In [None]:
#| export
from collections import OrderedDict
from itertools import product
import os
from os import PathLike
from pathlib import Path
import re
from typing import Optional, Union

from pathvalidate import sanitize_filename
from pylatexenc import latexwalker, latex2text
from pylatexenc.latexwalker import (
    LatexWalker, LatexEnvironmentNode, get_default_latex_context_db,
    LatexNode, LatexSpecialsNode, LatexMathNode, LatexMacroNode, LatexCharsNode,
    LatexGroupNode, LatexCommentNode
)
from pylatexenc.latex2text import (
    MacroTextSpec, EnvironmentTextSpec)
from pylatexenc.macrospec import (
    MacroSpec, LatexContextDb, EnvironmentSpec
)
import regex

from trouver.helper import (
    find_regex_in_text, dict_with_keys_topologically_sorted,
    containing_string_priority, replace_string_by_indices, text_from_file
)
from trouver.markdown.markdown.file import (
    MarkdownFile, MarkdownLineEnum
)
from trouver.markdown.obsidian.links import ObsidianLink

from trouver.markdown.obsidian.vault import VaultNote
from trouver.markdown.obsidian.personal.index_notes import (
    correspond_headings_with_folder, convert_title_to_folder_name
)
from trouver.markdown.obsidian.personal.reference import setup_folder_for_new_reference
from trouver.markdown.obsidian.vault import VaultNote
import warnings

In [None]:
#| export
DEFAULT_NUMBERED_ENVIRONMENTS = ['theorem', 'corollary', 'lemma', 'proposition',
                                 'definition', 'conjecture', 'remark', 'example',
                                 'question']

In [None]:
import glob
import shutil
import tempfile


from fastcore.test import ExceptionExpected, test_eq
from pathvalidate import validate_filename
from trouver.helper import _test_directory# , non_utf8_chars_in_file



## Potential problems

The following are some frequently problems that arise when using this module:


#### UnicodeDecodeErrors arise when reading LaTeX files

By default, the `text_from_file` method in `trouver.helper` reads files and attempts to decode them in `utf-8`. If a LaTeX file has characters that cannot be decoded into `utf-8`, then a `UnicodeDecodeError` may be raised. In this case, one can find identify these characters using the `trouver.helper.non_utf8_chars_in_file` method and modify the LaTeX file manually. It may be useful to use a text editor to jump to the positions that the characters are at and to change the encoding of the LaTeX file into `utf-8`; for example, the author of `trouver` has opened some `ANSI`-encoded LaTeX documents in `Notepad++` and converted their encoding into `UTF-8`.

#### `NoDocumentNodeErrors` arise even though the LaTeX file has a document environemt (i.e. `\begin{document}...\end{document}`)

The `find_document_node` method in this module sometimes is not able to detect the docment environment of a LaTeX file. This error is known to arise when
- there are macros (which include commands) defined that represents/expands to characters including `\begin{...}... \end{...}`. For example

In [None]:
# TODO in the above explanation, include an example.

## LaTeX comments

In [None]:
#| export
def remove_comments(text: str) -> str:
    # Find all occurrences of the comment pattern %[^\n]*
    return re.sub(r"%[^\n]*", "", text)

In [None]:
text = r"""% Commands with parameters
\newcommand{\field}[1]{\mathbb{#1}}
\newcommand{\mat}[4]{\left[\begin{array}{cc}#1 & #2 \\
                                         #3 & #4\end{array}\right]}
\newcommand{\dual}[1]{#1^{\vee}}
\newcommand{\compl}[1]{\hat{#1}}
"""
assert '%' not in remove_comments(text)
print(remove_comments(text))

text = r"""Hi. I'm not commented. %But I am!"""
test_eq(remove_comments(text), "Hi. I'm not commented. ")


\newcommand{\field}[1]{\mathbb{#1}}
\newcommand{\mat}[4]{\left[\begin{array}{cc}#1 & #2 \\
                                         #3 & #4\end{array}\right]}
\newcommand{\dual}[1]{#1^{\vee}}
\newcommand{\compl}[1]{\hat{#1}}



## Divide LaTeX file into parts

To make Obsidian notes from a LaTeX file, I use sections/subsections, and environments as places to make new notes.

Things to think about:
Sections/subsections
environments, including theorems, corollaries, propositions, lemmas, definitions, notations
citations
Macros defined in the preamble?

LatexMacroNodes include: sections/subsections, citations, references, and labels, e.g.

```latex
> \section{Introduction}
\cite{ellenberg2nilpotent}
\subsection{The section conjecture}
\'e
\ref{fundamental-exact-sequence}
\cite{stix2010period}
\ref{fundamental-exact-sequence}
\cite{stix2012rational}
\cite[Appendix C]{stix2010period}
\subsection{The tropical section conjecture}
\label{subsec:tropical-section-conjecture}
```

#### Divide the preamble from the rest of the document

Some macros and commands defined in the preamble seem to prevent the `pylatexenc` methods from properly identifying the document environment/node in a LaTeX document. To circumvent this, we define a function to divide the preamble from the rest of the document

In [None]:
#| export
def divide_preamble(
        text: str, # LaTeX document
        document_environment_name: str = "document"
        ) -> tuple[str, str]:
    """Divide the preamble from the rest of a LaTeX document.
    """
    begin_environment_str = rf'\begin{{{document_environment_name}}}'
    pattern = re.compile(re.escape(begin_environment_str))
    match = re.search(pattern, text) 
    start_match, end_match = match.span()
    return text[:start_match], text[start_match:]

    

In [None]:
latex_file_path = _test_directory() / 'latex_examples' / 'example_with_a_command_with_begin.tex'
text = text_from_file(latex_file_path)

preamble, document = divide_preamble(text)
assert r'\begin{displaymath}' in preamble
assert r'Hyun Jong Kim' in preamble

assert r'Hyun Jong Kim' not in document
assert document.startswith(r'\begin{document}')
assert document.endswith('\\end{document}')

#### Get the Document Node

In [None]:
#| export
class NoDocumentNodeError(Exception):
    """Exception raised when a LatexEnvironmentNode corresponding to the document 
    environment is expected in a LaTeX string, but no such node exists.
    
    **Attributes**
    - text - str
        - The text in which the document environment is not found.
    """
    
    def __init__(self, text):
        self.text = text
        super().__init__(
            f"The following text does not contain a document environment:\n{text}")



In [None]:
#| export
def find_document_node(
        text: str, # LaTeX str
        document_environment_name: str = "document" # The name of the document environment.
        ) -> LatexEnvironmentNode:
    """Find the `LatexNode` object for the main document in `text`.
    
    **Raises**
    - NoDocumentNodeError
        - If document environment node is not detected.
    """
    w = LatexWalker(text)
    nodelist, _, _ = w.get_latex_nodes(pos=0)
    for node in nodelist:
        if node.isNodeType(LatexEnvironmentNode)\
                and node.environmentname == document_environment_name:
            return node
    raise NoDocumentNodeError(text)

The main content of virtually all LaTeX math articles belongs to a document environment, which pylatexenc can often detect. The `find_document_node` function returns this `LatexEnvironmentNode` object:

In [None]:
latex_file_path = _test_directory() / 'latex_examples' / 'latex_example_1' / 'main.tex'
text = text_from_file(latex_file_path)
document_node = find_document_node(text)

If the LaTeX file has no `document` environment, then a `NoDocumentNodeError` is raised:

In [None]:
# This latex document has its `document` environment commented out.
latex_file_path = _test_directory() / 'latex_examples' / 'latex_example_2' / 'main.tex'
text = text_from_file(latex_file_path)
with ExceptionExpected(NoDocumentNodeError):
    document_node = find_document_node(text)

At the time of this writinga `NoDocumentNodeError` may be raised even if the LaTeX file has a proper `document` environment

In [None]:
latex_file_path = _test_directory() / 'latex_examples' / 'example_with_a_command_with_begin.tex'
text = text_from_file(latex_file_path)

# Perhaps in the future, pylatexenc will be able to find the document node for this file.
# When that time comes, delete this example.
with ExceptionExpected(NoDocumentNodeError):
    find_document_node(text)



The `divide_preamble` function can be used to circumvent this problem:

In [None]:
preamble, document = divide_preamble(text)
document_node = find_document_node(document)
test_eq(document_node.environmentname, 'document')
assert document_node.isNodeType(LatexEnvironmentNode)

In [None]:
# hide
# Find no document node error causes

# latex_file_path = r'_tests\latex_full\litt_cfag\main.tex'
# text = text_from_file(latex_file_path)
# document_node = find_document_node(text)

### Detect environment names used in a file

In [None]:
#| export
def environment_names_used(
        text: str # LaTeX document
        ) -> set[str]: # The set of all environment names used in the main document.
    """Return the set of all environment names used in the main document
    of the latex code.
    """
    document_node = find_document_node(text)
    return {node.environmentname for node in document_node.nodelist
            if node.isNodeType(LatexEnvironmentNode)}        

Writers often use different environment names. For examples, writers often use `theorem`, `thm`, or `theo` for theorem environments or `lemma` or `lem` for lemma environments. The `environment_names_used` function returns the environment names actually used in the tex file.

In the example below, note that only the environments that are actually used are returned. For instance, the preamble of the document defines the theorem environments `problem`, and `lemma` (among other things), but these are not actually used in the document itself.

In [None]:
latex_file_path = _test_directory() / 'latex_examples' / 'has_fully_written_out_environment_names.tex'
sample_text_1 = text_from_file(latex_file_path)
sample_output_1 = environment_names_used(sample_text_1)
test_eq({'corollary', 'proof', 'maincorollary', 'abstract', 'proposition'}, sample_output_1)

The document in the example below uses shorter names for theorem environments:

In [None]:
latex_file_path = _test_directory() / 'latex_examples' / 'has_shorter_environment_names.tex'
sample_text_2 = text_from_file(latex_file_path)
sample_output_2 = environment_names_used(sample_text_2)
test_eq({'conj', 'notation', 'corollary', 'defn'}, sample_output_2)

#### Identify the numbering convention of a LaTeX document

LaTeX documents have various number conventions. Here are some examples of papers on the arXiv and notes on their numbering schemes. Note that the source code to these articles are publicly available on the arXiv. 

- Ellenberg, Venkatesh, and Westerland, *[Homological stability for Hurwitz spaces and the Cohen-Lenstra conjecture over function fields](https://arxiv.org/abs/0912.0325)*, 
    - The subsections and theorem-like environments of each section share a numbering scheme, e.g. section 1 has subsection `1.1 The Cohen-Lenstra heuristics`, `1.2 Theorem`, `1.3 Hurwitz spaces`. This is accomplished by defining theorem-like environments using the `subsection` counter, e.g.

        ```latex
        \theoremstyle{plain}
        \newtheorem{thm}[subsection]{Theorem}
        \newtheorem{prop}[subsection]{Proposition}
        \newtheorem{cor}[subsection]{Corollary}
        \newtheorem{remark}{Remark}
        \newtheorem{conj}[subsection]{Conjecture}
        \newtheorem*{conj*}{Conjecture}
         ```

        defines the `thm`, `prop`, `cor`, and `conj` environments to be numbered using the `subsection` counter, the `remark` environmment to be defiend as an unnumbered environment, and the `conj*` environment to be defined as an unnumbered environment with a different name than the `conj` environment.

    - The `\swapnumbers` command is included in the preamble to change the way that theorems are numbered in the document, e.g. the article has `1.2 Theorem` as opposed to `Theorem 1.2`.
    - The equations are numbered along the subsections - this is accomplished by the lines 

        ```latex
        \numberwithin{equation}{subsection}
        \renewcommand{\theequation}{\thesubsection.\arabic{equation}}
        ```

        in the preamble.
- Hoyois, *[A quadratic refinement of the Grothendieck-Lefschetz-Verdier Trace Formula](https://arxiv.org/abs/1309.6147)*
    - The theorem-like environments are numbered `Theorem 1.1, Theorem 1.3, Corollary 1.4, Theorem 1.5`, etc.
        - The theorem-like environments that are numbered are assigned the `equation` counter. In particular, the equation
        environments share their numberings with the theorem-like environments. For example, section 1 has Equation `(1.2)`
        - This equation counter is reset at the beginning of each section and the section number is included in the numbering via
        ```latex 
        \numberwithin{equation}{section}
        ```

In [None]:
# TODO: consider different arxiv articles to see how they are numbered

In [None]:
#| export
def _search_counters_by_pattern(
        preamble: str,
        newtheorem_regex: re.Pattern, # This is supposed to be a regex that detects and captures parameters of `\newtheorem` commands.
        counter_group: int # This depends on which `newtheorem_regex` is used, and is either 3 or 4. 
        ) -> dict[str, str]: # The 
    """
    Capture the newly defined theorem-like environment names as well as the
    counters that they belong to
    
    This is a helper function for `numbered_newtheorems_counters_in_preamble`.
    
    """
    counters = {}
    for match in newtheorem_regex.finditer(preamble):
        env_name = match.group(1)
        counter = match.group(counter_group)
        # If no counter was specified, use the environment name as the counter
        if counter is None:
            counter = env_name
        counters[env_name] = counter
    return counters

In [None]:
#| hide

# Test that the contents of the `counters_for_environments` function are detecting
# The defined commands correctly.
text = text_from_file(_test_directory() / 'latex_examples' / 'newtheorem_example.tex') 
preamble, _ = divide_preamble(text)
second_parameter_pattern = re.compile(
    # r'\\newtheorem\s*\{\s*(\w+)\s*\}\s*(\[\s*(\w+)\s*\])?\s*\{\s*(.*)\s*\}')
    r'\\newtheorem\s*\{\s*(\w+)\s*\}\s*(\[\s*(\w+)\s*\])?\s*\{\s*(.*)\s*\}(?!\s*\[\s*(\w+)\s*\])')
third_parameter_pattern = re.compile(
    r'\\newtheorem\s*\{\s*(\w+)\s*\}\s*\{\s*(.*)\s*\}\s*(\[\s*(\w+)\s*\])?')
second_results = _search_counters_by_pattern(preamble, second_parameter_pattern, 3)
third_results = _search_counters_by_pattern(preamble, third_parameter_pattern, 4)
assert 'remark' not in second_results
assert 'remark' in third_results

In [None]:
#| hide
preamble = text = r"""
\theoremstyle{plain}
\newtheorem{thm}[subsection]{Theorem}
\newtheorem{prop}[subsection]{Proposition}
\newtheorem{cor}[subsection]{Corollary}
\newtheorem{remark}{Remark}
\newtheorem{conj}[subsection]{Conjecture}
\newtheorem*{conj*}{Conjecture}
"""

second_parameter_pattern = re.compile(
    # r'\\newtheorem\s*\{\s*(\w+)\s*\}\s*(\[\s*(\w+)\s*\])?\s*\{\s*(.*)\s*\}')
    r'\\newtheorem\s*\{\s*(\w+)\s*\}\s*(\[\s*(\w+)\s*\])?\s*\{\s*(.*)\s*\}(?!\s*\[\s*(\w+)\s*\])')
third_parameter_pattern = re.compile(
    r'\\newtheorem\s*\{\s*(\w+)\s*\}\s*\{\s*(.*)\s*\}\s*(\[\s*(\w+)\s*\])?')
second_results = _search_counters_by_pattern(preamble, second_parameter_pattern, 3)
third_results = _search_counters_by_pattern(preamble, third_parameter_pattern, 4)

second_results
# third_results

{'thm': 'subsection',
 'prop': 'subsection',
 'cor': 'subsection',
 'remark': 'remark',
 'conj': 'subsection'}

In [None]:
#| export
def numbered_newtheorems_counters_in_preamble(
        document: str # The LaTeX document
        ) -> dict[str, tuple[str, Union[str, None]]]: # The keys are the command names of the environments.  The value a key is a tuple `(<counter>, <reset_by_counter>)`, where `<counter>`` is the counter that the environment belongs to, which can be custom defined or predefined in LaTeX, and `<reset_by_counter>` is a counter whose incrementation resets the # counter of the environment, if available. 
    r"""Return the dict specifying the numbered `\newtheorem` command invocations

    Assumes that

    - invocations of the `\newtheorem` command are exclusively in the
    preamble of the LaTeX document.
    - theorem-like environments are defined using the `\newtheorem` command.
    - no environments of the same name are defined twice.

    This function does not take into account `numberwithins` being used.

    This function uses two separate regex patterns, one to detect the invocations
    of `\newtheorem` in which the optional parameter is the second parameter and
    one to detect those in which the optional parameter is the third parameter.


    """
    preamble, _ = divide_preamble(document)
    preamble = remove_comments(preamble)
    # TODO: maybe use the `regex` package instead of `re` with a recursive
    # balanced-curly braces detecting regex.

    # matches `\newtheorem{theorem}{Theorem}`, `\newtheorem{proposition}[theorem]{Proposition}`
    # does not match `\newtheorem{theorem}{Theorem}[Section]`
    second_parameter_pattern = re.compile(
        # In this case, the optional parameter (if any) should not follow the newtheorem.
        r'\\newtheorem\s*\{\s*(\w+)\s*\}\s*(\[\s*(\w+)\s*\])?\s*\{\s*(.*)\s*\}(?!\s*\[\s*(\w+)\s*\])')
    # matches `\newtheorem{theorem}{Theorem}`, `\newtheorem{theorem}{Theorem}[Section]`,
    # does not match `\newtheorem{proposition}[theorem]{Proposition}`
    third_parameter_pattern = re.compile(
        r'\\newtheorem\s*\{\s*(\w+)\s*\}\s*\{\s*(.*)\s*\}\s*(\[\s*(\w+)\s*\])?')

    # TODO: return a dict whose values are tuples.
    second_results = _search_counters_by_pattern(preamble, second_parameter_pattern, 3)
    third_results = _search_counters_by_pattern(preamble, third_parameter_pattern, 4)
    to_return = {}
    for environment_name, counter in second_results.items():
        to_return[environment_name] = (counter, None)
    for environment_name, reset_counter in third_results.items():
        if environment_name in to_return:
            continue
        to_return[environment_name] = (environment_name, reset_counter)
    return to_return        

The `numbered_newtheorems_counter_in_preamble` function parses the preamble of a LaTeX document for invocations of the `newtheorem` command and returns what counters each theorem-like environment command belongs to.

In [None]:
text = text_from_file(_test_directory() / 'latex_examples' / 'newtheorem_example.tex') 
print(text)

counters = numbered_newtheorems_counters_in_preamble(text)
test_eq(counters,
    {'theorem': ('theorem', None), 'lemma': ('theorem', None), 'definition': ('theorem', None), 'corollary': ('corollary', None), 'remark': ('remark', 'theorem')})

\documentclass{article}
\usepackage{amsthm}

\newtheorem{theorem}{Theorem}
\newtheorem{lemma}[theorem]{Lemma}
\newtheorem{definition}[theorem]{Definition} % Note that `theorem`, `lemma`, and `definition` all have `theorem` as their counter.
\newtheorem{corollary}{Corollary} % Note that `corollary` has its own counter.
\newtheorem{remark}{Remark}[theorem] % `remark` has `theorem` as its counter
\newtheorem*{conjecture*}{Conjecture} % `conjecture*` has no counter

\begin{document}

\section{Introduction}

\begin{theorem}
This is Theorem 1.
\end{theorem}

\begin{lemma}
This is Lemma 2.
\end{lemma}

\begin{definition}
This is Definition 3.
\end{definition}

\end{document}


In [None]:
text = r"""
\theoremstyle{plain}
\newtheorem{thm}[subsection]{Theorem}
\newtheorem{prop}[subsection]{Proposition}
\newtheorem{cor}[subsection]{Corollary}
\newtheorem{remark}{Remark}
\newtheorem{conj}[subsection]{Conjecture}
\newtheorem*{conj*}{Conjecture}
\begin{document}
\end{document}
"""
counters = numbered_newtheorems_counters_in_preamble(text)
test_eq(
    counters,
    {'thm': ('subsection', None), 'prop': ('subsection', None), 'cor': ('subsection', None), 'remark': ('remark', None), 'conj': ('subsection', None)})

`numbered_newtheorems_counters_in_preamble` ignores commented out text:

In [None]:
text = r"""
\theoremstyle{plain}
\newtheorem{thm}[subsection]{Theorem}
\newtheorem{prop}[subsection]{Proposition}
\newtheorem{cor}[subsection]{Corollary}
% \newtheorem{remark}{Remark}
\newtheorem{conj}[subsection]{Conjecture}
\newtheorem*{conj*}{Conjecture} %\newtheorem{fakeenv}{This won't be picked up!}
\begin{document}
\end{document}
"""
counters = numbered_newtheorems_counters_in_preamble(text)
test_eq(
    counters,
    {'thm': ('subsection', None), 'prop': ('subsection', None), 'cor': ('subsection', None), 'conj': ('subsection', None)})

`numbered_newtheorems_counters_in_preamble` does not account for `\numberwithin` command invocations:

In [None]:
text = text_from_file(_test_directory() / 'latex_examples' / 'numbering_example_3_theorem_like_environments_share_counter_with_equation_and_reset_at_each_section' / 'main.tex')
print(text)
# So `numbered_newtheorems_counters_in_preamble` only considers the theorem-like
#  environemnts as being counted by 'equation'.
# Note that the command  `\numberwithin{equation}{section}` resets the equation counter
# every time the `section` counter is incremented.
test_eq(numbered_newtheorems_counters_in_preamble(text), 
        {'theorem': ('equation', None), 'proposition': ('equation', None), 'lemma': ('equation', None), 'corollary': ('equation', None), 'definition': ('equation', None), 'example': ('equation', None), 'remark': ('equation', None)})

\documentclass{amsart}
\usepackage[utf8]{inputenc}
\usepackage{amsmath, amsfonts, amssymb, amsthm, amsopn}

\numberwithin{equation}{section}

\theoremstyle{plain}
\newtheorem*{theorem*}{Theorem}
\newtheorem*{theoremA}{Theorem A}
\newtheorem*{theoremB}{Theorem B}
\newtheorem{theorem}[equation]{Theorem}
\newtheorem{proposition}[equation]{Proposition}
\newtheorem{lemma}[equation]{Lemma}
\newtheorem{corollary}[equation]{Corollary}

\theoremstyle{definition}
\newtheorem{definition}[equation]{Definition}
\newtheorem{example}[equation]{Example}
\newtheorem*{acknowledgements}{Acknowledgements}
\newtheorem*{conventions}{Conventions}

\theoremstyle{remark}
\newtheorem{remark}[equation]{Remark}

\begin{document}

\section{Introduction}

\begin{theorem}
This is Theorem 1.1. This is because the \verb|\numberwithin{equation}{section}| makes the section number included in the equation counter and because the \\
\verb|\newtheorem{theorem}[equation]{Theorem}| command makes the environment \verb|theorem

The `\newtheorem` command can be used to specify the counter of the newly defined theorem-like environment to be reset upon another counter's incrementation; for example `\newtheorem{theorem}{Theorem}[section]` specifies for a new environment named `theorem` (with display text `Theorem`) that is reset whenever the `section` counter is incremented.

In [None]:
# TODO: reimplement the numbered_newtheorems_counters_in_preamble function to
# account for this example.
text = text_from_file(_test_directory() / 'latex_examples' / 'numbering_example_7_newtheorem_command_restarts_counter_by_section' / 'main.tex') 
print(text)
# So `numbered_newtheorems_counters_in_preamble` only considers the theorem-like
#  environemnts as being counted by 'equation'.
# Note that the command  `\numberwithin{equation}{section}` resets the equation counter
# every time the `section` counter is incremented.

test_eq(numbered_newtheorems_counters_in_preamble(text), 
        {'lemma': ('theorem', None), 'theorem': ('theorem', 'section'), 'corollary': ('corollary', 'theorem'), 'proposition': ('proposition', 'section')})


% Based on an example from https://www.overleaf.com/learn/latex/Theorems_and_proofs#Numbered_theorems.2C_definitions.2C_corollaries_and_lemmas

\documentclass[12 pt]{amsart}

\newtheorem{theorem}{Theorem}[section]
\newtheorem{corollary}{Corollary}[theorem]
\newtheorem{lemma}[theorem]{Lemma}
% Note that the below invocation of \newtheorem is invalid:
% \newtheorem{proposition}[theorem]{Proposition}[section]
\newtheorem{proposition}{Proposition}[section]

\begin{document}
\section{Introduction}
Theorems can easily be defined:

\begin{theorem}
Let \(f\) be a function whose derivative exists in every point, then \(f\) is 
a continuous function.
\end{theorem}

\begin{theorem}[Pythagorean theorem]
\label{pythagorean}
This is a theorem about right triangles and can be summarised in the next 
equation 
\[ x^2 + y^2 = z^2 \]
\end{theorem}

And a consequence of theorem \ref{pythagorean} is the statement in the next 
corollary.

\begin{corollary}
There's no right rectangle whose sides measure 3c

In [None]:
#| hide
# I found a bug where the section numbering cannot handle the theorem-like environment defined like
# \newtheorem{theorem}{Theorem}[section], cf. https://arxiv.org/abs/2106.10586 and the example in
# https://www.overleaf.com/learn/latex/Theorems_and_proofs#Numbered_theorems.2C_definitions.2C_corollaries_and_lemmas
# TODO: resolve this bug and delete this cell.

In [None]:
#| export
def numberwithins_in_preamble(
        document: str # The LaTeX document
    ) -> dict[str, str]: # The keys are the first arguments of `numberwithin` invocations and the values ar ethe second arguments of `numberwithin` invocations.
    r"""Return the dict describing `numberwithin` commands invoked
    in the preamble of `document`."""
    preamble, _ = divide_preamble(document)
    preamble = remove_comments(preamble)
    pattern = regex.compile(r'\\numberwithin\s*\{\s*(\w+)\s*\}\s*\{\s*(.*)\s*\}')
    numberwithins = {}

    for match in pattern.finditer(preamble):
        environment_to_number = match.group(1)
        environment_to_count = match.group(2)
        numberwithins[environment_to_number] = environment_to_count

    

    return numberwithins

In [None]:
text = text_from_file(_test_directory() / 'latex_examples' / 'numbering_example_3_theorem_like_environments_share_counter_with_equation_and_reset_at_each_section' / 'main.tex')
print(text)
test_eq(numberwithins_in_preamble(text), {'equation': 'section'})

\documentclass{amsart}
\usepackage[utf8]{inputenc}
\usepackage{amsmath, amsfonts, amssymb, amsthm, amsopn}

\numberwithin{equation}{section}

\theoremstyle{plain}
\newtheorem*{theorem*}{Theorem}
\newtheorem*{theoremA}{Theorem A}
\newtheorem*{theoremB}{Theorem B}
\newtheorem{theorem}[equation]{Theorem}
\newtheorem{proposition}[equation]{Proposition}
\newtheorem{lemma}[equation]{Lemma}
\newtheorem{corollary}[equation]{Corollary}

\theoremstyle{definition}
\newtheorem{definition}[equation]{Definition}
\newtheorem{example}[equation]{Example}
\newtheorem*{acknowledgements}{Acknowledgements}
\newtheorem*{conventions}{Conventions}

\theoremstyle{remark}
\newtheorem{remark}[equation]{Remark}

\begin{document}

\section{Introduction}

\begin{theorem}
This is Theorem 1.1. This is because the \verb|\numberwithin{equation}{section}| makes the section number included in the equation counter and because the \\
\verb|\newtheorem{theorem}[equation]{Theorem}| command makes the environment \verb|theorem

#### Getting the display names of environment

For example, `\newtheorem{theorem}{Theorem}` defines a theorem-like environment called `theorem` whose display name is `Theorem`.

In [None]:
#| export
def display_names_of_environments(
        document: str # The LaTeX document
        ) -> dict[str, str]:  
    r"""Return the dict specifying the display names for each theorem-like
    environment.

    This function uses two separate regex patterns, one to detect the invocations
    of `\newtheorem`
    in which the optional parameter is the second parameter and one to detect
    those in which the optional parameter is the third parameter.

    Assumes that
    - invocations of the `\newtheorem` command are exclusively in the
    preamble of the LaTeX document.
    - theorem-like environments are defined using the `\newtheorem` command.
    - no environments of the same name are defined twice.

    """
    preamble, _ = divide_preamble(document)
    # matches `\newtheorem{theorem}{Theorem}`, `\newtheorem{proposition}[theorem]{Proposition}`
    # does not match `\newtheorem{theorem}{Theorem}[Section]`
    second_parameter_pattern = re.compile(
        # In this case, the optional parameter (if any) should not follow the newtheorem.
        r'\\newtheorem\*?\s*\{\s*(\w+\*?)\s*\}\s*(\[\s*(\w+)\s*\])?\s*\{\s*(.*)\s*\}(?!\s*\[\s*(\w+)\s*\])')
    # matches `\newtheorem{theorem}{Theorem}`, `\newtheorem{theorem}{Theorem}[Section]`,
    # does not match `\newtheorem{proposition}[theorem]{Proposition}`
    third_parameter_pattern = re.compile(
        r'\\newtheorem\*?\s*\{\s*(\w+\*?)\s*\}\s*\{\s*(.*)\s*\}\s*(\[\s*(\w+)\s*\])?')
    second_results = _search_display_names_by_pattern(preamble, second_parameter_pattern, 4)
    third_results = _search_display_names_by_pattern(preamble, third_parameter_pattern, 2)
    return second_results | third_results
    

def _search_display_names_by_pattern(
        preamble: str,
        newtheorem_regex: re.Pattern,
        display_name_group: int # This depends on which `newtheorem_regex` is used, and is either 3 or 4. 
        ) -> dict[str, str]:
    """
    Capture the newly defined theorem-like environment names as well as the
    counters that they belong to"""
    display_names = {}
    for match in newtheorem_regex.finditer(preamble):
        env_name = match.group(1)
        display_name = match.group(display_name_group)
        display_names[env_name] = display_name
    return display_names

In [None]:
text = text_from_file(_test_directory() / 'latex_examples' / 'newtheorem_example.tex') 
print(text)
display_names = display_names_of_environments(text)
test_eq(display_names, {'theorem': 'Theorem',
 'lemma': 'Lemma',
 'definition': 'Definition',
 'corollary': 'Corollary',
 'conjecture*': 'Conjecture',
 'remark': 'Remark'})

\documentclass{article}
\usepackage{amsthm}

\newtheorem{theorem}{Theorem}
\newtheorem{lemma}[theorem]{Lemma}
\newtheorem{definition}[theorem]{Definition} % Note that `theorem`, `lemma`, and `definition` all have `theorem` as their counter.
\newtheorem{corollary}{Corollary} % Note that `corollary` has its own counter.
\newtheorem{remark}{Remark}[theorem] % `remark` has `theorem` as its counter
\newtheorem*{conjecture*}{Conjecture} % `conjecture*` has no counter

\begin{document}

\section{Introduction}

\begin{theorem}
This is Theorem 1.
\end{theorem}

\begin{lemma}
This is Lemma 2.
\end{lemma}

\begin{definition}
This is Definition 3.
\end{definition}

\end{document}


In [None]:
text = text_from_file(_test_directory() / 'latex_examples' / 'numbering_example_7_newtheorem_command_restarts_counter_by_section' / 'main.tex') 
print(text)
display_names = display_names_of_environments(text)
test_eq(display_names,
{'theorem': 'Theorem',
 'corollary': 'Corollary',
 'lemma': 'Lemma',
 'proposition': 'Proposition',})


% Based on an example from https://www.overleaf.com/learn/latex/Theorems_and_proofs#Numbered_theorems.2C_definitions.2C_corollaries_and_lemmas

\documentclass[12 pt]{amsart}

\newtheorem{theorem}{Theorem}[section]
\newtheorem{corollary}{Corollary}[theorem]
\newtheorem{lemma}[theorem]{Lemma}
% Note that the below invocation of \newtheorem is invalid:
% \newtheorem{proposition}[theorem]{Proposition}[section]
\newtheorem{proposition}{Proposition}[section]

\begin{document}
\section{Introduction}
Theorems can easily be defined:

\begin{theorem}
Let \(f\) be a function whose derivative exists in every point, then \(f\) is 
a continuous function.
\end{theorem}

\begin{theorem}[Pythagorean theorem]
\label{pythagorean}
This is a theorem about right triangles and can be summarised in the next 
equation 
\[ x^2 + y^2 = z^2 \]
\end{theorem}

And a consequence of theorem \ref{pythagorean} is the statement in the next 
corollary.

\begin{corollary}
There's no right rectangle whose sides measure 3c

### Divide latex text into parts

In [None]:
#| export
def _setup_counters(
        numbertheorem_counters: dict[str, tuple[str, Union[str, None]]], # An output of `numbered_newtheorems_counters_in_preamble`
        ) -> dict[str, int]:
    r"""
    Return a dict whose keys are of counters in the LaTeX document and whose
    values are all `0`. These key-value pairs are used to keep track of
    the numberings of `parts`.

    One special key is the key of the empty string `''`, which counters the
    parts which do not get a numbering, i.e. for most text that lie outside
    of (numbered) environments

    """
    # TODO: replace enumerated environments with markdown enumerated lists
    # and itemizes with markdown bulleted lists

    # cf. https://www.overleaf.com/learn/latex/Counters#Default_counters_in_LaTeX
    predefined_counters = [
        'part', # Incremented each time the `\part` command is used. It is not reset automatically and casn only be reset by the user
        'chapter', # Incremeneted each time the `\chapter` command is used.
        'section', # Incremented whenever a new `\section` command is encountered
        'subsection', # Incremented whenever a new `\subsection` command is encountered, reset whenever a new `\section` command is encountered
        'subsubsection', # Incremented whenever a new `\subsubsection` command is encounted, reset whenever a new `\subsection` or `\section` command is encountered
        'paragraph', # Incremeneted whenever a new paragraph is started. Reset whenever a new `\subsubsection`, `\subsection`, or `\section` command is encounted
        'subparagraph', # Incremented each time the `\subparagraph` command is used and reset at the beginning of a new
        'page', # Incremented each time a new page is started in the document
        'equation', # Incremeneted whenever the `\begin{equation}` environment is used. 
        'figure', # Incremented whenever a new `figure` environment is encountered
        'table', # Incremeneted whenever a new `taable` environment is encountered`
        'footnote', 
        'mpfootnote',
        'enumi',
        'enumii',
        'enumiii',
        'enumiv']

    counters = {counter: 0 for _, (counter, reset_counter) in numbertheorem_counters.items()}
    for counter in predefined_counters:
        counters[counter] = 0

    counters[''] = 0
    return counters

In [None]:
sample_counters = _setup_counters(
    {'thm': ('subsection', None), 'prop': ('subsection', None), 'cor': ('subsection', None), 'remark': ('remark', None), 'conj': ('subsection', None)})
assert 'remark' in sample_counters
test_eq(sample_counters['remark'], 0)
assert 'thm' not in sample_counters  # 'thm' is an environment name, but not a counter.


In [None]:
#| export
def _setup_numberwithins(
        explicit_numberwithins: dict[str, str],
        numbertheorem_counters: dict[str, tuple[str, Union[str, None]]], # An output of `numbered_newtheorems_counters_in_preamble`.
        ) -> dict[str, str]: # The keys are counters and the values are all counters that the key is immediately numbered within.
    """
    Extracts information of counters that are reset when other counters are
    incremented.

    This is a helper function of `_setup_all_numberwithins` as well as
    `divide_latex_text`.
    """
    builtin_numberwithins = {
        'subsection': 'section',
        'subsubsection': 'subsection',
        'paragraph': 'subsubsection',
        'subparagraph': 'paragraph',
        'enumii': 'enumi',
        'enumiii': 'enumii',
        'enumiv': 'enumiii',
        'part': 'chapter',
        'appendix': 'chapter'
    }
    numberwithins = explicit_numberwithins | builtin_numberwithins

    for environmentname, (counter, reset_by_counter) in numbertheorem_counters.items():
        if reset_by_counter is None:
            continue
        numberwithins[environmentname] = reset_by_counter
    return numberwithins

    

def _setup_all_numberwithins(
        explicit_numberwithins: dict[str, str],
        numbertheorem_counters: dict[str, tuple[str, Union[str, None]]], # An output of `numbered_newtheorems_counters_in_preamble`.
        ) -> dict[str, list[str]]: # The keys are counters and the values are all counters that the key is numbered within.
    """
    This is a helper function of `divide_latex_text`.
    """
    numberwithins = _setup_numberwithins(explicit_numberwithins, numbertheorem_counters)
    all_counters = set()
    for key, value in numberwithins.items():
        all_counters.add(key)
        all_counters.add(value)
    all_numbered_withins = {counter: [] for counter in all_counters}
    for counter_1, counter_2 in product(all_counters, all_counters):
        if _is_numberedwithin(counter_1, counter_2, numberwithins):
            all_numbered_withins[counter_1].append(counter_2)
    return all_numbered_withins


def _is_numberedwithin(
        counter_1, counter_2, numberwithins: dict[str, str]
        ) -> bool:
    """Return `True` if `counter_1` is numbered within `counter_2""" 
    if counter_1 not in numberwithins:
        return False
    elif numberwithins[counter_1] == counter_2:
        return True
    return _is_numberedwithin(
        numberwithins[counter_1], counter_2, numberwithins)


In [None]:
#| hide
sample_output = _setup_all_numberwithins({'equation': 'section'}, {})
test_eq(sample_output['section'], [])
test_eq(sample_output['subsection'], ['section'])
test_eq(sample_output['equation'], ['section'])

sample_output = _setup_all_numberwithins({'theorem': 'section'}, {})
test_eq(sample_output['theorem'], ['section'])

# In case that there is a `\newtheorem` invocation that also numbers the
# theorem-like environment within some counter (e.g. `\newtheorem{theorem}{Theorem}[section]`),
# we need to make sure that it is being setup like a numberwithin:
sample_output = _setup_all_numberwithins({'equation': 'section'}, {'theorem': ('theorem', 'section')})
test_eq(sample_output['theorem'], ['section'])
test_eq(sample_output['equation'], ['section'])

# In this example, let's say that we have a `\newtheorem{theorem}{Theorem}` instead
sample_output = _setup_all_numberwithins({'equation': 'section'}, {'theorem': ('theorem', None)})
assert 'theorem' not in sample_output
test_eq(sample_output['equation'], ['section'])

In [None]:
#| export
def _unnumbered_environments(
        numbertheorem_counters: dict[str, tuple[str, Union[str, None]]], # An output of `numbered_newtheorems_counters_in_preamble`
        display_names: dict[str, str]) -> set[str]:
    r"""Return the set of unnumbered theorem-like environments defined by
    `\newtheorem`.

    This is a helper function of `divide_latex_text`.
    """
    return {environment for environment in display_names
            if environment not in numbertheorem_counters}

    

In [None]:
#| hide
sample_unnumbered_environments = _unnumbered_environments(
    {'theorem': ('theorem', None), 'lemma': ('theorem', None), 'definition': ('theorem', None), 'corollary': ('corollary', None), 'remark': ('theorem', None)},
    {'theorem': 'Theorem', 'lemma': 'Lemma', 'definition': 'Definition', 'corollary': 'Corollary', 'conjecture*': 'Conjecture', 'remark': 'Remark'} 
    )
test_eq(sample_unnumbered_environments, {'conjecture*'})

In [None]:
#| export
def _section_title(
        text: str
        ) -> tuple[bool, str]: # The bool is `True` if the section/subsection is numbered (i.e. is `section` or `subsection` as opposed to `section*` or `subsection*`). The `str` is the title of the section or subsection
    """Return the title of a section or subsection from a latex str
    and whether or not the section/subsection is numbered"""

    # Note that the `section` command has the optional argument `toc-title` which appears
    # in the table of contents, cf.
    # http://latexref.xyz/_005csection.html
    pattern = regex.compile(
        r'\\(?:section|subsection)\s*(?:\[.*\])?(\*)?\s*'
        r'\{((?>[^{}]+|\{(?2)\})*)\}',
        regex.MULTILINE
    )
    regex_search = regex.search(pattern, text)
    is_numbered = regex_search.group(1) is None
    title = regex_search.group(2)
    return is_numbered, title


In [None]:
#| hide

# subsection, no extraneous spaces
sample_section = _section_title(r"\subsection{I am a subsection}")
test_eq(sample_section, (True, 'I am a subsection'))

# section, with extraneous spaces
sample_section = _section_title(r"\section {Generating series of special divisors}")
test_eq(sample_section, (True, 'Generating series of special divisors'))

# section, unnumbered
sample_section = _section_title(r"\section*{I am an unnumbered section}")
test_eq(sample_section, (False, 'I am an unnumbered section'))

# Subsection, unnumbered, extraneous spaces
sample_section = _section_title(r"\subsection*    {I am an unnumbered section and I have extraneous spaces}")
test_eq(sample_section, (False, 'I am an unnumbered section and I have extraneous spaces'))

# Multiline section
sample_section = _section_title(
    r"""\section*    {I am a section and I have span 
    multiple lines}""")
test_eq(sample_section, (False, 'I am a section and I have span \n    multiple lines'))

# Section with curly braces
sample_section = _section_title(
    r"""\section{ Can I talk about the finite field \mathcal{F}_p in this title?
        Can I also have multiple lines? Yes I can!}"""
)
test_eq(sample_section, (True, r""" Can I talk about the finite field \mathcal{F}_p in this title?
        Can I also have multiple lines? Yes I can!"""))

# Section with table of contents
sample_section = _section_title(
    r"\section [This is a Table of contents title] {This is the section title}"
)
test_eq(sample_section, (True, r"""This is the section title"""))

# # Section, also multiline
# sample_section = _section_title(
#     r"""\section{Exceptional maximal subgroups of 
# \texorpdfstring{\(\GSp_4(\ff_\ell)\)}{GSp4Fell}}"""
# )
# sample_section[1]

In [None]:
#| export
def _is_section_node(node):
    return (node.isNodeType(LatexMacroNode)
            and node.macroname == 'section')

def _is_subsection_node(node):
    return (node.isNodeType(LatexMacroNode)
            and node.macroname == 'subsection')

def _is_environment_node(node):
    return node.isNodeType(LatexEnvironmentNode)

In [None]:
#| hide
text = r"""
\documentclass{article}

\theoremstyle{plain}
\newtheorem{theorem}{Theorem}

\begin{document}

\section{This is section 1}

\subsection{This is subsection 1.1}

\begin{theorem}
\end{theorem}

\end{document}
"""
document_node = find_document_node(text)
assert _is_section_node(document_node.nodelist[1])
assert not _is_section_node(document_node.nodelist[3])
assert not _is_section_node(document_node.nodelist[5])

assert not _is_subsection_node(document_node.nodelist[1])
assert _is_subsection_node(document_node.nodelist[3])
assert not _is_subsection_node(document_node.nodelist[5])

assert not _is_environment_node(document_node.nodelist[1])
assert not _is_environment_node(document_node.nodelist[3])
assert _is_environment_node(document_node.nodelist[5])

# for node in document_node.nodelist:
#     print('\n')
#     print(node)
#     if node.isNodeType(LatexMacroNode):
#         print(node.macroname)
#     elif node.isNodeType(LatexEnvironmentNode):
#         print(node.environmentname)


In [None]:
#| export
def _is_numbered(
        node: LatexNode,
        numbertheorem_counters: dict[str, str]
        ) -> bool:
    if _is_section_node(node) or _is_subsection_node(node):
        is_numbered, _ = _section_title(node.latex_verbatim())
        return is_numbered
    elif _is_environment_node(node):
        return node.environmentname in numbertheorem_counters
    else:
        return False

In [None]:
#| hide
text = r"""
\documentclass{article}

\theoremstyle{plain}
\newtheorem{theorem}{Theorem}
\newtheorem*{theorem*}{Theorem}

\begin{document}
\begin{theorem}
\end{theorem}
\begin{theorem*}
\end{theorem*}
\end{document}
"""
document_node = find_document_node(text)
environments_to_counters = {'theorem': 'theorem'}

assert _is_numbered(document_node.nodelist[1], environments_to_counters)
assert not _is_numbered(document_node.nodelist[2], environments_to_counters)


# Example with numberwithin specified.
text = r"""
\documentclass{article}

\numberwithin{equation}{section}

\theoremstyle{plain}
\newtheorem{theorem}[equation]{Theorem}
\newtheorem*{theorem*}{Theorem}

\begin{document}
\begin{theorem}
\end{theorem}
\begin{theorem*}
\end{theorem*}
\end{document}
"""

document_node = find_document_node(text)
environments_to_counters = {'theorem': 'section'}

assert _is_numbered(document_node.nodelist[1], environments_to_counters)
assert not _is_numbered(document_node.nodelist[2], environments_to_counters)

# Example for sections and subsections
text = r"""
\begin{document}
\section{Section 1}
\subsection*{Unnumbered section}
\end{document}
"""
document_node = find_document_node(text)
environments_to_counters = {}

assert _is_numbered(document_node.nodelist[1], environments_to_counters)
assert not _is_numbered(document_node.nodelist[2], environments_to_counters)


In [None]:
#| export
def _change_counters(
        node,
        counters,
        numbertheorem_counters: dict[str, str],
        all_numberwithins: dict[str, list[str]]
        ):
    # identify which counter to change
    # TODO
    # Take into consideration unnumbered non-environment node
    # Take into consideration unnumbered environment node
    if _is_environment_node(node):
        if node.environmentname in numbertheorem_counters:
           counter = numbertheorem_counters[node.environmentname][0]
        else:
            counter = None
    elif _is_section_node(node):
        counter = 'section'
    elif _is_subsection_node(node):
        counter = 'subsection'
    else:
        counter = None

    # Section counters seem to only reset subsection counters
    # When the section is numbered, etc., cf. `numbering_example_4...`
    # and `numbering_example_5...` in `nbs\_tests\latex_examples`
    is_numbered = _is_numbered(node, numbertheorem_counters)
    # e.g. `\numberwithin{equation}{section}`` means that `equation` is
    # numbered within `section`, i.e. `equation` is reset whenever
    # `section` is incremeneted

    # if counter is None and not _is_environment_node(node):
    #     counters[''] += 1 
    #     return

    if is_numbered:
        counters[counter] += 1
    for numbered_counter, within_counter in all_numberwithins.items():
        if counter is not None and counter in within_counter:
            counters[numbered_counter] = 0



In [None]:
#| export
def get_node_from_simple_text(
        text: str) -> LatexNode:
    """Return the (first) `LatexNode` object from a str."""
    w = LatexWalker(text)
    nodelist, _, _ = w.get_latex_nodes(pos=0)
    return nodelist[0]

In [None]:
text = r"""\begin{thm}This is a theorem. \end{thm}"""
node = get_node_from_simple_text(text)
assert isinstance(node, LatexEnvironmentNode)
test_eq(node.environmentname, 'thm')


text = r"""\begin{thm}This is a theorem. \end{thm} \begin{proof} This is a proof. It is not captured by the `get_node_from_simple_text` function \end{proof}"""
node = get_node_from_simple_text(text)
assert isinstance(node, LatexEnvironmentNode)
test_eq(node.environmentname, 'thm')

In [None]:
#| hide

text = r"""\begin{thm}This is a theorem. \end{thm}"""
node = get_node_from_simple_text(text)
# Test a theoreem being counted by its own counter.
numbertheorem_counters = {'thm': ('thm', None)}
all_numberwithins = {}
counters = {'thm': 1}
_change_counters(node, counters, numbertheorem_counters, all_numberwithins)
test_eq(counters, {'thm': 2})
# Test a theorem being countered by the equation counter.
numbertheorem_counters = {'thm': ('equation', None)}
all_numberwithins = {}
counters = {'equation': 2}
_change_counters(node, counters, numbertheorem_counters, all_numberwithins)
test_eq(counters, {'equation': 3})

text = r"""\begin{corollary}This is a corollary. \end{orollary}"""
node = get_node_from_simple_text(text)
# Test a theorem-like environment being counted by the counter of
# another theorem-like environment
numbertheorem_counters = {'corollary': ('theorem', None), 'theorem': ('theorem', None)}
all_numberwithins = {}
counters = {'theorem': 0}
_change_counters(node, counters, numbertheorem_counters, all_numberwithins)
test_eq(counters, {'theorem': 1})


# Test a theorem-like environment whose counter is numbered within
# The section counter.
# First, see what happens when a theorem is called
text = r"""\begin{theorem}This is a theorem. \end{theorem}"""
node = get_node_from_simple_text(text)
numbertheorem_counters = {'theorem': ('theorem', None)}
all_numberwithins = {'theorem': ['section']}
counters = {'section': 1, 'theorem': 0}
_change_counters(node, counters, numbertheorem_counters, all_numberwithins)
test_eq(counters, {'section': 1, 'theorem': 1})
# Next, see what happens when a new section is invoked:
text = r"""\section{New section! The theorem counter should be reset}"""
node = get_node_from_simple_text(text)
_change_counters(node, counters, numbertheorem_counters, all_numberwithins)
test_eq(counters, {'section': 2, 'theorem': 0})

# Test a theorem-like environment sharing a counter with equation
# and in turn equation is numbered within section.
text = r"""\begin{theorem}This is a theorem. \end{theorem}"""
node = get_node_from_simple_text(text)
numbertheorem_counters = {'theorem': ('equation', None)}
all_numberwithins = {'equation': ['section']}
counters = {'section': 1, 'equation': 0}
_change_counters(node, counters, numbertheorem_counters, all_numberwithins)
test_eq(counters, {'section': 1, 'equation': 1})
# Next, see what happens when a new section is invoked:
text = r"""\section{New section! The theorem counter should be reset}"""
node = get_node_from_simple_text(text)
_change_counters(node, counters, numbertheorem_counters, all_numberwithins)
test_eq(counters, {'section': 2, 'equation': 0})

# Test an unnumbered theorem-like environment counter
text = r"""\begin{thm*}This is a theorem. \end{thm*}"""
node = get_node_from_simple_text(text)
# Test a theoreem being counted by its own counter.
numbertheorem_counters = {'thm': ('thm', None)}
all_numberwithins = {}
counters = {'thm': 1}
_change_counters(node, counters, numbertheorem_counters, all_numberwithins)
test_eq(counters, {'thm': 1})

# Test a theorem-like environment sharing a counter with equation
# and in turn equation is numbered within section, but the 
# environment is unnumbered.
text = r"""\begin{theorem*}This is a theorem. \end{theorem*}"""
node = get_node_from_simple_text(text)
numbertheorem_counters = {'theorem': ('equation', None)}
all_numberwithins = {'equation': ['section']}
counters = {'section': 1, 'equation': 0}
_change_counters(node, counters, numbertheorem_counters, all_numberwithins)
test_eq(counters, {'section': 1, 'equation': 0})
# Next, see what happens when a unnumbered new section is invoked:
text = r"""\section*{New section! The theorem counter should be reset}"""
node = get_node_from_simple_text(text)
_change_counters(node, counters, numbertheorem_counters, all_numberwithins)
test_eq(counters, {'section': 1, 'equation': 0})

# Test the counter for text that does not belong to an environment
# In the current implementation of _change_counters, the '' counter
# is not actually changed.
text = r"""Just some text."""
node = get_node_from_simple_text(text)
numbertheorem_counters = {'theorem': ('equation', None)}
all_numberwithins = {'equation': ['section']}
counters = {'section': 1, 'equation': 0, '': 0}
_change_counters(node, counters, numbertheorem_counters, all_numberwithins)
test_eq(counters, {'section': 1, 'equation': 0, '': 0})

In [None]:
#| export
def _node_numbering(
        node: LatexNode,
        numbertheorem_counters: dict[str, str],
        numberwithins: dict[str, str],
        counters: dict[str, int]
        ) -> str: # Just the numbering of the node, no "section/subsection" or displayname
    if _is_section_node(node):
        counter = 'section'
    elif _is_subsection_node(node):
        counter = 'subsection'
    elif _is_environment_node(node):
        counter = numbertheorem_counters[node.environmentname][0]
    return _numbering_helper('', counter, numberwithins, counters)


def _numbering_helper(
        trailing_numbering: str,
        counter: str,
        numberwithins: dict[str, str],
        counters: dict[str, int]
        ) -> str:
    """Recurisve helper function to `_node_numbering`."""
    if counter not in numberwithins and counter not in counters:
        return trailing_numbering
    if counter not in numberwithins and counter in counters and trailing_numbering:
        return f'{counters[counter]}.{trailing_numbering}'
    if counter not in numberwithins and counter in counters and not trailing_numbering:
        return f'{counters[counter]}'

    parent_counter = numberwithins[counter]
    current_count = counters[counter]
    if not trailing_numbering:
        to_pass_to_trailing_numbering = str(current_count)
    else:
        to_pass_to_trailing_numbering = f'{current_count}.{trailing_numbering}'

    return _numbering_helper(
        to_pass_to_trailing_numbering,
        parent_counter,
        numberwithins,
        counters)
    

In [None]:
text = r"""\begin{thm}This is a theorem. \end{thm}"""
node = get_node_from_simple_text(text)
# Test a theoreem being counted by its own counter.
numbertheorem_counters = {'thm': ('thm', None)}
numberwithins = {}
counters = {'thm': 1}
sample_numbering = _node_numbering(
    node, numbertheorem_counters, numberwithins, counters)
test_eq(sample_numbering, '1')
# Test a theorem being countered by the equation counter.
numbertheorem_counters = {'thm': ('equation', None)}
numberwithins = {}
counters = {'equation': 2}
sample_numbering = _node_numbering(
    node, numbertheorem_counters, numberwithins, counters)
test_eq(sample_numbering, '2')
# Test a theorem being countered by the equation counter.
numbertheorem_counters = {'thm': ('equation', None)}
numberwithins = {}
counters = {'equation': 2}
sample_numbering = _node_numbering(
    node, numbertheorem_counters, numberwithins, counters)
test_eq(sample_numbering, '2')

text = r"""\begin{corollary}This is a corollary. \end{orollary}"""
node = get_node_from_simple_text(text)
# Test a theorem-like environment being counted by the counter of
# another theorem-like environment
numbertheorem_counters = {'corollary': ('theorem', None), 'theorem': ('theorem', None)}
numberwithins = {}
counters = {'theorem': 0}
sample_numbering = _node_numbering(
    node, numbertheorem_counters, numberwithins, counters)
test_eq(sample_numbering, '0')

# Test a theorem-like environment whose counter is numbered within
# The section counter.
# First, see what happens when a theorem is called
text = r"""\begin{theorem}This is a theorem. \end{theorem}"""
node = get_node_from_simple_text(text)
numbertheorem_counters = {'theorem': ('theorem', None)}
numberwithins = {'theorem': 'section'}
counters = {'section': 1, 'theorem': 0}
sample_numbering = _node_numbering(
    node, numbertheorem_counters, numberwithins, counters)
test_eq(sample_numbering, '1.0')

# Next, see what happens when a new section is invoked:
text = r"""\section{New section! The theorem counter should be reset}"""
node = get_node_from_simple_text(text)
sample_numbering = _node_numbering(
    node, numbertheorem_counters, numberwithins, counters)
test_eq(sample_numbering, '1')

# Test a theorem-like environment sharing a counter with equation
# and in turn equation is numbered within section.
text = r"""\begin{theorem}This is a theorem. \end{theorem}"""
node = get_node_from_simple_text(text)
numbertheorem_counters = {'theorem': ('equation', None)}
numberwithins = {'equation': 'section'}
counters = {'section': 1, 'equation': 0}
sample_numbering = _node_numbering(
    node, numbertheorem_counters, numberwithins, counters)
test_eq(sample_numbering, '1.0')
# Next, see what happens when a new section is invoked:
text = r"""\section{New section! The theorem counter should be reset}"""
node = get_node_from_simple_text(text)
sample_numbering = _node_numbering(
    node, numbertheorem_counters, numberwithins, counters)
test_eq(sample_numbering, '1')

In [None]:
#| export

def _title(
        node: LatexNode,
        numbertheorem_counters: dict[str, str],
        numberwithins: dict[str, str], # An output of _setup_numberwithins
        all_numberwithins: dict[str, list[str]], # An output of all_numberwithins
        display_names: dict[str, str],
        counters: dict[str, int],
        swap_numbers: bool):
    """Return the title of a node based on the count in
    `counters`"""
    numbered = _is_numbered(node, numbertheorem_counters)
    if _is_section_node(node) and numbered:
        _, title = _section_title(node.latex_verbatim())
        return f"{counters['section']}. {title}"
    if _is_section_node(node) and not numbered:
        _, title = _section_title(node.latex_verbatim())
        return title 

    if _is_subsection_node(node) and numbered:
        _, title = _section_title(node.latex_verbatim())
        return f"{counters['section']}.{counters['subsection']}. {title}"
    if _is_subsection_node(node) and not numbered:
        _, title = _section_title(node.latex_verbatim())
        return title

    if _is_environment_node(node):
        return _title_for_environment_node(
            node, numbertheorem_counters, numberwithins,
            display_names, counters, swap_numbers)


def _title_for_environment_node(
        node: LatexNode,
        numbertheorem_counters: dict[str, str],
        numberwithins: dict[str, list[str]],
        display_names: dict[str, str],
        counters: dict[str, int],
        swap_numbers: bool):
    """Return the title of an environment node.
    If the node is not that of an theorem-like environment, then 
    
    """
    numbered = _is_numbered(node, numbertheorem_counters)
    # TODO: see what happens when environments are numbered within
    # sections vs. subsections
    if not numbered:
        numbering = None
    else:
        numbering = _node_numbering(
            node, numbertheorem_counters, numberwithins, counters)
    
    environment = node.environmentname
    if environment in display_names:
        display_name = display_names[environment]
    else:
        display_name = environment
    if not numbered:
        return display_name
    elif swap_numbers:
        return f'{numbering}. {display_name}.'
    else:
        return f'{display_name} {numbering}.'
        

In [None]:
#| hide

# Theorem that is not numbered within anything
text = r"""\begin{thm}This is a theorem. \end{thm}"""
node = get_node_from_simple_text(text)
numbertheorem_counters = {'thm': ('thm', None)}
numberwithins = {}
all_numberwithins = {}
display_names = {'thm': 'Theorem'}
counters = {'thm': 1}
swap_numbers = False
sample_title = _title(
    node, numbertheorem_counters, numberwithins, all_numberwithins,
    display_names, counters, swap_numbers)
test_eq(sample_title, 'Theorem 1.')

swap_numbers = False
sample_title = _title(
    node, numbertheorem_counters, numberwithins, all_numberwithins,
    display_names, counters, swap_numbers)
test_eq(sample_title, 'Theorem 1.')

# Theorem that is counted by equation, which in turn is numbered within
# section
text = r"""\begin{thm}This is a theorem. \end{thm}"""
node = get_node_from_simple_text(text)
numbertheorem_counters = {'thm': ('equation', None)}
numberwithins = {'equation': 'section'}
all_numberwithins = {'equation': ['section']}
display_names = {'thm': 'Theorem'}
counters = {'equation': 1, 'section': 2}
swap_numbers = False
sample_title = _title(
    node, numbertheorem_counters, numberwithins, all_numberwithins,
    display_names, counters, swap_numbers)
test_eq(sample_title, 'Theorem 2.1.')

swap_numbers = True
sample_title = _title(
    node, numbertheorem_counters, numberwithins, all_numberwithins,
    display_names, counters, swap_numbers)
test_eq(sample_title, '2.1. Theorem.')

# Section
text = r"""\section{This is a section}"""
node = get_node_from_simple_text(text)
numbertheorem_counters = {'thm': ('equation', None)}
numberwithins = {'equation': 'section'}
all_numberwithins = {'equation': ['section']}
display_names = {'thm': 'Theorem'}
counters = {'equation': 1, 'section': 2}
swap_numbers = False
sample_title = _title(
    node, numbertheorem_counters, numberwithins, all_numberwithins,
    display_names, counters, swap_numbers)
test_eq(sample_title, '2. This is a section')

swap_numbers = True
sample_title = _title(
    node, numbertheorem_counters, numberwithins, all_numberwithins,
    display_names, counters, swap_numbers)
test_eq(sample_title, '2. This is a section')

# Subsection
text = r"""\subsection{This is a subsection}"""
node = get_node_from_simple_text(text)
numbertheorem_counters = {'thm': ('equation', None)}
numberwithins = {'equation': 'section', 'subsection': 'section'}
all_numberwithins = {'equation': ['section'], 'subsection': ['section']}
display_names = {'thm': 'Theorem'}
counters = {'equation': 1, 'section': 2, 'subsection': 3}
swap_numbers = False
sample_title = _title(
    node, numbertheorem_counters, numberwithins, all_numberwithins,
    display_names, counters, swap_numbers)
test_eq(sample_title, '2.3. This is a subsection')

swap_numbers = True
sample_title = _title(
    node, numbertheorem_counters, numberwithins, all_numberwithins,
    display_names, counters, swap_numbers)
test_eq(sample_title, '2.3. This is a subsection')

# In the case that an environment is not a theorem-like environment.
text = r"""\begin{abstract} This is an abstract \end{abstract}"""
node = get_node_from_simple_text(text)
numbertheorem_counters = {'thm': ('equation', None)}
numberwithins = {'equation': 'section', 'subsection': 'section'}
all_numberwithins = {'equation': ['section'], 'subsection': ['section']}
display_names = {'thm': 'Theorem'}
counters = {'equation': 1, 'section': 2, 'subsection': 3}
swap_numbers = False
sample_title = _title(
    node, numbertheorem_counters, numberwithins, all_numberwithins,
    display_names, counters, swap_numbers)
test_eq(sample_title, 'abstract')

# # In the case a section has multilines
# text = r"""\section{Exceptional maximal subgroups of 
# \texorpdfstring{\(\GSp_4(\ff_\ell)\)}{GSp4Fell}}"""
# node = get_node_from_simple_text(text)
# numbertheorem_counters = {'thm': ('equation', None)}
# numberwithins = {'equation': 'section', 'subsection': 'section'}
# all_numberwithins = {'equation': ['section'], 'subsection': ['section']}
# display_names = {'thm': 'Theorem'}
# counters = {'equation': 1, 'section': 2, 'subsection': 3}
# swap_numbers = False
# sample_title = _title(
#     node, numbertheorem_counters, numberwithins, all_numberwithins,
#     display_names, counters, swap_numbers)
# sample_title
# test_eq(sample_title, '2.3. This is a subsection')

In [None]:
#| export
def swap_numbers_invoked(
        preamble: str
        ) -> bool: # 
    """Returns `True` if `\swapnumbers` is in the preamble.

    Assume that a mention of `\swapnumbers` is an actual invocation.
    """
    preamble = remove_comments(preamble)
    return '\swapnumbers' in preamble

In [None]:
assert swap_numbers_invoked('\swapnumbers')
assert not swap_numbers_invoked(r'''
\documentclass{article}
\usepackage{amsthm}
%\usepackage{amsmath}

\newtheorem{theorem}{Theorem} % \swapnumbers
\newtheorem{corollary}[theorem]{Corollary}
\newtheorem{definition}[theorem]{Definition}
\newtheorem*{remark*}{Remark}''')

In [None]:
#| export
def _node_warrants_own_part(
        node, environments_to_not_divide_along: list[str],
        accumulation: str, parts: list[tuple[str, str]]) -> bool:
    """Return `True` if `node` warrants making a new part to be added in `parts`.

    This is a helper function for `_process_node`. When this function returns
    `true`, the `accumulation` should be considered for appending to `parts`
    and the node should also be appended to `parts
    and 
    """
    if _is_section_node(node) or _is_subsection_node(node):
        return True
    elif not _is_environment_node(node):
        return False
    # Is environment node from here and below.
    if len(parts) == 0 and accumulation.strip() == '':
        return True
    # elif len(parts) == 0:
    #     return node.environmentname not in environments_to_not_divide_along
    # previous_node = get_node_from_simple_text(parts[-1][1])
    # if (accumulation.strip() == ''
    #         and (_is_section_node(previous_node)
    #              or _is_subsection_node(previous_node))):
    #     return True
    return node.environmentname not in environments_to_not_divide_along

In [None]:
#| hide

# These examples are based on `numbering_example_1_consecutive_numbering_scheme`
# in `\nbs\_tests\latex_examples`.

# Test the case of accumulating text at the very beginning before any section
node = get_node_from_simple_text('\nFor this document, the `theorem` counter is not reset whenever a new section begins.\n\nA similar numbering scheme can be accomplished by importing ')
environments_to_not_divide_along = ['equation', 'equation*', 'proof', 'align', 'align*']
accumulation = ''
parts = []
assert not _node_warrants_own_part(node, environments_to_not_divide_along, accumulation, parts)

node = get_node_from_simple_text('\\verb|amsmath|')
environments_to_not_divide_along = ['equation', 'equation*', 'proof', 'align', 'align*']
accumulation = '\nFor this document, the `theorem` counter is not reset whenever a new section begins.\n\nA similar numbering scheme can be accomplished by importing '
parts = []
assert not _node_warrants_own_part(node, environments_to_not_divide_along, accumulation, parts)

node = get_node_from_simple_text(' and invoking the code ')
environments_to_not_divide_along = ['equation', 'equation*', 'proof', 'align', 'align*']
accumulation = '\nFor this document, the `theorem` counter is not reset whenever a new section begins.\n\nA similar numbering scheme can be accomplished by importing \\verb|amsmath|'
parts = []
assert not _node_warrants_own_part(node, environments_to_not_divide_along, accumulation, parts)

# Now a new section comes in, which warrants a new part.
node = get_node_from_simple_text('\\section{Introduction}')
environments_to_not_divide_along = ['equation', 'equation*', 'proof', 'align', 'align*']
accumulation = '\nFor this document, the `theorem` counter is not reset whenever a new section begins.\n\nA similar numbering scheme can be accomplished by importing \\verb|amsmath| and invoking the code \\verb|\\numberwithin{theorem}{part}| in the preamble.\n\n' 
parts = []
assert _node_warrants_own_part(node, environments_to_not_divide_along, accumulation, parts)

# Now a new theorem comes in, which also warrants a new part.
node = get_node_from_simple_text('\\begin{theorem}\nThis is Theorem 1.\n\\end{theorem}')
environments_to_not_divide_along = ['equation', 'equation*', 'proof', 'align', 'align*']
accumulation = '\n\n' 
parts = [['1', 'For this document, the `theorem` counter is not reset whenever a new section begins.\n\nA similar numbering scheme can be accomplished by importing \\verb|amsmath| and invoking the code \\verb|\\numberwithin{theorem}{part}| in the preamble.'], ['1. Introduction', '\\section{Introduction}']] 
assert _node_warrants_own_part(node, environments_to_not_divide_along, accumulation, parts)

# Test the case where text that does not belong to an envrionment occurs at the 
# very beginning, even before any sections. and an environment node makes an
# appearance.
# cf. divide_latex_example_2 in `nbs\_tests\latex_examples`.
node = get_node_from_simple_text(r"""\begin{abstract}
    This is an abstract
    \end{abstract}""")
environments_to_not_divide_along = ['equation', 'equation*', 'proof', 'align', 'align*']
accumulation = r'\maketitle\n'
parts = []
assert _node_warrants_own_part(node, environments_to_not_divide_along, accumulation, parts)

# Now test the same case except `abstract` is included in `environments_to_not_divide_along`
node = get_node_from_simple_text(r"""\begin{abstract}
    This is an abstract
    \end{abstract}""")
environments_to_not_divide_along = ['equation', 'equation*', 'proof', 'align', 'align*', 'abstract']
accumulation = r'\maketitle\n'
parts = []
assert not _node_warrants_own_part(node, environments_to_not_divide_along, accumulation, parts)

# Test the case at the beginning of a section with an enumerate node.
node = get_node_from_simple_text('\\begin{enumerate}\n  \\item Introduction 2\n\n  \\item Preliminaries $\\quad 7$\n\n\\end{enumerate}')
environments_to_not_divide_along = ['equation', 'equation*', 'proof', 'align', 'align*', 'enumerate', 'itemize']
accumulation = r'\n'
parts = [['1. CONTENTS', '\\section{CONTENTS}']]
assert not _node_warrants_own_part(node, environments_to_not_divide_along, accumulation, parts)

# Test the case where as section is immediately followed by a subsection
node = get_node_from_simple_text('\\section{Section 2}')
environments_to_not_divide_along = ['equation', 'equation*', 'proof', 'align', 'align*', 'enumerate', 'itemize']
accumulation = r'\n'
parts = [['1. Section 1', '\\section{Section 1}']]
assert _node_warrants_own_part(node, environments_to_not_divide_along, accumulation, parts)

In [None]:
#| export
def _node_is_proof_immediately_following_a_theorem_like_environment(
        node, accumulation, parts, display_names) -> bool:
    """Return `True` if `node` is that of a proof environment that immediately
    follows a theorem-like environment.
    
    This is a helper function for `_process_node`.
    """
    if not _is_environment_node(node):
        return False
    if not node.environmentname == 'proof':
        return False
    if not len(parts) > 0:
        return False
    if accumulation.strip() != '':
        return False
    previous_node = get_node_from_simple_text(parts[-1][1])
    if not _is_environment_node(previous_node):
        return False
    return previous_node.environmentname in display_names


In [None]:
#| hide

# Test basic case where node is proof node following a theorem node.
node = get_node_from_simple_text(r'\begin{proof} This is a proof \end{proof}')
accumulation = '\n\n'
parts = [['1. Section', '\\section{Section}'], ['Theorem 1.', '\\begin{thm} This is a theorem \end{thm}']]
display_names = {'thm': 1}
assert _node_is_proof_immediately_following_a_theorem_like_environment(node, accumulation, parts, display_names)

# Test basic case where node is not a proof node
node = get_node_from_simple_text('\\begin{thm} This is a theorem \end{thm}')
accumulation = '\n\n'
parts = [['1. Section', '\\section{Section}']]
display_names = {'thm': 0}
assert not _node_is_proof_immediately_following_a_theorem_like_environment(node, accumulation, parts, display_names)

# Test when node is proof node at the very beginning of a document.
node = get_node_from_simple_text(r'\begin{proof} This is a proof \end{proof}')
accumulation = '\n\n'
parts = []
display_names = {'thm': 0}
assert not _node_is_proof_immediately_following_a_theorem_like_environment(node, accumulation, parts, display_names)

# Test when node is proof node at the beginnning of a section.
node = get_node_from_simple_text(r'\begin{proof} This is a proof \end{proof}')
accumulation = '\n\n'
parts = [['1. Section', '\\section{Section}']]
display_names = {'thm': 0}
assert not _node_is_proof_immediately_following_a_theorem_like_environment(node, accumulation, parts, display_names)

# Test when node is proof node following a remark.
node = get_node_from_simple_text(r'\begin{proof} This is a proof \end{proof}')
accumulation = '\n\n'
parts = [['1. Section', '\\section{Section}'], ['Theorem 1.', '\\begin{thm} This is a theorem \end{thm}'], ['Remark', '\\begin{rem} This is an unnumbered remark \\end{rem}']]
display_names = {'thm': 1}
assert not _node_is_proof_immediately_following_a_theorem_like_environment(node, accumulation, parts, display_names)

# Test when node is proof node following some nonempty text
node = get_node_from_simple_text(r'\begin{proof} This is a proof \end{proof}')
accumulation = '\n\nSome things are being said before the proof but after the theorem.'
parts = [['1. Section', '\\section{Section}'], ['Theorem 1.', '\\begin{thm} This is a theorem \end{thm}']]
display_names = {'thm': 1}
assert not _node_is_proof_immediately_following_a_theorem_like_environment(node, accumulation, parts, display_names)

In [None]:
#| export
DEFAULT_ENVIRONMENTS_TO_NOT_DIVIDE_ALONG = [
    'equation', 'equation*', 'proof', 'align', 'align*', 'enumerate', 'itemize', 'label',
    'eqnarray', 'quote', 'tabular', 'table']
def divide_latex_text(
        document: str, 
        # environments_to_divide_along: list[str], # A list of the names of environments that warrant a new note
        # numbered_environments: list[str], # A list of the names of environments which are numbered in the latex code. 
        environments_to_not_divide_along: list[str] = DEFAULT_ENVIRONMENTS_TO_NOT_DIVIDE_ALONG, # A list of the names of the environemts along which to not make a new note, unless the environment starts a section (or the entire document).
        ) -> list[tuple[str, str]]: # Each tuple is of the form `(note_title, text)`, where `note_title` often encapsulates the note type (i.e. section/subsection/display text of a theorem-like environment) along with the numbering and `text` is the text of the part. Sometimes `title` is just a number, which means that `text` is not of a `\section` or `\subsection` command and not of a theorem-like environment.
    r"""Divide LaTeX text to convert into Obsidian.md notes.

    Assumes that the counters in the LaTeX document are either the
    predefined ones or specified by the `\newtheorem` command.

    This function does not divide out `\subsubsection`'s.

    Proof environments are assigned to the same parts their prcededing
    theorem-like environments, if available.

    TODO: Implement counters specified by `\newcounter`, cf. 
    https://www.overleaf.com/learn/latex/Counters#LaTeX_commands_for_working_with_counters.
    """
    numbertheorem_counters = numbered_newtheorems_counters_in_preamble(document)
    explicit_numberwithins = numberwithins_in_preamble(document)
    numberwithins = _setup_numberwithins(explicit_numberwithins, numbertheorem_counters)
    all_numberwithins = _setup_all_numberwithins(explicit_numberwithins, numbertheorem_counters)
    # environments_to_counters = counters_for_environments(document)
    display_names = display_names_of_environments(document)
    counters = _setup_counters(numbertheorem_counters)
    unnumbered_environments = _unnumbered_environments(
        numbertheorem_counters, display_names)
    # Eventually gets returned
    preamble, main_document = divide_preamble(document)
    document_node = find_document_node(main_document)
    swap_numbers = swap_numbers_invoked(preamble)
    parts = []
    # "Accumulates" a "part" until text that should comprise a new part is encountered
    accumulation = '' 
    for node in document_node.nodelist:
        accumulation = _process_node(
            node, environments_to_not_divide_along, accumulation,
            numbertheorem_counters,
            numberwithins, all_numberwithins, counters,
            display_names, swap_numbers, parts)
    _append_non_environment_accumulation_to_parts_if_non_empty(
        accumulation, counters, parts)
    return parts


def _process_node(
        node, environments_to_not_divide_along, accumulation,
        numbertheorem_counters,
        numberwithins, all_numberwithins, counters,
        display_names, swap_numbers, parts) -> str:
    """
    Update `accumulation`, `counter`, and `parts` based on the contents of `node`.

    Also return 'accumulation` to update it.

    This is a helper function for `divide_latex_text`.

    """
    # If node is a proof immediately following a theorem-like environment
    # Then add it to said theorem-like environment
    if _node_is_proof_immediately_following_a_theorem_like_environment(
            node, accumulation, parts, display_names):
        parts[-1][1] += node.latex_verbatim()
        return accumulation

    _change_counters(node, counters, numbertheorem_counters, numberwithins)
    if _node_warrants_own_part(
            node, environments_to_not_divide_along, accumulation, parts):
        accumulation =  _append_non_environment_accumulation_to_parts_if_non_empty(
            accumulation, counters, parts)
        
        title = _title(
            node, numbertheorem_counters, numberwithins, all_numberwithins,
            display_names, counters, swap_numbers).strip()
        title = title.replace('\n', '') 
        parts.append([title, node.latex_verbatim()])
    else:
        accumulation += node.latex_verbatim()
        # In _change_counters`, the '' counter is incremented by default.
        # This offsets the incorrectly incrementation.
    return accumulation


def _append_non_environment_accumulation_to_parts_if_non_empty(
        accumulation: str, counters, parts):
    """Append accumulation to `parts` if `accumulation` is nonempty
    and return the updated `accumulation` """
    if accumulation.strip() != '':
        counters[''] += 1
        parts.append([str(counters['']).strip(), accumulation.strip()])
        return ''
    else:
        return accumulation.strip()






In [None]:
# TODO: explain examples

In [None]:
file = _test_directory() / 'latex_examples' / 'numbering_example_6' / 'main.tex'
sample_latex_text = text_from_file(file)
preamble, document = divide_preamble(sample_latex_text)
parts = divide_latex_text(sample_latex_text)

In [None]:
file = _test_directory() / 'latex_examples' / 'divide_latex_example_proof_preceded_by_theorem' / 'main.tex'
sample_latex_text = text_from_file(file)
parts = divide_latex_text(sample_latex_text)
print(parts)
test_eq(len(parts), 2)

[['1. Some section', '\\section{Some section}'], ['Theorem 1.', '\\begin{theorem}\nThis is a theorem.\n\\end{theorem}\\begin{proof}\nThis is a proof\n\\end{proof}']]


In [None]:
# sample_latex_file = Path(r'C:\Users\hyunj\Documents\Math\latex_image_data\latex_full\ellenberg_venkatesh_westerland_hshsclhff\main.tex')
# sample_latex_text = text_from_file(sample_latex_file)
# preamble, document = divide_preamble(sample_latex_text)
# parts = divide_latex_text(sample_latex_text)

In [None]:
file = _test_directory() / 'latex_examples' / 'divide_latex_example_text_preceded_by_undivided_environment' / 'main.tex'
sample_latex_text = text_from_file(file)
preamble, document = divide_preamble(sample_latex_text)
parts = divide_latex_text(document)
print(parts)
test_eq(len(parts), 2)

[['1. CONTENTS', '\\section{CONTENTS}'], ['1', "\\begin{enumerate}\n  \\item Introduction 2\n\n  \\item Preliminaries $\\quad 7$\n\n\\end{enumerate}\n2.1. Categorical preliminaries $\\quad 7$\n\n2.2. On the motivic Spanier-Whitehead category and Milnor-Witt K-theory 8\n\n2.3. $\\mathbb{A}^{1}$-derived category and $\\mathbb{A}^{1}$-homology 9\n\n\\begin{enumerate}\n  \\setcounter{enumi}{3}\n  \\item $\\mathbb{A}^{1}$-Spanier-Whitehead category of cellular smooth schemes 11\n\\end{enumerate}\n3.1. Cellular schemes 12\n\n3.2. Cellular Spanier-Whitehead category 13\n\n\\begin{enumerate}\n  \\setcounter{enumi}{4}\n  \\item The cellular homology of Morel-Sawant on cellular Thom spaces 14\n\n  \\item Spanier-Whitehead cellular complex 18\n\n\\end{enumerate}\n5.1. Definitions and basic properties 18\n\n5.2. Endomorphisms, traces, and characteristic polynomials 19\n\n5.3. Cellular Grothendieck-Lefschetz Trace Formula 21\n\n\\begin{enumerate}\n  \\setcounter{enumi}{6}\n  \\item Rationality of t

In [None]:
file = _test_directory() / 'latex_examples' / 'divide_latex_example_2' / 'main.tex'
sample_latex_text = text_from_file(file)
preamble, document = divide_preamble(sample_latex_text)
parts = divide_latex_text(document)
print(parts)

[['1', '\\maketitle'], ['abstract', '\\begin{abstract}\nThis is an abstract\n\\end{abstract}']]


The `divide_latex_text` function divides latex text 

In [None]:
file = _test_directory() / 'latex_examples' / 'numbering_example_1_consecutive_numbering_scheme' / 'main.tex'
text = text_from_file(file)
sample_output = divide_latex_text(text)
print(sample_output)
assert sample_output[0][0] == '1'
assert sample_output[1][0] == '1. Introduction'
assert sample_output[2][0] == 'Theorem 1.'
assert sample_output[3][0] == 'Corollary 2.'
assert sample_output[4][0] == 'Remark'


[['1', 'For this document, the `theorem` counter is not reset whenever a new section begins.\n\nA similar numbering scheme can be accomplished by importing \\verb|amsmath| and invoking the code \\verb|\\numberwithin{theorem}{part}| in the preamble.'], ['1. Introduction', '\\section{Introduction}'], ['Theorem 1.', '\\begin{theorem}\nThis is Theorem 1.\n\\end{theorem}'], ['Corollary 2.', '\\begin{corollary}\nThis is Corollary 2.\n\\end{corollary}'], ['Remark', '\\begin{remark*}\nThis is a remark. It is unnumbered and it does not affect the numberings of other environments.\n\\end{remark*}'], ['Definition 3.', '\\begin{definition}\nThis is Definition 3.\n\\end{definition}'], ['2. Another Section', '\\section{Another Section}'], ['Theorem 4.', '\\begin{theorem}\nThis is Theorem 4.\n\\end{theorem}'], ['2', 'And we might get a corollary!'], ['Corollary 5.', '\\begin{corollary}\nThis is Corollary 5.\n\\end{corollary}'], ['Definition 6.', '\\begin{definition}\nThis is Definition 6.\n\\end{defi

In [None]:
file = _test_directory() / 'latex_examples' / 'numbering_example_2_numbering_scheme_reset_at_each_section' / 'main.tex'
text = text_from_file(file)
print(divide_latex_text(text))

[['1', 'This document resets its `theorem` counter whenever a new section begins.'], ['1. Introduction', '\\section{Introduction}'], ['Theorem 1.1.', '\\begin{theorem}\nThis is Theorem 1.1.\n\\end{theorem}'], ['Corollary 1.2.', '\\begin{corollary}\nThis is Corollary 1.2.\n\\end{corollary}'], ['Remark', '\\begin{remark*}\nThis is a remark. It is unnumbered and it does not affect the numberings of other environments.\n\\end{remark*}'], ['Definition 1.3.', '\\begin{definition}\nThis is Definition 1.3.\n\\end{definition}'], ['2. Another Section', '\\section{Another Section}'], ['Theorem 2.1.', '\\begin{theorem}\nThis is Theorem 2.1.\n\\end{theorem}'], ['Corollary 2.2.', '\\begin{corollary}\nThis is Corollary 2.2.\n\\end{corollary}'], ['Definition 2.3.', '\\begin{definition}\nThis is Definition 2.3.\n\\end{definition}']]


In [None]:
file = _test_directory() / 'latex_examples' / 'numbering_example_3_theorem_like_environments_share_counter_with_equation_and_reset_at_each_section' / 'main.tex'
text = text_from_file(file)
print(divide_latex_text(text))

[['1. Introduction', '\\section{Introduction}'], ['Theorem 1.1.', '\\begin{theorem}\nThis is Theorem 1.1. This is because the \\verb|\\numberwithin{equation}{section}| makes the section number included in the equation counter and because the \\\\\n\\verb|\\newtheorem{theorem}[equation]{Theorem}| command makes the environment \\verb|theorem| be counted by the equation counter.\n\\end{theorem}'], ['1', 'The following makes an equation labeled 1.2; \n\\begin{equation}\n5 + 7 = 12\n\\end{equation}'], ['Theorem', '\\begin{theorem*}\nThis Theorem is unnumbered\n\\end{theorem*}'], ['Corollary 1.2.', '\\begin{corollary}\nThis is Corollary 1.3.\n\\end{corollary}'], ['2. Another section', '\\section{Another section}'], ['Theorem 2.1.', '\\begin{theorem}\nThis is theorem 2.1\n\\end{theorem}'], ['2', 'The following is labeled 2.2:\n\\begin{equation}\n3+5 = 8.\n\\end{equation}']]


In [None]:
file = _test_directory() / 'latex_examples' / 'numbering_example_4_unnumbered_section' / 'main.tex'
text = text_from_file(file)
print(divide_latex_text(text))

[['1. This is section 1', '\\section{This is section 1}'], ['Theorem 1.1.', '\\begin{theorem}\nThis is Theorem 1.1.\n\\end{theorem}'], ['1.1. This is a subsection 1.1', '\\subsection{This is a subsection 1.1}'], ['1', 'The following makes an equation labeled 1; \n\\begin{equation}\n5 + 7 = 12\n\\end{equation}'], ['Theorem', '\\begin{theorem*}\nThis Theorem is unnumbered\n\\end{theorem*}'], ['1.2. This is subsection 1.2', '\\subsection{This is subsection 1.2}'], ['Corollary 1.2.', '\\begin{corollary}\nThis is Corollary 1.2.\n\\end{corollary}'], ['Unnumbered section', '\\section*{Unnumbered section}'], ['1.1. This is subsection 1.3', '\\subsection{This is subsection 1.3}'], ['2', '\\subsubsection{This is subsubsection 1.3.1}'], ['Theorem 1.1.', '\\begin{theorem}\nThis is Theorem 1.3.\n\\end{theorem}'], ['2. Another section', '\\section{Another section}'], ['2.1. This is subsection 2.1', '\\subsection{This is subsection 2.1}'], ['Theorem 2.1.', '\\begin{theorem}\nThis is Theorem 2.1\n\\en

In [None]:
file = _test_directory() / 'latex_examples' / 'numbering_example_5_subsections_and_theorem_like_environments_share_counter' / 'main.tex'
text = text_from_file(file)
sample_output = divide_latex_text(text)
print(divide_latex_text(text))
test_eq(sample_output[4][0], '1. Remark.')
test_eq(sample_output[5][0], 'Remark')


[['1. This is section 1', '\\section{This is section 1}'], ['1.1. Theorem.', '\\begin{thm}\nThis is 1.1. Theorem. Note that the \\verb|\\swapnumbers| command is invoked in the preamble.\n\\end{thm}'], ['1.2. This is 1.2. subsection.', '\\subsection{This is 1.2. subsection.}'], ['1', 'Note that the equation counter is numbered within the subsection counter and that the theorem-like environments are numbered with the equation counter.\n\n\\subsubsection{This is 1.2.1. Subsubsection}'], ['1. Remark.', '\\begin{remark}\nThis is an 1. Remark. Note that \\verb|\\remark| has a counter separate from those of many of the other theorem-like environments.\n\\end{remark}'], ['Remark', '\\begin{rem*}\nThis is an unnumbered Remark.\n\\end{rem*}'], ['1.3. Proposition.', '\\begin{prop}\nThis is 1.3. Proposition.\n\\end{prop}'], ['Unnumbered section', '\\section*{Unnumbered section}'], ['1.1. Theorem.', '\\begin{thm}\nThis is 1.4. Theorem.\n\\end{thm}'], ['2. This is Section 2', '\\section{This is Sect

Note that part titles are stripped and are single-lined:

In [None]:
# TODO: fill in the following example
# part = parts[...]
# assert part[0].strip() == part[0]

In [None]:
# TODO: example with a multilined section title forced to single-lined:
# e.g. `\section{Exceptional maximal subgroups of 
# \texorpdfstring{\(\GSp_4(\ff_\ell)\)}{GSp4Fell}}`


In [None]:
# TODO: Find a list of environment names commonly used.

In [None]:
# TODO: examples with different numbering convention and different numbered environments

In [None]:
# TODO: make numbering_convention work correctly.
# Here are some latex files with different conventions:
# - All subsections in a section share numbering, 
#   - achter_pries_imht https://arxiv.org/abs/math/0608038: e.g. Lemmas 2.1, 2.2, 2.3 are in subsection 2.2 and Lemma 2.4 and Remark 2.5 are in subsection 2.4.as_integer_ratio
#   - pauli_wickelgren https://arxiv.org/abs/2010.09374: e.g. Example 3.5, 3.11 are in subsubsection 3.3.2, Exercise 4.1, Remark 4.2, are in subsection 4.1, Theorem 4.3 is in subsection 4.2, Theorem 4.4 is in subsection 4.3
# - Different environment types have different counts and the counts do not show the section number.
#   - vankataramana_imbrd https://arxiv.org/abs/1205.6543: 
#       - e.g. section 1 has Theorem 1, Remark 1, Remark 2, Remark 3, subsection 1.1.3 has Remark 4, Subsection 2.2 has Definition 1

## Identify sections and subsections to make folders for a reference.

In [None]:
#| export
def _part_starts_section(
        part: tuple[str, str]):
    """
    Return `True` if `part` starts a section (explicitly),
    cf. `divide_latex_text`.
    """
    return part[1].startswith(r'\section')
    # node = get_node_from_simple_text(part[1])
    # return _is_section_node(node)


def _part_starts_subsection(
        part: tuple[str, str]):
    """Return `True` if `part` starts a subsection, cf. `divide_latex_text`."""
    return part[1].startswith(r'\subsection')
    # node = get_node_from_simple_text(part[1])
    # return _is_subsection_node(node)

In [None]:
#| hide
part = ['1. This is section 1', '\\section{This is section 1}']
assert _part_starts_section(part)
assert not _part_starts_subsection(part)
part = ['1.2. This is 1.2 subsection.', '\\subsection{This is 1.2 subsection.}']
assert not _part_starts_section(part)
assert _part_starts_subsection(part)
part = ['1', 'Note that the equation counter is numbered within the subsection counter and that the theorem-like environments are numbered with the equation counter.\n\n\\subsubsection{This is 1.2.1 Subsubsection}']
assert not _part_starts_section(part)
assert not _part_starts_subsection(part)
part = ['1. Remark.', '\\begin{remark}\nThis is an unnumbered remark.\n\\end{remark}']
assert not _part_starts_section(part)
assert not _part_starts_subsection(part)
part = ['Remark', '\\begin{rem*}\nThis is an unnumbered Remark.\n\\end{rem*}']
assert not _part_starts_section(part)
assert not _part_starts_subsection(part)

In [None]:
#| export
UNTITLED_SECTION_TITLE = 'Untitled Section'
def section_and_subsection_titles_from_latex_parts(
        parts: list[tuple[str, str]], # An output of `divide_latex_text`
        # verbose_sections: bool = False, # 
        # short_subsections: bool = False,
        # section_name: str = 'section',
        # subsection_name: str = 'subsection')\
        ) -> list[list[str]]: # Each list corresponds to a section. The first entry of the list is the title of the section and the other entries are the titles of the subsections. 
    """
    Return a list of lists of titles for the sections and subsections in `parts`

    Unnumbered sections get their own list. Unnumbered subsections are also included in lists.
    All the titles are striped (of leading and trailing whitespaces).
    """
    sections_and_subsections = []
    for part in parts:
       _consider_part_to_add(part, sections_and_subsections) 
    return sections_and_subsections


def _consider_part_to_add(
        part: list[tuple[str, str]],
        sections_and_subsections: list[list[str]]):
    """Add the title of `part` to `sections_and_subsections`
    if `part` starts a section or subsection."""
    title = part[0].strip()
    if _part_starts_section(part):
        sections_and_subsections.append([title])
    elif _part_starts_subsection(part):
        sections_and_subsections[-1].append(title)
    elif not sections_and_subsections:
        # If sections and subsections is empty and the very first `part`
        # does not explicitly start a section, then we are in an untitled
        # section.
        sections_and_subsections.append([UNTITLED_SECTION_TITLE])
        


In the following example, the Environments are numbered Theorem 1, Corollary 2, Definition 3, etc.
Also note that there is some content before the very first (explicitly defined) section, so there is a section given by the `UNTITLED_SECTION_TITLE` constant.

In [None]:
parts = [
    ['1', 'For this document, the `theorem` counter is not reset whenever a new section begins.\n\nA similar numbering scheme can be accomplished by importing \\verb|amsmath| and invoking the code \\verb|\\numberwithin{theorem}{part}| in the preamble.'],
    ['1. Introduction', '\\section{Introduction}'],
    ['Theorem 1.', '\\begin{theorem}\nThis is Theorem 1.\n\\end{theorem}'],
    ['Corollary 2.', '\\begin{corollary}\nThis is Corollary 2.\n\\end{corollary}'],
    ['Remark', '\\begin{remark*}\nThis is a remark. It is unnumbered and it does not affect the numberings of other environments.\n\\end{remark*}'],
    ['Definition 3.', '\\begin{definition}\nThis is Definition 3.\n\\end{definition}'],
    ['2. Another Section', '\\section{Another Section}'],
    ['Theorem 4.', '\\begin{theorem}\nThis is Theorem 4.\n\\end{theorem}'], ['2', 'And we might get a corollary!'],
    ['Corollary 5.', '\\begin{corollary}\nThis is Corollary 5.\n\\end{corollary}'],
    ['Definition 6.', '\\begin{definition}\nThis is Definition 6.\n\\end{definition}']]
sample_output = section_and_subsection_titles_from_latex_parts(parts)
test_eq(sample_output, [[UNTITLED_SECTION_TITLE], ['1. Introduction'], ['2. Another Section']])

In contrast, the following example has environments numbered by sections:

In [None]:
parts = [
    ['1', 'This document resets its `theorem` counter whenever a new section begins.'], 
    ['1. Introduction', '\\section{Introduction}'],
    ['Theorem 1.1.', '\\begin{theorem}\nThis is Theorem 1.1.\n\\end{theorem}'],
    ['Corollary 1.2.', '\\begin{corollary}\nThis is Corollary 1.2.\n\\end{corollary}'],
    ['Remark', '\\begin{remark*}\nThis is a remark. It is unnumbered and it does not affect the numberings of other environments.\n\\end{remark*}'],
    ['Definition 1.3.', '\\begin{definition}\nThis is Definition 1.3.\n\\end{definition}'],
    ['2. Another Section', '\\section{Another Section}'],
    ['Theorem 2.1.', '\\begin{theorem}\nThis is Theorem 2.1.\n\\end{theorem}'],
    ['Corollary 2.2.', '\\begin{corollary}\nThis is Corollary 2.2.\n\\end{corollary}'],
    ['Definition 2.3.', '\\begin{definition}\nThis is Definition 2.3.\n\\end{definition}']]
test_eq(sample_output, [[UNTITLED_SECTION_TITLE], ['1. Introduction'], ['2. Another Section']])

The below example is derived from a LaTeX document in which significant content is present before any particular sections. See the `nbs\_tests\latex_examples\latex_example_with_content_before_sections` folder. Also see https://arxiv.org/abs/1111.3607 for an example of a paper with significant content priori to any explicitly defined sections.

In [None]:
parts = [
    ['abstract', "\\begin{abstract}\nI'm an abstract\n\\end{abstract}"],
    ['1', '\\maketitle\n\nI want to talk about things but notice that this part does not belong to a section!'],
    ['Theorem 1.', "\\begin{theorem}\\label{th:some_theorem}\nI'm a theorem.\n\\end{theorem}"],
    ['2', 'Blah blah blah'],
    ['Theorem 2.', '\\begin{theorem}\\label{th:some_other_theorem}\nImpart me with mathematical knowledge!\n\\end{theorem}'],
    ['3', 'Maybe a corollary'],
    ['Corollary 3.', '\\begin{corollary}\\label{cor:a_corollary}\nI immediately follow from the above theorem.\n\\end{corollary}'],
    ['4', 'More stuff!'],
    ['Corollary 4.', '\\begin{corollary}\\label{cor:another_corollary}\nMore delicious mathematical knowledge.\n\\end{corollary}'],
    ['5', 'Maybe you could describe how we demonstrate this corollary.'],
    ['1. Proof of Theorem~\\ref{th:main}', '\\section{Proof of Theorem~\\ref{th:main}}'],
    ['6', 'Now this is finally in a section.'],
    ['Lemma 5.', '\\begin{lemma}\nSome lemma\n\\end{lemma}\\begin{proof}\nMaximum effort!\n\\end{proof}'],
    ['7', 'Blah blah blah.'],
    ['1.1. This is a subsection', '\\subsection{This is a subsection}'],
    ['8', "I'm about one thing."],
    ['1.2. This is another subsection', '\\subsection{This is another subsection}'],
    ['9', "I'm about another thing."]] 
sample_output = section_and_subsection_titles_from_latex_parts(parts)
test_eq(sample_output, [[UNTITLED_SECTION_TITLE], ['1. Proof of Theorem~\\ref{th:main}', '1.1. This is a subsection', '1.2. This is another subsection']])

The below example is derived from a LaTeX document with a `\numberwithin{equation}{subsection}` in which the theorem-like environments are numbered with the `equation` counter. In particular, theorem-like environments and subsections are counted together.

Also, note that the below example starts with an explicitly defined section, so there is no section given by the `UNTITLED_SECTION_TITLE` constant.

In [None]:
parts = [
    ['1. This is section 1', '\\section{This is section 1}'],
    ['1.1. Theorem.', '\\begin{thm}\nThis is 1.1. Theorem. Note that the \\verb|\\swapnumbers| command is invoked in the preamble.\n\\end{thm}'],
    ['1.2. This is 1.2. subsection.', '\\subsection{This is 1.2. subsection.}'],
    ['1', 'Note that the equation counter is numbered within the subsection counter and that the theorem-like environments are numbered with the equation counter.\n\n\\subsubsection{This is 1.2.1. Subsubsection}'],
    ['1. Remark.', '\\begin{remark}\nThis is an 1. Remark. Note that \\verb|\\remark| has a counter separate from those of many of the other theorem-like environments.\n\\end{remark}'],
    ['Remark', '\\begin{rem*}\nThis is an unnumbered Remark.\n\\end{rem*}'],
    ['1.3. Proposition.', '\\begin{prop}\nThis is 1.3. Proposition.\n\\end{prop}'],
    ['Unnumbered section', '\\section*{Unnumbered section}'],
    ['1.1. Theorem.', '\\begin{thm}\nThis is 1.4. Theorem.\n\\end{thm}'],
    ['2. This is Section 2', '\\section{This is Section 2}'],
    ['2.1. Theorem.', '\\begin{thm}\nThis is 2.1. Theorem\n\\end{thm}']]
sample_output = section_and_subsection_titles_from_latex_parts(parts)
test_eq(sample_output, [['1. This is section 1', '1.2. This is 1.2. subsection.'], ['Unnumbered section'], ['2. This is Section 2']])


The titles of the sections are stripped of their leading and trailing whitespaces (if available)

In [None]:
# The below example makes sure that titles are stripped
parts = [
    ['   1. Section with an unnumbered subsection   ', '\\section{Section with an unnumbered subsection}'],
    ['1', 'This is a section with an unnumbered subsection'],
    ['1.1. ', '\\subsection{}']
]
sample_output = section_and_subsection_titles_from_latex_parts(parts)
test_eq(sample_output, [['1. Section with an unnumbered subsection', '1.1.']])



In [None]:
file = _test_directory() / 'latex_examples' / 'latex_example_with_plenty_of_sections_and_subsections' / 'main.tex'
text = text_from_file(file)
parts = divide_latex_text(text) 
print(parts)
sample_output = section_and_subsection_titles_from_latex_parts(parts)
test_eq(sample_output,
        [['1. This is section 1', '1.1. This is section 1.1', '1.2. This is section 1.2'],
         ['2. This is section 2'],
         ['3. This is section 3', '3.1. This is section 3.1', '3.2. This is section 3.2', '3.3. This is section 3.3', '3.4. This is section 3.4']])

[['1. This is section 1', '\\section{This is section 1}'], ['1.1. This is section 1.1', '\\subsection{This is section 1.1}'], ['1.2. This is section 1.2', '\\subsection{This is section 1.2}'], ['2. This is section 2', '\\section{This is section 2}'], ['3. This is section 3', '\\section{This is section 3}'], ['3.1. This is section 3.1', '\\subsection{This is section 3.1}'], ['3.2. This is section 3.2', '\\subsection{This is section 3.2}'], ['3.3. This is section 3.3', '\\subsection{This is section 3.3}'], ['3.4. This is section 3.4', '\\subsection{This is section 3.4}']]


## Formatting modifications

### Identify macros and commands to replace

Authors usually define a lot of custom commands and macros in their LaTeX files. Such customizations vary from author to author and most customized commands are not recognized by Obsidian. 

See `nbs/_tests/latex_examples/commands_example/main.tex` for some examples of custom commands.

In [None]:
#| export
def _argument_detection(group_num: int) -> str:
    """
    Helper function to `regex_pattern_detecting_command`, and `_commands_from_def`

    This basically helps detect balanced curly braces for invocations of commands.
    """
    return "\{((?>[^{}]+|\{(?1)\})*)\}".replace("1", str(group_num))

In [None]:
#| export
def custom_commands(
        preamble: str, # The preamble of a LaTeX document.
        ) -> list[tuple[str, int, Union[str, None], str]]: # Each tuple consists of 1. the name of the custom command 2. the number of parameters 3. The default argument if specified or `None` otherwise, and 4. the display text of the command.
    """
    Return a dict mapping commands (and math operators) defined in `preamble` to
    the number of arguments display text of the commands.

    Assumes that the newcommands only have at most one default parameter (newcommands with
    multiple default parameters are not valid in LaTeX).

    Ignores all comented newcommands.
    """
    preamble = remove_comments(preamble)
    latex_commands = _commands_from_newcommand_and_declaremathoperator(preamble)
    tex_commands = _commands_from_def(preamble)
    return latex_commands + tex_commands


def _commands_from_newcommand_and_declaremathoperator(
        preamble: str, # The preamble of a LaTeX document
        ) -> list[tuple[str, int, Union[str, None], str]]: # Each tuple consists of 1. the name of the custom command 2. the number of parameters 3. The default argument if specified or `None` otherwise, and 4. the display text of the command.
    """
    Get custom commands from invocations of `\newcommand` and `DeclareMathOperator`
    in the preamble.

    Helper function to `custom_commands`
    """
    # newcommand_regex = regex.compile(
    #     r'(?<!%)\s*\\(?:(?:re)?newcommand|DeclareMathOperator)\s*\{\\\s*(\w+)\s*\}\s*(?:\[(\d+)\]\s*(?:\[(\w+)\])?)?\s*\{((?>[^{}]+|\{(?4)\})*)\}', re.MULTILINE)
    newcommand_regex = regex.compile(
        r'(?<!%)\s*\\(?:(?:re)?newcommand|DeclareMathOperator)\s*(?:\{\\\s*(\w+)\s*\}|\\\s*(\w+))\s*(?:\[(\d+)\]\s*(?:\[(\w+)\])?)?\s*\{((?>[^{}]+|\{(?5)\})*)\}', re.MULTILINE)

    commands = []
    for match in newcommand_regex.finditer(preamble):
        name_surrounded_in_parentheses = match.group(1) # e.g. \newcommand{\A}
        name_without_parentheses = match.group(2) # e.g. \newcommand\A
        num_args = match.group(3)
        optional_default_arg = match.group(4)
        definition = match.group(5)

        if name_surrounded_in_parentheses is not None:
            name = name_surrounded_in_parentheses
        else:
            name = name_without_parentheses

        # Convert the number of arguments to an integer, if it was specified
        if num_args is not None:
            num_args = int(num_args)
        else:
            num_args = 0

        commands.append((name, num_args, optional_default_arg, definition))
    return commands


def _commands_from_def(
        preamble: str
        ) -> list[tuple[str, int, Union[str, None], str]]: # Each tuple consists of 1. the name of the custom command 2. the number of parameters 3. The default argument if specified or `None` otherwise, and 4. the display text of the command.
    """
    """
    def_command_identifying = r'(?<!%)\s*\\def\s*'
    command_name_identifying = r'\\\s*(\w+)\s*'
    command_def = _argument_detection(2)
    def_regex = regex.compile(
        f"{def_command_identifying}{command_name_identifying}{command_def}"
    )
    return [(match.group(1), 0, None, match.group(2))
            for match in def_regex.finditer(preamble)]


In [None]:
#| hide
text = r"\def\calh{{\mathcal H}}"
test_eq(_commands_from_def(text), [('calh', 0, None, '{\\mathcal H}')])

In [None]:
# Basic
text_1 = r'\newcommand{\con}{\mathcal{C}}'
test_eq(custom_commands(text_1), [('con', 0, None, r'\mathcal{C}')])

# With a parameter
text_2 = r'\newcommand{\field}[1]{\mathbb{#1}}'
test_eq(custom_commands(text_2), [('field', 1, None, r'\mathbb{#1}')]) 

# With multiple parameters, the first of which has a default value of `2`
text_3 = r'\newcommand{\plusbinomial}[3][2]{(#2 + #3)^#1}'
test_eq(custom_commands(text_3), [('plusbinomial', 3, '2', r'(#2 + #3)^#1')])

# The display text has backslashes `\` and curly brances `{}``
text_4 = r'\newcommand{\beq}{\begin{displaymath}}'
test_eq(custom_commands(text_4), [('beq', 0, None, '\\begin{displaymath}')])


# Basic with spaces in the newcommand declaration
text_6 = r'\newcommand {\con}  {\mathcal{C}}'
test_eq(custom_commands(text_6), [('con', 0, None, r'\mathcal{C}')])

# With a parameter and spaces in the newcommand declaration
text_7 = r'\newcommand   {\field}   [1] {\mathbb{#1}}'
test_eq(custom_commands(text_7), [('field', 1, None, r'\mathbb{#1}')])

# With multiple parameters, a default value, and spaces in the newcommand declaration
text_8 = r'\newcommand {\plusbinomial} [3] [2] {(#2 + #3)^#1}'
test_eq(custom_commands(text_8), [('plusbinomial', 3, '2', r'(#2 + #3)^#1')]) 

# With a comment `%'; commented out command declarations should not be detected.
text_9 = r'% \newcommand{\con}{\mathcal{C}}'
test_eq(custom_commands(text_9), [])


# Spanning multiple lines
text_10 = r'''\newcommand{\mat}[4]{\left[\begin{array}{cc}#1 & #2 \\
                                         #3 & #4\end{array}\right]}'''
test_eq(
    custom_commands(text_10),
    [('mat', 4, None,
             '\\left[\\begin{array}{cc}#1 & #2 \\\\\n                                         #3 & #4\\end{array}\\right]')])

# Math operator
text_11 = r'\DeclareMathOperator{\Hom}{Hom}'
test_eq(custom_commands(text_11), [('Hom', 0, None, 'Hom')])

text_12 = r'\DeclareMathOperator{\tConf}{\widetilde{Conf}}'
test_eq(custom_commands(text_12), [('tConf', 0, None, r'\widetilde{Conf}')])

# `\def` commands
# \def is a bit complicated because arguments can either be provided with []
# or can be provided with {}.
text_13 = r'\def\A{{\cO_{K}}}'
test_eq(custom_commands(text_13), [('A', 0, None, '{\cO_{K}}')])

# newcommand and renewcommand don't require {} for the
# command name, cf. https://arxiv.org/abs/1703.05365
text_14 = r'\newcommand\A{{\mathbb A}}'
test_eq(custom_commands(text_14), [('A', 0, None, r'{\mathbb A}')])

# A test for https://arxiv.org/abs/0902.4637
text_15 = r'\newcommand{\til}[1]{{\widetilde{#1}}}'
test_eq(custom_commands(text_15), [('til', 1, None, '{\\widetilde{#1}}')])


In [None]:
#| export
def regex_pattern_detecting_command(
        command_tuple: tuple[str, int, Union[None, str], str], # Consists of 1. the name of the custom command 2. the number of parameters 3. The default argument if specified or `None` otherwise, and 4. the display text of the command.
        ) -> regex.Pattern:
    """Return a `regex.pattern` object (not a `re.pattern` object) detecting
    the command with the specified number of parameters, optional argument,
    and display text.

    Assumes that the curly braces used to write the invocations of the commands
    are balanced and properly nested. Assumes that there are no two commands
    of the same name.
    """
    command_name, num_parameters, optional_arg, _ = command_tuple
    backslash_name = fr"\\{command_name}"
    optional_argument_detection = fr"(?:\[(.*?)\])?" if optional_arg is not None else ""
    if optional_arg is not None:
        trailing_arguments = [_argument_detection(i) for i in range(2, 1+num_parameters)]
        trailing_args_pattern = "\\s*".join(trailing_arguments)
        pattern = (f"{backslash_name}\\s*{optional_argument_detection}\\s*{trailing_args_pattern}")
    elif num_parameters > 0:
        arguments = [_argument_detection(i) for i in range(1, 1+num_parameters)]
        args_pattern = "\\s*".join(arguments)
        pattern = f"{backslash_name}\\s*{args_pattern}"
    else:
        # Match the command name exactly without letters immediately following
        # (but underscores following are okay).
        pattern = f"{backslash_name}(?![^\W_])"
    return regex.compile(pattern)

    

In [None]:
# Basic
pattern = regex_pattern_detecting_command(('Sur', 0, None, r'\mathrm{Sur}'))
text = r'The number of element of $\Sur(\operatorname{Cl} \mathcal{O}_L, A)$ is ...'
match = pattern.search(text)
start, end = match.span()
test_eq(text[start:end], r'\Sur')

# One parameter
pattern = regex_pattern_detecting_command(('field', 1, None, r'\mathbb{#1}'))
text = r'\field{Q}'
# print(pattern.pattern)
match = pattern.search(text)
start, end = match.span()
test_eq(text[start:end], text)

# Multiple parameters
pattern = regex_pattern_detecting_command(('mat', 4, None, r'\left[\begin{array}{cc}#1 & #2 \\ #3 & #4\end{array}\right]'))
text = r'\mat{{123}}{asdfasdf{}{}}{{{}}}{{asdf}{asdf}{}}' # This is a balanced str.
match = pattern.search(text)
start, end = match.span()
test_eq(text[start:end], text)
test_eq(match.group(1), r'{123}')

# Multiple parameters, one of which is optional parameter
pattern = regex_pattern_detecting_command(('plusbinomial', 3, '2', r'(#2 + #3)^#1'))
# When the optional parameter is used
text = r'\plusbinomial{x}{y}'
match = pattern.search(text)
start, end = match.span()
test_eq(text[start:end], text)

# When the optional parameter is not used
text = r'\plusbinomial[4]{x}{y}'
match = pattern.search(text)
start, end = match.span()
test_eq(text[start:end], text)

# One parameter that is optional.
pattern = regex_pattern_detecting_command(('greet', 1, 'world', r'Hello #1!'))
# When the optional parameter is used
text = r'\greet'
match = pattern.search(text)
start, end = match.span()
test_eq(text[start:end], text)

# When the optional parameter is not used
text = r'\greet[govna]'
match = pattern.search(text)
start, end = match.span()
test_eq(text[start:end], text)

# In the following example, `\del` is a command defined as `\delta`.
# Any invocation `\delta` should detected as invocations of `\del``
command_tuple = (r'del', 0, None, r'\delta')
pattern = regex_pattern_detecting_command(command_tuple)
text = r'\del should be detected.'
match = pattern.search(text)
start, end = match.span()
test_eq(text[start:end], r'\del')
text = r'\delta should not be detected.'
match = pattern.search(text)
assert match is None
# test_eq(replace_command_in_text(text, command_tuple), r'\delta should be replaced. \delta should not.')

# In the following example, the command takes one argument, but sometimes the command
# is `\del` 
command_tuple = ('til', 1, None, '{\\widetilde{#1}}')
pattern = regex_pattern_detecting_command(command_tuple)
text = r'\til \calh_g'
match = pattern.search(text)
# start, end = match.span()


In [None]:
#| export
def replace_command_in_text(
        text: str,
        command_tuple: tuple[str, int, Union[None, str], str], # Consists of 1. the name of the custom command 2. the number of parameters 3. The default argument if specified or `None` otherwise, and 4. the display text of the command.
    ):
    """
    Replaces all invocations of the specified command in `text` with the display text
    with the arguments used in the display text.

    Assumes that '\1', '\2', '\3', etc. are not part of the display text. 
    """
    command_name, num_parameters, optional_arg, display_text = command_tuple
    command_pattern = regex_pattern_detecting_command(command_tuple)
    replace_pattern = display_text.replace('\\', r'\\')
    # if optional_arg is not None:
    #     replace_pattern = replace_pattern.replace('#1', optional_arg)
    replace_pattern = re.sub(r'#(\d)\b', r'\\\1', replace_pattern)
    text = regex.sub(
        command_pattern,
        lambda match: _replace_command(match, command_tuple, command_pattern, replace_pattern),
        text)
    return text


def _replace_command(
        match: regex.match,
        command_tuple: tuple[str, int, Union[None, str], str],
        command_pattern: regex.Pattern,
        replace_pattern: re.Pattern) -> str:
    """
    Replace the matched command with the display text
    
    This is a helper function to `replace_command_in_text`.
    """
    command_name, num_parameters, optional_arg, display_text = command_tuple
    start, end = match.span()
    matched_string_to_replace = match.string[start:end]
    if len(match.groups()) > 0 and match.group(1) is None:
        replace_pattern = replace_pattern.replace(r'\1', optional_arg)
        replaced_string = regex.sub(command_pattern, replace_pattern, matched_string_to_replace)
        return replaced_string
    else:
        return regex.sub(command_pattern, replace_pattern, matched_string_to_replace)


# def _replace_nonexplicit_instances_of_command(
#         text: str,
#         command_tuple: tuple[str, int, Union[None, str], str], # Consists of 1. the name of the custom command 2. the number of parameters 3. The default argument if specified or `None` otherwise, and 4. the display text of the command.
#     ) -> str:
#     """
#     Replace the nonexplicitly instances of a custom command. 

#     Sometimes, a LaTeX command is used nonexplicitly, i.e. the arguments are not
#     explicitly typed with surrounding curly braces `{}`.  An example of this phenomenon
#     is a command named `\til` defined by `\newcommand{\til}[1]{{\widetilde{#1}}}`
#     that is later invoked using `$\til \calh_g$`.

#     This function is only a workaround.

#     This is a helper function to `replace_command_in_text`.
#     """



In [None]:
#| hide


In [None]:
# Basic
command_tuple = ('Sur', 0, None, r'\mathrm{Sur}')
# pattern = regex_pattern_detecting_command(command_tuple)
text = r'The number of element of $\Sur(\operatorname{Cl} \mathcal{O}_L, A)$ is ... Perhaps $\Sur$ is nonempty.'
test_eq(replace_command_in_text(text, command_tuple), 'The number of element of $\mathrm{Sur}(\operatorname{Cl} \mathcal{O}_L, A)$ is ... Perhaps $\mathrm{Sur}$ is nonempty.')


# One parameter
command_tuple = ('field', 1, None, r'\mathbb{#1}')
# pattern = regex_pattern_detecting_command(command_tuple)
text = r'$\field{Q}$ is the field of rational numbers. $\field{C}$ is the field of complex numbers'
test_eq(replace_command_in_text(text, command_tuple), '$\mathbb{Q}$ is the field of rational numbers. $\mathbb{C}$ is the field of complex numbers')

# Multiple parameters
command_tuple = ('mat', 4, None, r'\left[\begin{array}{cc}#1 & #2 \\ #3 & #4\end{array}\right]')
# pattern = regex_pattern_detecting_command(command_tuple)
text = r'\mat{{123}}{asdfasdf{}{}}{{{}}}{{asdf}{asdf}{}}' # This is a balanced str.
test_eq(replace_command_in_text(text, command_tuple), r'\left[\begin{array}{cc}{123} & asdfasdf{}{} \\ {{}} & {asdf}{asdf}{}\end{array}\right]')

# Multiple parameters, one of which is optional parameter
command_tuple = ('plusbinomial', 3, '2', r'(#2 + #3)^#1')
# pattern = regex_pattern_detecting_command(command_tuple)
# When the optional parameter is used
text = r'\plusbinomial{x}{y}'
test_eq(replace_command_in_text(text, command_tuple), r'(x + y)^2')

# When the optional parameter is not used
text = r'\plusbinomial[4]{x}{y}'
test_eq(replace_command_in_text(text, command_tuple), r'(x + y)^4')


# One parameter that is optional.
command_tuple = ('greet', 1, 'world', r'Hello #1!')
# pattern = regex_pattern_detecting_command(command_tuple)
# When the optional parameter is used
text = r'\greet'
test_eq(replace_command_in_text(text, command_tuple), r'Hello world!')

# When the optional parameter is not used
text = r'\greet[govna]'
test_eq(replace_command_in_text(text, command_tuple), r'Hello govna!')

# In the following example, `\del` is a command defined as `\delta`.
# Any invocation `\delta` should NOT be replaced into `\deltata` should NOT be replaced into `\deltata`
command_tuple = (r'del', 0, None, r'\delta')
# pattern = regex_pattern_detecting_command(command_tuple)
text = r'\del should be replaced. \delta should not.'
test_eq(replace_command_in_text(text, command_tuple), r'\delta should be replaced. \delta should not.')


In [None]:
#| export
def replace_commands_in_text(
        text: str, # The text in which to replace the commands. This should not include the preamble of a latex document.
        command_tuples: tuple[str, int, Union[None, str], str], # An output of `custom_commands`. Each tuple Consists of 1. the name of the custom command 2. the number of parameters 3. The default argument if specified or `None` otherwise, and 4. the display text of the command.
        repeat: int = 1 # The number of times to repeat replacing the commands throughout the text. Defaults to `1`, in which custom commands are replaced throughout the entire document once. If set to -1, then this function attempts to replace custom commands until no commands to replace are found. 
    ) -> str:
    """
    Replaces all invocations of the specified commands in `text` with the
    display text with the arguments used in the display text.

    Assumes that '\1', '\2', '\3', etc. are not part of the display text. 

    If `repeat` is set to `-1`, then this function attempts to replace
    custom commands until no commands to replace are found. However, this
    might cause infinite loops for some documents.

    """
    while repeat != 0:
        for command_tuple in command_tuples:
            text = replace_command_in_text(text, command_tuple)
        repeat -= 1
    return text

The `replace_comands_in_text` function replaces custom commands from a (the main part of) a LaTeX document.

In [None]:
text = r'''Here is a matrix over $\field{Q}$: $\mat{1/2}{2}{-1}{5/7}$.
           Note that it is not over $\field{F}_7$ and not over $\field{F}_2$.'''

command_tuples = [
    ('field', 1, None, r'\mathbb{#1}'),
    ('mat', 4, None, r'\left[\begin{array}{cc}#1 & #2 \\ #3 & #4\end{array}\right]')]

sample_output = replace_commands_in_text(text, command_tuples)

test_eq(sample_output, 
        r'''Here is a matrix over $\mathbb{Q}$: $\left[\begin{array}{cc}1/2 & 2 \\ -1 & 5/7\end{array}\right]$.
           Note that it is not over $\mathbb{F}_7$ and not over $\mathbb{F}_2$.''')


Note that some writers define custom commands using other custom commands. By default, the `replace_commands_in_text` function replaces custom commands just once. In the following example, there is a custom command that is defined using another custom command and the function replace the "outer" custom command:

In [None]:
# TODO: continue this example
text = r''''''

In [None]:
#| export 
def replace_commands_in_latex_document(
        docment: str
        ) -> str:
    """Return the latex document (without the preamble) with invocations
    of custom commands/operators replaced with their display text.

    Assumes that all custom commands and operators are defined in the
    preamble.

    Assumes that, if commands with the same name are defined multiple times,
    only the finally defined command is used. 

    Even replaces these invocations incommented out text.
    """
    preamble, document = divide_preamble(docment)
    commands = custom_commands(preamble)
    # Note that `command_tuple[0]` is the name of the command.
    unique_commands = {command_tuple[0]: command_tuple for command_tuple in commands} 
    for _, command_tuple in unique_commands.items():
        document = replace_command_in_text(document, command_tuple)
    return document
    

In [None]:
file = _test_directory() / 'latex_examples' / 'commands_recursive_example' / 'main.tex'
document = text_from_file(file)
commands_replaced = replace_commands_in_latex_document(document)
assert commands_replaced.startswith(r'\begin{document}')
assert commands_replaced.endswith(r'\end{document}')
assert r'\S' not in commands_replaced
assert r'\mathbb{S}1' in commands_replaced  # Note that $\S$ is defined twice in the preamble; only the latter definition is used.
assert r'\field{Q}$' not in commands_replaced
assert r'\mathbb{Q}$' in commands_replaced
assert r'\commentedout' not in commands_replaced
assert r'This is actually a command that is commented out, but it is also replaced!' in commands_replaced
print(commands_replaced)

\begin{document}

$\mathbb{S}1$
%$\mathbf{Q}$
%$\mathbf{Q}$
%This is actually a command that is commented out, but it is also replaced!
$\mathbb{Q}$

\end{document}


### Replace commonly used syntax

Obsidian does not compile all LaTeX syntax. For example

- `\( \)` and `\[ \]` are not recognized as math mode delimiters.
- `\begin{equation} \end{equation}` and `\begin{align} \end{align}` (as well as their unnumbered versions with `*`) require surrounding `$$`.

In [None]:
#| export
# TODO: give the option to replace quotations ``'' and `enquote`, e.g. ```unlikely intersections''` into `"unlikely intersections"`
# TODO: give the option to replace emph with `****`, e.g. ``\emph{special}``.
def adjust_common_syntax_to_markdown(
        text) -> str:
    """
    Adjust some common syntax, such as math mode delimiters and equation/align
    environments, for Markdown.

    Assumes that the tokens for math mode delimiters (e.g. `\( \)` and `\[ \]`)
    are not used otherwise.
    """
    # TODO: see if I need to add more substitutions.
    text = re.sub(r'\\\(|\\\)', '$', text)
    text = re.sub(r'\\\[|\\]', '$$', text)
    text = re.sub(r'(\\begin\{(?:align|equation|eqnarray)\*?\})', r'$$\1', text)
    text = re.sub(r'(\\end\{(?:align|equation|eqnarray)\*?\})', r'\1$$', text)
    return text

In [None]:
text = r'''
I want to talk about \(\mathbb{Z}[i]\). It is the ring whose elements are of the form $a+bi$ where $a,b \in \mathhbb{Z}$.
It has a multiplication structure:
\[ (a+bi) (c+di) = (ac-bd) + (ad+bc)i.\]

Here is an equation:
\begin{equation}
5+7 = 12
\end{equation}

Here is another:
\begin{equation*}
5+6 = 11
\end{equation*}

Here is an align:
\begin{align}
5+7 = 12
\end{align}

Here is another:
\begin{align*}
5+6 = 11
\end{align*}

\begin{eqnarray}
asdf
\end{eqnarray}
'''
sample_output = adjust_common_syntax_to_markdown(text)
print(sample_output)
assert r'\(' not in sample_output
assert r'\)' not in sample_output
assert r'\[' not in sample_output
assert r'\]' not in sample_output
assert r'$$\begin{align}' in sample_output
assert r'\end{align}$$' in sample_output
assert r'$$\begin{equation}' in sample_output
assert r'\end{equation}$$' in sample_output
assert r'$$\begin{align*}' in sample_output
assert r'\end{align*}$$' in sample_output
assert r'$$\begin{equation*}' in sample_output
assert r'\end{equation*}$$' in sample_output
assert r'$$\begin{eqnarray}' in sample_output
assert r'\end{eqnarray}$$' in sample_output


I want to talk about $\mathbb{Z}[i]$. It is the ring whose elements are of the form $a+bi$ where $a,b \in \mathhbb{Z}$.
It has a multiplication structure:
$$ (a+bi) (c+di) = (ac-bd) + (ad+bc)i.$$

Here is an equation:
$$\begin{equation}
5+7 = 12
\end{equation}$$

Here is another:
$$\begin{equation*}
5+6 = 11
\end{equation*}$$

Here is an align:
$$\begin{align}
5+7 = 12
\end{align}$$

Here is another:
$$\begin{align*}
5+6 = 11
\end{align*}$$

$$\begin{eqnarray}
asdf
\end{eqnarray}$$



## Setup an Obsidian vault reference

In [None]:
#| export
def _replace_custom_commands_in_parts(
        parts: list[tuple[str, str]],
        custom_commands: list[tuple[str, int, Union[str, None], str]]
        ) -> list[tuple[str, str]]:
    return [(title, replace_commands_in_text(text, custom_commands))
            for title, text in parts]


def _adjust_common_syntax_to_markdown_in_parts(
        parts: list[tuple[str, str]]
        ) -> list[tuple[str, str]]:
    return [(title, adjust_common_syntax_to_markdown(text))
            for title, text in parts]

In [None]:
#| hide
# TODO: test _adjust_common_syntax_to_markdown_in_parts

text = r'Let $\tConf_n$ be the universal cover of $\Conf_n$.'
parts = [('1', text)]
cust_comms = [
    ('tConf', 0, None, '\\widetilde{Conf}'),
    ('Conf', 0, None, '\\Conf'),
    ]
test_eq(
    _replace_custom_commands_in_parts(parts, cust_comms),
    [('1', 'Let $\\widetilde{Conf}_n$ be the universal cover of $\\Conf_n$.')])



In [None]:
#| export
def _adjust_common_section_titles_in_parts(
        parts: list[tuple[str, str]],
        reference_name: str):
    """Adjust common section titles in `parts`

    Common section titles include, but are not limited to,
    `Introduction`, `Notations`,
    `Conventions`, `Preliminaries`, and `Notations and Conventions`    
    
    This is a helper function for `setup_reference_from_latex_parts`.

    """
    return [(_adjusted_title(title, text, reference_name), text)
            for title, text in parts]


# TODO: also adjust title if the title is of the form
# <section_number>_<common_section_title>, e.g.
# 7_acknowledgements
# 8_references
COMMON_SECTION_TITLES = [
    'introduction', 'notations', 'conventions', 'preliminaries',
    'notations and conventions', 'definitions', 'definitions and notations',
    'references', 'acknowledgements']


def _adjusted_title(
        title: str,
        text: str,
        reference_name: str):
    """Adjust the given title"""
    node = get_node_from_simple_text(text)
    if not _is_section_node(node):
        return title
    _, section_title = _section_title(text)
    if section_title.lower() in COMMON_SECTION_TITLES:
        return f'{title}_{reference_name}'
    else:
        return title 

In [None]:
#| hide
test_eq(_adjusted_title('1. Introduction', r'\section {Introduction}', 'reference_name'), '1. Introduction_reference_name')
test_eq(_adjusted_title('2. Not a common name', r'\section{Not a common name}', 'reference_name'), '2. Not a common name')
# test_eq(_adjusted_title(UNTITLED_SECTION_TITLE))


In [None]:
#| export
def _create_notes_from_parts(
        parts: list[tuple[str, str]],
        chapters: list[list[str]],
        index_note: VaultNote, # The index note of the reference that was created.
        vault: Path,
        reference_folder: Path,
        reference_name: str,
        template_mf: MarkdownFile, # The template of the reference that was created.
        ):
    """Create notes for the vault from `parts`."""
    # headings_folder_correspondence = correspond_headings_with_folder(
    #     index_note, vault)
    title_numbering_folder_map = {
        title: convert_title_to_folder_name(title)
        for section in chapters
        for title in section}

    current_section, current_subsection = chapters[0][0], '' # section/subsection titles
    # Dict of dict of list of str. The top level keys
    # are section titles and the corresponding values are dicts whose
    # keys are subsection titles and values are lists of bulleted links texts
    # of the form `- [[linke_to_note]], Title/identifying information` to add.
    links_to_make = {current_section: {current_subsection: []}}  
    for part in parts:
        current_section, current_subsection = _create_part_or_update(
            part, title_numbering_folder_map, vault, reference_folder,
            reference_name, template_mf, current_section, current_subsection,
            links_to_make)

    _make_links_in_index_notes(
        links_to_make, title_numbering_folder_map, vault,
        reference_folder, reference_name)
    

def _make_links_in_index_notes(
        links_to_make: dict[str, dict[str, list[str]]],
        title_numbering_folder_map: dict[str, tuple[str, str]],
        vault: Path,
        reference_folder: Path,
        reference_name: str,
        ):
    for section_title, section_dict in links_to_make.items():
        section_folder = title_numbering_folder_map[section_title]
        _make_links_in_index_note_for_section(
            section_title, section_dict, section_folder,
            vault, reference_folder, reference_name)


def _make_links_in_index_note_for_section(
        section_title: str,
        section_dict: dict[str, list[str]],
        section_folder: str,
        vault: Path,
        reference_folder: Path,
        reference_name: str):
    rel_path = reference_folder / section_folder / f'_index_{section_folder}.md'
    section_index_note = VaultNote(vault, rel_path=rel_path)
    mf = MarkdownFile.from_vault_note(section_index_note)
    for subsection_title, links_to_make_in_header in section_dict.items():
        mf.add_line_in_section(
            subsection_title,
            {'type': MarkdownLineEnum.UNORDERED_LIST,
             'line': '\n'.join(links_to_make_in_header)})
    mf.write(section_index_note)


def _create_part_or_update(
        part: tuple[str, str],
        title_numbering_folder_map: dict[str, tuple[str, str]],
        vault: Path,
        reference_folder: Path,
        reference_name: str,
        template_mf: MarkdownFile,
        current_section: str,
        current_subsection: str,
        links_to_make: dict[str, dict[str, list[str]]],
        ) -> tuple[str, str]:
    """
    Consider `part` for creating a new note or for updating
    `current_section` and `current_subsection`

    Also append to `links_to_make` for each note that is created.
    """
    if _part_starts_section(part):
        current_section = part[0]
        current_subsection = ''
        links_to_make[current_section] = {'': []}
        folder = title_numbering_folder_map[current_section]
        return current_section, current_subsection
    elif _part_starts_subsection(part):
        current_subsection = part[0]
        links_to_make[current_section][current_subsection] = []
        return current_section, current_subsection

    created_note = _create_note_for_part(
        part, title_numbering_folder_map, vault, reference_folder,
        reference_name, template_mf, current_section, current_subsection)

    _update_links_to_make(
        part, current_section, current_subsection, links_to_make,
        created_note)
    return current_section, current_subsection


def _create_note_for_part(
        part: tuple[str, str],
        title_numbering_folder_map: dict[str, tuple[str, str]],
        vault: Path,
        reference_folder: Path,
        reference_name: str,
        template_mf: MarkdownFile,
        current_section: str,
        current_subsection: str
        ) -> VaultNote: # The created VaultNote.
    """Create a note for the part"""
    note_title = sanitize_filename(part[0])
    note_contents = part[1]
    mf = template_mf.copy(True)
    mf.add_line_in_section(
        'Topic[^1]', {'type': MarkdownLineEnum.DEFAULT, 'line': note_contents})
    mf.parts[-1]['line'] += note_title
    section_folder = title_numbering_folder_map[current_section]
    # TODO: Make it so that unique_note_name indicates an unnumbered
    # note as unnumbered.
    unique_note_name = VaultNote.unique_name(
        f"{reference_name}_{note_title}", vault)
    if current_subsection == '':
        rel_path = (
            reference_folder / section_folder / f"{unique_note_name}.md")
    else:
        subsection_folder = title_numbering_folder_map[current_subsection]
        rel_path = (
            reference_folder / section_folder / subsection_folder / f"{unique_note_name}.md")
    vn = VaultNote(vault, rel_path=rel_path)
    vn.create()
    mf.write(vn)
    return vn



def _update_links_to_make(
        part: tuple[str, str],
        current_section: str,
        current_subsection: str,
        links_to_make: dict[str, dict[str, list[str]]],
        created_note: VaultNote
        ) -> None:
    """Update `links_to_make` after note is created."""
    # if current_subsection is not None:
    #     current_subsection_key = current_subsection
    # else:
    #     current_subsection_key = ''
    note_title = part[0]
    links_to_make[current_section][current_subsection].append(
        f'- [[{created_note.name}]], {note_title}'
    )
    




In [None]:
#| export
# TODO: test parts without a subsection.
# TODO: somehow contents before a section are not inclued. Fix this bug.
# TODO: If section titles are completely empty, e.g. https://arxiv.org/abs/math/0212208,
# Make section titles based on reference name.
# TODO: give the option to not included commented out content from latex files.
def setup_reference_from_latex_parts(
        parts: list[tuple[str, str]], # Output of `divide_latex_text`
        custom_commands: list[tuple[str, int, Union[str, None], str]], # Output of `custom_commands` applied to the preamble of the LaTeX ddocument.`
        vault: PathLike, # An Obsidian.md vault,
        location: PathLike, # The path to make the new reference folder. Relative to `vault`.
        reference_name: PathLike, # The name of the new reference.
        authors: Union[str, list[str]], # Each str is the family name of each author.
        author_folder: PathLike = '_mathematicians', # The directory where the author files are stored in. Relative to `vault`.
        references_folder: PathLike = '_references', # The directory where the references files are stored in. Relative to `vault`.
        templates_folder: PathLike = '_templates', # The directory where the template files are stored in. Relative to `vault`.
        template_file_name: str = '_template_common', # The template file from which to base the template file of the new reference.
        notation_index_template_file_name: str = '_template_notation_index', # The template file from which to base the notation index file of the new reference.
        glossary_template_file_name: str = '_template_glossary', # The template file from which to base the glossary file of the new reference.
        setup_temp_folder: bool = True, # If `True`, creates a `_temp` folder with an index file. This folder serves to house notes auto-created from LaTeX text files before moving them to their correct directories. Defaults to `True`.
        make_second_template_file_in_reference_directory: bool = True, # If `True`, creates a copy of the template note within the directory for the reference.
        copy_obsidian_configs: Optional[PathLike] = '.obsidian', # The folder relative to `vault` from which to copy obsidian configs.  If `None`, then no obsidian configs are copied to the reference folder. Defaults to `.obsidian`. 
        overwrite: Union[str, None] = None, # Specifies if and how to overwrite the reference folder if it already exists.  - If `'w'`, then deletes the contents of the existing reference folder, as well as the template and reference file before setting up the reference folder before creating the new reference folder.  - If `'a'`, then overwrites the contents of the reference folder, but does not remove existing files/folders.  - If `None`, then does not modify the existing reference folder and raises a `FileExistsError`.
        confirm_overwrite: bool = True, # Specifies whether or not to confirm the deletion of the reference folder if it already exists and if `overwrite` is `'w'`. Defaults to `True`.
        verbose: bool = False,
        replace_custom_commands: bool = True, # If `True`, replace the custom commands in the text of `parts` when making the notes.
        adjust_common_latex_syntax_to_markdown: bool = True, # If `True`, apply `adjust_common_syntax_to_markdown` to the text in `parts` when making the notes.`
        ) -> None:
    """Set up a reference folder in `vault` using an output of `divide_latex_text`, create
    notes from `parts`, and link notes in index files in the reference folder.

    Assumes that

    - `parts` is derived from a LaTeX document in which
        - all of the text belongs to sections.
        - all of the sections/subsections are uniquely named
    - The template file is has a section `# Topic`
    - The last line of the template file is a footnote indicating where the note comes from.
    - There is at most one reference folder in the vault whose name is given by
      `reference_name`.

    `parts` itself is not modified, even if `replace_custom_commands` and/or
    `adjust_common_latex_syntax_to_markdown` are set to `True`.

    cf. `setup_folder_for_new_reference` for how the reference folder is set up..

    The names for the subfolders of the reference folder are the section titles, except
    for sections with common titles such as `Introduction`, `Notations`, `Conventions`,
    `Preliminaries`, and `Notations and Conventions`. This ensures that the index
    file names for sections in different reference folders do not have the same name.

    Text/parts that precede explicitly given sections are included in the 
    first section's folder and are linked in the first section's index file.
    """
    parts = _adjust_common_section_titles_in_parts(parts, reference_name)
    chapters = section_and_subsection_titles_from_latex_parts(parts)
    if chapters[0][0] == UNTITLED_SECTION_TITLE:
        chapters[0][0] = f'{reference_name} {UNTITLED_SECTION_TITLE}'
    setup_folder_for_new_reference(
        reference_name, location, authors, vault, author_folder,
        references_folder, templates_folder, template_file_name,
        notation_index_template_file_name, 
        glossary_template_file_name, chapters, setup_temp_folder,
        make_second_template_file_in_reference_directory,
        copy_obsidian_configs, overwrite, confirm_overwrite, verbose)
    index_note = VaultNote(
        vault, rel_path=Path(location) / reference_name / f'_index_{reference_name}.md')
    template_note = VaultNote(vault, name=f'_template_{reference_name}')
    template_mf = MarkdownFile.from_vault_note(template_note)

    if replace_custom_commands:
        parts = _replace_custom_commands_in_parts(parts, custom_commands)
    if adjust_common_latex_syntax_to_markdown:
        parts = [(title, adjust_common_syntax_to_markdown(text))
                 for title, text in parts]
    
    reference_folder = Path(location) / reference_name
    _create_notes_from_parts(
        parts,
        chapters, 
        index_note,
        vault,
        reference_folder,
        reference_name,
        template_mf)
    


In [None]:
with (tempfile.TemporaryDirectory(prefix='temp_dir', dir=os.getcwd()) as temp_dir):
    temp_vault = Path(temp_dir) / 'test_vault_5'
    shutil.copytree(_test_directory() / 'test_vault_5', temp_vault)

    sample_latex_file = _test_directory() / 'latex_examples' / 'latex_example_with_untitled_subsections_setup_to_a_vault' / 'main.tex' 
    sample_latex_text = text_from_file(sample_latex_file)
    preamble, _ = divide_preamble(sample_latex_text)
    parts = divide_latex_text(sample_latex_text)
    cust_comms = custom_commands(preamble)
    
    setup_reference_from_latex_parts(
        parts, cust_comms, temp_vault, Path(''),
        'test_ref',
        ['Kim', 'Kim', 'Kim'])
    # os.startfile(temp_vault)
    # input()

In [None]:
# TODO: give an example for a LaTeX document with a multiline section
# TODO: give an example for a LaTeX document with a section that must be sanitized first, e.g.
# in banwait_et_al_cnpgrg2c, there is a section of the string
# `\section{Exceptional maximal subgroups of 
# \texorpdfstring{\(\GSp_4(\ff_\ell)\)}{GSp4Fell}}`

The following example demonstrates setting up a reference folder from a latex document with significant content before any explicitly specified sections. In particular, the reference folder contains a subfolder dedicated to the content that comes before the explicitly specified sections.

In [None]:
with (tempfile.TemporaryDirectory(prefix='temp_dir', dir=os.getcwd()) as temp_dir):
    temp_vault = Path(temp_dir) / 'test_vault_5'
    shutil.copytree(_test_directory() / 'test_vault_5', temp_vault)

    sample_latex_file = _test_directory() / 'latex_examples' / 'latex_example_with_content_before_sections' / 'main.tex' 
    sample_latex_text = text_from_file(sample_latex_file)
    preamble, _ = divide_preamble(sample_latex_text)
    parts = divide_latex_text(sample_latex_text)
    cust_comms = custom_commands(preamble)
    
    setup_reference_from_latex_parts(
        parts, cust_comms, temp_vault, Path(''),
        'test_ref',
        ['Kim', 'Kim', 'Kim'])
    reference_folder = temp_vault / 'test_ref'

    subdirectories = list(reference_folder.glob('**'))
    relative_subdirectories = [
        os.path.relpath(subdirectory, reference_folder)
        for subdirectory in subdirectories]
    print("The following are the subdirectories of `reference_folder` (relative to `temp_vault`):")
    print(relative_subdirectories)
    assert convert_title_to_folder_name(f'test_ref {UNTITLED_SECTION_TITLE}') in relative_subdirectories

    # os.startfile(temp_vault)
    # input()

The following are the subdirectories of `reference_folder` (relative to `temp_vault`):
['.', '.obsidian', '.obsidian\\plugins', '.obsidian\\plugins\\fast-link-edit', '1_proof_of_theorem~refthmain', '1_proof_of_theorem~refthmain\\11_this_is_a_subsection', '1_proof_of_theorem~refthmain\\12_this_is_another_subsection', 'test_ref_untitled_section', '_temp']
