-
Notifications
You must be signed in to change notification settings - Fork 35
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unknown latex macros do not include arguments? #60
Comments
Thanks for the feedback. This behavior is by design: if a macro is unknown, there is no way to know if a brace that follows is an argument or simple LaTeX content that follows. So we assume that it does not accept any macros. Say you write: \begin{equation}
\unknownsymbol [A, B] + \anotherunknownsymbol {\textstyle \frac{1}{2}} = 0
\end{equation} then there is no way to know if The ambiguity is not only for math modes. Think about code like In my opinion the best solution here is to identify which macros are unknown and add their signature to the latex context database. You can nevertheless customize the behavior of the from pylatexenc import macrospec, latexwalker, latex2text
class AbsorbAllDetectedPossibleMacroArgumentsParser(macrospec.MacroStandardArgsParser):
def parse_args(self, w, pos, parsing_state=None):
argspec = ''
argnlist = []
origpos = pos
while True:
# inspect the following token at the given position (should skip
# spaces if necessary)
try:
tok = w.get_token(pos)
except latexwalker.LatexWalkerEndOfStream:
break
if tok.tok == 'char' and tok.arg.startswith('*'):
argspec += '*'
argnlist.append(
w.make_node(latexwalker.LatexCharsNode,
parsing_state=parsing_state,
chars='*', pos=tok.pos, len=1)
)
pos = tok.pos + 1
elif tok.tok == 'char' and tok.arg.startswith('['):
(node, np, nl) = w.get_latex_maybe_optional_arg(
pos,
parsing_state=parsing_state
)
pos = np + nl
argspec += '['
argnlist.append(node)
elif tok.tok == 'brace_open':
(node, np, nl) = w.get_latex_expression(
pos,
strict_braces=False,
parsing_state=parsing_state,
)
pos = np + nl
argspec += '{'
argnlist.append(node)
else:
# something else -- we're guessing that it's not a macro
# argument
break
parsed = ParsedMacroArgs(
argspec=argspec,
argnlist=argnlist,
)
return (parsed, origpos, pos-origpos)
lw_db = latexwalker.get_default_latex_context_db()
lw_db.set_unknown_macro_spec(
macrospec.MacroSpec("", AbsorbAllDetectedPossibleMacroArgumentsParser())
)
output = latex2text.LatexNodes2Text().latex_to_text(r"""
\documentclass{article}
\usepackage{times}
\definecolor{gray}
\RequirePackage{fixltx2e}
""", latex_context=lw_db)
# output.strip() == "" Hope this helps! |
Thanks @phfaist, this is very helpful and makes sense. I think inside of the document, this behavior makes a lot of sense (because \textbf{blah} should parse to blah, for example), but before \begin{document} I think we can be reasonably confident (or certain? I'm not sure how latex works) that \marco{argument} should not produce any text in the output document. The workaround I've been using is just to delete all of the content that appears before \begin{document} before using pylatexenc, and separately handling the fact that this sometimes removes title/author. I'm not sure if this is just a hack or should be incorporated into the general behavior of pylatexenc, but just as FYI. Thanks again. |
Yes, you're right that usually in a LaTeX document before There are inherent limitations in Another strategy (outlined in issue #48) can be to start parsing the document from the top, one node at a time, while ignoring errors along the way, and while inspecting the nodes that you get for information you might be interested in (such as Thanks again for the feedback! |
That makes sense. Thanks again for the tips. |
Hi, thanks for this useful utility, and sorry for the usage question. I'm finding that the following code
will output
Basically, the arguments to the first two macros get removed, whereas the arguments for the second two don't. I think this may be because the first two macros are built-in LaTeX macros, whereas the latter two are custom ones, so maybe pylatexenc doesn't know how many arguments there should be, so it defaults to assuming "no arguments". If I run this:
then indeed I get
Is there any way to make pylatexenc automatically try to grab as many arguments as possible from unknown macros? Thanks a lot!
The text was updated successfully, but these errors were encountered: