Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow for exception when converting #62

Closed
RossWilliamson opened this issue Apr 27, 2021 · 3 comments
Closed

Allow for exception when converting #62

RossWilliamson opened this issue Apr 27, 2021 · 3 comments

Comments

@RossWilliamson
Copy link

It would be good to have an exception list when doing the conversion. For example I would like to keep \ref as \ref in order to put label markers in prior to latex. Right now that gets printed just as \ref.

@phfaist
Copy link
Owner

phfaist commented Apr 28, 2021

Hi, thanks for the issue. You can already achieve the desired result by specifying custom text conversions (see https://pylatexenc.readthedocs.io/en/latest/latex2text/):

from pylatexenc import latex2text

l2t_context = latex2text.get_default_latex_context_db()
l2t_context.add_context_category('preserve-custom-macros', prepend=True, macros=[
    latex2text.MacroTextSpec('ref', simplify_repl=r'\ref{%(1)s}')
    ],)
l2t = latex2text.LatexNodes2Text(latex_context=l2t_context)

latex = r'\emph{For the definition of $\alpha$, see also:} \ref{eq:a} \& \ref{eq:b}'
converted = l2t.latex_to_text(latex)
print(converted)
# outputs →  For the definition of α, see also: \ref{eq:a} & \ref{eq:b}

I'm closing this issue, feel free to reopen if I'm missing anything.

@phfaist phfaist closed this as completed Apr 28, 2021
@RossWilliamson
Copy link
Author

RossWilliamson commented Apr 29, 2021

Thanks! I was wondering how you do this for the latexencode vs latex2text. I have a string which has a deliberate "\ref" in there that I need to preserve. I tried the following:

from pylatexenc import latexencode

 cr = [ latexencode.UnicodeToLatexConversionRule(latexencode.RULE_REGEX, [
    (re.compile(r'\\ref'), r'\\ref'),
 ], replacement_latex_protection='none'),
    'defaults'
 ]

u_to_l = latexencode.UnicodeToLatexEncoder(conversion_rules=cr)

u_to_l.unicode_to_latex(r'\ref{sec:pp:qq}')

but it returns \ref{sec:pp:qq} - i.e. it escapes the curly brackets which i not wanted

@phfaist
Copy link
Owner

phfaist commented Apr 30, 2021

Try:

import re
from pylatexenc import latexencode

cr = [
    latexencode.UnicodeToLatexConversionRule(latexencode.RULE_REGEX, [
        (re.compile(r'\\ref\{([^\}]+)\}'), r'\\ref{\1}'),
     ], replacement_latex_protection='none'),
     'defaults'
 ]

u_to_l = latexencode.UnicodeToLatexEncoder(conversion_rules=cr)

print( u_to_l.unicode_to_latex(r'See \ref{sec:pp:qq} for α=β') )
# prints: See \ref{sec:pp:qq} for \ensuremath{\alpha}=\ensuremath{\beta}

Also, using this regular expression rule, no escaping will happen within the argument of the \ref macro.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants