Feature request - option in utf8tolatex to maintain capitalisation for BibTeX #21

tvercaut · 2019-06-08T17:49:16Z

It would be great to have an option to keep custom capitalisation for bibtex.

For example

TCP: The Capitalisation Example

would be encoded as

{TCP}: The Capitalisation Example

For now, I am using code borrowed from https://openreview-py.readthedocs.io/en/latest/_modules/tools.html#get_bibtex in combination with utf8encode:

def capitalize_title(title):
    capitalization_regex = re.compile('[A-Z]{2,}')
    words = re.split('(\W)', title)
    for idx, word in enumerate(words):
        m = capitalization_regex.search(word)
        if m:
            new_word = '{' + word[m.start():m.end()] + '}'
            words[idx] = words[idx].replace(word[m.start():m.end()], new_word)
    return ''.join(words)

bibtex_title = capitalize_title(utf8tolatex(orig_title))

The text was updated successfully, but these errors were encountered:

furutaka · 2019-06-08T23:44:29Z

Hi,

I don't understand your point. You want an option to surround/protect some parts by braces? My understanding is that the BibTeX entry itself should be protected...

tvercaut · 2019-06-09T08:34:09Z

Sorry if this wasn't completely clear. I am trying to create bibtex files programatically. Depending on the bibliography style, bibtex may generate titles in "titlecaps" or "sentence case" irrespective of the capitalisation used in the bibtex file. To avoid interfering with mandatory capitalisation (e.g. acronyms), mandatorily capitalised title parts (e.g. acronyms) should be protected by braces in bibtex files.

Let's consider this example title

AET: An exposé of titles

in the bibliography as compiled in a latex documents, this might get displayed as

AET: An Exposé of Titles

The corresponding part of the bibtex file should be

title={{AET}: An expos{\'e} of titles}

Note that using double braces would prevent bibtex from using the title capitalisation from the bibliography style and is thus not wanted.

title={{AET: An expos{\'e} of titles}}

would lead to AET: An exposé of titles in the pdf rather than AET: An Exposé of Titles

Below is a small python code to ease looking at this question:

#!/usr/bin/env python3

## Import statements
import sys
import re
from pylatexenc.latexencode import utf8tolatex

## Function to surround accronyms with braces
def capitalize_title(title):
    capitalization_regex = re.compile('[A-Z]{2,}')
    words = re.split('(\W)', title)
    for idx, word in enumerate(words):
        m = capitalization_regex.search(word)
        if m:
            new_word = '{' + word[m.start():m.end()] + '}'
            words[idx] = words[idx].replace(word[m.start():m.end()], new_word)
    return ''.join(words)

def utf8tobibtex_title(title):
    return capitalize_title(utf8tolatex(orig_title))

orig_titles = [ "AET: An Exposé of Titles", "AET: An exposé of titles" ]

for cmd_line_arg in sys.argv[1:]:
    orig_titles.append(cmd_line_arg)


for orig_title in orig_titles:
    print("===")
    print("orig_title\n" + orig_title + "\n")
    print("utf8tolatex(orig_title)\n" + utf8tolatex(orig_title) + "\n")
    print("utf8tobibtex_title(orig_title)\n" + utf8tobibtex_title(orig_title) + "\n")

    print("Title in bibtex context")
    print("title={" + utf8tobibtex_title(orig_title) + "},\n")

phfaist · 2019-06-10T13:14:38Z

Thanks for your feedback. My impression is that the functionality that you're suggesting is a bit orthogonal to the purpose of pylatexenc.latexencode, which is meant to provide a lightweight and straightforward conversion of non-ascii chars into corresponding LaTeX encoding sequences. It sounds like your suggestion would only target a rather specific use case, namely the protection of acronyms in the generation of BibTeX entries.

However, I've been meaning to improve utf8tolatex() to allow to extend it to perform some smarter encodings, like transforming "..." (three dots) into "\ldots". The idea would be to have a way to specify custom rules, in the same spirit as the MacroDef's in pylatexenc.latex2text. I think this would be a good way to resolve your problem: You could specify a rule where a word with two or more capital letters get output with surrounding protective braces.

Hopefully I'll be able to get to this soon.

tvercaut · 2019-06-11T08:19:11Z

OK thanks. As this is not on the roadmap, I will close this ticket.

As an off-topic side note as well, I also tried pylatex.utils.escape_latex and saw that it encoded line breaks as \newline which was not what I wanted this time but is probably something useful in some contexts.

phfaist · 2019-07-26T12:24:40Z

Hi again. I'm working on a pylatexenc 2.0 release that would allow you to do what you were suggesting. Could you test the new version and see if it meets your needs? I'm happy to hear your feedback.

u = UnicodeToLatexEncoder(
    conversion_rules=[
        latexencode.UnicodeToLatexConversionRule(
            latexencode.RULE_REGEX,
            [ (re.compile(r'([{}])'), r'\1'), # keep existing braces
              (re.compile(r'\b([A-Z]{2,}\w*)\b'), r'{\1}'), ]
        ),
    ] + latexencode.get_builtin_conversion_rules('defaults')
)
result = u.unicode_to_latex(input_string)

See updated doc: https://pylatexenc.readthedocs.io/en/latest/latexencode/

To install the development version, clone the git repo, then in the cloned directory run the commands:

python setup.py sdist
pip install dist/pylatexenc-2.0b0.tar.gz

tvercaut closed this as completed Jun 11, 2019

tvercaut mentioned this issue Jun 14, 2019

Improved BibTeX generation openreview/openreview-py#317

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature request - option in utf8tolatex to maintain capitalisation for BibTeX #21

Feature request - option in utf8tolatex to maintain capitalisation for BibTeX #21

tvercaut commented Jun 8, 2019

furutaka commented Jun 8, 2019

tvercaut commented Jun 9, 2019

phfaist commented Jun 10, 2019

tvercaut commented Jun 11, 2019

phfaist commented Jul 26, 2019

Feature request - option in utf8tolatex to maintain capitalisation for BibTeX #21

Feature request - option in utf8tolatex to maintain capitalisation for BibTeX #21

Comments

tvercaut commented Jun 8, 2019

furutaka commented Jun 8, 2019

tvercaut commented Jun 9, 2019

phfaist commented Jun 10, 2019

tvercaut commented Jun 11, 2019

phfaist commented Jul 26, 2019