Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BibTeX export: Output non-ascii characters wrapped in curly braces: {\AA} instead of \AA{} #620

Closed
skjaeve opened this issue Dec 29, 2016 · 10 comments

Comments

@skjaeve
Copy link

skjaeve commented Dec 29, 2016

BBT version 1.6.89, Firefox 50.1.0, Fedora Linux 25

When exporting from Zotero using the "Better BibTeX" exporter, non-ascii characters are converted to TeX codes, f.ex "Å" is converted to "\AA" and "æ" to "\ae". But these characters are not wrapped in curly braces, so bibtex/bibtex8 don't parse them properly. This is an issue when generating initials from first names.

Example: Exported entry in .bib file:

@article{skjaeveland2011,
  title = {On the Relationship between Flux Transfer Events, Temperature Enhancements and Ion Upflow Events in the Cusp Ionosphere},
  doi = {10.1029/2011JA016480},
  timestamp = {2016-05-02T13:20:15Z},
  urldate = {2011-08-23},
  journal = {J. Geophys. Res.},
  author = {Skj\ae{}veland, \AA{}smund and Moen, J\o{}ran Idar and Carlson, Herbert C.},
  year = {2011},
  keywords = {2407 Auroral ionosphere,2475 Polar cap ionosphere,2704 Auroral phenomena,2706 Cusp,anisotropy,Cusp,EISCAT,heating,upflow},
  pages = {A10305}
}

becomes this in .bbl file:


\bibitem[{\textit{Skj\ae{}veland et~al.}(2011)\textit{Skj\ae{}veland, Moen, and
  Carlson}}]{skjaeveland2011}
Skj\ae{}veland, A., J.~I. Moen, and H.~C. Carlson (2011), On the relationship
  between flux transfer events, temperature enhancements and ion upflow events
  in the cusp ionosphere, \textit{J. Geophys. Res.}, p. A10305,
  \doi{10.1029/2011JA016480}.

i.e. "\AA{}smund" is truncated to "A" and not to "\AA{}". If I manually edit the .bib file to change the author line:
author = {Skj\ae{}veland, {\AA}smund and Moen, J\o{}ran Idar and Carlson, Herbert C.},

the .bbl file has the correct initial:
Skj\ae{}veland, {\AA}., J.~I. Moen, and H.~C. Carlson (2011), On the

and the final bibliography looks correct.
(\ae{} and \o{} should probably also be wrapped in curly braces, but it doesn't cause any issues for me.)

References:
http://tex.stackexchange.com/questions/57743/how-to-write-%C3%A4-and-other-umlauts-and-accented-letters-in-bibliography
http://tex.stackexchange.com/questions/62522/capital-%C3%98-scandinavian-letter-in-bibtex

@retorquere
Copy link
Owner

@skjaeve a quick workaround would be to add a end of guarded area character just after the Å which should cause BBT to output it as Skj\ae{}veland, {\relax \AA{}}smund which I think will force the proper initial handling.

Are you using Better BibTeX or Better BibLaTeX BTW? Not that it will make much difference unless you're on a very new version of biblatex (biber 2.7+/biblatex 3.5+). If you are, you can enable extended name format in the settings, add the end of guarded area marker, and that should take care of the initials problem.

@njbart, opinions? That A is deduced to be the initial from \AA{}smund sounds like a biblatex bug to me.

@skjaeve
Copy link
Author

skjaeve commented Dec 29, 2016

This is the "Better BibTeX" exporter. It's not an issue with the "Better BibLaTeX" exporter, which exports without converting characters and understands Unicode. Unfortunately, switching to BibLaTeX is not an option as the journal wants BibTeX format. The "extended name format" seems to make no difference for BibTeX.

I'm using biblatex 3.7/biber 2.6 from TeXLive 2016 when I'm using BibLaTeX, and bixtex8 3.71.

(I've corrected confusion of bibtex/biblatex in the text.)

@retorquere
Copy link
Owner

@njbart sorry to bother you again but I really can't move forward on this without your input.

@njbart
Copy link
Contributor

njbart commented Jan 25, 2017

So this is about bibtex, not biblatex? With all due caveats (I'm not using bibtex myself any longer, and I haven't done any testing at all): My guess is that accented characters do need to be wrapped in braces for bibtex to handle them properly.

See Patashnik, Oren. 1988. ‘BibTeXing’. http://mirrors.ctan.org/biblio/bibtex/base/btxdoc.pdf, p. 3 f.:

  1. BibTEX now handles accented characters. For example if you have an entry with the two fields
     author = "Kurt G{\"o}del",
     year = 1931,

and if you’re using the alpha bibliography style, then BibTEX will construct the label [Göd31] for this entry, which is what you’d want. To get this feature to work you must place the entire accented character in braces; in this case either {\"o} or {\"{o}} will do. Furthermore these braces must not themselves be enclosed in braces (other than the ones that might delimit the entire field or the entire entry); and there must be a backslash as the very first character inside the braces. Thus neither {G{\"{o}}del} nor {G\"{o}del} will work for this example.

@retorquere
Copy link
Owner

OK, thanks. Technically it's not a big change to switch from \"o{} to {\"o} (it's just a precedence rule order change), but it messes with the post-translation cleanup algorithm.

BBT works on a char-by-char basis, so Gææst is first translated to G\ae{}\ae{}st, and then the post-translation cleanup changes every {}<not a letter or a number> to <not a letter or a number> yielding the cleaner G\ae\ae{}st.

Changing the precedence rules to favor outer braces would give me G{\ae}{\ae}st and I can't safely change }{<not a letter or a number> to <not a letter or a number>. So both bibtex and biblatex output would get a lot more verbose, as they use the same translation tables.

Trying to find a way I could still safely do cleanup, but I don't currently see it.

@retorquere
Copy link
Owner

Can you try https://github.com/retorquere/zotero-better-bibtex/releases/download/builds/zotero-better-bibtex-1.6.91-br620-3484.xpi? It should do what you want at minimal damage to the cleanup process.

@skjaeve
Copy link
Author

skjaeve commented Jan 29, 2017

This works as expected, problem appears to be solved from my point of view,

@retorquere
Copy link
Owner

OK, super. I have tests running on the merge but I don't see why they shouldn't just pass (famous last words). What this change does is that it uses {\...} for accented characters (for which I user the rule character code is 0x00C0 .. 0x024F but not 0x00D7 or 0x00F7, and the \...{} form for anything else.

@retorquere
Copy link
Owner

OK, it's been merged to master and will be released somewhere before tomorrow evening. I'm trying to take a few more changes into the next release.

@github-actions
Copy link

This issue has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Feb 17, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

3 participants