New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Capitalization: Capitalize all title-fields for language "en" #383
Comments
Do you mean for CSL JSON or for BBT? I'm not entirely certain about capitalizing user input; my idea is that BBT discloses user intent as best as possible given the impedance mismatch between the formats. User intent for capitalization is, I think, best expressed by the user capitalizing titles as desired. |
If users enter To get the same in bibtex and biblatex, there is no other option than to convert the title to This, I would argue, respects user intent as best as possible. |
Interesting. Which processor renders it that way? Not BibTex then. I'm still not entirely convinced. Adding braces around If you input |
BTW, how should this interact with caps preservation? Surely you wouldn't want |
Or would you want non-capitalised non-filler words to be capitalised, and capitalised non-filler words to be braced? How about something like iPod? This would be capitalised in this scheme. I'm not too keen on the |
Ugh, I can't add things to the reference edit pane without some crazy shady monkey patching. That is going to be too brittle. On the whole reference is not a problem though. |
Why specifically for english though? Doesn't this apply to other languages equally? |
I have some ideas on how to get this to work, but I'll probably put it behind a preference |
Is the list of fields that should be capitalised the same as the list that should get preserve caps? |
To clarify your earlier question, this doesn't need to be applied to CSL JSON, since citeproc handles the capitalization already; Zotero recommends that all titles be stored sentence-case. |
That was my earlier point actually. Why not have the user store the titles sentence-cased in the first place? |
That is the official recommendation. |
I've taken a few stabs at it but it gets increasingly messy and fragile. I'm sorry, but I'm not going to honor this one. |
No, only English has both title-case and sentence-case styles. |
Exactly.
EDIT: “iPod” shouldn’t be capitalised by BBT, and it should be protected. |
Yes. bib(la)tex needs titles in title case, and those words that must not be lowercased again by sentence-case styles such as biblatex-apa need protection. |
That’d be a pity. It’s necessary since the conventions of bib(la)tex and CSL are incompatible: bib(la)tex expects titles in title-case, and words that must not be lowercased must be protected, but CSL expects titles in sentence-case, and words that must not be uppercased must be protected. (The latter doesn’t happen so very often, but without protection CSL title-case styles would turn, e.g., “nm” (nanometer) into “Nm” (Newtonmeter), something that should really be avoided.)
Over at pandoc, we’ve been through this whole exercise when writing pandoc-citeproc’s biblatex -> CSL converter (the inverse of what I’d like BBT to do), but it’s not that complicated after all, and seems to work great. |
But how would I know that iPod should be excluded from capitalization? And why would it not be better to assume title-case and convert title-case to sentence-case for CSL? That seems to be a lot simpler to me. |
This one is not going to be easy. It will require rethinking of the way I convert the HTML-ish input to LaTeX. |
Sweet, that's simple the current behavior |
There's still a fair number of cases where I think the title caser doesn't do the right thing: https://bitbucket.org/fbennett/citeproc-js/issues/191/a-is-uppercased-in-the-title-caser |
I've worked around most of those by feeding the titlecaser just plain text. So we're getting close on this one. What should be done with |
Is there a list of words that biblatex expects to be lowercase in titlecase? I know "and" and "or" are supposed to keep downcased, but what about words like "after"? |
I’m still puzzled why you seem to be having such difficulties with Still,
Hmm, in Zotero, from a title
Again, I think that’s a citeproc-js bug.
There’s no official bib(la)tex list; bib(la)tex expects the user to enter titles in correct title case (which some styles then convert to sentence case; never the other way around). Style manuals differ a little here, but the citeproc-js list of small words is a good approximation. |
BTW, citeproc-js is currently changing some of the titlecaser’s details, and from what it looks like neither quotes nor parentheses, nor HTML-like markup will protect against case conversion from now on. See |
It does, but sometimes it just dies when I feed it valid input; other times, it just doesn't title-case right; it seems the
The official recommendation is however to enter titles in sentence-case, right? So that would have to be
OK, so I could just wait this one out.
but then what is "correct title case"? I'm going with the smallwords from the CSL titlecaser, in any case, .... wow, that thread is active! The progression there seems promising, so I'll just wait for the results of that, but there's another reason I may want to feed only plaintext to the title caser; BBT supports |
BTW the title caser doesn't deal with words in quotes consistently;
I've added both cases to the citeproc-js issue tracker, but it looks like I can't post to the xbiblio thread you linked to. |
You have to subscribe at https://lists.sourceforge.net/lists/listinfo/xbiblio-devel. |
Ah, mailing list, not forum. Looks like a lot of these issue were in fact already handled, I've pulled in the latest citeproc and things look near perfect. Tests running again. |
OK, so just 6 or so more title caser problems and this feature should be finished. |
I've released the other recent changes we concocted as part of 1.6.6; I'll release this one when the tests go green, pending changes in the citeproc titlecaser. You seem to be in the loop on this -- can you alert me when you think something has changed? I'm also watching the citeproc-js issues list. |
Activity on the citeproc title caser has been a little low lately, so I've given another one a shot; only these cases do not pass, and if I remove "that" from the shortwords list (the CSL title caser does have it, but it seems to be smart about "that is") I get this. Neither is perfect, but the first seems preferable over the existing title caser. What do you think? |
Sorry, that should have been this for the version that doesn't have "that" in the smallWords list. |
Adding "their" to the smallwords list leaves a single failing case but one that also fails in the same way with the CSL title caser. |
I see no activity on citeproc-js currently, and the alternative titlecaser passes all my tests, so I've merged to master. Next release will have the feature. |
@nickbart1980 says:
The text was updated successfully, but these errors were encountered: