-
Notifications
You must be signed in to change notification settings - Fork 183
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refine titlecasing #1247
Refine titlecasing #1247
Conversation
@moewew Sorry I've changed the plan a bit. This one I think would not require any immediate |
@gmilde Hoepfully more to your taste: no immediate change to existing functions. |
Note that I've not at present added a 'word exception' mechanism, though that is trivial if we stick with the 'word = between two spaces' approximation. Thoughts on this aspect welcome. |
So just to be sure: This implements Does "capitalise" here means roughly (barring special cases like Dutch "IJ") uppercase first letter and leave the rest alone or does it mean uppercase first letter and lowercase the rest?
|
Is it necessary to introduce "titlecase_once".? |
@car222222 I was thinking of the existing |
I think the argument of using consistent names is compelling even if "firstword" is slightly clearer if just seen without context or without knowledge of expl3. We use "once" always in the sense of "first occurance" and thus using the standard anme is on the whole better in my opinion. |
@FrankMittelbach I am happy that "once" always means the first occurrence, but here there is nothing to indicate clearly what is occurring! Using "wordonce" (better "oneword") and "allwords" would be clearer. I fact replace_first would have been much better:-). |
In the case of replace, it is very likely that the one to be replaced (once) is the last occurrence rather than the first. |
I'm happy if we want to go with |
@josephwright Do you mean other occurrences of "once", Note also that in this case we still need to indicate the object affected, i.e., "word", since the term "titlecase" does not obviously have much relationship with words. |
Correct.
Broadly uppercase first, leave rest alone. @gmilde suggested that on balance when you look across different languages, leaving non-initial characters alone is likely best. (Also, Unicode only describe titlecase in terms of the action at the start of a word, and do not mention the effect on the remain of the input.)
Current suggestion would retain |
But "capitalise" still does not convey that it is applied to words! |
@car222222 On `titlecasing', I think the Unicode FAQ are clear it's a word-based property:
I'm not sure how one could describe 'capitalisation' in a way that didn't link to words. |
To me, capitalisation means much the same as uppercasing. |
capitalize: write or print (a word or letter) in capital letters But also: to write a letter of the alphabet as a capital, or to write the first letter of a word as a capital And: To capitalize a word is to make its first letter a capital letter So again, it seems that getting "word" in there will remove all ambiguities. |
Something completely different, try: _initialcap_allwords _initialcap_firstword |
@moewew I wonder if it';s best for me to set up to internally-optimise the |
Hmmm, with my My personal opinion is that it would be cool to have the option to stack |
@moewew I'm thinking of using a marker so text expansion only needs to happen once. If you are happy with a 'two part' approach, I think this looks cleaner: we simply the naming, etc. Now i need to see if the PR is signed off: as we are at TUG from Thursday p.m., I'll ask the team in person. |
How can it get Signed Off now? When the code and names are decided, then it will need something to fill this void:
|
Code is finished at this point: I am not planning to include the 'skip words' idea at this stage.
The documentation for all of the |
So should you remove this empty environment from this file? Or put therein the reference to l3text.dtx (where I shall now check). |
Here is some further information on use of the term 'titlecase' in and around Unicode (and a little more generally). @josephwright wrote (elsewhere): the Unicode FAQ does use 'titlecase' for the idea of 'capitalising': https://unicode.org/faq/casemap_charprop.html#4
The above referenced FAQ does indeed also allude (without a Elsewhere, the term is used, occasionally and inconsistently, for the somewhat ill-defined idea of transforming “a title” by uppercasing (or, in sophisticated cases, titlecasing) the start of some more or less well-defined selection of the words. I could not find any examples of its use in relation to the capitalisation of all words in multi-word text. |
Right, rebasing, etc. for a merge (@moewew) |
3fd6554
to
5095b47
Compare
There are a couple of references to I don't think I'll manage to get the current GitHub Can you please tell me what \documentclass{article}
\ExplSyntaxOn
\def\test#1{\text_titlecase_first:n{\text_lowercase:n{#1}}}
\ExplSyntaxOff
\begin{document}
\test{Lorem ipsum Dolor sit Amet A}
\test{lorem ipsum Dolor sit Amet A}
\end{document} produces with this PR merged? |
No - I'll tidy those up - we had a bit of back-and-forth about
No pressure - we just did a release, I can sit on this for a while.
Both will give |
Can you tell me what \documentclass{article}
\usepackage{csquotes}
\ExplSyntaxOn
\bool_set_false:N \l_text_titlecase_check_letter_bool
\def\test#1{%
\text_titlecase_first:n{\text_lowercase:n{#1}}\par
\text_titlecase:n{#1}}
\ExplSyntaxOff
\begin{document}
\test{\enquote{lorem ipsum}}
\test{\enquote*{lorem ipsum}}
\end{document} gives with the PR merged? I get
where the "‘l’orem ipsum" is not what I want. That means that at least in the current version of the kernel the nested |
@moewew |
Hmmm. To be honest, I'm a bit lost here. I just thought |
@moewew We never remove any functions, although here there are some edge-case changes in behaviour as deprecation goes with shifting to emulation. If you need a function that lowercases first and want that to work before and after the update, and will never give a deprecation warning, something like \cs_if_exist:NTF \text_titlecase_all:n
{
\cs_new:Npn \__mypkg_tilelcase:n #1
{ \text_titlecase_first:n { \text_lowercase:n {#1} } }
}
{ \cs_new_eq:NN \__mypkg_tilelcase:n \text_titlecase_first:n } would work. However, one of the questions I was trying to sort early on is to what extent that is actually required. The suggestion was that lowercasing 'the remainder' is likely not that useful, as the typical pattern is only to worry about the first character ( |
We need it to turn titles into sentence case. Entries in the |
@moewew what is the biblatex approach currently to handle uppercase letters that supposed to stay unchanged, say, something like IBM in the title? |
@moewew Is the main concern the performance of |
@FrankMittelbach The default is to use curly braces thanks to some very clever code by @josephwright and @blefloch that essentially turns @josephwright Performance is a secondary concern, though I will admit that I have wondered how much the new two-pass scheme will affect performance. My primary concern is backwards compatibility. We need to retain the documented behaviour of I don't really care how this is implemented either on the If using the deprecated function will not change behaviour significantly, will work in the foreseeable future and will not generate warnings that users could complain about, I'm perfectly fine doing that. If using deprecated functions is generally frowned upon or may cause warnings (e.g. with stricter |
@moewew I think I can solve the \cs_gset:Npn \__text_change_case_break:w #1 \q__text_recursion_stop
{
\__text_change_case_break_aux:w ? #1
}
\cs_gset:Npn \__text_change_case_break_aux:w #1 \q__text_recursion_tail
{
\__text_change_case_store:o { \use_none:n #1 }
\__text_change_case_end:w
} With that, you should be able to use The issue I've been having is that 'titlecase' really means just changing the first char, and depending on the exact usage it may or may not be expected to lowercase the remainder of the input. I really would rather split the two concepts. If there is a performance issue, I can see a way to do a lookahead and shortcut some of the code that would be repetitive. |
@josephwright random optimisation note: Use |
Sure. As I said, I don't really care what exact code we use, all I care about is that it does what we need to do and we use supported code/follow best practices. From what I gather so far this should give us what we need. (I don't have time to test this at the moment. Hopefully over the weekend...)
Fair enough. It felt a bit odd to me to use a macro with
Performance improvements would be cool, but I have no idea if this would really impact users significantly. (After all, |
@moewew OK, I will hold off from a release until at least the start of next week - probably will do one soon-ish as this is a non-trivial change. |
If a new |
@moewew Ah, right, yes: I'll still wait to make sure there's nothing unexpected |
we no longer use \text_titlecase:n for sentence casing. See <latex3/latex3#1247>.
Prepared plk/biblatex#1310 for |
@moewew Sounds good: I will do a release today |
Follows from discussion in #1240. Here, a clearer split occurs between titlecasing one an multiple words without any additional naming needed. It also avoids any change to formal behaviour of the existing functions: note though that the new functions do not lowercase the remainder of their input.