Add `\tl_lowercase:nn` and deprecate (non-expandable) `\tl_to_lowercase:n` #141

blefloch · 2013-02-03T16:31:03Z

Contrarily to all \<type>_to_<thing> functions, \tl_to_lowercase:n and \tl_to_uppercase:n, wrappers around the corresponding TeX primitives, are not expandable. It would be better to provide \tl_lowercase:nn, analogous to \tl_rescan:nn, with a first argument to hold the setup.

\cs_new_protected:Npn \tl_lowercase:nn #1#2
  { \group_begin: #1 \tex_lowercase:D { \group_end: #2 } }

Later, we can deprecate \tl_to_lowercase:n, and even later, rename \tl_expandable_lowercase:n to \tl_to_lowercase:n. Similarly for upper case.

The text was updated successfully, but these errors were encountered:

josephwright · 2013-02-04T08:08:21Z

I'm reasonably happy with this, provided we feel that requiring understanding of TeX's case-changing stuff is OK. I suspect that this is the realistic position: Will's earlier attempt as \tl_transform:nn (or something similar) did not really work. I guess the only question is about naming (as I think @FrankMittelbach is not so keen on \tl_rescan:nn anyway!).

I'm also keen we do provide some form of expandable case-change, even if we know it's very slow, as this is something that people want to be able to do and we do have the code.

blefloch · 2013-02-05T08:26:49Z

I don't think that we can abstract away TeX's case changing, unless we set all lccodes or all uccodes to 0, except those requested by users. What are lc and uc codes used for in TeX, apart from \lowercase and \uppercase?

On naming, would you be happier with \tl_use_rescanned:nn (and \tl_set_rescanned:Nnn) and \tl_use_lowercased:nn, or similar \tl_use_...?

wspr · 2013-04-02T11:37:37Z

I'd sort of forgotten the exact syntax for \tl_transform for case changing; I guess it was probably "lost" in the big bang; I can't remember if it turned into anything more than what's attached below. The gist was indeed to set all lccodes to 0 (whether locally or globally I guess) and then provide a wrapper macro to transform the contents as appropriate.

I don't know about "did not really work" — it was an abstraction but the question would be whether it was useful. Can probably argue this style of programming doesn't much belong in expl3, but to me it falls into the same grey zone as tl_rescan, which is useful for some but not usually for most.

\documentclass{article}
\usepackage{expl3}
\begin{document}
\ExplSyntaxOn

\cs_new:Npn \tl_transform:nn #1 {
  \group_begin:
    \tl_map_function:nN {
      \A\B\C\D\E\F\G\H\I\J\K\L\M\N\O\P\Q\R\S\T\U\V\W\X\Y\Z
    } \char_protect_uppercase:N
    \cs_set_eq:NN \char_transform:NN \char_transform_hidden:NN
    #1
    \tl_transform_aux:n
}

\cs_new:Npn \tl_transform_aux:n #1 {
    \tl_to_lowercase:n {
  \group_end:
  #1
  }
}

\cs_set:Npn \char_protect_uppercase:N #1 {
  \char_set_lccode:nn {`#1} {0}
}

\cs_new:Npn \char_transform_hidden:NN #1#2 {
  \char_set_lccode:nn {`#1} {`#2}
}

\tl_transform:nn{
  \char_transform:NN \a \b 
  \char_transform:NN \A \B 

  \char_set_catcode_active:N \~
  \char_transform:NN \~ \! 
}{
  a b c A B C
  \cs_set:Npn ~ {BANG}
}

\par
\char_set_catcode_active:N \!
!

\end{document}

josephwright · 2013-04-03T11:49:51Z

To be clear, by 'did not really work' I meant in terms of a fit with expl3 (indeed, we might decide the same about \tl_transform). At a technical level it did of course work!

I'd like to move on @blefloch's suggestion if we are all agreed: \tl_to_lowercase:n as it stands is a poor fit for expl3 and is one of the few rough edges in the code we have in l3kernel.

josephwright · 2013-04-04T13:03:12Z

I've added some code: see the comments there. May be worth raising on LaTeX-L.

I wonder if we might actually be better if the first argument is a mapping \tl_lowercase:nn { { `\A } { `\A } { `\B } { `\c } } { <stuff> } but then this has the issue that there are different ways of giving a charcode and if you go with the above you can only accept one (presumably as I've done above by number). Longer-term, the above might suggest we don't need `\char_set_lccode:n`, etc., at all, but I'm wary of that as there is an interaction between lccode/uccode and for example end-of-sentence spacing. Thoughts on all of this most welcome! git-svn-id: http://www.latex-project.org/svnroot/experimental/trunk@4478 de43f980-851b-0410-b2f7-c40aca1f87e0

blefloch · 2013-04-06T14:32:53Z

(Oops, wall of text ahead.) I think we need to think about why we need analogs of \uppercase or \lowercase. Right now, I can think of three distinct tasks for which one could want case-changing:

Building weird character tokens. This can be done in two ways, either with \lowercase/\uppercase in the usual way, or by defining a temporary helper, for instance, a function to strip the trailing catcode-12 "pt" from a token list such as "1pt" could be defined in at least two ways:
```
\group_begin:
\char_set_uccode:nn { `\+ } { `\p }
\char_set_uccode:nn { `\- } { `\t }
\tex_uppercase:D { \group_end:
  \cs_new:Npn \@@_strip_pt:w #1 + - {#1} }

\cs_set_protected:Npn \@@_tmp:w #1
  { \cs_new:pn \@@_strip_pt:w ##1 #1 {##1} }
\exp_args:No \@@_tmp:w { \tl_to_str:n { pt } }
```
The first way is more general, and allows to build almost any weird catcode-charcode combination, with the following exceptions: one cannot get a character code 0 from \tex_uppercase:D, and one cannot get catcode 10 characters after control sequences (except single-character csnames), or several such characters in a row, or two catcode 10 characters with two distinct catcodes, because of how TeX normalizes all catcode-10 characters to character code 32 upon input. In particular, the auxiliaries we use to strip spaces from token lists or comma-lists are defined in the second way. This second way is less general but suffices for the great majority of cases (e.g., in all those weird conditionals in l3token).
Uppercasing a title or other piece of text. Doing this properly requires to understand better the structure of the text that is being uppercased, so as to avoid uppercasing environments' names, mathematics, etc. Also, a general approach should allow for title-casing, which opens a whole new can of worms. All this, I believe, should not be done when operating on token lists, but rather in some slightly later stage of processing.
Lowercasing some piece of text, for instance, to canonicalize names or words for sorting, or to work with case insensitive file systems. Well, simply forgetting case is not enough to sort properly anyway, and for indexes we need to think quite a lot about giving users the option to coalesce names, and we need to take regional differences in the alphabetical order. When working with the file system, I would say that we want a fixed dictionary between upper and lower case, which should not be affected by the lccode and uccode. Also, this should happen when working on strings of characters, since the os does not know what a token is.

I might be reducing the questions of uppercasing and lowercasing to very small special cases, and if so, correct me. My current impression, though, is that we need two functions: one to lowercase in a controlled way a string of characters, and one to produce weird tokens. It does not make sense to me to define such a function to define weird tokens with lowercase or uppercase explicitly in its name. Thus I am quite fond of Will's \tl_transform:nn, at least as a rough starting point.

On the question of how to implement it, I've asked on TeX.sx to know when TeX uses each code. It may be possible to set uccodes to 0 during the whole TeX run, so that the function only has to apply the setup requested by the user.

From Joseph's commit,

\tl_lowercase:nn
  {
     { `\A } { `\A }
     { `\B } { `\c }
  }
  { <stuff> }
but then this has the issue that there are different ways of
giving a charcode and if you go with the above you can only accept
one (presumably as I've done above by number).

I am not sure whether to go with Will's version of \tl_transform:nn,

\tl_transform:nn
  {
    \char_transform:nn { `A } { `A }
    \char_transform:nn { `B } { `c }
    \char_set_catcode_active:N \@
    \char_transform:nn { `\@ } { `\% }
  }
  { \cs_set_protected:Npn @ { BANG } }

or with

\group_begin:
\char_set_catcode_active:N \@
\tl_transform:nn
  { { `A } { `A }  { `B } { `c }  { `\@ } { `\% } }
  { \group_end: \cs_set_protected:Npn @ { BANG } }

or with one argument for character code changes and one for other setups,

\tl_transform:nnn
  { \char_set_catcode_active:N \@ }
  { { `A } { `A }  { `B } { `c }  { `\@ } { `\% } }
  { \cs_set_protected:Npn @ { BANG } }

or perhaps in such cases one should mix \tl_rescan:nn (which does something a little bit different):

\tl_rescan:nn
  { \char_set_catcode_active:N \@ }
  {
    \tl_transform:nn
      { { `A } { `A }  { `B } { `c }  { `\@ } { `\% } }
      { \cs_set_protected:Npn @ { BANG } }
  }

\tl_transform:nn
  { { `A } { `A }  { `B } { `c }  { `\@ } { `\% } }
  {
    \tl_rescan:nn
      { \char_set_catcode_active:N \% }
      { \cs_set_protected:Npn @ { BANG } }
  }

Longer-term, the above might suggest we don't need \char_set_lccode:n,
etc., at all, but I'm wary of that as there is an interaction between
lccode/uccode and for example end-of-sentence spacing.

None there, but with hyphenation, at least. See the TeX.sx question linked above for details if anyone answers.

wspr · 2013-04-07T07:06:25Z

I might be reducing the questions of uppercasing and lowercasing to very small special cases, and if so, correct me. My current impression, though, is that we need two functions: one to lowercase in a controlled way a string of characters, and one to produce weird tokens. It does not make sense to me to define such a function to define weird tokens with lowercase or uppercase explicitly in its name. Thus I am quite fond of Will's \tl_transform:nn, at least as a rough starting point.

I like the thought process behind the different syntaxes here, and I agree with you that we're basically talking about two different things and we should design the syntax for these to be either appropriate to both or have two separate commands for the two separate ideas.

I still lean towards a generic "setup" argument, since you never know what else you'll want to do in there, such as redefine macros or provide "local-only" definitions.

If you wanted a shorthand for input char mapping, instead of

\char_transform:nn {`\A}{`\B}
\char_transform:nn {`\C}{`\D}

etc, a wrapper would be fairly tidy I guess:

\tl_transform:nn
 { \char_transform:n { {`\A}{`\B} {`\C}{`\D} } ... }
 { ... }

josephwright · 2013-04-09T08:22:51Z

My concern with a generic 'set up' is that you are then mixing stuff up. I can see an argument for this, as some effects are otherwise tricky to achieve, but do want to be sure that's what we are after.

I'd agree with Bruno's analysis that there are distinct cases. Suggests to me that we shouldn't add anything new with 'lower/uppercase' in the name at the moment, so I'm going to back-out the additions.

As discussed in issue #141, there are separate use cases for the primitives here, and it's likely we want to cover the 'odd category code' case with a name which reflects this. git-svn-id: http://www.latex-project.org/svnroot/experimental/trunk@4481 de43f980-851b-0410-b2f7-c40aca1f87e0

josephwright · 2015-07-23T06:27:08Z

As agreed at TUG2015, do the deprecation and ask for real use cases looking forward. Talk to Ulrike Fischer.

josephwright · 2015-11-09T13:53:02Z

Progress update: most of the use cases will be removed from expl3 this week although a second round will be needed later to get rid of some \tex_lowercase:D that might go in favour of \char_generate:nn but at present are awkward (mainly due to XeTeX).

josephwright · 2017-11-16T15:49:03Z

This was done a while ago.

blefloch added feature request labels Jul 25, 2014

josephwright self-assigned this Jul 15, 2015

josephwright closed this as completed Nov 16, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add `\tl_lowercase:nn` and deprecate (non-expandable) `\tl_to_lowercase:n` #141

Add `\tl_lowercase:nn` and deprecate (non-expandable) `\tl_to_lowercase:n` #141

blefloch commented Feb 3, 2013

josephwright commented Feb 4, 2013

blefloch commented Feb 5, 2013

wspr commented Apr 2, 2013

josephwright commented Apr 3, 2013

josephwright commented Apr 4, 2013

blefloch commented Apr 6, 2013

wspr commented Apr 7, 2013

josephwright commented Apr 9, 2013

josephwright commented Jul 23, 2015

josephwright commented Nov 9, 2015

josephwright commented Nov 16, 2017

Add \tl_lowercase:nn and deprecate (non-expandable) \tl_to_lowercase:n #141

Add \tl_lowercase:nn and deprecate (non-expandable) \tl_to_lowercase:n #141

Comments

blefloch commented Feb 3, 2013

josephwright commented Feb 4, 2013

blefloch commented Feb 5, 2013

wspr commented Apr 2, 2013

josephwright commented Apr 3, 2013

josephwright commented Apr 4, 2013

blefloch commented Apr 6, 2013

wspr commented Apr 7, 2013

josephwright commented Apr 9, 2013

josephwright commented Jul 23, 2015

josephwright commented Nov 9, 2015

josephwright commented Nov 16, 2017

Add `\tl_lowercase:nn` and deprecate (non-expandable) `\tl_to_lowercase:n` #141

Add `\tl_lowercase:nn` and deprecate (non-expandable) `\tl_to_lowercase:n` #141