Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

url package is tagged as Formula if the math code is loaded #5

Closed
u-fischer opened this issue Jul 24, 2023 · 27 comments
Closed

url package is tagged as Formula if the math code is loaded #5

u-fischer opened this issue Jul 24, 2023 · 27 comments
Labels
currently incompatible package or class package or class that doesn't work with current version of tagging code fixed in release issue is fixed and will be deployed in the next release of package or kernel

Comments

@u-fischer
Copy link
Member

u-fischer commented Jul 24, 2023

\url uses internally math mode and this is grabbed and tagged as Formula. The problem is that url does set \m@th but inside the math at the end:

\def\Url@FormatString{%
 \UrlFont \Url@MathSetup 
 $\fam\z@ \textfont\z@\font
 \expandafter\UrlLeft\Url@String\UrlRight
 \m@th$% <--------------  
}%

This can be corrected by moving \m@th into \Url@MathSetup, but I wonder if a dedicated command to mark "fake math" is needed, and if the code should detect \m@th inside dollars?

\DocumentMetadata{uncompress,testphase={phase-III,math}}
\documentclass{article}
\usepackage{url}
\begin{document}

\url{https://www.latex-project.org} %tagged as formula
\makeatletter
\AddToHook{cmd/Url@MathSetup/before}{\@math}
\makeatother

\url{https://www.latex-project.org} %ok only text.

$a=b$
\end{document}
@car222222
Copy link

@davidcarlisle once commented that putting \m@th at the end of the math is quite common.

Please can you tell me where to find the code you are using to "grab math stuff" : file, location and branch. Thanks.
Then I can perhaps look into how we might detect such cases.

@car222222
Copy link

Is it obvious that the use of mathmode is a necessary, or wise, method by which to format a URL?

@FrankMittelbach
Copy link
Member

FrankMittelbach commented Jul 24, 2023

@davidcarlisle once commented that putting \m@th at the end of the math is quite common.

Please can you tell me where to find the code you are using to "grab math stuff" : file, location and branch. Thanks. Then I can perhaps look into how we might detect such cases.

develop branch latex-lab/latex-lab-math.dtx (or something like that)

@FrankMittelbach
Copy link
Member

FrankMittelbach commented Jul 24, 2023

Is it obvious that the use of mathmode is a necessary, or wise, method by which to format a URL?

may be historic and Donald's style of coding, you can do some tricks if you pretend you are in math

@davidcarlisle
Copy link
Member

Is it obvious that the use of mathmode is a necessary, or wise, method by which to format a URL?

it's not for math as such but for \mathcode"8000 math-active characters, which could, perhaps, be replaced by \scantokens and real active characters these days but url.sty math mode processing has a very long history.

@u-fischer
Copy link
Member Author

Is it obvious that the use of mathmode is a necessary, or wise, method by which to format a URL?

I hope to convince someone (@josephwright) at some time to write a replacement, the current implementation for example can't handle unicode properly, but for now we have to take what is there.

@josephwright
Copy link
Member

@u-fischer OK, I'll add it to the to-do list. Could we start a formal spec list somewhere so I don't start on the wrong path?

@FrankMittelbach FrankMittelbach changed the title url is tagged as Formula if the math code is loaded url package is tagged as Formula if the math code is loaded Jul 24, 2023
@FrankMittelbach FrankMittelbach added the currently incompatible package or class package or class that doesn't work with current version of tagging code label Jul 24, 2023
@josephwright
Copy link
Member

I think Donald is using math mode to get fine control of line breaking: not sure if a non-math mode solution is really viable as a result.

@car222222
Copy link

Yet another strange way to make use of TeX's abilities!

@FrankMittelbach
Copy link
Member

sure we all came up with tricks like that to make things work in limited space (this is why I said using math mode for tricks because it was a bit more that the math active). @josephwright I'm not sure that it would be that hard conceptually by parsing through the url token by token. On the other hand, it might be simpler to just accept that mmode is sometimes misused for its technical possibilities and all we would need to do for this is to have a reasonable simple flag to ensure that no tagging happens (which could be a simple as requiring \m@th as the first token after $ or some dedicated \NotMath).

@josephwright
Copy link
Member

@FrankMittelbach I think on reflection you are fight. My feeling is we really should look to 'fix' these uses but we also need to at least try to 'handle' them. So yes, some form of flag to say 'not maths' is a good idea, but we should also try over time to re-implement the 'abuses' so we don't need math mode: I suspect a 'not maths' flag will have edge cases where it falls.

@u-fischer
Copy link
Member Author

So yes, some form of flag to say 'not maths' is a good idea

Yes, a flag to say "not math" is a good idea, and also a flag "this is math", to overwrite the \m@th detection. Perhaps the private boolean used currently inside \m@th should be made public?

@car222222
Copy link

car222222 commented Jul 25, 2023

I just checked and: @josephwright did fix the \m@th detector, using \tl_if_in:nnF, so that it should now be effective anywhere within the grabbed math.

So there must be some other reason why it grabs the math in this case.

@FrankMittelbach
Copy link
Member

FrankMittelbach commented Jul 25, 2023

Why would you need to overwrite the \m@th detection? If the mechanism is that \m@th has to be the first token after $ then a simple correction is to use \relax\m@th for existing code. Of course you could give this \relax a name such as \IsMath :-)

@FrankMittelbach
Copy link
Member

FrankMittelbach commented Jul 25, 2023

I just checked and: @josephwright did fix the \m@th detector, using \tl_if_in:nnF so that it should now be effective anywhere within any math.

So there must be some other reason why it grabs the math in this case.

interesting ... some grouping that interfers perhaps?

@car222222
Copy link

Not sure: I did not check the details of what \tl_if_in:nnF actually does if the tl contains brace-groups:-).
I just assumed that it recurses into them.

@u-fischer
Copy link
Member Author

I just checked and: @josephwright did fix the \m@th detector, using \tl_if_in:nnF, so that it should now be effective anywhere within the grabbed math.

So there must be some other reason why it grabs the math in this case.

If I do the following the first three are tagged as math, only the last one is normal text:

\DocumentMetadata{uncompress,testphase={phase-III,math}}
\documentclass{article}
\begin{document}
\makeatletter
$ a=b$

$\m@th c=d $

$e=f \m@th$

{\m@th $g=h$ }
\end{document}

@josephwright
Copy link
Member

@u-fischer We had some back-and-forward about \m@th, with the result being at the time we decided it only meant 'not maths' if it came immediately before the $. That was because it also shows up in 'real' maths (amsmath, etc.).

@FrankMittelbach
Copy link
Member

I'm surprised you say "before" and also surprised if that is what amsmath does, because it means you have to add an extra (unnecessary) group to keep the change local, which you get for free if you put it inside the dollars. So I always thought we implemented $\m@th as the legacy indicator.

However, seeing how it is used in amsmath I guess you are right and we should not look for \m@th inside, because quite often there is is in fact use to produce math but with mathsurround forced to zero. So that does in fact mean we need some flag (outside of the $...$ probably) that signals that the next $...$ is not or is math.

@josephwright
Copy link
Member

As a reminder, we find in amsmath for example both

\def\@mathmeasure#1#2#3{\setbox#1\hbox{\frozen@everymath\@emptytoks
    \m@th$#2#3$}}

and

\def\plainroot@#1\of#2{\setbox\rootbox\hbox{%
 $\m@th\scriptscriptstyle{#1}$}%
\def\r@@t#1#2{\setboxz@h{$\m@th#1\sqrtsign{#2}$}%
 \dimen@\ht\z@\advance\dimen@-\dp\z@
 \setbox\@ne\hbox{$\m@th#1\mskip\uproot@ mu$}%
 \advance\dimen@ by1.667\wd\@ne
 \mkern-\leftroot@ mu\mkern5mu\raise.6\dimen@\copy\rootbox
 \mkern-10mu\mkern\leftroot@ mu\boxz@}

@car222222
Copy link

@josephwright wrote:
We had some back-and-forward about \m@th, with the result being
at the time we decided it only meant 'not maths' if it came immediately before the $.

Maybe we did, but then later it got changed, as I just explained, to not grab the any math that contains it. I did actually check
the code for this.

Also, which $, the first or last. Anyway, that is certainly not what the current implementation.

That was because it also shows up in 'real' maths (amsmath, etc.).

I one believed that took but I have not yet found any examples of this, at least not in amsmath.

Where it does occur is in connection with math that is itself contained in "faketext", such as when an alignment or an hbox is used within math to format some purely mathematical construct,
with no real text involved.

Some time ago I wrote an essay to describe a reasonably robust method of distinguishing (within math) between such faketext and real text within the math. This should be implemented so that real math within real text should be grabbed in cases where the current setup does not grab it.

I wonder where that file is now?

@car222222
Copy link

For emphasis:
I believe that the convention that "it must occur immediately before a $" was never on the table.
I am reasonably sure that at one stage the code checked only whether it came immediately after the opening $ or $$ (or
before the math, of course).

@car222222
Copy link

@FrankMittelbach wrote:

However, seeing how it is used in amsmath . . . because quite
often there it is in fact use to produce math but with
mathsurround forced to zero.

Please can you point to any example of this that is not immediately inside "faketext", such as an hbox or vbox+array. I really need to find any such examples if they exist.

@car222222
Copy link

@josephwright All of those are examples of "faketext", as I explained.

@car222222
Copy link

car222222 commented Jul 25, 2023

Also, of course, if the \m@th is buried in a definition like these examples, then it will not be found by just scanning the top-level math contents.

@u-fischer
Copy link
Member Author

@u-fischer We had some back-and-forward about \m@th, with the result being at the time we decided it only meant 'not maths' if it came immediately before the $.

Well actually that is not true. After looking at the emails and code I think @car222222 is quite right. Math is not processed if \m@th is detected.

\DocumentMetadata{uncompress,testphase={phase-III,math}}
\documentclass{article}

\ExplSyntaxOn
\math_processor:n{XXXX}
\ExplSyntaxOff
\begin{document}
\makeatletter
$ a=b$

$\m@th c=d $

$e=f \m@th$

{\m@th $g=h$ }
\end{document}

gives

image

The problem is only with the tagging code which is outside the processor in \__math_grab_dollar:w and so applied unconditionally (unless \__math_grab_dollar:w is not executed if the boolean is false)

\cs_new_protected:Npn \__math_grab_dollar:w % $
  #1 $
  {
    \tl_if_blank:nF {#1}
      {
        \__math_process:nn { math } {#1} % $
        \tagmcend %end P-chunk, in code: \tag_mc_end_push:
        \@kernel@math@begin
        #1 $
        \@kernel@math@end
        \tagmcbegin{}  % restart P-chunk (whatsits in pdftex)
      }
  }

@car222222
Copy link

Well, that is all Frank's (or maybe Ulrike's?) code, which has probably not been checked. I have not seen it before today.

Logically, it would seem reasonable to me that the tagging
should be done (or at least set up, or not) by the processor:
then it would not get done if the math did contain a \m@th.

@u-fischer u-fischer added the fixed in release issue is fixed and will be deployed in the next release of package or kernel label Sep 14, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
currently incompatible package or class package or class that doesn't work with current version of tagging code fixed in release issue is fixed and will be deployed in the next release of package or kernel
Projects
None yet
Development

No branches or pull requests

5 participants