Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ConTeXt writer: tag paragraphs #7885

Merged
merged 1 commit into from
Jan 15, 2023
Merged

Conversation

tarleb
Copy link
Collaborator

@tarleb tarleb commented Feb 2, 2022

Paragraphs are enclosed by \startparagraph and \stopparagraph
commands. This ensures better tagging results in PDF output.

@tarleb
Copy link
Collaborator Author

tarleb commented Feb 2, 2022

Demonstrating the difference (extracted from the PDF with pdfinfo -struct-text):

before

Div
  "Term Paper TitleBird2022-01-23"
  Sect "section"
    Div
      H (block)
        "Knuth"
    Div
      "Thus, I came to the conclusion that the designer of a new system must not onlybe the implementer and first large–scale user; the designer should also write thefirst user manual.The separation of any of these four components would have hurt TeX significantly. IfI had not participated fully in all these activities, literally hundreds of improvementswould never have been made, because I would never have thought of them orperceived why they were important.But a system cannot be successful if it is too strongly influenced by a single person.Once the initial design is complete and fairly robust, the real test begins as peoplewith many different viewpoints undertake their own experiments."

after

  "Term Paper TitleBird2022-01-23"
  Sect "section"
    Div
      H (block)
        "Knuth"
    Div
      P (block)
        "Thus, I came to the conclusion that the designer of a new system must not onlybe the implementer and first large–scale user; the designer should also write thefirst user manual."
      P (block)
        "The separation of any of these four components would have hurt TeX significantly. IfI had not participated fully in all these activities, literally hundreds of improvementswould never have been made, because I would never have thought of them orperceived why they were important."
      P (block)
        "But a system cannot be successful if it is too strongly influenced by a single person.Once the initial design is complete and fairly robust, the real test begins as peoplewith many different viewpoints undertake their own experiments."

I'm just not sure if the additional verbosity is worth it.

CC: @denismaier @klpn

@jgm
Copy link
Owner

jgm commented Feb 2, 2022

Sounds like a win to me, but let's see what the ConTeXt experts say.

@denismaier
Copy link
Contributor

In general that's a useful addition, especially when going directly to PDF. Maybe, if you convert to context sources, some might the less verbose alternative. Maybe a new command line option could be useful?

@jgm
Copy link
Owner

jgm commented Feb 3, 2022

I suppose we could add an extension like tags or pagaraph_tags. But I'm not sure how important this would be. If someone is using pandoc to generate ConTeXt that will then be hand-edited, and they don't want these things, they could always pipe the output through

sed -E -e '/\\(start|stop)paragraph/d'

Interested in more feedback on this from ConTeXt users...

@jgm
Copy link
Owner

jgm commented Feb 3, 2022

Actually, the ConTeXt writer has access to variables, so why don't we just activate this feature if the pdfa variable is set? Would that be sensitive?

@tarleb
Copy link
Collaborator Author

tarleb commented Feb 3, 2022

Checking the pdfa variable would make sense, IMHO.

It seems that there are number of additional cases where we could improve tagging, e.g., in lists or for emphasized text: the ConTeXt wiki recommends to define \definehighlight[emph][style={\em}] and use \emph{text} instead of the normal {\em text}, as the former produces better tagging. The Export page in the wiki has a couple of additional examples. The end result looks quite different from "normal" ConTeXt, so yet another extension would be justifiable, too.

@jgm
Copy link
Owner

jgm commented Feb 3, 2022

Checking the pdfa variable is easy but slightly unprincipled. (Variables are supposed to be for template inclusion, so it's always a bit odd when they affect the body too.) So maybe adding a tagging extension would make sense. Not sure.

@tarleb
Copy link
Collaborator Author

tarleb commented Feb 3, 2022

I'm tempted to leave things as they are, but to use tagging as motivation for the new writer style and make_variant function that you suggested. Tagging-friendly ConTeXt would be a prime usecase for this.

@tarleb
Copy link
Collaborator Author

tarleb commented Jun 4, 2022

I just found out about the effect of --section-divs on ConTeXt output (#2609). I think it might make sense to hide the suggested behavior behind that switch.

@jgm
Copy link
Owner

jgm commented Jun 4, 2022

I think having a separate tagging extension might be more principled. It wouldn't really be obvious why --section-divs ALSO puts tags around paragraphs.

Or maybe it wouldn't it be that bad just to do the paragraph tagging by default for ConTeXt? It's the way of the future, presumably.

@tarleb
Copy link
Collaborator Author

tarleb commented Jun 4, 2022

I see, that's true. If we merge this, would it make sense to let the --section-divs behavior be the default? It seem like that would be the most consistent.

@jgm
Copy link
Owner

jgm commented Jun 5, 2022

If we merge this, would it make sense to let the --section-divs behavior be the default?

Agreed. I guess that would mean that we're only targeting ConTeXt IV, since older versions don't support the \start/stopsection. But at this point that's probably quite sensible. I think I'd be in favor of the simplest solution, and this is probably it.

The question @denismaier raised above is about the increased verbosity. I don't know how much of an issue that is for ConTeXt user.

@tarleb
Copy link
Collaborator Author

tarleb commented Jun 5, 2022

I was informed on the ConTeXt mailing list that using \startparagraph ... \stopparagraph leads to problems in some cases, e.g. in list items. The workaround is to use \bpar ... \epar instead. I don't understand yet whether it's preferable to always use those commands, or to use them only where \startparagraph would lead to unexpected results.

@tarleb
Copy link
Collaborator Author

tarleb commented Jan 14, 2023

I went ahead with the additional extension: if tagging is enabled, all paragraphs are wrapped in \bpar/\epar commands. Furthermore, we then generate \definehighlight commands for all used emphasis types and inject them via the emphasis-commands template variable.

Docs haven't been updated yet.

Paragraphs are enclosed by `\bpar` and `\epar` commands, and `highlight`
commands are used for emphasis. This results in much better tagging in
PDF output.
Comment on lines +3587 to +3589
emphasized text. The `emphasis-command` template variable is set
if the extension is enabled. Combine this with the `pdfa` variable
to generate accessible PDFs.
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: maybe we should have a section of the manual (near the end I guess) about producing accessible PDFs?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. I realize now that the last sentence should be dropped here, as the pdfa variable only affects the metadata, not tagging.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My tests indicate that context will always produce tagged PDF, but there are small errors in the tags unless the tagging extension is enabled. I will clean that up.

@jgm jgm merged commit c71d476 into jgm:main Jan 15, 2023
@tarleb tarleb deleted the context-paragraph-tagging branch January 15, 2023 19:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants