Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ignore table of contents padding marks in Acrobat when reading PDF files #15845

Closed
Qchristensen opened this issue Nov 27, 2023 · 10 comments · Fixed by #16141
Closed

Ignore table of contents padding marks in Acrobat when reading PDF files #15845

Qchristensen opened this issue Nov 27, 2023 · 10 comments · Fixed by #16141
Labels
p4 https://github.com/nvaccess/nvda/blob/master/projectDocs/issues/triage.md#priority triaged Has been triaged, issue is waiting for implementation.
Milestone

Comments

@Qchristensen
Copy link
Member

Steps to reproduce:

Reported by an organisation in the NVDA user email group: https://nvda.groups.io/g/nvda/message/113359

  1. Create a file in Word with a generated Table of Contents (using Word's table of contents feature. Use the standard default TOC type which uses ..... as a padding mark between each entry and its page number)
  2. Export the file as a PDF file
  3. Open the file in Acrobat (in fact this reproduced in most PDF readers / browser viewers)
  4. Read the TOC

Actual behavior:

At step 5, reading the TOC, NVDA reads similar to:

Link heading dot, link one hundred dot, link 48 dot 1.

Expected behavior:

It would be good if these repeated dots weren't read, at least at the default punctuation level.

NVDA logs, crash dumps and other attachments:

System configuration

NVDA installed/portable/running from source:

NVDA version:

NVDA 2023.3 installed

Windows version:

Windows 11 (64-bit) Version: 22H2, Build: 22621.2715

Name and version of other software in use when reproducing the issue:

Adobe Acrobat Pro 2020, but also replicates in Foxit PDF reader version 12.1.1.15289, and browsers - I believe this is more to do with how and when NVDA reads multiple characters like this rather than the programs in use.

Other information about your system:

Other questions

Does the issue still occur after restarting your computer?

Have you tried any other versions of NVDA? If so, please report their behaviors.

If NVDA add-ons are disabled, is your problem still occurring?

Does the issue still occur after you run the COM Registration Fixing Tool in NVDA's tools menu?

@Qchristensen
Copy link
Member Author

There are several workaround for this:

  • From the document creation side, use a TOC template which does not add repeated dots as a padding character (there is one or you can customise it)
  • From the user side, change the punctuation level

But as noted in that user group thread, this is a common way of presenting tables of contents and it would be useful if they were read out without extraneous information as the default.

@CyrilleB79
Copy link
Collaborator

Is the issue also present in other screen reader such as Jaws or Narrator?

@seanbudd
Copy link
Member

seanbudd commented Dec 7, 2023

Unfortunately this is a "can't fix" situation, the job of the screen reader is to read the text.

@seanbudd seanbudd closed this as not planned Won't fix, can't repro, duplicate, stale Dec 7, 2023
@CyrilleB79
Copy link
Collaborator

These lines of dots are here for presentation and, in the first place, it would be the job of the PDF converter to avoid these dots to be rendered as text in the PDF.

@seanbudd wrote:

Unfortunately this is a "can't fix" situation, the job of the screen reader is to read the text.

I would not be so categorical.
Narrator does not reads it with its default punctuation level.

A solution could be for example to create a new complex symbol consisting in 4 points (regexp = r".{4,}"), which is not reported at default lower punctuation level.

Reopening so that NV Access indicates if they could accept such workaround.

@CyrilleB79 CyrilleB79 reopened this Dec 7, 2023
@Adriani90
Copy link
Collaborator

Reopening so that NV Access indicates if they could accept such workaround.

I second this. IN fact the community will definitely accept this workaround because it is annoying indeed to hear everytime the number of dots displayed on the screen. At least theere would be an option to suppress that announcement by choosing another level of punctuation.

Unfortunately this is a "can't fix" situation, the job of the screen reader is to read the text.

@seanbudd I wonder if you guys talked about this internally at all? I mean not every visible text on the screen makes always sense to be reported in any case. That's why we have different puncutation levels, different symbol reporting levels, etc.
It would be very appreciated if you give some more input on how you came to this decision before closing such issues.

Note that before closing such issues, it is important to test the behavior with other screen readers, especially when you make a general statement about screen readers.

@Adriani90
Copy link
Collaborator

Adriani90 commented Dec 7, 2023

These lines of dots are here for presentation and, in the first place, it would be the job of the PDF converter to avoid these dots to be rendered as text in the PDF.

That's not entirely true. Dots are also displayed in table of contents even in printed documents which makes it easier for sighted people to follow the line until the page numbers. So for sighted people it makes indeed sense to have these dots on the display, and even more for people who have a bit of vision ability.
I don't know how the printing mechanisms work in the backend of a pdf viewer, but I think it is not that trivial to present dots in another way having in mind that people want to print the documents as well. As far as I know from PDFs which are not fully accessible, if you are presenting stuff that's expected to be text as non text, you fail the accessibility tests.

@CyrilleB79
Copy link
Collaborator

These lines of dots are here for presentation and, in the first place, it would be the job of the PDF converter to avoid these dots to be rendered as text in the PDF.

That's not entirely true. Dots are also displayed in table of contents even in printed documents which makes it easier for sighted people to follow the line until the page numbers. So for sighted people it makes indeed sense to have these dots on the display, and even more for people who have a bit of vision ability.

Sorry, maybe my previous statement was not very clear. I mean:

These lines of dots are here only for visual presentation and, in the first place, it would be the job of the PDF converter to avoid these dots to be passedto ATs via the accessibility APIs. Though, I do not know if PDF specification has something equivalent to HTML where a text can be visually displayed, but hidden to screen readers.

@Adriani90
Copy link
Collaborator

On the topic about hiding content in PDFs but still displaying them visually, the PDF community is quite strict about this and doesn't like behavior like this:
https://community.adobe.com/t5/acrobat-discussions/how-to-force-acrobat-screen-reader-to-skip-ignore-an-entire-page/m-p/9744360

Indeed if you want to pass the accessibility tests, you have to provide an alt text for all non text elements.

@seanbudd seanbudd added the blocked/needs-product-decision A product decision needs to be made. Decisions about NVDA UX or supported use-cases. label Dec 12, 2023
@CyrilleB79
Copy link
Collaborator

@seanbudd, this issue is labeled "blocked/needs-product-decision".

Have you been able to discuss this at NV Access?
The complex symbol fix described in #15845 (comment) would be quite simple and should not cause undesirable side effects.

Thanks.

@seanbudd
Copy link
Member

seanbudd commented Feb 7, 2024

@CyrilleB79 - that solution seems appropriate

@seanbudd seanbudd added p4 https://github.com/nvaccess/nvda/blob/master/projectDocs/issues/triage.md#priority triaged Has been triaged, issue is waiting for implementation. and removed blocked/needs-product-decision A product decision needs to be made. Decisions about NVDA UX or supported use-cases. needs-triage labels Feb 7, 2024
seanbudd pushed a commit that referenced this issue Feb 7, 2024
Closes #15845

Summary of the issue:
Padding dots in table of contents are reported even at low punctuation levels.

Description of user facing changes
Padding dots are not reported anymore at low punctuation levels.

Description of development approach
In symbol file, define a complex symbol to identify padding dots as 4 or more dots. Assign level to "all", e.g. as "end of sentence dot". And define "send real symbol to synthesizer" to "always" so that a pause is kept between the text before the dots and the text after them.

In character processing, change the order of symbol processing as follows:

complex symbols rules
repetition rules
simple symbol rules
Before, repetition rule was the first. This has been done so that the repetition rule do not override the new rule for the padding dots complex symbol.
In any case, I do not think that there was any use case of the repetition rule being used with complex symbols.
@nvaccessAuto nvaccessAuto added this to the 2024.2 milestone Feb 7, 2024
Nael-Sayegh pushed a commit to Nael-Sayegh/nvda that referenced this issue Feb 15, 2024
Closes nvaccess#15845

Summary of the issue:
Padding dots in table of contents are reported even at low punctuation levels.

Description of user facing changes
Padding dots are not reported anymore at low punctuation levels.

Description of development approach
In symbol file, define a complex symbol to identify padding dots as 4 or more dots. Assign level to "all", e.g. as "end of sentence dot". And define "send real symbol to synthesizer" to "always" so that a pause is kept between the text before the dots and the text after them.

In character processing, change the order of symbol processing as follows:

complex symbols rules
repetition rules
simple symbol rules
Before, repetition rule was the first. This has been done so that the repetition rule do not override the new rule for the padding dots complex symbol.
In any case, I do not think that there was any use case of the repetition rule being used with complex symbols.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
p4 https://github.com/nvaccess/nvda/blob/master/projectDocs/issues/triage.md#priority triaged Has been triaged, issue is waiting for implementation.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants