Reduce garbage text from graphics #1989

nvaccessAuto opened this Issue Dec 8, 2011 · 26 comments


None yet
2 participants

Reported by ianr on 2011-12-08 19:28
I propose that nvda only outputs text for a graphic in virtual buffers if the graphic has alt text or other appropriate accessibility information and does not output the filename of the graphic.

I find that many websites have advertisements as graphics that have urls to the graphics that are in excess of 200 characters. I find it very frustrating to use "say all" and then come across a url that takes 10 to 20 seconds to get through reading all the garbage parameters of the graphics url.

I think the experience would be much better if graphics without alt text were replaced with some short text such as "no alt".

Then I could probably leave say all running instead of needing to manually down arrow passed all the garbage so that I can start say all again.

Comment 1 by ianr on 2011-12-08 19:46
Perhaps the replacement text could be "graphic". I do not have strong feelings on the replacement just that the garbage text isn't read. In fact I'd be fine if nothing was said for the graphic except that in some instances people would want to be able to right click on the graphic and graphics that are links should still be clickable in some way.

As an aside I checked one of the graphics that annoyed me and it was 748 characters.

Comment 2 by briang1 on 2011-12-08 22:02
Perhaps you could put a link to a page which is annoying in this respect.
also, are you sure they are always the file name? I've come across adverts that seem to be just garbled bits of text which sighted folk say are pictures of some sort. How would you know.
What about huge javascript links as well. Some are semi understandable, some not at all.
Maybe you just want this in say all not anywhere else?

Comment 3 by jteh on 2011-12-08 22:08
This would make way too many sites inaccessible. Often, NVDA's algorithm strips many graphic URLs down to something useful, since often the URL is actually quite descriptive. You probably wouldn't be aware of how often this happens. Configurability would be nice here, but is really difficult to implement for this part of the code.

Can you provide an example URL? It might be that we can adapt the algorithm to strip ridiculously long URLs on the grounds that they aren't useful.

Comment 4 by ianr on 2011-12-09 01:47
After a closer look it is actually a graphic that is a link and the text displayed is most of the href of the link not the image src as I previously thought.

Here is a page with an example. It is the first graphic you are taken to when hitting G. It is within an iframe and I would guess is an advertisement.

Here is the text that nvda displays for the graphic:
aclk sa=l&ai=BU6kTY2ThToj3G6bGsQfG4rC4Buil_fUB8OaJqhvAjbcB4KLkARABGAEg8t2AAjgAUPr2-IL______wFgycaXjeikjBigAfSFnf0DsgEXd3d3LnNlZWluZ3dpdGhzb3VuZC5jb226AQk3Mjh4OTBfYXPIAQLaASpodHRwOi8vd3d3LnNlZWluZ3dpdGhzb3VuZC5jb20vdm9pY2Vzci5odG2AAgHIAois_RmoAwHIAxXoA6UF6AMD6APICugD1wH1AwAIAMT1AwAAABCgBgI&num=1&sig=AOD64_0H69VIxY_qY45WoMtK1iQ4Rss1xw&client=ca-pub-5522445984992839&adurl=

It may vary between views as ads usually get rotated.

Comment 5 by briang1 on 2011-12-09 05:43
OK, this is an advert, and the site is using detection of adblock software to not allow one to view it with adblocking enabled. It does say on the site that if you feel any particular ad is intrusive you should report it, but myself I feel this, although they are right in saying its used to help with costs, is all too intrusive and they need to specify only textual ads are allowed from their advertisers. Strangely, simple adblock in IE is not detected and so I can view the site with that enabled with no issues.
Thus your problem could be fixed for you by using an adblocker.

In the case of this particular site, I'd suggest a word in their ear.
Graphic ads have no real place if their audience is blind.
Try some adblockers and see if this cures your issues elsewhere.

Comment 6 by ianr on 2011-12-09 17:05
I installed firefox AdBlock Plus and now the graphic on that site just says "AdChoices". Perhaps this has solved it for me, I guess we'll see after I've browsed a lot more. Thanks for the suggestion.

Comment 7 by ianr on 2011-12-09 21:56
I still think there could be some improvements made to the information output for graphics, or if there were a way to make it configurable that would be nice too.
It seems the problem is really about graphics that are links or clickable that have no alt text because most of the link href gets output.

Here's another example:

There are 6 graphics with garbage text on that page.

Comment 8 by briang1 on 2011-12-10 11:02
Did someone mention webkit... grin

Actually, I was also thinking that a new part of the userguide or quick help file in nvda could contain, for example, all the windows standard shortcut keys, and advice on things like using ad blockers to stop graphic adverts making such a mess of a page for blind users. There seems to be few places where this stuff is actually available.

Comment 9 by jteh on 2011-12-13 06:26
The current thinking is to just render a space for graphics if our URL-based graphic labelling algorithm returns more than 30 characters, as anything longer than this is likely to be useless.
Milestone changed from None to 2012.1

Comment 10 by ianr on 2011-12-13 17:25
Your solution sounds good to me.

Comment 11 by mdcurran on 2011-12-21 05:11
Fixed in 13a4352. As Jamie suggested, we now no longer render guessed names for graphics and links generated by URLs, that result in a name 30 characters or longer. I tested with that github link (I no longer see garbage text for those graphics) and I can still see some shorter url-based names for graphics on Please however report any useful graphic/link labels that may have gone missing because of this commit -- though the logic is pretty simple really.
State: closed

Comment 12 by Palacee_hun on 2012-01-30 23:02
I experience severe accessibility issues arising from this simple trimming solution. I know hundreds of webpages that do not label graphics and URLs properly which results in the guessing algorythm coming into play. I may say this is common web authoring practice and I might even risk saying it is on the increase together with other "bad" or "inaccessible" web authoring practices. In these times of economic disturbances, market competition becomes harsher and that puts visual appearance above all in web authoring making other aspects like accessibility less important. And I think it is not really wise to try to fight these trends because it is quite impossible, but to try to adapt our accessibility products (like screen readers) to them when possible at all.
[guessing feature of NVDA is especially important for all kinds of webshops where improper naming and labelling is very frequent and useful content can only be accessed through such improperly authored links. But webshops are not the only example, such important but improperly formed links and graphics can be found on all sorts of websites. But I have also found such links in HTML-format emails, like Ebay promo e-mails. I cannot activate links leading to the promotions any more since the trimming because they are not shown, instead only blank lines. And that is the worst. I agree that guessed URLs are uncomfortable (especially when using sayall) and cumbersome, but I experience that most of the time they are vital, because otherwise we cannot know that there is a content there. Furthermore the guessed longish URLs often contain such keywords from which one can guess the type of the content. E.g. it is common practice on webshops to put an increase/decrease amount button (authored as a link) around items in basket. These are often malformed, but thanks to URL guessing I heard something like " ...&order_inc_button& ..." in NVDA 2011.3 from which I could find out what they were and activated them if needed. But they get killed by trimming, so this webshop feature gets killed too with recent NVDA builds. But even the order and put in basket links can be totally killed by this trimming solution if they are not authored properly (happens often). I anticipate dozens of webpages I use regularly would become unusable if this "fix" got into official 2012.1. I can gather HTML extract examples if required but I preferred reopening this ticket and report the serious drawbacks of this solution as soon as possible.
I feel algorythmic URL guessing in NVDA is and will be a must into indefinite future and I experience it provides useful if not crucial information in 90 % of cases albeit being cumbersome of course. One cannot find a character limit upon which to decide whether guessed information is garbage or not. That is impossible. So I see three alternatives from which to choose:
[revert this change entirely, because it does much more harm than it does good. Uncomfortableness is always better than totally suppressing a potentially useful information without any clues;
-to make this configurable in browse mode settings together with the trimming limit perhaps;
-to restrict this solution exclusively to sayall.
I feel this should be evaluated and decided upon fast before the publishing of 2012.1 official, otherwise I fear a major fallback in the usability of NVDA with lots of websites.

State: reopened

Comment 13 by jteh on 2012-01-30 23:21
Please provide real world URLs. I understand the concern, but I honestly have never seen any examples where guessed graphic names are any use at all if longer than 30 chars.

Also, you should not be just hearing a blank line. The link should still be present, but indicated by a space. NVDA should still say "link graphic". Therefore, you should still be able to move to this and activate it. In other words, you might lose the guessed name, but it should still be possible to activate it when this happens.

Comment 14 by Palacee_hun (in reply to comment 13) on 2012-01-31 00:52
An affected link from a recent Ebay promo e-mail:
[(this links to the deals of a given week, that is the most important link in that e-mail)
NVDA 2011.3 presents this undoubtedly crazy link as "link graphic 8?eecl=3&eesi=UK&i=13297f8539aII1f1914II26ee53II132c1023c4e&eepc="
This does not mean much to a human of course except the letters "UK". But the promo e-mail contains other such crazy links as well and during months I have
learned and got used to the situation that if a link in such an e-mail began with "8?eecl=3&eesi=UK&" then this is the link to go for, because it leads to the current deals. Other crazy links in the promo lead to messy Ebay places and since the trimming it is not possible to distinguish the important link in such undoubtedly crazy, but working way. By the way it was the letters "UK" in this messy link that hinted me to try that one for the deals when I first encountered such a promo e-mail (I am registered at Ebay UK).

An order link at one of my mainly used food providers (
NVDA 2011.3 begins reading this at "index.php?page=order_menu" according to its rules. Now it goes away completely. The clue in this link is "order_menu" which I now lose.
An example of a quantity increase link near a product in basket from the same webshop ( ``````
The clues here are "order_menu_basket" and "act=inc" (action=increase). This also disappears in recent NVDA builds such as its decreasing counterpart which has the same structure only with "act=dec" in it. Now I hear only two unnamed graphic links above each other. How can I know what are they??? It is not so safe on a webshop interface to just click links at random and go by trial and error, is it???
[final note: such very long links with some eventual clues in them began to be in the wild with the dawn of Web 2.0 and with the immense popularity of various script engines (PHP, JSP, ASP, etc. etc.) And they are definitely not fading away, do they? On the contrary in fact ...
Replying to jteh:

Please provide real world URLs. I understand the concern, but I honestly have never seen any examples where guessed graphic names are any use at all if longer than 30 chars.

Also, you should not be just hearing a blank line. The link should still be present, but indicated by a space. NVDA should still say "link graphic". Therefore, you should still be able to move to this and activate it. In other words, you might lose the guessed name, but it should still be possible to activate it when this happens.

Comment 15 by mdcurran on 2012-01-31 23:21
Perhaps rather than removing the text all together, we could just keep the first 30 characters? Although of course this reintroduces some garbage text again, I feel it may be a better middle road to go for 2012.1. All the examples cited here contain their useful part in the first 30 characters. And I think this is probably true in most cases that I've seen.
We could in deed even lower the character count to 20 or something.
What are other peoples' thoughts on this?

Comment 16 by benjaminhawkeslewis on 2012-02-01 00:40
How about keeping segments that look like words, short numbers (5 digits or less), or short codes (5 characters or less) and dropping long segments (e.g. 6 characters or more) that are either just digits or a mix of digits and letters?

Comment 17 by briang1 on 2012-02-01 08:17
I'd imagine that would not be that easy to do, and would result in a lot of processing even if it were possible, which would slow things down.

Maybe a simple toggle for the garbage link is all you need, but somehow you would need to know they had been encounterd.

Comment 18 by Palacee_hun on 2012-02-01 15:31
Considering arguments I read here I suggest two alternatives depending on whether there is already a strings freeze for NVDA 2012.1:
[no then I suggest introducing a new option in "Browse Mode Settings" where the user could enter how many chars he/she wanted to keep from guessed URLs (starting from the beginning of string) and 0 would mean leaving guessed URLs alone. I suggest "Chars to keep from unlabelled link URLs (0 no processing)" as the English option label;
-if yes then I vote for the middle of the road solution in Comment 15 (keeping 30 chars from the left of guessed URLs) with putting an ellipsis sign (three dots) at the end of the rendered text to signal to the user that it's trimmed.
If both seem too much in this state of 2012.1 then I suggest reverting this change and putting this topic off to 2012.2. Garbage texts in guessed URLs don't bother me much personally and I feel this is a typical topic of personal tastes.

Comment 19 by ianr on 2012-02-01 16:39
I think the ultimate solution would be to make it configurable. Whether that is the option to enable or disable the hiding of garbage text or the ability to enter the number of characters that is too many for guessed urls, either would solve the problem for both sides of the argument. Jamie previously said it would be quite difficult to make this configurable. Is this the case with both of these configuration ideas?

Though garbage text does really annoy me I hate to stop someone else from getting at the information they need.

Is the difficulty in configuring due to the virtual buffer code being in C++?

Could a regex be added in python that filters text handed back from the C++ code?

Comment 20 by ianr on 2012-02-01 17:06
I should also mention that 30 characters with 3 dots at the end would not bother me nearly as much as 500 characters. If this would give a reasonable solution for other users and can make it into the next release then that's fine with me too.

Comment 21 by ianr on 2012-02-01 20:40
It is also good to know about work arounds to see the link text. For instance in Firefox you can use object navigation to move to the next object which is the status bar and will display the current link. You can also right click and choose copy link location to get it in your clipboard.
The copy link location also works in Thunderbird.
In internet explorer you can use the context menu and "Copy shortcut" menu item.

I'm not saying these are a solution. Jut putting them out there so people are aware some work arounds are available to get the information they need.

Comment 22 by jteh on 2012-02-01 22:28
We're not in string freeze yet, but adding configuration options to customise the text generated by browse mode backends requires some fairly serious code change and can't be managed for 2012.1. To be honest, even after 2012.1, I don't think it is worthwhile just for this option, especially as this won't make any sense to most users and will thus introduce confusion.

This is mostly to do with the cross-process nature of this code, though being in C++ does make it a bit painful too. It's not possible to filter the text at the Python level because the text is fetched from the backend on the fly (e.g. each time you move the cursor). Doing this would result in differences in output when reading by character versus reading by line, for example.

Comment 23 by mdcurran on 2012-02-02 02:31
We're going to go with truncation at 30 with an ellipsis.

Comment 24 by mdcurran on 2012-02-02 02:53
Done in 24f44ca.
State: closed

Comment 25 by jteh on 2012-02-02 03:11
This didn't push. I think you need to rebase. :)

Comment 26 by jteh on 2012-02-02 03:31
Err, my bad. It pushed just fine.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment