Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Already on GitHub? Sign in to your account

handle non-breaking spaces for indentation indication #5610

wants to merge 2 commits into


None yet
3 participants

camlorn commented Dec 18, 2015

I've seen this a lot, only just figured out what it is. if you're in the web browser and indentation indication is on, non-breaking spaces cause it to blow up annoyingly. For an example, see this thread, post 6. It causes indentation to be read in Firefox, Thunderbird, etc. as "space space space space..."
If someone says where the indentation function is, I can probably submit a PR. I imagine this isn't hard.


jcsteh commented Dec 15, 2015


camlorn commented Dec 17, 2015

There's two things I'd like to do to this that I think make it better, though it looks like best may not be achievable:

  • Replace all non-breaking space with spaces. This will break when in locales that mix it with non-space characters, but will at least begin doing something sane when it's mixed with spaces. I'm obviously biased, but this is the most common place I see them.
  • Group indentation characters by type. "Space tab space tab space tab" is bad, "3 tab 3 space" is better but comes with the downside that you need to check for where the spaces are manually. Of these, I think this one is the most controversial, and it might be worth dropping, but it would at least do something somewhat sane for locales where the default indentation isn't space (if those exist).

jcsteh commented Dec 18, 2015

  • Replace all non-breaking space with spaces.

I guess we could do this; it'd just be two passes instead of one, which is fine. We can do this just for indentation text so we're not doing it over the all text; it's only useful for indentation reporting.

  • Group indentation characters by type. "Space tab space tab space tab" is bad, "3 tab 3 space" is better

It can be quite different visually. At least in decent editors (Notepad++ certainly does this), tab doesn't always just get mapped to a fixed number of spaces. Rather, it moves to the next point evenly divisible by that number of spaces. If you do space tab, it will indent the same distance as just tab; i.e. the tab consumes the equivalent of 3 spaces. So, tab space tab space tab looks the same as 3 tabs, but not the same as 3 tab 2 space.


camlorn commented Dec 18, 2015

Everyone says mixing them is a bad idea. If we said "3 space 3 tab mixed" and you care, you could look. That's my thinking anyway. I'd want to stick a pause after mixed if we did it, but honestly replacing non-breaking spaces is probably enough anyway. I'll go ahead and prepare a PR for the first part of this at least.

@camlorn camlorn added a commit to camlorn/nvda that referenced this pull request Dec 18, 2015

@camlorn camlorn Replace non-breaking spaces with spaces before computing indentation …
…announcement. Work for #5610.

camlorn commented Dec 18, 2015

This works for the first part. I left it open because I'm not sure if we want to do the second or not.
Sorry about the history, I'm not sure why the pull request is including both commits. I used hub and it says that pull-request to issue conversion is deprecated...but only after it does it. I'm assuming this is the bug, and suffice it to say I will not be doing that again.

@jcsteh jcsteh commented on an outdated diff Dec 20, 2015

@@ -421,6 +422,8 @@ def getIndentationSpeech(indentation):
# Translators: This is spoken when the given line has no indentation.
return _("no indent")
+ #The non-breaking space is semantically a space, so we replace it here.
+ indentation = RE_NBSP_CONVERT.sub(" ", indentation)

jcsteh Dec 20, 2015


Obviously totally cool with using regexps if it allows you to do multiple string passes in one step and thus improve performance, but in this case, is there any advantage of using a regexp over a simple:

indentation = indentation.replace(u"\xa0", " ")

camlorn commented Dec 21, 2015

Yeah, actually. "\xa0".replace(u"\xa0", " ") fails because replace tries to decode the unicode literal to Ascii. Passing the ascii representation to the Unicode version of replace also breaks; in that case, I suspect that it's trying to decode the argument to ascii first, but it's hard to tell. The function's docstring doesn't explicitly say that the input is going to be a Unicode string, so you either have to force it (create 2 temporary strings in quick succession) or do an even uglier if statement as far as I can see. There might be a cleaner way, but a simple replace call seems to break on one of the alternatives no matter how I write it, the if statement switching on types seemed ugly, and you seemed concerned that this be absolutely as fast as possible.
I don't discount the possibility that I missed something, me and Unicode do not mix.


jcsteh commented Dec 21, 2015

ah, very good point. However, it seems this fails already for Unicode:

>>> speech.getIndentationSpeech(" \xa0")
Traceback (most recent call last):
  File "<console>", line 1, in <module>
  File "C:\Users\jamie\src\nvda\source\speech.py", line 438, in getIndentationSpeech
    return " ".join(res)
UnicodeDecodeError: 'ascii' codec can't decode byte 0xa0 in position 0: ordinal not in range(128)

Given this and the fact that I can't think of a case when it would ever be passed an ANSI string, just change the docstring to say unicode instead of basestring for the parameter type, which should also make the unicode.replace safe.


camlorn commented Dec 21, 2015

That should do it.


jcsteh commented Dec 22, 2015

I didn't see this before, but I see you've based this on (and are requesting merge into) next. For future reference, all pull requests should be based on (and submitted against) master; see the Contributing guidelines. No need to resubmit or anything for this one; I'll take care of it.

@jcsteh jcsteh added a commit that referenced this pull request Dec 22, 2015

@camlorn @jcsteh camlorn + jcsteh When speaking line indentation, non-breaking spaces are now treated a…
…s normal spaces. Previously, this could cause announcements such as "space space space" instead of "3 space".

Fixes #5610.

Incubated in 67f080d.

@jcsteh jcsteh added this to the 2016.1 milestone Dec 22, 2015

@jcsteh jcsteh added the enhancement label Dec 22, 2015

@jcsteh jcsteh closed this in e11a6eb Jan 6, 2016

@nvaccessAuto nvaccessAuto removed the incubating label Jan 6, 2016

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment