handle non-breaking spaces for indentation indication #5610

Closed
wants to merge 2 commits into from

3 participants

@camlorn

I've seen this a lot, only just figured out what it is. if you're in the web browser and indentation indication is on, non-breaking spaces cause it to blow up annoyingly. For an example, see this thread, post 6. It causes indentation to be read in Firefox, Thunderbird, etc. as "space space space space..."
If someone says where the indentation function is, I can probably submit a PR. I imagine this isn't hard.

@jcsteh
@camlorn

There's two things I'd like to do to this that I think make it better, though it looks like best may not be achievable:

  • Replace all non-breaking space with spaces. This will break when in locales that mix it with non-space characters, but will at least begin doing something sane when it's mixed with spaces. I'm obviously biased, but this is the most common place I see them.

  • Group indentation characters by type. "Space tab space tab space tab" is bad, "3 tab 3 space" is better but comes with the downside that you need to check for where the spaces are manually. Of these, I think this one is the most controversial, and it might be worth dropping, but it would at least do something somewhat sane for locales where the default indentation isn't space (if those exist).

@jcsteh
  • Replace all non-breaking space with spaces.

I guess we could do this; it'd just be two passes instead of one, which is fine. We can do this just for indentation text so we're not doing it over the all text; it's only useful for indentation reporting.

  • Group indentation characters by type. "Space tab space tab space tab" is bad, "3 tab 3 space" is better

It can be quite different visually. At least in decent editors (Notepad++ certainly does this), tab doesn't always just get mapped to a fixed number of spaces. Rather, it moves to the next point evenly divisible by that number of spaces. If you do space tab, it will indent the same distance as just tab; i.e. the tab consumes the equivalent of 3 spaces. So, tab space tab space tab looks the same as 3 tabs, but not the same as 3 tab 2 space.

@camlorn

Everyone says mixing them is a bad idea. If we said "3 space 3 tab mixed" and you care, you could look. That's my thinking anyway. I'd want to stick a pause after mixed if we did it, but honestly replacing non-breaking spaces is probably enough anyway. I'll go ahead and prepare a PR for the first part of this at least.

@camlorn camlorn added a commit to camlorn/nvda that referenced this pull request Dec 18, 2015
@camlorn camlorn Replace non-breaking spaces with spaces before computing indentation …
…announcement. Work for #5610.
204f3dc
@camlorn

This works for the first part. I left it open because I'm not sure if we want to do the second or not.
Sorry about the history, I'm not sure why the pull request is including both commits. I used hub and it says that pull-request to issue conversion is deprecated...but only after it does it. I'm assuming this is the bug, and suffice it to say I will not be doing that again.

@jcsteh jcsteh commented on an outdated diff Dec 20, 2015
source/speech.py
@@ -421,6 +422,8 @@ def getIndentationSpeech(indentation):
# Translators: This is spoken when the given line has no indentation.
return _("no indent")
+ #The non-breaking space is semantically a space, so we replace it here.
+ indentation = RE_NBSP_CONVERT.sub(" ", indentation)
@jcsteh
jcsteh added a note Dec 20, 2015

Obviously totally cool with using regexps if it allows you to do multiple string passes in one step and thus improve performance, but in this case, is there any advantage of using a regexp over a simple:

indentation = indentation.replace(u"\xa0", " ")
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
@camlorn

Yeah, actually. "\xa0".replace(u"\xa0", " ") fails because replace tries to decode the unicode literal to Ascii. Passing the ascii representation to the Unicode version of replace also breaks; in that case, I suspect that it's trying to decode the argument to ascii first, but it's hard to tell. The function's docstring doesn't explicitly say that the input is going to be a Unicode string, so you either have to force it (create 2 temporary strings in quick succession) or do an even uglier if statement as far as I can see. There might be a cleaner way, but a simple replace call seems to break on one of the alternatives no matter how I write it, the if statement switching on types seemed ugly, and you seemed concerned that this be absolutely as fast as possible.
I don't discount the possibility that I missed something, me and Unicode do not mix.

@jcsteh

ah, very good point. However, it seems this fails already for Unicode:

>>> speech.getIndentationSpeech(" \xa0")
Traceback (most recent call last):
  File "<console>", line 1, in <module>
  File "C:\Users\jamie\src\nvda\source\speech.py", line 438, in getIndentationSpeech
    return " ".join(res)
UnicodeDecodeError: 'ascii' codec can't decode byte 0xa0 in position 0: ordinal not in range(128)

Given this and the fact that I can't think of a case when it would ever be passed an ANSI string, just change the docstring to say unicode instead of basestring for the parameter type, which should also make the unicode.replace safe.

@camlorn

That should do it.

@jcsteh

I didn't see this before, but I see you've based this on (and are requesting merge into) next. For future reference, all pull requests should be based on (and submitted against) master; see the Contributing guidelines. No need to resubmit or anything for this one; I'll take care of it.

@jcsteh jcsteh added a commit that referenced this pull request Dec 22, 2015
@camlorn camlorn When speaking line indentation, non-breaking spaces are now treated a…
…s normal spaces. Previously, this could cause announcements such as "space space space" instead of "3 space".

Fixes #5610.
5899822
@nvaccessAuto

Incubated in 67f080d.

@jcsteh jcsteh was assigned by nvaccessAuto Dec 22, 2015
@jcsteh jcsteh added this to the 2016.1 milestone Dec 22, 2015
@jcsteh jcsteh added the enhancement label Dec 22, 2015
@jcsteh jcsteh added a commit that closed this pull request Jan 6, 2016
@camlorn camlorn When speaking line indentation, non-breaking spaces are now treated a…
…s normal spaces. Previously, this could cause announcements such as "space space space" instead of "3 space".

Fixes #5610.
e11a6eb
@jcsteh jcsteh closed this in e11a6eb Jan 6, 2016
@nvaccessAuto nvaccessAuto removed the incubating label Jan 6, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment