Skip to content

Fixed ANSI formatting on UTF-8 source code. #20

merged 2 commits into from Aug 8, 2011

3 participants

pvaret commented Aug 8, 2011


As said in my email, I've fixed the bug we've been discussing this morning. Hope this will work for you.

@inducer inducer merged commit 327668f into inducer:master Aug 8, 2011
inducer commented Aug 8, 2011

Sorry, I jumped the gun a bit on this one. I reverted 7841 because it broke drawing in my terminal in a weird way--highlights would not show up on the full line, and empty lines would mirror the sidebar within the source view. Even after rereading your email, I'm not quite sure what issue you're fixing here. Can you please explain--in particular, why is urwid's apply_target_encoding() not good enough?

Thanks for your contribution,

pvaret commented Aug 8, 2011

Ah, dang it. I feared something like that -- I can't claim to fully understand just what the heck goes on in the deepest crevices of urwid.

apply_target_encoding() is pretty good for what it does, as far as I can tell; the issue would be that what it does is not what we need. If I understand correctly, it takes strings like "some unicode SOME DEC CONTROL CODES some more unicode" and turns it into "some bytes SOME DEC CONTROL CODES STILL WORKING some more bytes". I worked under the assumption that DEC control codes would not matter for us, what with them being from aaaages long gone. Apparently the assumption was incorrect. Darn it.

Where apply_target_encoding() is lacking with regards to our need is that bytestrings and unicode strings can have different lengths and so we need to move the format markers accordingly, and apply_target_encoding() completely doesn't do that.

Would you have an example of source code that presents the display issue you noticed when viewed in PuDB? So I can try to make a better batch that doesn't break thinks.

Thanks. :)

inducer commented Aug 8, 2011

I was just using the ./ script that comes with pudb. My terminal is 194 characters wide--not sure that matters.


I believe this partially solve the issue:

diff --git a/pudb/ b/pudb/
index 322244d..be99a28 100644
--- a/pudb/
+++ b/pudb/
@@ -95,6 +95,7 @@ class SourceLine(urwid.FlowWidget):
         bytestr = ''
         pos = 0

+        last_attr_len = attr[-1][1]
         for i, (token, length) in enumerate(attr):

             fragment = text[pos:pos+length]
@@ -104,6 +105,7 @@ class SourceLine(urwid.FlowWidget):

             bytestr += fragment_bytes
             pos += length
+        attr[-1] = (token, last_attr_len + (len(fragment_bytes) - len(fragment)))

         cs = [(None, len(bytestr))]

pvaret's code doesn't make the last fragment's length to include the space between EOL and the right edge of canvas, that's why the highlight only ends at EOL.

I found if there is any fullwidth chars (e.g. CJK) in text, then it seems that you need to minus the number of fullwidth chars. (fullwidth = 2, halfwidth = 1, 2 - 1 = 1?) I am not sure why and no idea how to get such information about a char is fullwidth char or not. (It didn't resolve)

pvaret commented Aug 8, 2011

D'OH. I indeed assumed that the formatting ended at the string's end. Where it really ends at the window's edge, apparently. I'll be bashfully fixing this posthaste.

As to the additional issue you report, livibetter, well, D'OH again. I didn't take the byte order mark into account. Bashful, posthaste, etc.

Thanks for the heads up.

pvaret commented Aug 8, 2011

... And more d'oh-ing still as I realize it's not (only) a BOM issue. Right, I'll keep working on this, but this may take longer than I hoped. Darn.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Something went wrong with that request. Please try again.