Use unicode for internal processing? #1

ms8r · 2015-04-11T09:52:17Z

Firstly, many thanks for putting this up - I think it's brilliant. In addition to code refactoring this is tremendously useful for people editing and revising books (with LaTeX, markdown and/or reStructured Text sources). For this purpose it would be very helpful to have repren deal also with non-ascii characters (e.g. accented characters in foreign words).

Would you consider adding an --encoding option to enable users to specify file encoding? repren could then decode the inputs read from pattern and input files to unicode, do all the internal processing in unicode and then encode again when writing output.

The text was updated successfully, but these errors were encountered:

ms8r · 2015-04-11T10:10:52Z

Never mind... it already works fine for non-ascii - I forgot to switch off expandtabs when editing my pattern file.

jlevy · 2015-04-11T22:10:38Z

I actually avoided giving the program knowledge of encodings since then binary files, weird encodings, malformed UTF8, etc would cause random Python encoding exceptions. By working at byte level it should work on anything. (Well, case insensitive matching on non-ascii chars would require encoding knowledge -- but it doesn't handle that.) I've used it on Unicode files without problems, but if you have issues with encodings -- or ideas/PRs to improve it in general -- do let me know.

And thanks for the kind words -- glad it's useful!

Good start at #1. Still could use a lot more.

jlevy closed this as completed Apr 11, 2015

jlevy added a commit that referenced this issue Sep 3, 2015

Add tests.

c608015

Good start at #1. Still could use a lot more.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use unicode for internal processing? #1

Use unicode for internal processing? #1

ms8r commented Apr 11, 2015

ms8r commented Apr 11, 2015

jlevy commented Apr 11, 2015

Use unicode for internal processing? #1

Use unicode for internal processing? #1

Comments

ms8r commented Apr 11, 2015

ms8r commented Apr 11, 2015

jlevy commented Apr 11, 2015