New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

turn off unicode #77

Closed
sqlalchemy-bot opened this Issue Feb 26, 2008 · 13 comments

Comments

Projects
None yet
1 participant
@sqlalchemy-bot

sqlalchemy-bot commented Feb 26, 2008

Migrated issue, originally created by Anonymous

if the input and output are not uincode, then decode and encode cause some overhead, add a choice to turn unicode off could improve the performance a bit.

add a argument in Lookup and Template:
... ,using_unicode = True, ...

when turn off unicode, the compiled module source is saved with the proper charset, and adding

# -*- encoding:charset -*-

in head, escape is not needed.


Attachments: unicode.patch

@sqlalchemy-bot

This comment has been minimized.

sqlalchemy-bot commented Mar 1, 2008

Michael Bayer (@zzzeek) wrote:

hi there -

im reviewing your patches, thanks for them ! So far this particular one I can't accept:

  • the primary method to turn off the "unicode" conversion step expression matches, which is certainly fairly expensive, is to redefine the default_filter of the template: http://www.makotemplates.org/docs/filtering.html#filtering_expression_defaultfilters

  • the explicit kwargs in Template are to allow checking for valid arguments.

  • the use_unicode flag I don't exactly understand the point of. If it's that you're trying to have a template which contains multibyte characters and you'd like it to go straight through and generate a python file with a "coding" attribute at the top, its not that simple. See #11 for reference. example (fails with the patch, as well as without):

          template = Template("""Alors vous imaginez ma surprise, au lever du jour, quand une drôle de petit voix m’a réveillé. Elle disait: « S’il vous plaît… dessine-moi un mouton! »""", input_encoding='utf-8')
          assert template.render() == """Alors vous imaginez ma surprise, au lever du jour, quand une drôle de petit voix m’a réveillé. Elle disait: « S’il vous plaît… dessine-moi un mouton! »"""
    
@sqlalchemy-bot

This comment has been minimized.

sqlalchemy-bot commented Mar 1, 2008

Changes by Michael Bayer (@zzzeek):

  • changed status to closed
@sqlalchemy-bot

This comment has been minimized.

sqlalchemy-bot commented Mar 4, 2008

Anonymous wrote:

because strings in the compiled source code are unicode, like u'\xxxx', just removing "unicode" from default_filters does not work, it will causes DecodeError if the data is multibyte string.

so, strings must stay like in template source code, such as "我们", and add "# -- encoding:utf-8 --" in compiled source code.

In lexer.py, it try to decode all source code into Unicode, so we need a parameter to turn it off. Then removing "unicode" from default_filters will not cause DecodeError.

Instead of using Unicode, it must be more complicated, but speeds up a bit. I have used it this way and work fine. If you are interesting in it, I will refine the code and submit it again.

@sqlalchemy-bot

This comment has been minimized.

sqlalchemy-bot commented Mar 4, 2008

Changes by Anonymous:

  • changed status to reopened
@sqlalchemy-bot

This comment has been minimized.

sqlalchemy-bot commented Mar 4, 2008

Michael Bayer (@zzzeek) wrote:

can you please attach a template file illustrating what you're referring to ? if the idea is just, "unicode is too slow, just pass through utf-8 directly without processing", that historically has not worked with our particular approach (we tried). Like I pointed out in my example, the patch does not work.

@sqlalchemy-bot

This comment has been minimized.

sqlalchemy-bot commented Mar 6, 2008

Anonymous wrote:

I have updated the patch, and pass all the test cases, including two chinese templates, one using unicode, the other one using utf-8 directly for better performance.

If unicode is not neccessary, Can Mako turn off unicode at default or no unicode at all?

@sqlalchemy-bot

This comment has been minimized.

sqlalchemy-bot commented Mar 7, 2008

Michael Bayer (@zzzeek) wrote:

this part of the patch:

@@ -563,7 +566,7 @@
             "try:")
         self.write_source_comment(node)
         self.printer.writelines(
-                "context.write(unicode(%s))" % node.attributes['expr'],
+                "context.write(%s)" % node.attributes['expr'],
             "finally:",
                 "context.caller_stack.nextcaller = None",
             None

should be calling upon the default_filters in the way that visitExpression does, since a %call approximates saying ${foo()} - so we wouldn't hardcode unicode(), but would instead pull from default_filters. It's a bug on my part, can you work that in to the patch ?

@sqlalchemy-bot

This comment has been minimized.

sqlalchemy-bot commented Mar 7, 2008

Michael Bayer (@zzzeek) wrote:

this will also resolve #11. I do not recall what was causing AST parsing to fail over there since it does not seem to be happening now.

@sqlalchemy-bot

This comment has been minimized.

sqlalchemy-bot commented Mar 7, 2008

Michael Bayer (@zzzeek) wrote:

oh also can we call the flag "disable_unicode=True"

@sqlalchemy-bot

This comment has been minimized.

sqlalchemy-bot commented Mar 7, 2008

Michael Bayer (@zzzeek) wrote:

...which would also replace default filters with [str()]. The point of the default filter of unicode() or str() is so that people can say ${5 + 7} and it renders. It of course can be cleared entirely for performance reasons.

@sqlalchemy-bot

This comment has been minimized.

sqlalchemy-bot commented Mar 10, 2008

Anonymous wrote:

I have updated the patch:

add default filters to %call tag.

replace disable_unicode as "disable_unicode"

set default_filters as ["str"] while disable_unicode is True.

@sqlalchemy-bot

This comment has been minimized.

sqlalchemy-bot commented Mar 21, 2008

Michael Bayer (@zzzeek) wrote:

thanks. Committed a modified version in d5f83e6 which retains identical Mako behavior if the flag is off, which is the default setting for both Template and TemplateLookup. Also added new documentation for this mode. Since not using unicode is against Mako's general philosophy, the docs warn against using this flag unless users are absolutely sure they want it (if anyone reports UnicodeDecode errors with this flag, they're using it wrong and will be urged to stop using it), and it's almost certain that this feature will not be available in the Python 3000 version since Py3K standardizes on unicode strings everywhere.

@sqlalchemy-bot

This comment has been minimized.

sqlalchemy-bot commented Mar 21, 2008

Changes by Michael Bayer (@zzzeek):

  • changed status to closed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment