New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

allow usage of non-ascii bytestring literals in templates #11

Closed
sqlalchemy-bot opened this Issue Jan 19, 2007 · 14 comments

Comments

Projects
None yet
1 participant
@sqlalchemy-bot

sqlalchemy-bot commented Jan 19, 2007

Migrated issue, originally created by Anonymous

The mako template parser has a problem, or a weirdness, depending on your view. Basically it is not possible to compile any template that contains non-ascii characters inside the ${} code. The problem traces back to python's built-in compiler inability to compile out-of-ascii unicode source. To fix it some kind of encoding-juggling inside ast.py (the 'parse' function?) would be needed as well as adding a #-*- prefix to the code being compiled there. Alas, I haven't been able to fix this myself (mysterious body snatcher exceptions pop out) neither have I enough time to work on it but I'm sure you get the idea.

To replicate the problem, just compile "${f('\u0142')}" as a mako template.

I should add that the problem is serious, at least for us and a showstopper for mako adoption in our project.


Attachments: alternate_unicode.patch

@sqlalchemy-bot

This comment has been minimized.

sqlalchemy-bot commented Jan 19, 2007

Michael Bayer (@zzzeek) wrote:

ive added backslash replacing for non-ascii characters to expressions sent for AST parsing within expressions, python code blocks, and control lines in [changeset:189]. check the unit tests added to that changeset to get the idea. note that using non-ascii characters anywhere in templates requires that the encoding of the template be specified at the top via a "magic encoding comment".

@sqlalchemy-bot

This comment has been minimized.

sqlalchemy-bot commented Jan 19, 2007

Changes by Michael Bayer (@zzzeek):

  • changed status to closed
@sqlalchemy-bot

This comment has been minimized.

sqlalchemy-bot commented Jan 22, 2007

Anonymous wrote:

I'm afraid it's still wrong. Test case:

import mako.template
t = u"#-*- encoding:utf-8\n${f('\u0142')}".encode('utf-8')
te = mako.template.Template(t)
te.render_unicode(f=lambda x:x)

returns u'\u0142', should return u'\u0142' (tested on svn rev 190).

@sqlalchemy-bot

This comment has been minimized.

sqlalchemy-bot commented Jan 22, 2007

Changes by Anonymous:

  • changed status to reopened
@sqlalchemy-bot

This comment has been minimized.

sqlalchemy-bot commented Jan 22, 2007

Michael Bayer (@zzzeek) wrote:

im sorry, i dont understand at this point. test case:

import mako.template


t = u"#-*- encoding:utf-8\n${f('\u0142')}".encode('utf-8')
te = mako.template.Template(t)
print te.code
f = lambda x:x

assert f('\u0142') == te.render_unicode(f=f)
print repr(unicode(f('\u0142')))
print repr(te.render_unicode(f=lambda x:x))`

generated code (if you believe this is incorrect, tell me what it should say - note that all expressions are expected to be str()-able or unicode expressions since they get passed to unicode() unconditionally - use context.write() to bypass this):

from mako import runtime, filters, cache
UNDEFINED = runtime.UNDEFINED
_magic_number = 1
_modified_time = 1169479868.3539629
_template_filename=None
_template_uri='memory:0x63f30'
_template_cache=cache.Cache(__name__, _modified_time)
_exports = []


def render_body(context,**pageargs):
    __locals = dict(pageargs=pageargs)
    f = context.get('f', UNDEFINED)
    # SOURCE LINE 2
    context.write(unicode(f('\u0142')))
    return ''

program output - assertion case passes:

u'\\u0142'
u'\\u0142'

also observe the unit tests added within the changeset, which embed literal multibyte expressions that come out identically to the original.

@sqlalchemy-bot

This comment has been minimized.

sqlalchemy-bot commented Jan 22, 2007

Michael Bayer (@zzzeek) wrote:

also, try out the attached patch. it breaks all the current unit tests but i think its what you are looking for, it basically passes the string straight through, adds the "coding" comment to the top of the generated file. i would essentially have to throw out the whole way Mako does unicode and rewrite it to go this approach, it seems.

@sqlalchemy-bot

This comment has been minimized.

sqlalchemy-bot commented Jan 22, 2007

Michael Bayer (@zzzeek) wrote:

OK, it was using cStringIO. this one passes most tests. again, basic idea is just spitting out the genned module in the same encoding as what was given. not sure if its working all the way though. i know what youre looking for, the total "straight through" without using u"" at all. not sure if i can get this working totally.

@sqlalchemy-bot

This comment has been minimized.

sqlalchemy-bot commented Jan 22, 2007

Michael Bayer (@zzzeek) wrote:

also im being told that Genshi requires non-ascii strings be sent as u'' as well, so im not sure if this issue is limited to Mako.

@sqlalchemy-bot

This comment has been minimized.

sqlalchemy-bot commented Jan 23, 2007

Anonymous wrote:

I guess I introduced confusion with '\u0142' which should actually be u'\u0142' - a subtle but important difference :)

Now, this assertion should hold, but doesn't:

assert f(u'\u0142') == te.render_unicode(f=f)

where te = Template(u"#-*- encoding:utf-8\n${f('\u0142')}".encode('utf-8'))

I'm currently reviewing your code and the patch attached and looking for a way to implement what I want. Will keep you updated.

@sqlalchemy-bot

This comment has been minimized.

sqlalchemy-bot commented May 1, 2007

Michael Bayer (@zzzeek) wrote:

ultimately, to make everyone no longer notice that you have to say u'foo' and not 'foo', we have to make it so that generated modules are in the same encoding as the source file. a lot of weird problems arise when you do this, including that the AST parsing needs to be passed bytestrings instead of unicode objects, which then breaks other stuff, and so on. i dont think its high priority now since id prefer people to just use unicode objects.

@sqlalchemy-bot

This comment has been minimized.

sqlalchemy-bot commented May 1, 2007

Changes by Michael Bayer (@zzzeek):

  • removed labels: bug
  • added labels: easy, feature
  • changed title from "non-ASCII code problem" to "allow usage of non-ascii bytestring literals in te"
@sqlalchemy-bot

This comment has been minimized.

sqlalchemy-bot commented Mar 7, 2008

Michael Bayer (@zzzeek) wrote:

someone has posted a working patch for this in #77 so lets move over to there

@sqlalchemy-bot

This comment has been minimized.

sqlalchemy-bot commented Mar 7, 2008

Changes by Michael Bayer (@zzzeek):

  • changed status to closed
@sqlalchemy-bot

This comment has been minimized.

sqlalchemy-bot commented Mar 21, 2008

Michael Bayer (@zzzeek) wrote:

in d5f83e6:

from mako.template import Template

f = lambda x:x
te = Template(u"#-*- encoding:utf-8\n${f('\u0142')}".encode('utf-8'), disable_unicode=True)
assert f(u'\u0142'.encode('utf-8')) == te.render(f=f)

passes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment