This adds a --encoding command line option that allows you to force a particular file encoding when reading template files. AFAICT Cheetah already supports this via the standard -*- encoding: -*- directive, but this was an easier change for me to make than change all of our templates (currently I'm using Cheetah 2.0.1, which happens to work but is super broken in other ways).
-*- encoding: -*-
The unicode parts were a bit over my head, but I think this is correct. Thoughts?
make it possible to force file encoding via the command line
Cheetah 2.0.1 isn't really supported, in fact later releases of Cheetah do a much better job dealing with funky encoded templates because Cheetah now uses nothing but unicode objects internally.
Can you not upgrade? :-/
I'll poke at this a bit tomorrow. So the original issue that I'm having is at Yelp (hi Slide!) we're still on 2.0.1 for production stuff, and I'm trying to get things working on a more modern version of Cheetah, etc. A while back another developer changed the default sigils to unicode characters, namely we have some code like this in our build system:
COMPILER_SETTINGS': ','.join(CHEETAH_OPTS + ["cheetahVarStartToken='≈'", "directiveStartToken='∆'"])
I'm not a frontend developer, but IIRC the reason here was that we use $ all over the place in JS code so it was annoying to always have to escape $ (and easy to make mistakes), and the Unicode characters above are easy to enter in OS X, and have no possibility of conflicting with anything else. Due to some weird stuff I don't fully understand, Cheetah 2.0.1 doesn't really understand that these are Unicode characters (when cheetah-compile is invoked, it doesn't actually know that it's getting UTF-8 arguments), but some combination of a total lack of unicode-awareness everywhere constructively interfered to cause things to work.
The new Cheetah seems way more Unicode-aware, so the old code wasn't jiving (I'm still not really sure how it currently works!), but I was able to fix this by passing in the arguments using the ASCII representations, i.e. by changing the above to:
'COMPILER_SETTINGS': ','.join(CHEETAH_OPTS + ["cheetahVarStartToken=u'\u2248'", "directiveStartToken=u'\u2206'"])
This mostly works, but then in the bowels of cheetah/Compiler.py the input file is created using something like this (pseudo-code):
data = open(path_to_file)
data = unicode(data)
That doesn't work for our templates:
evan@dev1sv:~/pg/main (evan_fedora) $ python
Python 2.5.2 (r252:60911, Jan 20 2010, 23:14:04)
[GCC 4.2.4 (Ubuntu 4.2.4-1ubuntu3)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> data = open('./yelp/web/sites/admin/js/yelp/pages/admin/biz_updates.js.tmpl').read()
Traceback (most recent call last):
File "", line 1, in
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 2622: ordinal not in range(128)
This is because of the embedded unicode characters in these template files, and the fact that all of the templates don't have an encoding specified. The patch I sent you lets me work around this by just changing our Scons code to pass in --encoding='utf-8', and the codecs module will read the template with the right encoding the first time around.
Maybe our approach to this is broken, but this seems to work around it for now. I'm curious if you have any ideas for how to work around this better, or if this is something that you guys have tackled differently at Slide.
Just to make things clear, if it wasn't from above -- we're already on 2.0.1 (plus some patches that we won't need after upgrading), and everything is working. The Scons change that I mentioned above, which changes how we pass arguments into Cheetah, plus the patch I sent you, would let us upgrade to Cheetah 2.4.x.