non-ascii characters in demo gallery image filenames cause problems #463

TierraDelFuego opened this Issue Nov 5, 2012 · 3 comments


None yet

2 participants


I see that a similar issue was raised and closed but the real fix is to not use non-ascii chars. See RFC 3986

(NB: This analysis is attributable to the support from

This might look like the proper filename-part of a URI

However, this didn't work. Some investigation revealed that this is
because there are two separate ways to write the acute "A" character in
UTF-8 that look identical.

It can be written as an ASCII "A" followed by a "COMBINING ACUTE ACCENT"
(see, or

The trouble is that the UTF-8 hexadecimal representations of these two
glyphs are completely different. In the first case, it looks like this
in hex:

41 cc 81

And in the second case, it looks like this:

c3 81

That really matters, because there is actually no such thing as a UTF-8
HTTP URL. Instead, browsers translate whatever you type to what they
believe is the correct hex representation and send that to the server.
But the "what they believe is the correct hex representation" part can
go wrong: the reason that
Ávila%2C%20Spain.jpg didn't seem to work was
that the two browsers we tested prefer the second form of the encoding,
and send that to the server as:


But that's not how you saved the filename -- instead, you saved it as
the first form, an ASCII "A" followed by hex "cc 81":


Which, as you can see, works fine if you force the browser to send that

And this discussion doesn't even contemplate the possibility that
someone might be using a browser with a non-UTF-8 character set that
sends the accented "A" character as a completely different byte
representation in another charset. For example, the character is simply
the hex byte c1 in ISO-8859-1, so a browser using that charset may send
it as:


The server has no way to know that these three different representation
are all supposed to be the same file; it's just matching bytes.

So we really have two choices about how to make this work reliably. The
first is to avoid non-ASCII URLs entirely; that's our advice.

The second is to determine the actual hex filename representation (UTF-8
byte sequence), then force that in URLs, as in the last example above.
To find the filename byte representation, you could try something like
this in the shell:

ls | perl -pe 's/([\x20\x2c\x5c\x80-\xff])/"%" . uc sprintf "%02x",ord($1)/eg;'

Which produces this helpful encoding that would always work in any


Consider that I am new to python and django and mezzanine simultaneously.

The thumb_url looks correct to me but the image_url does not.

Vars to the browser with Debug = True

in thumbnail

244 raise FileSystemEncodingChanged()


u'A\u0301vila, Spain-75x75.jpg'




u'A\u0301vila, Spain'








u'A\u0301vila, Spain.jpg'



u'/path/static/media/uploads/gallery/.thumbnails/A\u0301vila, Spain-75x75.jpg'




u'uploads/gallery/A\u0301vila, Spain.jpg'










This is generally the result of the locale for your filesystem and/or database not supporting unicode.

Can you check those?


This is the mezzanine demo using sqlite3. The filesystem can't be the problem, the files are on disk with those filenames. No doubt you are busy and may have not had time to read my entire long report. I also noticed that your demo site will not allow renaming of files to these kinds of filenames. The install was a slam dunk so that stage passed but still stalled at this stage. These "first impressions" matter. It seems converting to ascii is all that's needed. Would you accept a patch for this?
FWIW I couldn't even find the gallery on your demo page.


I did read through it, it's just that this is an issue that was really ironed out many months ago, and usually always turned out to be a locale issue. It can also manifest itself when the demo data is installed without a correct locale set, and then the correct locale set is after that moment, or vice versa - you mentioned you had some help from your hosting company, so I suspect that this may have occurred as they investigated it.

The demo site doesn't necessarily contain the demo data since it's open to anyone to edit - I've reset the demo data and right at this moment at least, you can see the gallery working fine with unicode filenames in it: - in fact the reason the demo data contains filenames with unicode characters is entirely to raise this issue when it does come up.

Also for reference you can see in the bundled fabfile that we define a locale that supports utf here:

which we then first apply when provisioning a server and log back in before continuing, to ensure everything is correct:

If it is possible that your hosting provider changed the locale after you'd installed the demo data in order to try and rectify this, you might be able to get it working by reinstalling now that the locale is correct. Keep in mind though that this is only demo data, and a quicker path might be simply to remove the demo gallery, given that the locale may be correctly configured now.

@stephenmcd stephenmcd closed this May 14, 2013
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment