Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix #272 (wrong slugs for non-latin chars) #273

Closed
wants to merge 2 commits into from

Conversation

spapas
Copy link
Contributor

@spapas spapas commented Oct 11, 2014

This PR resolves #272.

@spapas
Copy link
Contributor Author

spapas commented Oct 19, 2014

I'd forgoten to include Unidecode to install_requires of setup.py - now the tests pass.

@fizista
Copy link

fizista commented Dec 12, 2014

License Package "Unidecode" would require changing the license taggit package under the GPL. And besides, I have a problem after applying this patch.

@spapas
Copy link
Contributor Author

spapas commented Dec 12, 2014

Using a GPL program (without including it anywhere) will require to change the license of django-taggit ? Are you sure about that ?

Also, what problem do you have ?

@fizista
Copy link

fizista commented Dec 12, 2014

Read the second point of the license.
http://www.gnu.org/licenses/gpl-2.0.html

If the library was under the LGPL license you do not need to change the current library license.
http://fosswire.com/post/2007/04/the-differences-between-the-gpl-lgpl-and-the-bsd/

@fizista
Copy link

fizista commented Dec 14, 2014

A good solution would be to implement this library here:
https://github.com/un33k/django-uuslug

@spapas
Copy link
Contributor Author

spapas commented Dec 15, 2014

@fizista django-uuslug seems like a very nice solution, however if you check its requirements, it uses python-slugify (https://github.com/un33k/python-slugify/) which ... (guess what)

also uses Unidecode: https://github.com/un33k/python-slugify/blob/master/requirements.txt

Also, for reference, django uses a simple javascript search-and-replace algorithm in the admin prepopulate_fields which contains all the replacements it will do to create unicode slugs (https://github.com/django/django/blob/master/django/contrib/admin/static/admin/js/urlify.js) , for example

var GREEK_MAP = {
    'α':'a', 'β':'b', 'γ':'g', 'δ':'d', 'ε':'e', 'ζ':'z', 'η':'h', 'θ':'8',

However, this is in javascript!

I think that Unidecode is the only way to create unicode-compliant slugs in python because it contains a complete list of transliterations of unicode characters to latin ones (much more complete than the one django-admin uses) - for example, here are the greek letters: https://github.com/iki/unidecode/blob/master/unidecode/x003.py (from 0x91 = Α, Β, Γ etc)

What could we do to enable unicode slugs ?

Here's a thought: If I removed unidecode from the dependencies of setup.py (so unidecode wouldn't be a requirement of djangoi-taggit) and just do the following in models.py:

slug = default_slugify(tag)
try:
    from unidecode import unidecode
    slug = default_slugify(unidecode(tag))
except:
    slug = default_slugify(tag)

(meaning if the user has installed unidecode then use it to transliterate the tag, if not fail silently and use tag as is). This way, django-taggit would just use the public API of unidecode if the user has installed it himself and wouldn't require it at all. Would this also violate the GPL ?

@fizista
Copy link

fizista commented Dec 15, 2014

You're right. Licenses sometimes make me crazy.

I found some other solution:
https://github.com/mozilla/unicode-slugify/

The only question is whether we can in URLs, place a non-ASCII characters, which, however, are letters.

If the Mozilla Foundation use this library in your own projects, so I guess there is no problem?

@spapas
Copy link
Contributor Author

spapas commented Dec 15, 2014

The unicode-slugify library seems to be ok concerning the license. However, I don't really like the way it works:

From what I've seen, it just converts the unicode strings to unicode slugs by removing spaces, dots etc. It won't do any transliteration of unicode characters to latin (for instance α το a or γ to g etc). Django-admin does not work like this (it works as I've explained in my previous answer) since it transliterates the characters to their latin counterparts.

Also, another reason against the use of unicode characters in slugs is that when you copy the slug from the browser, it will copy the unicode entities representation of the slug. What I mean is that if I go to the url: http://example.com/δοκιμη and then try to copy that URL, I will receive http://example.com/%CE%B4%CE%BF%CE%BA%CE%B9%CE%BC%CE%B7 when I paste it (!)

Of course, some sites do use unicode characters in URLs (for instance, the greek version of wikipedia http://el.wikipedia.org/wiki/%CE%86%CE%BB%CF%86%CE%B1) however I still think that we need to be consistent with django. So I recommend finding out a different solution...

@yohanboniface
Copy link

Is there any official status of this PR from the maintainers?

I'm interested too to have it merged. And I also think that using unidecode only if installed (as suggested in #273 (comment)) could be a nice way to keep the default installation light and without deps, and to prevent any licence change issue.

yohanboniface added a commit to ideascube/ideascube that referenced this pull request Mar 30, 2015
It's possible to override the Tag model used by taggit, but this
means customizing two models, and so having new table in database
(or hardcoding their name to match the current taggit ones). And I
prefer not to go such a complex way for only overriding a slugify
method.
Let's wait for jazzband/django-taggit#273 or
any other option without impact on DB.
@frewsxcv
Copy link
Collaborator

frewsxcv commented Apr 1, 2015

It'd be great if this could be a feature one could opt-in to instead of having a library that everyone has to install. Maybe some way for the user to specify a function they want to use to generate a slug? Also, I'm not really a fan of adding a GPL dependency to a BSD licensed library.

@spapas
Copy link
Contributor Author

spapas commented Apr 1, 2015

@frewsxcv I proposed on my 15 December 2014 comment (#273 (comment)) a solution that doesn't require install of unidecode (use a try/catch block to check if unidecode is installed and fall back to normal slugify if it's not been installed) however I don't want to update my PR until I know it'll be merged ☺️

@frewsxcv
Copy link
Collaborator

Anyone have opinions on #315 before I merge it?

@spapas
Copy link
Contributor Author

spapas commented Jun 23, 2015

@frewsxcv 👍

@frewsxcv
Copy link
Collaborator

#315 has been merged

0.15.0 has been released which includes it https://github.com/alex/django-taggit/blob/develop/CHANGELOG.txt#L4

@frewsxcv frewsxcv closed this Jun 24, 2015
@frewsxcv
Copy link
Collaborator

Also, if anyone wants to go above and beyond, we'd probably eventually want a regression test and also a mentioning of the optional dependency (unidecode) in the documentation. If no one else gets around to it, I can take care of it later.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants