Skip to content

Commit

Permalink
Merge pull request #797 from jazzband/unicode-slugify
Browse files Browse the repository at this point in the history
Preserve unicode when slugifying by default
  • Loading branch information
rtpg committed Apr 25, 2022
2 parents 5dfc48d + ddb5ce6 commit 191d727
Show file tree
Hide file tree
Showing 6 changed files with 98 additions and 4 deletions.
21 changes: 20 additions & 1 deletion CHANGELOG.rst
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,26 @@ Changelog
(Unreleased)
~~~~~~~~~~~~

* Drop Django 2.2 support.
* **Backwards incompatible:** Tag slugification used to silently strip non-ASCII characters
from the tag name to make the slug. This leads to a lot of confusion for anyone using
languages with non-latin alphabets, as well as weird performance issues.

Tag slugification will now, by default, maintain unicode characters as-is during
slugification. This will lead to less surprises, but might cause issues for you if you are
expecting all of your tag slugs to fit within a regex like ``[a-zA-Z0-9]`` (for example in
URL routing configurations).

Generally speaking, this should not require action on your part as a library user, as
existing tag slugs are persisted in the database, and only new tags will receive the
enhanced unicode-compatible slug.

If you wish to maintain the old stripping behavior, set the setting
``TAGGIT_STRIP_UNICODE_WHEN_SLUGIFYING`` to ``True``.

As a reminder, custom tag models can easily customize slugification behavior by overriding
the ``slugify`` method to your business needs.

`` Drop Django 2.2 support.

2.1.0 (2022-01-24)
~~~~~~~~~~~~~~~~~~
Expand Down
3 changes: 2 additions & 1 deletion docs/faq.rst
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,8 @@ Frequently Asked Questions


One way to handle this is with post-generation hooks::
class ProductFactory(DjangoModelFactory):

class ProductFactory(DjangoModelFactory):
# Rest of the stuff

@post_generation
Expand Down
20 changes: 20 additions & 0 deletions docs/getting_started.rst
Original file line number Diff line number Diff line change
Expand Up @@ -27,3 +27,23 @@ And then to any model you want tagging on do the following::
If you want ``django-taggit`` to be **CASE-INSENSITIVE** when looking up existing tags, you'll have to set ``TAGGIT_CASE_INSENSITIVE`` (in ``settings.py`` or wherever you have your Django settings) to ``True`` (``False`` by default)::

TAGGIT_CASE_INSENSITIVE = True


Settings
--------

The following Django-level settings affect the behavior of the library

* ``TAGGIT_CASE_INSENSITIVE``

When set to ``True``, tag lookups will be case insensitive. This defaults to ``False``.

`` ``TAGGIT_STRIP_UNICODE_WHEN_SLUGIFYING``
When this is set to ``True``, tag slugs will be limited to ASCII characters. In this case, if you also have ```unidecode`` installed,
then tag sluggification will transform a tag like ``あい うえお`` to ``ai-ueo``.
If you do not have ``unidecode`` installed, then you will usually be outright stripping unicode, meaning that something like ``helloあい`` will be slugified as ``hello``.

This value defaults to ``False``, meaning that unicode is preserved in slugification.

Because the behavior when ``True`` is set leads to situations where
slugs can be entirely stripped to an empty string, we recommend not activating this.
20 changes: 20 additions & 0 deletions taggit/migrations/0005_auto_20220424_2025.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
# Generated by Django 2.2.26 on 2022-04-24 20:25

from django.db import migrations, models


class Migration(migrations.Migration):

dependencies = [
("taggit", "0004_alter_taggeditem_content_type_alter_taggeditem_tag"),
]

operations = [
migrations.AlterField(
model_name="tag",
name="slug",
field=models.SlugField(
allow_unicode=True, max_length=100, unique=True, verbose_name="slug"
),
),
]
11 changes: 9 additions & 2 deletions taggit/models.py
Original file line number Diff line number Diff line change
@@ -1,3 +1,4 @@
from django.conf import settings
from django.contrib.contenttypes.fields import GenericForeignKey
from django.contrib.contenttypes.models import ContentType
from django.db import IntegrityError, models, router, transaction
Expand All @@ -19,7 +20,10 @@ class TagBase(models.Model):
verbose_name=pgettext_lazy("A tag name", "name"), unique=True, max_length=100
)
slug = models.SlugField(
verbose_name=pgettext_lazy("A tag slug", "slug"), unique=True, max_length=100
verbose_name=pgettext_lazy("A tag slug", "slug"),
unique=True,
max_length=100,
allow_unicode=True,
)

def __str__(self):
Expand Down Expand Up @@ -71,7 +75,10 @@ def save(self, *args, **kwargs):
return super().save(*args, **kwargs)

def slugify(self, tag, i=None):
slug = slugify(unidecode(tag))
if getattr(settings, "TAGGIT_STRIP_UNICODE_WHEN_SLUGIFYING", False):
slug = slugify(unidecode(tag))
else:
slug = slugify(tag, allow_unicode=True)
if i is not None:
slug += "_%d" % i
return slug
Expand Down
27 changes: 27 additions & 0 deletions tests/test_models.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
from django.test import TestCase, override_settings

from tests.models import TestModel


class TestSlugification(TestCase):
def test_unicode_slugs(self):
"""
Confirm the preservation of unicode in slugification by default
"""
sample_obj = TestModel.objects.create()
# a unicode tag will be slugified for space reasons but
# unicode-ness will be kept by default
sample_obj.tags.add("あい うえお")
self.assertEqual([tag.slug for tag in sample_obj.tags.all()], ["あい-うえお"])

def test_old_slugs(self):
"""
Test that the setting that gives us the old slugification behavior
is in place
"""
with override_settings(TAGGIT_STRIP_UNICODE_WHEN_SLUGIFYING=True):
sample_obj = TestModel.objects.create()
# a unicode tag will be slugified for space reasons but
# unicode-ness will be kept by default
sample_obj.tags.add("あい うえお")
self.assertEqual([tag.slug for tag in sample_obj.tags.all()], [""])

0 comments on commit 191d727

Please sign in to comment.