Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

UnicodeEncodeError in middleware.py #1601

Closed
carstenfuchs opened this issue Mar 29, 2022 · 12 comments
Closed

UnicodeEncodeError in middleware.py #1601

carstenfuchs opened this issue Mar 29, 2022 · 12 comments

Comments

@carstenfuchs
Copy link

Hello,

using Django Debug Toolbar 3.2.4 with Django 4.0.3, I get the following stack trace:

Traceback (most recent call last):
  File "/home/carsten/.virtualenvs/Lori/lib/python3.8/site-packages/django/core/handlers/exception.py", line 55, in inner
    response = get_response(request)
  File "/home/carsten/.virtualenvs/Lori/lib/python3.8/site-packages/debug_toolbar/middleware.py", line 93, in __call__
    response.content = insert_before.join(bits)
  File "/home/carsten/.virtualenvs/Lori/lib/python3.8/site-packages/django/template/response.py", line 143, in content
    HttpResponse.content.fset(self, value)
  File "/home/carsten/.virtualenvs/Lori/lib/python3.8/site-packages/django/http/response.py", line 387, in content
    content = self.make_bytes(value)
  File "/home/carsten/.virtualenvs/Lori/lib/python3.8/site-packages/django/http/response.py", line 296, in make_bytes
    return bytes(value.encode(self.charset))
UnicodeEncodeError: 'utf-8' codec can't encode characters in position 181020-181021: surrogates not allowed

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/carsten/.virtualenvs/Lori/lib/python3.8/site-packages/django/core/handlers/wsgi.py", line 132, in __call__
    response = self.get_response(request)
  File "/home/carsten/.virtualenvs/Lori/lib/python3.8/site-packages/django/core/handlers/base.py", line 140, in get_response
    response = self._middleware_chain(request)
  File "/home/carsten/.virtualenvs/Lori/lib/python3.8/site-packages/django/core/handlers/exception.py", line 57, in inner
    response = response_for_exception(request, exc)
  File "/home/carsten/.virtualenvs/Lori/lib/python3.8/site-packages/django/core/handlers/exception.py", line 139, in response_for_exception
    response = handle_uncaught_exception(
  File "/home/carsten/.virtualenvs/Lori/lib/python3.8/site-packages/django/core/handlers/exception.py", line 180, in handle_uncaught_exception
    return debug.technical_500_response(request, *exc_info)
  File "/home/carsten/.virtualenvs/Lori/lib/python3.8/site-packages/django/views/debug.py", line 67, in technical_500_response
    return HttpResponse(html, status=status_code, content_type="text/html")
  File "/home/carsten/.virtualenvs/Lori/lib/python3.8/site-packages/django/http/response.py", line 355, in __init__
    self.content = content
  File "/home/carsten/.virtualenvs/Lori/lib/python3.8/site-packages/django/http/response.py", line 387, in content
    content = self.make_bytes(value)
  File "/home/carsten/.virtualenvs/Lori/lib/python3.8/site-packages/django/http/response.py", line 296, in make_bytes
    return bytes(value.encode(self.charset))
UnicodeEncodeError: 'utf-8' codec can't encode characters in position 5995-5996: surrogates not allowed

The related section around line 93 in debug_toolbar/middleware.py is:

        # Always render the toolbar for the history panel, even if it is not
        # included in the response.
        rendered = toolbar.render_toolbar()

        # …

        # Insert the toolbar in the response.
        content = response.content.decode(response.charset)
        insert_before = dt_settings.get_config()["INSERT_BEFORE"]
        pattern = re.escape(insert_before)
        bits = re.split(pattern, content, flags=re.IGNORECASE)
        if len(bits) > 1:
            bits[-2] += rendered
            response.content = insert_before.join(bits)
            if "Content-Length" in response:
                response["Content-Length"] = len(response.content)
        return response

If I replace line

            bits[-2] += rendered

with

            bits[-2] += rendered.encode('ascii', 'replace').decode()

in order to get rid of any problematic characters, it works.

Unfortunately, I've no idea what might cause this and I'm not sure how to proceed from here?

@tim-schilling
Copy link
Contributor

Do you know which characters are being rendered and in which panels they are coming from?

@carstenfuchs
Copy link
Author

I modified the above to replace the german Umlaute (äöüÄÖÜß) with something safe. Not elegant at all, but:

            bits[-2] += rendered.replace('ä', 'XXX').replace('Ä', 'XXX').replace('ö', 'XXX').replace('Ö', 'XXX').replace('ü', 'XXX').replace('Ü', 'XXX').replace('ß', 'XXX').encode('ascii', 'backslashreplace').decode()

Replacing the Umlaute alone was not enough, the encode-decode-step is still necessary. With backslashreplace as the method for the remaining Unicode characters, this yields the attached screenshot. Note the \xbb near the top:
grafik
However, I'm not sure if this is actually the culprit – the unicode characters that cause the trouble might still be elsewhere.

Grepping the page HTML source for occurrences of \x, I found these fragments:

  • <a id="djHideToolBarButton" href="#" title="Toolbar ausblenden">Ausblenden \xbb</a></li>
  • <button type="button" class="djDebugClose">\xd7</button>
  • (&#x27;nb&#x27;, &#x27;Norwegian Bokm\xe5l&#x27;),

Maybe it's on of these? If I can figure out what the original Unicode characters are for these, I can try and replace them as well.

@carstenfuchs
Copy link
Author

I made some progress and eventually managed to find the surrogates that are mentioned in the stack trace:

In one of my apps, I have static files with german umlaut characters, e.g.
Handbuch/images/kalendereinträge.png

These files are listed in the "Static files" panel in section django.contrib.staticfiles.finders.AppDirectoriesFinder, where they cause the reported error.

My current work-around is to replace only the surrogate characters:

            rendered = rendered.replace('\udcc3\udcbc', '___???___')
            rendered = rendered.replace('\udcc3\udca4', '___???___')
            bits[-2] += rendered

However, I still have no idea how the surrogate characters come up in the first place: My system is Ubuntu 20.04 LTS and there is nothing special about the above mentioned files at all.

Can anyone reproduce this?

@matthiask
Copy link
Member

Is it possible that your filesystem encoding (in Python) isn't set to UTF-8? My systemd user units always contain the following environment variables: Environment=LANG=en_US.UTF-8 LC_ALL=en_US.UTF-8 LC_CTYPE=en_US.UTF-8

I remember that we had many problems in the past without this; the server process would basically crash each time someone uploaded files containing umlauts. I'm not 100% sure if this happened only with Python 2 or also with Python 3 though so this could be a dead end.

@matthiask
Copy link
Member

... that being said, maybe the toolbar should expect filenames which cannot be properly converted to UTF-8...

@carstenfuchs
Copy link
Author

@matthiask thanks for your hints! You're right, my filesystem encoding in Python was indeed set to ascii and I could resolve the problem by setting LANG=de_DE.UTF-8 in the Apache /etc/apache2/envvars config file. (I used to use LANG=C for over 10 years, so now I'm mildly worried that the change might introduce subtle side effects elsewhere. Maybe I'll still switch to en_US.UTF-8, after all.)

Although the problem was eventually caused by my Apache config alone, this seems to be a very complicated topic. Maybe it would be possible for djdt to warn about non-UTF-8 filesystem encodings? (Or better about filenames that cannot be properly decoded?)

Thank you!

@matthiask
Copy link
Member

Oh yeah, it's complicated and very annoying.

I'm unsure what we should do. On one hand django-debug-toolbar shouldn't crash, on the other hand it's documented that Django expects an UTF-8 environment (not C) here https://docs.djangoproject.com/en/4.0/ref/unicode/#files So maybe the somewhat strange behavior is to be expected?

I didn't even know that this was documented, this section has been added recently (in 2015 😅)

@carstenfuchs
Copy link
Author

Thanks for the link! My project started in 2011 and even though I regularly work through all Release Notes very carefully, it is easy to miss such updates in the docs, useful and worthwhile as they are.

Imho, it would be ideal if this could be covered with a Django system check, which however is in vain here, given that command line shells and webservers tend to have different environments.

@matthiask
Copy link
Member

We already discussed adding checks for issues which are (arguably) only surfaced but not really caused by django-debug-toolbar in the past; the last time it was about static files as well #1318

Such things are really hard to debug sometimes if you don't already know where to look so I think it may be time to revisit my stance on this. I wrote that I am slightly against adding checks for other apps (even if those other apps are bundled with Django) but I'm not so sure anymore.

Here would probably be the place for such a new check:

def run_checks(cls):
"""
Check that the integration is configured correctly for the panel.
Specifically look for static files that haven't been collected yet.
Return a list of :class: `django.core.checks.CheckMessage` instances.
"""
errors = []
for finder in finders.get_finders():
try:
for path, finder_storage in finder.list([]):
finder_storage.path(path)
except OSError:
errors.append(
Warning(
"debug_toolbar requires the STATICFILES_DIRS directories to exist.",
hint="Running manage.py collectstatic may help uncover the issue.",
id="debug_toolbar.staticfiles.W001",
)
)
return errors

@carstenfuchs
Copy link
Author

This is looking great! :-)

If I understand things correctly though, it might be possible that an error that raises an exception (such as here with the surrogates) dominates the checks, as it never gives them a chance to be displayed as part of the normal output.

@carstenfuchs
Copy link
Author

Oh, and this would be a check for the Django core, not for a Django app, not even a built-in.

@tim-schilling
Copy link
Contributor

Since we haven't seen any thank yous or emojis in this thread, I'm closing this issue rather than implementing a check.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants