Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unicode patch email rejected #502

Open
siddhesh opened this issue Oct 5, 2022 · 6 comments
Open

Unicode patch email rejected #502

siddhesh opened this issue Oct 5, 2022 · 6 comments

Comments

@siddhesh
Copy link
Contributor

siddhesh commented Oct 5, 2022

This patch on libc-alpha did not appear in patchwork as it crashed during parsing:

https://sourceware.org/pipermail/libc-alpha/2022-October/142506.html

Here's the backtrace at the point of the crash:

Error when parsing incoming email: OperationalError(1366, "Incorrect string value: '\\xF0\\x91\\x82\\x80 0...' for column `patchwork`.`patchwork_patch`.`content` at row 1")
Traceback (most recent call last):
  File "/path/to/patchwork/lib/python3/lib/python3.6/site-packages/django/db/backends/utils.py", line 84, in _execute
    return self.cursor.execute(sql, params)
  File "/path/to/patchwork/lib/python3/lib/python3.6/site-packages/django/db/backends/mysql/base.py", line 73, in execute
    return self.cursor.execute(query, args)
  File "/path/to/patchwork/lib/python3/lib64/python3.6/site-packages/MySQLdb/cursors.py", line 206, in execute
    res = self._query(query)
  File "/path/to/patchwork/lib/python3/lib64/python3.6/site-packages/MySQLdb/cursors.py", line 319, in _query
    db.query(q)
  File "/path/to/patchwork/lib/python3/lib64/python3.6/site-packages/MySQLdb/connections.py", line 254, in query
    _mysql.connection.query(self, query)
MySQLdb._exceptions.OperationalError: (1366, "Incorrect string value: '\\xF0\\x91\\x82\\x80 0...' for column `patchwork`.`patchwork_patch`.`content` at row 1")

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/path/to/patchwork/patchwork.sourceware.org/patchwork/management/commands/parsemail.py", line 60, in handle
    result = parse_mail(mail, options['list_id'])
  File "/path/to/patchwork/patchwork.sourceware.org/patchwork/parser.py", line 1191, in parse_mail
    state=find_state(mail),
  File "/path/to/patchwork/lib/python3/lib/python3.6/site-packages/django/db/models/manager.py", line 85, in manager_method
    return getattr(self.get_queryset(), name)(*args, **kwargs)
  File "/path/to/patchwork/lib/python3/lib/python3.6/site-packages/django/db/models/query.py", line 447, in create
    obj.save(force_insert=True, using=self.db)
  File "/path/to/patchwork/patchwork.sourceware.org/patchwork/models.py", line 554, in save
    super(Patch, self).save(**kwargs)
  File "/path/to/patchwork/patchwork.sourceware.org/patchwork/models.py", line 385, in save
    super(EmailMixin, self).save(*args, **kwargs)
  File "/path/to/patchwork/lib/python3/lib/python3.6/site-packages/django/db/models/base.py", line 754, in save
    force_update=force_update, update_fields=update_fields)
  File "/path/to/patchwork/lib/python3/lib/python3.6/site-packages/django/db/models/base.py", line 792, in save_base
    force_update, using, update_fields,
  File "/path/to/patchwork/lib/python3/lib/python3.6/site-packages/django/db/models/base.py", line 895, in _save_table
    results = self._do_insert(cls._base_manager, using, fields, returning_fields, raw)
  File "/path/to/patchwork/lib/python3/lib/python3.6/site-packages/django/db/models/base.py", line 935, in _do_insert
    using=using, raw=raw,
  File "/path/to/patchwork/lib/python3/lib/python3.6/site-packages/django/db/models/manager.py", line 85, in manager_method
    return getattr(self.get_queryset(), name)(*args, **kwargs)
  File "/path/to/patchwork/lib/python3/lib/python3.6/site-packages/django/db/models/query.py", line 1254, in _insert
    return query.get_compiler(using=using).execute_sql(returning_fields)
  File "/path/to/patchwork/lib/python3/lib/python3.6/site-packages/django/db/models/sql/compiler.py", line 1397, in execute_sql
    cursor.execute(sql, params)
  File "/path/to/patchwork/lib/python3/lib/python3.6/site-packages/django/db/backends/utils.py", line 66, in execute
    return self._execute_with_wrappers(sql, params, many=False, executor=self._execute)
  File "/path/to/patchwork/lib/python3/lib/python3.6/site-packages/django/db/backends/utils.py", line 75, in _execute_with_wrappers
    return executor(sql, params, many, context)
  File "/path/to/patchwork/lib/python3/lib/python3.6/site-packages/django/db/backends/utils.py", line 84, in _execute
    return self.cursor.execute(sql, params)
  File "/path/to/patchwork/lib/python3/lib/python3.6/site-packages/django/db/utils.py", line 90, in __exit__
    raise dj_exc_value.with_traceback(traceback) from exc_value
  File "/path/to/patchwork/lib/python3/lib/python3.6/site-packages/django/db/backends/utils.py", line 84, in _execute
    return self.cursor.execute(sql, params)
  File "/path/to/patchwork/lib/python3/lib/python3.6/site-packages/django/db/backends/mysql/base.py", line 73, in execute
    return self.cursor.execute(query, args)
  File "/path/to/patchwork/lib/python3/lib64/python3.6/site-packages/MySQLdb/cursors.py", line 206, in execute
    res = self._query(query)
  File "/path/to/patchwork/lib/python3/lib64/python3.6/site-packages/MySQLdb/cursors.py", line 319, in _query
    db.query(q)
  File "/path/to/patchwork/lib/python3/lib64/python3.6/site-packages/MySQLdb/connections.py", line 254, in query
    _mysql.connection.query(self, query)
django.db.utils.OperationalError: (1366, "Incorrect string value: '\\xF0\\x91\\x82\\x80 0...' for column `patchwork`.`patchwork_patch`.`content` at row 1")

The backtrace was attained using the following change to the sourceware instance, maybe something like this could be useful for general purpose logging too:

 
 PYTHONPATH="${PATCHWORK_BASE}:${PATCHWORK_BASE}/lib/python:$PYTHONPATH" \
     DJANGO_SETTINGS_MODULE="$DJANGO_SETTINGS_MODULE" \
-    "$PW_PYTHON" "$PATCHWORK_BASE/manage.py" parsemail "$@"
+    "$PW_PYTHON" "$PATCHWORK_BASE/manage.py" parsemail \
+    "$@" >> $PATCHWORK_BASE/patchwork-mailstat.log 2>&1
 
 # NOTE(stephenfin): We must return 0 here. When parsemail is used as a
 # delivery command from a mail server like postfix (as it is intended
@codonell
Copy link

codonell commented Oct 5, 2022

The Unicode updates are certainly going to exercise UTF-8 parsing in the entire pipeline. We are not going to have invalid UTF-8, but we may have UTF-8 code points that cannot be displayed correctly because we are updating to code points that have just been added to the standard.

@stephenfin
Copy link
Member

stephenfin commented Oct 10, 2022

We assume that the database can store UTF-8. I suspect you didn't set the encoding for the table before running manage.py migrate on day 0? This is called out in the documentation as a requirement:

Note

As noted in the Django documentation, Django expects databases to be configured with an encoding of UTF-8 or UTF-16. If using MySQL, you may need to configure this this explicitly as older versions defaulted to latin1 encoding. Refer to the MySQL documentation for more information.

You can validate this by running SHOW FULL COLUMNS FROM patchwork_patch. For example, running this against a local database where I did not configure the charset before running manage.py migrate:

mysql> SHOW FULL COLUMNS FROM patchwork_patch;
+--------------+----------------------+-------------------+------+-----+---------+----------------+---------------------------------+---------+
| Field        | Type                 | Collation         | Null | Key | Default | Extra          | Privileges                      | Comment |
+--------------+----------------------+-------------------+------+-----+---------+----------------+---------------------------------+---------+
| id           | int(11)              | NULL              | NO   | PRI | NULL    | auto_increment | select,insert,update,references |         |
| msgid        | varchar(255)         | latin1_swedish_ci | NO   | MUL | NULL    |                | select,insert,update,references |         |
| date         | datetime(6)          | NULL              | NO   |     | NULL    |                | select,insert,update,references |         |
| headers      | longtext             | latin1_swedish_ci | NO   |     | NULL    |                | select,insert,update,references |         |
| content      | longtext             | latin1_swedish_ci | YES  |     | NULL    |                | select,insert,update,references |         |
| name         | varchar(255)         | latin1_swedish_ci | NO   |     | NULL    |                | select,insert,update,references |         |
| project_id   | int(11)              | NULL              | NO   | MUL | NULL    |                | select,insert,update,references |         |
| submitter_id | int(11)              | NULL              | NO   | MUL | NULL    |                | select,insert,update,references |         |
| archived     | tinyint(1)           | NULL              | NO   | MUL | NULL    |                | select,insert,update,references |         |
| commit_ref   | varchar(255)         | latin1_swedish_ci | YES  |     | NULL    |                | select,insert,update,references |         |
| delegate_id  | int(11)              | NULL              | YES  | MUL | NULL    |                | select,insert,update,references |         |
| diff         | longtext             | latin1_swedish_ci | YES  |     | NULL    |                | select,insert,update,references |         |
| hash         | char(40)             | latin1_swedish_ci | YES  |     | NULL    |                | select,insert,update,references |         |
| number       | smallint(5) unsigned | NULL              | YES  |     | NULL    |                | select,insert,update,references |         |
| pull_url     | varchar(255)         | latin1_swedish_ci | YES  |     | NULL    |                | select,insert,update,references |         |
| related_id   | int(11)              | NULL              | YES  | MUL | NULL    |                | select,insert,update,references |         |
| series_id    | int(11)              | NULL              | YES  | MUL | NULL    |                | select,insert,update,references |         |
| state_id     | int(11)              | NULL              | YES  | MUL | NULL    |                | select,insert,update,references |         |
+--------------+----------------------+-------------------+------+-----+---------+----------------+---------------------------------+---------+
18 rows in set (0.01 sec)

As risk of stating the obvious, you can fix this after the fact but it'll take some time. First, modify the default:

ALTER DATABASE {database} DEFAULT CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;

...followed by the following for each table:

ALTER TABLE {table} CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;

@stephenfin
Copy link
Member

stephenfin commented Oct 10, 2022

Ah, wait, I think I misunderstood the issue. So you're submitting content that is currently recognized as invalid unicode? Interesting. 🤔 Should we consider storing patches as binary strings? If so, are we going to be able to decode them on pull if they are "invalid" (to Python) code points? I'm not sure what to do here....

@codonell
Copy link

@stephenfin No, if we need to exercise invalid Unicode then we need to create it by hand carefully and it shouldn't show up in a patch. I don't think that patchwork should need to handle invalid UTF-8.

@stephenfin
Copy link
Member

Okay, then what (if anything) is the ask here? I don't know for sure, but I'm guessing MySQL/MariaDB rejected that patch content because it thought it was invalid unicode?

@siddhesh
Copy link
Contributor Author

It's unlikely to be a table encoding issue since they're at utf8mb4. I altered patchwork_patch to use utf8mb4_unicode_ci instead of the utf8mb4_general_ci that we have elsewhere and it continues to reject that patch.

AFAICT, the email doesn't have invalid unicode either, so not sure what's going on.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants