Skip to content
Browse files

Create Post “”

  • Loading branch information
laymonage committed Aug 27, 2019
1 parent c34e0c6 commit 8dcbd050b677e86f8c4196160ce75b1b95a4e8a1
Showing with 182 additions and 0 deletions.
  1. +182 −0 content/posts/
@@ -0,0 +1,182 @@
title: "Revisions"
date: 2019-08-26T10:08:29+07:00
draft: false
toc: false
comments: true
slug: revisions

Mistakes happen, and that's part of a learning process. In a large project like
Django, it can be hard to spot a mistake. Thanks to it being open source,
anyone can see the code and fix the mistakes they see. In this post, I'll
explain how I found a vulnerability in `contrib.postgres.fields.JSONField` that
allowed SQL injections to be performed.

If you're familiar with Python's [DB-API], you may have used the [`.execute()`]
method. It's a method that can be used to prepare and execute a database
operation (query or command). Parameters can be passed into the method and will
be bound to the variables in the operation.

In general, the parameters should be separated from the operation string. That
means instead of formatting the parameters directly into the string, we should
use a placeholder specified by the [`paramstyle`]. This is to avoid SQL
injection attacks that exploit the quotation marks around values. The Django
[docs] give a quick explanation regarding this.

So, instead of doing something like the following:

>>> query = 'SELECT * FROM myapp_dog WHERE name = %s' % name
>>> connection.cursor.execute(query)

or the following:

>>> query = "SELECT * FROM myapp_dog WHERE name = '%s'"

We should do:

>>> query = 'SELECT * FROM myapp_dog WHERE name = %s'
>>> connection.cursor.execute(query, [name])

On Django, this process is very crucial in the implementation of lookups and
transforms. In fact, Django breaks it down into `process_lhs` and `process_rhs`
methods to process the query string and the parameters of the left-hand-side
(LHS) and right-hand-side (RHS) of a query's `WHERE` clause. There's a
[section] on Django docs that explains this. It doesn't really go in depth, but
it should give you the gist of what's happening under the hood. That's where I
spent most of my time working on this project, but I don't think I can explain
it any clearer than the docs. You just need to read the docs and get your hands
dirty with some hands-on if you really want to understand.

While working on the [`KeyTransform`], I noticed [something off] in the
`as_sql()` method. As you can see there, if there isn't more than one key
transform applied and the string raises `ValueError` when casted into `int`,
the code incorrectly **quotes** the `key_name` and **formats** the lookup
**directly** into the string. Basically, it does exactly the two things I
explained above that shouldn't be done. So, I tried to find a way to exploit
this behavior.

As explained before, we need to exploit the quotation marks. Normally, the
following lookup:

>>> query = Dog.objects.filter(data__breed='labrador')

gets translated into the following SQL query:

SELECT * FROM myapp_dog WHERE (data -> 'breed') = 'labrador'

We need to come up with a special string for the `KeyTransform` on the `data`
`JSONField`. So, the string that follows `data__` should be special such that
it meets the following conditions:

- It raises a `ValueError` when casted into `int`. This is easy, just make sure
it's not an index lookup (`data__1`, `data__42`, etc. don't meet this
- It only counts as one `KeyTransform`: it can't contain double underscore
(`__`) that's followed by a string that's not registered as a lookup.
So, something like `data__breed`, `data__breed__startswith`, or
`data__breed__contains` meet the condition, but `data__owner__name` doesn't.
- It should escape the quotation mark and contain another condition that
evaluates to true. So, it should contain something like `OR 1=1`. This
should be... "impossible".

Keyword arguments must be valid identifiers, therefore we can't include a quote
nor a space. We [can't] do something like:

>>> query = Dog.objecs.filter(data__breed' something OR 1=1 something'='labrador')

as that would obviously raise a `SyntaxError`.


We *can* pass `kwargs` using a dictionary. Any string is a valid key in a

So, it's possible to do this:

>>> kwargs = {"data__breed' = 'a') OR 1 = 1 OR ('d": 'x'}
>>> query = Dog.objects.filter(**kwargs)

Which will be translated into the following SQL query:

SELECT * FROM myapp_dog WHERE (data -> 'breed' = 'a') OR 1 = 1 OR ('d') = '"x"'

As you expect, the query will return **all** `Dog` objects because the
condition `1 = 1` is always true.

In order for all this to happen in real life, someone had to use
`contrib.postgres.JSONField` and provide a way for users to construct their
own key-value pair that will be made into a dictionary and passed as
`kwargs` in the query. It is unlikely, but it's possible. We can never be too
careful when it comes to security. Besides, it's best practice to pass values
using parameters when we're using DB-API anyway.

I contacted the Django security team (``) and they
were pretty quick to respond. One and a half week later, security releases
were [issued] which included a [fix] for the vulnerability. I'm very proud to
see my name on it. Yay!

However, it turns out that the fix wasn't perfect. It spawned a bug that's
split into two tickets ([one] because it's a regression, the [other]'s not).
The regression happened because the fix was based on the code that handles
nested `KeyTransform` and that code already had the bug. The usage of
`KeyTransform` that's affected by this bug is actually undocumented, though.

The fixes ([single] and [nested]) were quite simple. The parameters were in
the wrong order, so the fixes were just to flip them. Funnily, I already did
so when I implemented `KeyTransform` for other database backends. It didn't
mean that the undocumented usage was already working out of the box, though.

While fixing the implementation so that the new tests pass, I had to change my
approach of using `JSON_EXTRACT` and `JSON_UNQUOTE` on MySQL and MariaDB for
the `KeyTransform` implementation. In the end, it's very simple and cleaner
than my previous implementation.

For MariaDB, I always use `JSON_UNQUOTE(JSON_EXTRACT(...))`, while it's just
`JSON_EXTRACT` on MySQL unless it's chained with a lookup expecting a text LHS.
Previously on MariaDB, it's only wrapped with `JSON_UNQUOTE` if the RHS is a
string. However, doing `JSON_UNQUOTE` on MariaDB doesn't do any harm if the RHS
isn't a string because every JSON value is a string anyway (unlike MySQL, it
doesn't have a native JSON type). So, I decided to always use `JSON_UNQUOTE`.
This eliminates the need to check whether the RHS is a string. It also unifies
the code for the `exact` lookup between MySQL and MariaDB.

I think that's the last major revision I did before GSoC finishes today. There
may be some other ones, but that depends on whether other Django folks find
mistakes in my PR or not. For now, my GSoC project is pretty much completed and
I'll probably only have one more post coming to recap the whole journey. The
final announcement is on September 4 (or maybe 3 in where you live), so let's
see if I'm going to pass the final evaluation.

Until then, see you!

[`KeyTransform`]: {{< ref "" >}}
[something off]:

0 comments on commit 8dcbd05

Please sign in to comment.
You can’t perform that action at this time.