Skip to content

Commit

Permalink
Security fix: Anonymous users must not leak submitter names
Browse files Browse the repository at this point in the history
The arbitrary jsonquery feature now needs to be explicitly enabled for
routes and is only ever allowed for authenticated users (because they
currently have access to all data anyway).

For the /api/documents route, we manually parse and validate the query
and build it from scratch for unauthenticated users, *without*
reverting to jsonquery.

Before, it was possible to efficiently extract submitted_by names by
applying restrictions in the form of `submitted_by LIKE '<prefix>%' to
the /api/documents route. A proof of concept exploit has been developed
using a bisection algorithm (interval halfing) that extracts submitter
names of arbitrary protocols (even those that have not been validated
yet) in a matter of seconds.
  • Loading branch information
pcworld committed Aug 18, 2020
1 parent 748e064 commit e3e7323
Show file tree
Hide file tree
Showing 5 changed files with 70 additions and 17 deletions.
4 changes: 3 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,9 @@ Here's a small map of the api side of the project:
* `db/*.py`
Declarative definition of the database tables we use. These include parts of the `garfield` database (into which we save all documents and their associated data) and parts of the `fsmi` database (which we query for user auth)
* `api_utils.py`
We're using Flask, Flask-login, SQLAlchemy, jsonquery and marshmallow. That means a lot of work is done for us and what's left of most API endpoints is mostly boilerplate for stringing all of this together. `api_utils.py` is where we abstract all of that away, too, so most API endpoints can be generated with a simple call to `api_utils.endpoint`. Also found there: grab bag of small utility functions that would only clutter routes.py.
We're using Flask, Flask-login, SQLAlchemy, jsonquery and marshmallow. That means a lot of work is done for us and what's left of most API endpoints is mostly boilerplate for stringing all of this together. `api_utils.py` is where we abstract all of that away, too, <s>so most API endpoints can be generated with a simple call to `api_utils.endpoint`</s>.
_As it turned out years later, this idea was rather flawed: Allowing arbitrary queries by anonymous users (in particular it allows the `WHERE` clauses of `SELECT` queries to be arbitrarily set) can have unintended security consequences, because even columns that are not directly output [can be efficiently leaked](https://github.com/fsmi/odie-server/pull/168) by bruteforcing valid restrictions to these columns._
Also found in `api_utils.py`: grab bag of small utility functions that would only clutter routes.py.
* `barcode/barcode.py`
This is where barcode scanner support lives. It also handles putting barcodes on transcripts.
* `admin/`
Expand Down
20 changes: 16 additions & 4 deletions api_utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -83,9 +83,12 @@ class PaginatedResultSchema(Schema):
total = fields.Int(attribute='pagination.total')


def filtered_results(query, schema, paginate=True):
# `allow_insecure` is *insecure* -- see comment in endpoint() regarding the `allow_insecure_authenticated`
# parameter. Note that in this function, `allow_insecure` does also apply to anonymous users.
def filtered_results(query, schema, paginate=True, allow_insecure=False):
q = json.loads(request.args.get('q', '{}'))
query = jsonquery(query, q) if q else query
if allow_insecure:
query = jsonquery(query, q) if q else query
# ensure deterministic ordering
# (we need this for paginated results with queries involving subqueries)
# We assume that all queriable tables have an 'id' column
Expand Down Expand Up @@ -142,7 +145,7 @@ def wrapped_f(*f_args, **f_kwargs):
ROUTE_ID = 0


def endpoint(query_fn, schemas=None, allow_delete=False, paginate_many=True):
def endpoint(query_fn, schemas=None, allow_delete=False, paginate_many=True, allow_insecure_authenticated=False):
"""Creates and returns an API endpoint handler
Can create both SINGLE-style and MANY-style endpoints. The generated route simply
Expand All @@ -156,6 +159,14 @@ def endpoint(query_fn, schemas=None, allow_delete=False, paginate_many=True):
query_fn: A callable returning a Query object. Mustn't be None for GET-enabled endpoints.
paginate_many: whether to return paginated results (default:True)
The 'page' GET-parameter selects the page
allow_insecure_authenticated: If true, the 'q' parameter (JSON) will be parsed into an SQLAlchemy Query using the
jsonquery library if the user is logged in. If the user is not logged in, this feature
will stay disabled.
As the name implies, this is insecure as it allows near-arbitrary user-controlled
WHERE clauses in SELECT queries. This feature should be avoided as much as possible
and only still exists for legacy compatibility reasons.
It allows for non-obvious data leaks even when restricting the selected columns;
see https://github.com/fsmi/odie-server/pull/168 for details.
"""
if schemas is None:
schemas = {}
Expand All @@ -167,7 +178,8 @@ def handle_get(instance_id=None):
if instance_id is None: # GET MANY
assert 'GET' in schemas, "GET schema missing"
schema = schemas['GET']
return filtered_results(query_fn(), schema, paginate_many)
allow_insecure = allow_insecure_authenticated and get_user()
return filtered_results(query_fn(), schema, paginate_many, allow_insecure=allow_insecure)
else: # GET SINGLE
result = query_fn().get(instance_id)
obj = serialize(result, schema)
Expand Down
2 changes: 2 additions & 0 deletions db/documents.py
Original file line number Diff line number Diff line change
Expand Up @@ -104,6 +104,8 @@ def __str__(self):
**config.documents_table_args)


# Note that if you extend this enum, make sure to adjust the validation in routes/documents.py's documents_query()
# accordingly.
document_type = sqla.Enum('oral', 'written', 'oral reexam', 'mock exam', name='document_type', inherit_schema=True)


Expand Down
49 changes: 41 additions & 8 deletions routes/documents.py
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,7 @@
from flask import request, send_file
from marshmallow import Schema, fields
from marshmallow.validate import OneOf
from sqlalchemy import asc, desc
from sqlalchemy.orm import subqueryload, undefer
from sqlalchemy.orm.exc import NoResultFound
from pytz import reference
Expand Down Expand Up @@ -70,18 +71,50 @@ class ExaminantSchema(IdSchema):


def documents_query():
q = Document.query.options(subqueryload('lectures'), subqueryload('examinants'))
filters = json.loads(request.args.get('filters', '{}'))
for id in filters.get('includes_lectures', []):
q = q.filter(Document.lectures.any(Lecture.id == id))
for id in filters.get('includes_examinants', []):
q = q.filter(Document.examinants.any(Examinant.id == id))
return q
query = Document.query.options(subqueryload('lectures'), subqueryload('examinants'))
param_filters = json.loads(request.args.get('filters', '{}'))
for id_ in param_filters.get('includes_lectures', []):
query = query.filter(Document.lectures.any(Lecture.id == id_))
for id_ in param_filters.get('includes_examinants', []):
query = query.filter(Document.examinants.any(Examinant.id == id_))

# From the documentselection view (which can be accessed anonymously) we usually get a GET parameter 'q' like this:
# {"operator":"order_by_desc","column":"date","value":{"operator":"and","value":[{"column":"document_type","operator":"in_","value":["written","oral","oral reexam","mock exam"]}]},"page":1}
# We need to parse and create the SQLAlchemy Query manually to avoid data leaks.
# (If we were instead using jsonquery, anonymous users were able to efficiently extract submitted_by names by
# performing bisection with the submitted_by parameter and `like` operator.
# See https://github.com/fsmi/odie-server/pull/168 for details.)
# For logged in users, we have set allow_insecure_authenticated=True on the /api/documents view to allow
# the `submitted_by` filter in the depositreturn view; logged in users have access to all the data anyway.
# Note that the `page` parameter is taken into account by endpoint()/filtered_results() in either case.
if not get_user():
param_q = json.loads(request.args.get('q', '{}'))
# While it looks ugly, we validate all parameters to ensure that the JSON structure is as expected and that only
# whitelisted values work.
if param_q.get('operator') in ('order_by_desc', 'order_by_asc') and \
param_q.get('column') in ('date', 'number_of_pages', 'validation_time'):
sort_fn = asc if param_q.get('operator') == 'order_by_asc' else desc
query = query.order_by(sort_fn(param_q['column']))

if isinstance(param_q.get('value'), dict) and \
param_q['value'].get('operator') == 'and' and \
isinstance(param_q['value'].get('value'), list) and \
len(param_q['value']['value']) == 1 and \
param_q['value']['value'][0].get('column') == 'document_type' and \
param_q['value']['value'][0].get('operator') == 'in_' and \
isinstance(param_q['value']['value'][0].get('value'), list) and \
all(value in ('written', 'oral', 'oral reexam', 'mock exam') for value in param_q['value']['value'][0]['value']):
query = query.filter(Document.document_type.in_(param_q['value']['value'][0]['value']))

return query

api_route('/api/documents')(
endpoint(
schemas={'GET': DocumentDumpSchema},
query_fn=documents_query)
query_fn=documents_query,
# Allow insecure JSON queries for logged in users so that the `submitted_by` filter in the depositreturn.html
# view works.
allow_insecure_authenticated=True)
)

# aggregate values of unpaginated source data
Expand Down
12 changes: 8 additions & 4 deletions routes/misc.py
Original file line number Diff line number Diff line change
Expand Up @@ -63,11 +63,13 @@ class OrderDumpSchema(IdSchema):

api_route('/api/orders', methods=['GET'])(
login_required(
endpoint(
endpoint(
schemas={
'GET': OrderDumpSchema,
},
query_fn=lambda: Order.query.options(subqueryload('items.document.lectures'), subqueryload('items.document.examinants')))
query_fn=lambda: Order.query.options(subqueryload('items.document.lectures'), subqueryload('items.document.examinants')),
allow_insecure_authenticated=True
)
))


Expand Down Expand Up @@ -111,7 +113,9 @@ class DepositDumpSchema(IdSchema):

api_route('/api/deposits')(
login_required(
endpoint(
endpoint(
schemas={'GET': DepositDumpSchema},
query_fn=lambda: Deposit.query.options(subqueryload('lectures')))
query_fn=lambda: Deposit.query.options(subqueryload('lectures')),
allow_insecure_authenticated=True
),
))

0 comments on commit e3e7323

Please sign in to comment.