New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
OEP-0049 Django App Patterns #181
Changes from 5 commits
8a72fa8
349c49e
bbcf213
e2336f6
d861220
0e9799a
5ce0f7c
960ff97
00dafb5
8c6f7b2
fb8ea19
e370ca8
45345f9
54f0bf9
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,211 @@ | ||
|
||
============================= | ||
OEP-0049: Django App Patterns | ||
============================= | ||
|
||
.. list-table:: | ||
:widths: 25 75 | ||
|
||
* - OEP | ||
- :doc:`OEP-0045 </oeps/oep-0049>` | ||
* - Title | ||
- Django App Patterns | ||
* - Last Modified | ||
- 2021-01-29 | ||
* - Authors | ||
- Matt Tuchfarber <mtuchfarber@edx.org> | ||
* - Arbiter | ||
- Jinder Singh | ||
* - Status | ||
- Under Review | ||
* - Type | ||
- Architecture | ||
* - Created | ||
- 2021-01-29 | ||
* - Review Period | ||
- 2021-02-22 - 2021-03-10 | ||
|
||
Abstract | ||
-------- | ||
Proposes a common set of code patterns for Open edX Django apps. | ||
|
||
Motiviation | ||
----------- | ||
As our number of Django apps continue to grow in our many services, we want to coalesce around a couple of standard design patterns to both make switching between codebases easier and to help untangle some of the links between codebases we have today. These decisions should be considered "best practices" or the default patterns, and should only be violated if the code base requires it. | ||
|
||
Decision | ||
-------- | ||
All of our Django apps should have a common structure. This structure consists of a combination of default Django-style code and Open edX-style code. This document will only attempt to detail the common Open edX patterns that we would like to see everywhere, ignoring Django-default items (e.g. ``admin.py``, ``urls.py``, etc) and situation-specific items (e.g. a separate ``constants.py`` file). | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Not a blocking comment but I wasn't able to find a good example of this but it would be nice to link to some docs that explicitly spell out the django side of our layout. If you're new to django, you won't really know what those are and if you come here for answers, it might be nice to give them a place to go. The closest I found was this: https://docs.djangoproject.com/en/3.1/intro/tutorial01/#creating-the-polls-app There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I added a table of the things we won't have opinions on with links to their individual docs. Lemme know if this covers your concern. |
||
|
||
Listed below are each of the files or folders your app should contain and what they should consist of. | ||
|
||
|
||
README.rst | ||
++++++++++ | ||
Each app should contain a README.rst to explain it's use. See full details of what should go in the README.rst in OEP-0019_ | ||
|
||
.. _OEP-0019: https://open-edx-proposals.readthedocs.io/en/latest/oep-0019-bp-developer-documentation.html#readmes | ||
|
||
.. ___init__.py: | ||
__init__.py | ||
+++++++++++ | ||
The ``__init__.py`` file should either be blank if the app is a plugin, or contain a single line defining the ``default_app_config`` for the app. This ``default_app_config`` should point to the ``AppConfig`` located in ``<app_name>/apps.py``. | ||
tuchfarber marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
Unlike many packages, ``__init__.py`` should *not* be used to as the way to export the app's public methods. These should be exported using ``api.py`` (and thus imported as ``from path.to.app.api import public_function``). See api.py_ below. | ||
|
||
For example: | ||
|
||
.. code-block:: python | ||
|
||
default_app_config = "service_name.apps.app_name.apps.CustomAppConfig" | ||
|
||
|
||
.. _apps.py: | ||
apps.py | ||
+++++++ | ||
The ``apps.py`` file should contain a subclass of a Django ``AppConfig``. The AppConfig should set the app's name to it's full path (e.g. ``name = "service_name.apps.app_name"``) and should (optionally) have an overriding ``ready()`` function which initializes the app. This initialization also often includes setting up Django signals. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Long ago, the standard used to be to just import the signal handlers at the top of the file, which caused many circular import dependencies - hence this new pattern. Not sure if the text should mention that, but history and all... There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 👍 I added a warning that all imports should happen in the ready function else ➿ |
||
|
||
For example: | ||
|
||
.. code-block:: python | ||
|
||
class MyAppConfig(AppConfig): | ||
""" | ||
Application Configuration for MyApp. | ||
""" | ||
name = "service_name.apps.app_name" | ||
|
||
# (optional) Set up plugin. See https://github.com/edx/edx-django-utils/tree/master/edx_django_utils/plugins | ||
|
||
def ready(self): | ||
""" | ||
Connect handlers to recalculate grades. | ||
""" | ||
from .signals import handlers | ||
|
||
.. _api.py: | ||
api.py | ||
++++++ | ||
This should be single point of entry for other Python code to talk to your app. This is *not* a Rest API, this is a Python API (see rest_api_). Some rules for ``api.py`` are as follows: | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 👍🏼 for differentiating between Python API and REST API here. Sometimes "API" is used loosely and interchangeably and it's confusing. |
||
|
||
1. API methods defined in ``api.py`` should be well-named, self-consistent, and relevant to its own domain (without exposing technical and implementation details) | ||
2. An app's Django models and other internal data structures should not be exposed via its Python APIs. | ||
3. Ideally, tests should use only Python APIs declared in other apps' ``api.py`` files. However, if an app's API is needed *only* for testing, then test-relevant Python APIs should be defined/exported in an intentional Python module called ``api_for_tests.py``. | ||
tuchfarber marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
|
||
Not exposing an app's data structures can be tricky because it's very easy to expose them without meaning to. Therefore there are a couple common strategies we employ. | ||
|
||
1. When importing internal app code to be used in the ``api.py`` file, prefix it with an underscore so it's clear it's for internal use only. | ||
|
||
2. Create a ``data.py`` file to house simple data objects that can be passed from your app's function to the calling app. By creating these objects, we can avoid both passing Django model objects or querysets directly and having to serialize data. Other apps may import data classes from ``data.py`` in additional to functionality from ``api.py``. See data.py_ for more details. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'm concerned about not exposing QuerySets, due to future performance concerns, but also due to extensibility. For instance, I was just discussing w/ @MichaelRoytman about the course_enrollment api in edx-platform which doesn't currently expose a QuerySet, and as a result he is unable to additionally filter those enrollments inside the database (and instead had to pull the data into memory and then filter in python). That pattern, of pulling into memory and filtering, makes pagination difficult, because to find a page of N users, you may have to filter through the entire data set in python (in the worst case). Even exposing generators just mitigates the problem by only pulling users until you fill up the filtered page, but that doesn't prevent you from needing to iterate the users in python, rather than filtering in the database. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. (Context: this was mainly pulled from https://github.com/edx/edx-platform/blob/master/docs/decisions/0002-inter-app-apis.rst#decisions.) I've also been hesitant about performance on this point. There are ways around the performance issue, such as allowing There was a recent discussion with about cc: @nasthagiri since you wrote the referenced ADR and had that There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Even exposing **kwargs doesn't actually help that much. For instance, you couldn't use Sigh... I wish we had a better way of being able to use the Python standard of "we're all consenting adults" to feel more comfortable exposing honest-to-goodness querysets. Like, can we just say that non-public fields on a model will be prefixed with There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. So coincidentally, I had something like this come up in the ticket that I'm working on today, and my mind also wandered to trying to address it with
So I settled on a new addition to the def get_course_keys_with_outlines() -> QuerySet:
"""Queryset of ContextKeys, iterable as a flat list."""
return LearningContext.objects.values_list('context_key', flat=True) So iterating through it simply goes like you'd expect: for course_key in get_course_keys_with_outlines():
print(course_key) Yields:
But you can also use it as a qset: all_course_keys_qs = CourseOverview.objects.values_list('id', flat=True)
missing_outlines_qs = all_course_keys_qs.exclude(
id__in=get_course_keys_with_outlines()
) The query it uses is:
Not that this will solve every problem, but I think there is something we can do here that is to encourage some predictable subset of the Queryset API in some disciplined way. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I mean, would it work if we subclassed QuerySet to make something that maps into attrs over it when you actually do the iteration? I'll experiment a little... There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. So for the last part, something might look like: tasks = api.get_tasks_for_user(user_id)
urgent_tasks = tasks.overdue().high_priority() There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. (Sorry, still doing this in bits and pieces where I can fit in the time...) I think the kind of interface I posted above could work for the "there is a system out there that I want answers from" type of query, but we'd need something more for the "I want to efficiently add my own data to this" type of query. Say I was attaching course-specific survey data to enrollments (I am picking totally at random). There's an enrollments API built in the fashion of this OEP, and I have survey objects that have a reverse relationship name of "survey". The approach I suggested above would obliterate the idea of a model relation, and I'm super-skittish about reimplementing that type of logic. But a client should be able to do something like: tasks = task_api.get_tasks_for_user(user_id)
urgent_tasks = tasks.overdue().high_priority()
jira_tickets = jira_api.tickets_for_tasks(urgent_tasks) Implementing the JIRA API There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. (FYI I extended the review date by a month so we could hash this out. New date is April 10.) I really like the "have your cake and eat it too" design. I have a lot of hesitations on adding any of it in here though. Reasons being:
TL;DR: I want this; I want to be in on the discussion; I don't think this is where we decide it. I'm thinking this section might become more of a "use attrs if possible, querysets are okay for performance reasons". When we have a working use case, we can update this section to use it. That seem fair to you? I'll update the PR with that language if so. P.S. I hate stopping a brainstorm, so know that I feel bad about it. Haha. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Sorry I didn't reply to this. I agree that this idea is too rough to be part of this OEP, and it makes sense to experiment with it first. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Thanks! I've updated the language to be less strict about not passing querysets. If you have any thoughts on it, let me know. If not, I think this was the last major question, so we'll just wait it out until April 10 :) |
||
|
||
Using Django's Paginator class can help keep cross-app retrievals performant without passing Querysets around. | ||
|
||
For example: | ||
|
||
.. code-block:: python | ||
|
||
from django.conf.settings import UNSUPPORTED_PROGRAM_UUIDS | ||
|
||
from .data import ProgramData | ||
from .models_api import get_programs as _get_programs | ||
|
||
def get_supported_programs(page_size=None, page=None): | ||
""" | ||
Gets all programs that aren't in UNSUPPORTED_PROGRAM_UUIDS settings | ||
|
||
Returns a page of results if page_size is specified | ||
""" | ||
# _get_programs() returns a queryset | ||
q_supported_programs = _get_programs().exclude( | ||
uuid__in=UNSUPPORTED_PROGRAM_UUIDS | ||
) | ||
|
||
if page_size: | ||
# passing a queryset to a paginator allows it to stay performant on large tables | ||
program_paginator = Paginator(q_supported_programs, page_size) | ||
# get_page returns the first page if page is None | ||
supported_programs = program_paginator.get_page(page) | ||
else: | ||
supported_programs = q_supported_programs | ||
|
||
return [ | ||
ProgramData( | ||
uuid=program.uuid, | ||
title=program.title, | ||
status=program.status | ||
) | ||
for program in supported_programs | ||
] | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is interesting. I think I need to think about it some more, but a few reflex reactions:
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Thought about 3 a lot. The best way I can think of to keep performance and app model isolation is to pass the paginator directly and lazily transform the responses into data classes. Leaving this code in a comment for now because it's it involves creating custom classes that don't exist yet. I'm curious 1) What you think of the pattern and 2) If it's worth putting in here since without the shared generic code, this would be cumbersome code for an OEP. # Things we'd probably throw in edx-django-utils
class TransformingPage(Page):
def __init__(self, *args, **kwargs):
self.transformer = kwargs.get("transformer")
del kwargs["transformer"]
super().__init__(*args, **kwargs)
def __getitem__(self, index):
# Transform the data at the last possible moment so we don't evaluate the queryset
obj = super().__getitem__(index)
return self.transformer(obj)
class TransformingPaginator(Paginator):
def __init__(self, *args, **kwargs):
self.transformer = kwargs.get("transformer")
del kwargs["transformer"]
super().__init__(self, *args, **kwargs)
# Suggested method to override on Paginator
def _get_page(self, *args, **kwargs):
return TransformingPage(*args, **kwargs, transformer=self.transform_function)
###############################
# in api.py
from django.conf.settings import UNSUPPORTED_PROGRAM_UUIDS
from edx_django_utils import TransformingPage, TransformingPaginator
from .data import ProgramData
from .models_api import get_programs as _get_programs
def get_supported_programs(page_size=None):
"""
Gets all programs that aren't in UNSUPPORTED_PROGRAM_UUIDS settings
Returns a Paginator of programs
"""
DEFAULT_PAGE_SIZE = 20
# _get_programs() returns a queryset
q_supported_programs = _get_programs().exclude(
uuid__in=UNSUPPORTED_PROGRAM_UUIDS
)
# transformer would probably be a common function elsewhere, but putting it inline for brevity
return TransformingPaginator(
q_supported_programs,
page_size or DEFAULT_PAGE_SIZE,
transformer=(lambda program: ProgramData(
uuid=program.uuid,
title=program.title,
status=program.status
))
) |
||
|
||
.. _data.py: | ||
data.py | ||
+++++++ | ||
This file should include the public data structures for the app that can be passed between apps without exposing internal features. These should be used instead of sending Django model objects or querysets to apps that call the functions in ``api.py``. This file should not import anything other than stdlib modules, so that it may be imported by any other app without issue. These data objects should be simple objects with all business logic handled by ``api.py``. They may however perform simple validation, as long as it is self-contained (doesn't reach out to database, network, or any code outside of the class) | ||
tuchfarber marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
For example: | ||
|
||
.. code-block:: python | ||
|
||
|
||
from dataclasses import dataclass | ||
from enum import Enum | ||
|
||
import attr | ||
|
||
def ProgramStatus(Enum): | ||
ACTIVE = "active" | ||
RETIRED = "retired" | ||
|
||
@attr.attrs(frozen=True) | ||
class ProgramData: | ||
uuid: str = attr.attrib( | ||
validator=[attr.validators.instance_of(str)] | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Is it too much to hope for mypy running, so we don't have to do runtime checking of this? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'd vastly prefer mypy, but we haven't followed through enabling it and I forget why what it's being up up for. I was mostly using it as a simple example until we live in a static type checking world though. If you think it'd be harmful during the in-between time, let me know though. |
||
) | ||
title: str = attr.attrib( | ||
validator=[attr.validators.instance_of(str)] | ||
) | ||
status: str = attr.attrib( | ||
validator=[ | ||
attr.validators.instance_of(str), | ||
attr.validators.in_(ProgramStatus) | ||
] | ||
) | ||
|
||
# Alternatively if you don't need field validation, conversion, or other attrs benefits: | ||
|
||
@dataclass(frozen=True) | ||
class ProgramData: | ||
uuid: str | ||
title: str | ||
status: str | ||
|
||
.. _rest_api: | ||
rest_api/ | ||
+++++++++ | ||
If an app will have it's own REST API, it should live in a folder called ``rest_api`` to distinguish it from the ``api.py`` file used for intra-app communication. | ||
|
||
APIs should be versioned and the serializers and permissions associated with that version should be kept inside that version's folder. This prevents breakages when an API needs to be updated. | ||
|
||
An example of a common folder structure for a versioned REST API:: | ||
|
||
app_name | ||
├── rest_api | ||
│ ├── v1 | ||
│ │ ├── permissions.py | ||
│ │ ├── serializers.py | ||
│ │ ├── urls.py | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Is there anything useful to state about standards for There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I have thoughts on URLs, namespacing, and such, but that feels like an argument left for a different OEP. So I just added a note that this doesn't attempt to go into it. |
||
│ │ └── views.py | ||
│ └── urls.py | ||
├── urls.py | ||
└── views.py # existing legacy non-REST APIs | ||
|
||
|
||
.. _signals: | ||
signals/ | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Whoops. I had two conflicting thoughts while writing this and got my wires crossed. I've updated it to show that I added a separate entry for I think the change fixes the registration side effect concerns, but let me know if it doesn't. |
||
+++++++++ | ||
If an app is consuming Django Signals from other apps in the service, it should include a ``signals.py`` file which includes all of it's signal handlers. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
|
||
|
||
Consequences | ||
------------ | ||
At this time, there is no plan to enforce any of these guidelines. The vast majority of current Open edX code doesn't yet meet these guidelines, and there will always be exceptions to the rule. The hope is that as developers write new code or refactor existing code, they follow these patterns as best they can. We also hope that code reviewers will ensure these guidelines are followed in the code they approve. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Motiviation
->Motivation