Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OEP-0049 Django App Patterns #181

Merged
merged 14 commits into from Apr 12, 2021
Merged

Conversation

tuchfarber
Copy link
Contributor

No description provided.

@tuchfarber tuchfarber force-pushed the tuchfarber/add_first_style_guide branch from 3ae90de to 8a72fa8 Compare February 2, 2021 21:49
oeps/oep-0049-django-app-patterns.rst Outdated Show resolved Hide resolved
oeps/oep-0049-django-app-patterns.rst Outdated Show resolved Hide resolved
oeps/oep-0049-django-app-patterns.rst Show resolved Hide resolved
oeps/oep-0049-django-app-patterns.rst Outdated Show resolved Hide resolved
.. _rest_api:
rest_api/
+++++++++
Each app should self-contain it's related REST API, meaning it should live within the app and not in a separate "api" app where it would mingle with other apps' REST API code. Since the Python API is using ``api.py``, the REST API code (often Django Rest Framework) should live in a ``rest_api`` folder.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if it's worthwhile to deviate from having these in views.py. It's where people will expect to look for views code, including DRF views. And we're not really building any other kind of view.

Also, I suspect if we really go all-in on this pattern, we'll find that REST APIs do not align 1:1 with apps and their internal APIs. For instance, with learning_sequences, I hope that eventually many other REST APIs query it to get course outline and sequence metadata, marked up with whatever additional information makes sense for them. It's kind of like Netflix's backend-for-frontend, where instead of having a bunch of backend services that get aggregated into different REST APIs, we get a bunch of small in-process API calls that get composed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is interesting... I would only look for Django server-render-pages views in views.py. The vast majority of apps I've touched have either an api or rest_api folder with a structure similar to the one below. This is what I've considered "the right way" of added a DRF API to your app since I started. (see course_modes, entitlements, grades, etc)

myapp/
    rest_api/ (or api/)
        /v1
            serializers.py
            urls.py
            views.py
        permissions.py
        urls.py
    urls.py
    views.py
    ...the rest of the app code

I'm not saying this is the right way going forward, but it's definitely a pattern that's very consistent throughout our codebase currently. I also find it's easier to distinguish and separate the existing non-REST APIs / views from the REST ones in this patterns. I've added the above suggested structure to solicit more feedback rather than only suggest a folder name.

I've updated in response to the second paragraph. It was my understanding that self-contained APIs was an architectural decision that was made prior, but I agree that composition can be beneficial.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess I haven't been paying enough attention–I knew the rest_api practice existed, but I didn't realize it was that widespread. FWIW, it's not something I feel strongly about, particularly since there's a views.py in that directory anyhow.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Though I would point out that we're not really making any other kind of view these days.

for program in all_programs
if program.uuid not in UNSUPPORTED_PROGRAM_UUIDS
]
return supported_programs
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What are your thoughts on pagination for this kind of request? I know the paginators in DRF have some kind of ability to accept something that's not a queryset, but I don't remember what that looked like.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Django has a built in Paginator, so I've updated the example to use it. Let me know what you think.

Changes include: addition of attrs example, removal of models_api,
expansion of rest_api, and additon of signals.
@tuchfarber tuchfarber marked this pull request as ready for review February 19, 2021 21:48
tuchfarber and others added 2 commits February 22, 2021 14:30
Co-authored-by: Manjinder Singh <49171515+jinder1s@users.noreply.github.com>
Co-authored-by: Manjinder Singh <49171515+jinder1s@users.noreply.github.com>
Copy link

@doctoryes doctoryes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for taking the time to write out these standards! 🧁

Please ensure that these agreed-upon standards end up being represented in this cookiecutter upon merging this OEP:

https://github.com/edx/edx-cookiecutters/tree/master/cookiecutter-django-app

Because good work is always rewarded with more work. 😉

--------
Proposes a common set of code patterns for Open edX Django apps.

Motiviation

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Motiviation -> Motivation

.. _apps.py:
apps.py
+++++++
The ``apps.py`` file should contain a subclass of a Django ``AppConfig``. The AppConfig should set the app's name to it's full path (e.g. ``name = "service_name.apps.app_name"``) and should (optionally) have an overriding ``ready()`` function which initializes the app. This initialization also often includes setting up Django signals.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Long ago, the standard used to be to just import the signal handlers at the top of the file, which caused many circular import dependencies - hence this new pattern. Not sure if the text should mention that, but history and all...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 I added a warning that all imports should happen in the ready function else ➿

.. _api.py:
api.py
++++++
This should be single point of entry for other Python code to talk to your app. This is *not* a Rest API, this is a Python API (see rest_api_). Some rules for ``api.py`` are as follows:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍🏼 for differentiating between Python API and REST API here. Sometimes "API" is used loosely and interchangeably and it's confusing.

│   ├── v1
│   │ ├── permissions.py
│   │ ├── serializers.py
│   │ ├── urls.py

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there anything useful to state about standards for urls.py files? (This one and the two below.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have thoughts on URLs, namespacing, and such, but that feels like an argument left for a different OEP. So I just added a note that this doesn't attempt to go into it.

.. _signals:
signals/
+++++++++
If an app is consuming Django Signals from other apps in the service, it should include a ``signals.py`` file which includes all of it's signal handlers.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's -> its

status=program.status
)
for program in supported_programs
]
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is interesting. I think I need to think about it some more, but a few reflex reactions:

  1. I do think some sort of Paginator usage makes sense, though I'm not really that familiar with the options and requirements there.
  2. Something like this makes sense for a DRF query or something like Django admin, and making use of Django's builtins to accomplish that would be great.
  3. We'd probably need some different approach (generator?) if this was being run by a management command and wanted to iterate through a whole bunch of records. Otherwise, this would do a lot of unnecessary computation on the database side to recompute a bunch of queries instead of having one big one.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thought about 3 a lot. The best way I can think of to keep performance and app model isolation is to pass the paginator directly and lazily transform the responses into data classes. Leaving this code in a comment for now because it's it involves creating custom classes that don't exist yet. I'm curious 1) What you think of the pattern and 2) If it's worth putting in here since without the shared generic code, this would be cumbersome code for an OEP.

# Things we'd probably throw in edx-django-utils
class TransformingPage(Page):
    def __init__(self, *args, **kwargs):
        self.transformer = kwargs.get("transformer")
        del kwargs["transformer"]
        super().__init__(*args, **kwargs)

    def __getitem__(self, index):
        # Transform the data at the last possible moment so we don't evaluate the queryset
        obj = super().__getitem__(index)
        return self.transformer(obj)

class TransformingPaginator(Paginator):
    def __init__(self, *args, **kwargs):
        self.transformer = kwargs.get("transformer")
        del kwargs["transformer"]
        super().__init__(self, *args, **kwargs)

    # Suggested method to override on Paginator
    def _get_page(self, *args, **kwargs):
        return TransformingPage(*args, **kwargs, transformer=self.transform_function)

###############################
# in api.py
from django.conf.settings import UNSUPPORTED_PROGRAM_UUIDS

from edx_django_utils import TransformingPage, TransformingPaginator

from .data import ProgramData
from .models_api import get_programs as _get_programs

def get_supported_programs(page_size=None):
    """
    Gets all programs that aren't in UNSUPPORTED_PROGRAM_UUIDS settings

    Returns a Paginator of programs
    """
    DEFAULT_PAGE_SIZE = 20

    # _get_programs() returns a queryset
    q_supported_programs = _get_programs().exclude(
        uuid__in=UNSUPPORTED_PROGRAM_UUIDS
    )

    # transformer would probably be a common function elsewhere, but putting it inline for brevity
    return TransformingPaginator(
        q_supported_programs, 
        page_size or DEFAULT_PAGE_SIZE,
        transformer=(lambda program: ProgramData(
            uuid=program.uuid,
            title=program.title,
            status=program.status
        ))
    )

@attr.attrs(frozen=True)
class ProgramData:
uuid: str = attr.attrib(
validator=[attr.validators.instance_of(str)]
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it too much to hope for mypy running, so we don't have to do runtime checking of this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd vastly prefer mypy, but we haven't followed through enabling it and I forget why what it's being up up for. I was mostly using it as a simple example until we live in a static type checking world though. If you think it'd be harmful during the in-between time, let me know though.



.. _signals:
signals/
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Did you mean to give signals its own directory?
  2. It sounds like signal handlers are supposed to go in signals.py. If I have a long running task that is triggered by a signal handler, should it go in in the same module, or a separate tasks.py file?
  3. Where is the right place to define a custom signal that your app will emit? It sounds like we probably don't want it in signals.py, since that has celery registration side-effects. Maybe data.py?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whoops. I had two conflicting thoughts while writing this and got my wires crossed. I've updated it to show that signals should be a directory, with signals.py and handlers.py files. That lines it up with my from .signals import handlers in the apps.py example.

I added a separate entry for tasks.py because I think that's important to call out.

I think the change fixes the registration side effect concerns, but let me know if it doesn't.


1. When importing internal app code to be used in the ``api.py`` file, prefix it with an underscore so it's clear it's for internal use only.

2. Create a ``data.py`` file to house simple data objects that can be passed from your app's function to the calling app. By creating these objects, we can avoid both passing Django model objects or querysets directly and having to serialize data. Other apps may import data classes from ``data.py`` in additional to functionality from ``api.py``. See data.py_ for more details.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm concerned about not exposing QuerySets, due to future performance concerns, but also due to extensibility. For instance, I was just discussing w/ @MichaelRoytman about the course_enrollment api in edx-platform which doesn't currently expose a QuerySet, and as a result he is unable to additionally filter those enrollments inside the database (and instead had to pull the data into memory and then filter in python). That pattern, of pulling into memory and filtering, makes pagination difficult, because to find a page of N users, you may have to filter through the entire data set in python (in the worst case). Even exposing generators just mitigates the problem by only pulling users until you fill up the filtered page, but that doesn't prevent you from needing to iterate the users in python, rather than filtering in the database.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Context: this was mainly pulled from https://github.com/edx/edx-platform/blob/master/docs/decisions/0002-inter-app-apis.rst#decisions.)

I've also been hesitant about performance on this point. There are ways around the performance issue, such as allowing kwargs that get passed to the initial queryset, but that makes the hard line we're trying to draw between apps a lot more hazy. It also defeats the purpose of not exposing the internal data structure, but it'd give us a performant API that also only returns plain data classes.

There was a recent discussion with about models_api.py (which I had pulled from this doc), which set it up as the "when you need to break the API contract for performance reasons" API to use. I'm not sure if having a separate file, or having kwargs (above) would be less hazy.

cc: @nasthagiri since you wrote the referenced ADR and had that models_api.py discussion with me.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even exposing **kwargs doesn't actually help that much. For instance, you couldn't use .annotate or .aggregate if the result isn't returning a QuerySet. I wonder if there's a path forward where we advocate returning values() or values_list() querysets. Even that, though, means that clients can't choose which columns they care about, which isn't great from performance purposes.

Sigh... I wish we had a better way of being able to use the Python standard of "we're all consenting adults" to feel more comfortable exposing honest-to-goodness querysets. Like, can we just say that non-public fields on a model will be prefixed with _ on the object (and then we can name the column whatever we want on the field definition)

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So coincidentally, I had something like this come up in the ticket that I'm working on today, and my mind also wandered to trying to address it with values_list. The problem was:

  • I wanted to return an iterable of course contexts that didn't have any learning_sequence outline data associated with them.
  • Because I want to punt learning_sequences out of edx-platform and into its own repo, I didn't want to make it know about CourseOverview.
  • The actual pushing of data into learning_sequences happens in contentstore (i.e. Studio), since learning_sequences likewise doesn't know about modulestore.

So I settled on a new addition to the learning_sequences API that was:

def get_course_keys_with_outlines() -> QuerySet:
    """Queryset of ContextKeys, iterable as a flat list."""
    return LearningContext.objects.values_list('context_key', flat=True)

So iterating through it simply goes like you'd expect:

for course_key in get_course_keys_with_outlines():
    print(course_key)

Yields:

course-v1:TNL+100+2021_01_21
course-v1:TNL+100+ABC
course-v1:TNL+LibContentTest+2021
course-v1:edX+DemoX+Demo_Course

But you can also use it as a qset:

all_course_keys_qs = CourseOverview.objects.values_list('id', flat=True)
missing_outlines_qs = all_course_keys_qs.exclude(
    id__in=get_course_keys_with_outlines()
)

The query it uses is:

SELECT `course_overviews_courseoverview`.`id` FROM `course_overviews_courseoverview`
WHERE NOT (`course_overviews_courseoverview`.`id` IN (SELECT U0.`context_key` FROM `learning_sequences_learningcontext` U0))

Not that this will solve every problem, but I think there is something we can do here that is to encourage some predictable subset of the Queryset API in some disciplined way.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean, would it work if we subclassed QuerySet to make something that maps into attrs over it when you actually do the iteration? I'll experiment a little...

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So for the last part, something might look like:

tasks = api.get_tasks_for_user(user_id)

urgent_tasks = tasks.overdue().high_priority()

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Sorry, still doing this in bits and pieces where I can fit in the time...)

I think the kind of interface I posted above could work for the "there is a system out there that I want answers from" type of query, but we'd need something more for the "I want to efficiently add my own data to this" type of query.

Say I was attaching course-specific survey data to enrollments (I am picking totally at random). There's an enrollments API built in the fashion of this OEP, and I have survey objects that have a reverse relationship name of "survey". The approach I suggested above would obliterate the idea of a model relation, and I'm super-skittish about reimplementing that type of logic. But a client should be able to do something like:

tasks = task_api.get_tasks_for_user(user_id)
urgent_tasks = tasks.overdue().high_priority()

jira_tickets = jira_api.tickets_for_tasks(urgent_tasks)

Implementing the JIRA API tickets_for_tasks by taking a queryset, doing a select_related, and returning a matching qset with JIRA information corresponding to the qset that was passed in–that general pattern should be a common use case, right? The JIRA app here gets to determine the reverse relationship name in their model...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(FYI I extended the review date by a month so we could hash this out. New date is April 10.)

I really like the "have your cake and eat it too" design. I have a lot of hesitations on adding any of it in here though. Reasons being:

  1. Whatever we decide on will be an idea and not tested in multiple places. I think this should be an ideate -> implement -> ADR -> OEP flow, rather than tagged on here. (Very interested in this topic though, so I'd be happy to be a part of the future work)
  2. Sort of in regard to "reimplementing that type of logic", this feels like it's straddling the line between a customization and a contradiction to how Django does things. I don't think we should bend to Django's will, but as one of the larger Django projects, I think this may be a good space for cross-community collaboration to see if this is something someone's already tried, or if they have any thoughts. Honestly it feels like it'd be a rabbit hole to dive into.

TL;DR: I want this; I want to be in on the discussion; I don't think this is where we decide it.

I'm thinking this section might become more of a "use attrs if possible, querysets are okay for performance reasons". When we have a working use case, we can update this section to use it.

That seem fair to you? I'll update the PR with that language if so.

P.S. I hate stopping a brainstorm, so know that I feel bad about it. Haha.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry I didn't reply to this. I agree that this idea is too rough to be part of this OEP, and it makes sense to experiment with it first.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! I've updated the language to be less strict about not passing querysets. If you have any thoughts on it, let me know. If not, I think this was the last major question, so we'll just wait it out until April 10 :)


Decision
--------
All of our Django apps should have a common structure. This structure consists of a combination of default Django-style code and Open edX-style code. This document will only attempt to detail the common Open edX patterns that we would like to see everywhere, ignoring Django-default items (e.g. ``admin.py``, ``urls.py``, etc) and situation-specific items (e.g. a separate ``constants.py`` file).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not a blocking comment but I wasn't able to find a good example of this but it would be nice to link to some docs that explicitly spell out the django side of our layout. If you're new to django, you won't really know what those are and if you come here for answers, it might be nice to give them a place to go. The closest I found was this: https://docs.djangoproject.com/en/3.1/intro/tutorial01/#creating-the-polls-app

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a table of the things we won't have opinions on with links to their individual docs. Lemme know if this covers your concern.

.. ___init__.py:
__init__.py
+++++++++++
The ``__init__.py`` file should contain a line for the ``default_app_config`` for the app, or o. This ``default_app_config`` should point to the ``AppConfig`` located in ``<app_name>/apps.py``. It may also contain small app details such as a version. However, unlike many packages, ``__init__.py`` should *not* be used to as the way to export the app's public methods. These should be exported using ``api.py`` (and thus imported as ``from path.to.app.api import public_function``). See api.py_ below.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for the app or o. ? Is this a typo?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whoops, typo. Fixed!

oeps/oep-0049-django-app-patterns.rst Show resolved Hide resolved
Copy link
Contributor

@jinder1s jinder1s left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Besides the doc string correction, everything here looks good! Let's get this through to the other side.

"""
Gets all programs that aren't in UNSUPPORTED_PROGRAM_UUIDS settings

Returns a page of results if page_size is specified
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this doc string incorrect? It seems like copy pasta from docstring of get_supported_programs_paged function

@tuchfarber tuchfarber merged commit bfccb0d into master Apr 12, 2021
@tuchfarber tuchfarber deleted the tuchfarber/add_first_style_guide branch April 12, 2021 22:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants