# Tutorial on serialization in Django Rest Framework

In [his blog post](https://www.caktusgroup.com/blog/2019/02/01/creating-api-endpoint-django-rest-framework/),
my colleague Dmitriy gave a good example of starting to use 
[Django Rest Framework](https://www.django-rest-framework.org/) (DRF) with Django models.

Now you might want to start learning more about DRF from its documentation. I
found that just starting with the documentation was rather challenging for me,
as there were some basic concepts assumed that I wasn't aware of. I thought in
this post, I'd go over some of them. Then I found I was getting deeper and
deeper into serialization, so this has ended up being all about serialization.

And to state some of my own assumptions up front: for this post, I'll be
assuming an API that uses JSON to transport data, and Python 3.

(If you've seen the
[first Tutorial on the DRF site](https://www.django-rest-framework.org/tutorial/1-serialization/),
there is some overlap in what this post covers, but then this post digs deeper
into serialization while the DRF tutorial moves on to other aspects of building
an API.)

## Serialization

In our Django application, we're working with Python objects, but for our
API we want to use a format called JSON to transport information over the
network. Serializing is the process of converting Python objects
to JSON, and deserializing is the process of converting JSON to Python objects
again.

(It helps me to think of "serializing" as creating a "serial" stream of bytes
that can flow over a network connection, and "deserializing" as consuming a
serial stream of bytes and turning it into something more useful again.)

In DRF, serialization and deserialization are handled by the same class.
This kind of makes sense, give that you want deserialization to be the
reverse of serialization, but it can also be confusing.

It also turns out that writing a serializer class that can just serialize
is pretty trivial, but when you start wanting it to deserialize, all sorts
of complications appear.

## Minimal Django environment

While I won't be doing much with Django directly in this post, Django Rest Framework does assume that it's running in a configured Django environment and without it, useful things like error messages don't work. 

The following snippet at the top of a file will set up a very minimal Django environment, enough for us to experiment with DRF without having to set up a complete Django project. This is adapted from [_Lightweight Django_](http://shop.oreilly.com/product/0636920032502.do), by Julia Elman and Mark Lavin
(both past co-workers of mine), O'Reilly, 2014.

In [25]:
from django.apps import apps
from django.conf import settings

if not settings.configured:
    settings.configure(
     DEBUG=True,
     SECRET_KEY='thisisthesecretkey',
     #ROOT_URLCONF=__name__,
     MIDDLEWARE_CLASSES=(
     'django.middleware.common.CommonMiddleware',
     'django.middleware.csrf.CsrfViewMiddleware',
     'django.middleware.clickjacking.XFrameOptionsMiddleware',
     ),
    )
apps.populate([])

## Python class for examples

I'll use this tiny Python class for my first examples:

In [2]:
from dataclasses import dataclass

@dataclass
class Thing:
    id: int
    b: str

    def __str__(self):
        return '<Thing(%d, "%s")>' % (self.id, self.b)

## How to serialize

The serializer class for Thing can be pretty simple.

In [3]:
from rest_framework import serializers

class ThingSerializer(serializers.Serializer):
    id = serializers.IntegerField()
    b = serializers.CharField()

Now suppose we want to use it to serialize something. In this case, we want to start with a Thing
object, and end up with JSON.

To do this, we construct an instance of ThingSerializer, passing in our Thing
as the `instance` argument.
Then we can get the serialized data from the `.data` property of the serializer.

Example:

In [4]:
a_thing = Thing(1, 'foo')
serializer = ThingSerializer(instance=a_thing)
data = serializer.data
print(data)

{'id': 1, 'b': 'foo'}


The output of a DRF serializer is actually not quite serialized bytes yet, but a Python object ready to be rendered
as JSON, or if you like, YAML, XML, and many other options. Let's see that last step, though we'll omit it in most of our examples.  We do this by using a [DRF Renderer](https://www.django-rest-framework.org/api-guide/renderers/).

In [5]:
from rest_framework import renderers

renderer = renderers.JSONRenderer()

print(renderer.render(data))

b'{"id":1,"b":"foo"}'


Now we have raw bytes, ready to be sent over a TCP connection, saved to a file, or whater
you need. If we had non-ASCII text in our data, it would now be encoded, using UTF-8 by default.

Note that there is no validation involved when serializing. It is assumed that the object
you are going to serialize is valid, and up to you to ensure that before you try
to serialize it.

## How to deserialize

If we have a serialized form of a Thing and we want to get a Thing object from
it, we again use our ThingSerializer class, but in a different way.

To do this, we construct an instance of ThingSerializer, passing in the serialized
data as the `data` argument. Then we check the validity of the serialized data using `.is_valid()`.
If it's valid, then we can get the deserialized data from the `.validated_data`
attribute.

Here's an example, in which we start from raw bytes, parse them from JSON
using a [DRF Parser](https://www.django-rest-framework.org/api-guide/parsers/), 
and deserialize them.

In [6]:
from io import BytesIO
from rest_framework import parsers
parser = parsers.JSONParser()

bits = b'{"id": 1, "b": "foo"}'
data = parser.parse(BytesIO(bits))
print(data)

serializer = ThingSerializer(data=data)
serializer.is_valid(raise_exception=True)
print(serializer.validated_data)

{'id': 1, 'b': 'foo'}
OrderedDict([('id', 1), ('b', 'foo')])


It is important to notice that in this case, validation is mandatory. DRF
won't let us do much until after we've called `.is_valid()`.

Also note that in this case, the serializer returned an `OrderedDict` instead
of a plain dictionary. This lets us know the order of the fields in the
original serialized data, in case that's important to us.  Although it's moot
in this case, since ordering is not (supposed to be) significant in JSON and
the JSON parser does *not* preserve ordering, so we end up just passing an ordinary
dict to the serializer. Still, the serializer preserves
ordering itself, in case it was significant. From here on, we'll just consider
an OrderedDict the equivalent of a dict in our examples.

But wait a minute. We ended up here with basically the same dictionary that we
started with. We were expecting a Thing object, but
it'll take a little more work on our part to get there.

For now, notice that what we got corresponds to how we defined the fields in
our serializer. `id` was an IntegerSerializer field and we got an integer, while
`b` was a CharSerializer field and we got a string. DRF has deserialized the
fields for us individually. What's missing is
putting all of them together into a Thing object, and DRF doesn't know how to
do that yet. We'll have to add some code for it.

## How does it know

How does the serializer instance know whether it's supposed to serialize or
deserialize? It's entirely based on what was passed in when it was constructed -
if data was passed in, it will deserialize; otherwise, it will serialize.
(It's also possible to pass in both data and an instance of an object if we
want to update an existing object. We'll get to that.)

## How this is used in an API

At a very high level, if an API client submits a GET request to our application,
we'll end up finding the object they want, serializing it, and sending a response
with the serialized data as its body.
The URI path of the GET request tells us what kind of thing we want,
and where to find it.

Similarly, if an API client wants to create an object, it'll submit a POST request
whose body contains the JSON data representing the object it wants to create.
Our app will validate the data, deserialize it, and store the object.
The URI path of the POST request tells us what kind of thing it is.

And if an API client wants to change an existing object, it'll submit a PUT request,
using the same URL it would use to GET the existing object, but the PUT will
contain in its request body the serialized data for the updated object.

An API client can even submit a PATCH request the same way, and only provide
in the request body the data for the fields it wants to change. Other fields will
be left unchanged.

## Creating a new object

Let's go into a little more detail about how serializers are used when creating
an object. DRF will handle a lot of this for us if we use its ModelSerializer and
ViewSet classes, but it's good to understand this for writing serializer tests and
to better understand what's happening when you start customizing serializers more.

We'll need to expand our serializer class a bit, and when we're done, we will be
able to get a Thing object from our serialized data. 

The updated class:

In [7]:
from rest_framework import serializers

class ThingSerializer(serializers.Serializer):
    id = serializers.IntegerField()
    b = serializers.CharField()

    def create(self, validated_data):
        return Thing(**validated_data)

We added a `create` method, which is given the validated data,
and must return the final Python object that corresponds to
that data.

If this was a Django application and Thing was a model, then `create`
would also be expected to save the new Thing before returning.
We might change `return Thing(**validated_data)` to
`return Thing.objects.create(**validated_data)`

And here's how we use it to create a Thing:

In [8]:
data = {'id': 1, 'b': 'foo'}
serializer = ThingSerializer(data=data)
serializer.is_valid(raise_exception=True)
a_thing = serializer.save()
print(str(a_thing))

<Thing(1, "foo")>


So the full process is to construct a serializer passing the data as the
`data` argument, validate it, and call `save` to create and return the
final, deserialized Python object.

But this is a bit too simple. Thinking about Django for a minute, when we create
a new record, we don't provide the value for `id`; we expect the database to
do that for us. But as we've written this, the API client must provide an `id`,
and if we were storing these in a database, it'd probably be forced to figure out
an `id` that's not already in use. We want this to work more like Django, so
let's make a few more changes.

First, we'll modify our example Thing class to behave a bit more as if it were
a Django model, generating its own `id` value if one is not provided:

In [9]:
from dataclasses import field
from random import randint

def random_id():
    return randint(1, 99999)

@dataclass
class Thing:
    b: str
    id: int = field(default_factory=random_id)

    def __str__(self):
        return '<Thing(%d, "%s")>' % (self.id, self.b)

In [10]:
print(Thing(b='test creating an id'))

<Thing(13240, "test creating an id")>


Now, we'll modify our serializer.

In [11]:
class ThingSerializer(serializers.Serializer):
    id = serializers.IntegerField(required=False)
    b = serializers.CharField()

    def validate_id(self, value):
        # Are we trying to create a new thing?
        if not self.instance and value:
            raise serializers.ValidationError('Cannot specify id when creating new thing')
        return value
    
    def create(self, validated_data):
        return Thing(**validated_data)

Let's try providing an id when we create a thing and see what happens:

In [12]:
data = {'id': 1, 'b': 'bad data'}
serializer = ThingSerializer(data=data)
print("Valid: %s" % serializer.is_valid())
print(serializer.errors)

Valid: False
{'id': [ErrorDetail(string='Cannot specify id when creating new thing', code='invalid')]}


Good, we get a ValidationError. Now, let's try it the right way.

In [13]:
data = {'b': 'create a new thing with its own id'}
serializer = ThingSerializer(data=data)
serializer.is_valid(raise_exception=True)
thing = serializer.save()
print(thing)

<Thing(55144, "create a new thing with its own id")>


If that seems like more work that we ought to have to do, it is.
DRF provides a `ModelSerializer` variant of a serializer class that
has most of this behavior built-in. But I think it's important to
understand that this is going on "behind the scenes".