Skip to content

Commit

Permalink
Merge pull request #30 from swistakm/feature/bulk-resource-creation
Browse files Browse the repository at this point in the history
Feature: bulk resource creation
  • Loading branch information
swistakm committed Nov 9, 2016
2 parents dd74e9a + fb67ae9 commit 4d8469c
Show file tree
Hide file tree
Showing 6 changed files with 403 additions and 20 deletions.
278 changes: 265 additions & 13 deletions docs/guide/generic-resources.rst
Original file line number Diff line number Diff line change
Expand Up @@ -199,12 +199,14 @@ ListCreateAPI

:class:`ListCreateAPI` extends :class:`ListAPI` with capability to
create new objects with data from resource representation provided in
POST request body.
POST or PATCH request body.

It expects from you to implement same handlers as for :class:`ListAPI`
and also new ``.create(self, params, meta, validated, **kwargs)`` method handler
that creates single object (e.g. in some storage). Created object may or may
not be returned in response 'content' section (this is optional)
and also new ``.create(self, params, meta, validated, **kwargs)``
and (optionally) ``.create_bulk(self, params, meta, validated, **kwargs)``
method handlers that are able to create single single and multiple objects
(e.g. in some storage). Created object may or may not be returned in response
'content' section (this is optional)

``create()`` accepts following arguments:

Expand All @@ -214,16 +216,39 @@ not be returned in response 'content' section (this is optional)
to this dict will will be later included in response
'meta' section. This can already prepopulated by method
that calls this handler.
* **validated** *(dict):* dictionary of internal object fields values
after converting from representation with full validation performed
* **validated** *(dict):* a **single dictionary** of internal object fields
values after converting from representation with full validation performed
accordingly to definition contained within serializer instance.
* **kwargs** *(dict):* dictionary of values retrieved from route url
template by falcon. This is suggested way for providing
resource identifiers.

If ``create()`` will return any value it should have same form as return value
of ``retrieve()`` because it will be again translated into representation
with serializer.
``create_bulk()`` accepts following arguments:

* **params** *(dict):* dictionary of parsed parameters accordingly
to definitions provided as resource class atributes.
* **meta** *(dict):* dictionary of meta parameters anything added
to this dict will will be later included in response
'meta' section. This can already prepopulated by method
that calls this handler.
* **validated** *(dict):* a **list of multiple dictionaries** of internal
objects' field values after converting from representation with
full validation performed accordingly to definition contained within
serializer instance.
* **kwargs** *(dict):* dictionary of values retrieved from route url
template by falcon. This is suggested way for providing
resource identifiers.


If ``create()`` and ``create_bulk()`` return any value then it should have
same form compatible with the return value of ``retrieve()`` because it will
be again translated into representation with serializer. Of course ``create()``
should return single instance of resource but ``create_bulk()`` should return
collection of resources.

Note that default implementation of :any:`ListCreateAPI.create_bulk()` is very
simple and may not be suited for every use case. If you want to use it please
refer to :ref:`bulk-creation-guide`.

Example usage:

Expand Down Expand Up @@ -309,7 +334,234 @@ Generic resources without serialization
If you don't like how serializers work there are also two very basic generic
resources that does not rely on serializers: :class:`Resource` and
:class:`ListResource`. They can be extended with mixins found in
:any:`graceful.resources.mixins` module and provide same method handlers like
generic resources that utilize serializers (``list()``, ``retrieve()``,
``update()`` etc.) but do not perform anything more beyond content-type level
serialization.
:any:`graceful.resources.mixins` module and provide the same method handlers
like the generic resources that utilize serializers (i.e. ``list()``,
``retrieve()``, ``update()`` and so on). Note that they do not perform anything
beyond content-type level serialization.


.. _bulk-creation-guide:

Guide for creating resources in bulk
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~


:class:`ListCreateAPI` ships with default implementation of ``create_bulk()``
method that will call the ``create()`` method separately for every resource
instance retrieved from request payload. The actual code is following:

.. code-block:: python
def create_bulk(self, params, meta, **kwargs):
validated = kwargs.pop('validated')
return [self.create(params, meta, validated=item) for item in validated]
This approach to bulk resource creation may not be the most performant one if
you save resource instance to your storage on every ``create()`` call.
The other concern is whether you care about data consistency in your storage
and want to ensure the "all or nothing" semantics. With default bulk creation
handler it may be hard to enforce such contraints. Anyway, you can easily
override this method to suit your own needs.

There are at least three ways you can handle bulk resource creation in graceful:

* *Completely separate bulk and single resource creation*: allow ``create()``
and ``create_bulk()`` handlers to have their own separate code responsible
for saving data in the storage.
* *Deffered saves*: Allow your ``create()`` handler to skip saves if specific
keyword parameter is set and then do your saves in th ``create_bulk()``
handler.
* *Utilize your storage transactions*: Wrap your data processing with
per-request transaction to ensure "all or nothing" semantics on database
level.


Completely separate bulk and single resource creation
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

This approach is simplest to implement but makes only sense if the process
of your resource creation is very simple and heavily relies on serializers
to validate and prepare your data before save.

Assume your API allows to create and retrieve simple documents in some simple
storage that may even not be a real database. Good example would be an API
dealing with Solr search engine:

.. code-block:: python
from pysolr import Solr
from graceful.serializers import BaseSerializer
from graceful.fields import StringField
from graceful.resources.generic import ListCreateAPI
solr = Solr("<solr url>", "<solr port>")
class DocumentSerializer(BaseSerializer):
text = StringField("Document content")
author = StringField(
"Document author",
# note: Assume that due to legacy reasons this field
# is stored under different name in Solr.
# graceful is great in dealing with such problems!
source="autor_name_t"
)
class DocumentsAPI(ListCreateAPI):
def list(self, params, meta, **kwargs):
return solr.search("*:*")
def create(self, params, meta, validated, **kwargs):
solr.add([validated])
# note: return document back so its representation
# can be included in response body
return validated
Solr search engine is especially good example here because it will not handle
well multiple single-ducument save requests and the best approach is to
batch them. The ``pysolr`` module (popular library for integration with solr)
allows you to save multiple documents with single ``Solr.add()`` call.
Actually, it even encourages you to batch documents using single call because
it accepts only list as input argument.

Let's override the default ``create_bulk()`` so it will save all the documents
it receives as the ``validated`` argument without calling ``create()`` handler:

.. code-block:: python
class DocumentsAPI(ListCreateAPI):
def list(self, params, meta, **kwargs):
return solr.search("*:*")
def create(self, params, meta, validated, **kwargs):
solr.add([validated])
# note: return document back so its representation
# can be included in the response body
return validated
def create_bulk(self, params, meta, validated, **kwargs):
solr.add(validated)
# note: return documents back so their representation
# can be included in the response body
return validated
Note that above technique works best for simple use cases where the
``validated`` argument represents complete data that can be easily saved
directly to your storage without any further modification.

If you need any additional processing of resources in your custom ``create()``
and ``create_bulk()`` methods before saving them to your storage,
the code can quickly become hard to mantain. Anyway, you can start with this
approach and refactor it later into *deferred saves* pattern as these two are
very alike and offer similar advantages.


Deferred saves
^^^^^^^^^^^^^^

In previous section we said that having separate code that independently saves
*single resource* and *resources in bulk* may not be a best approach if you
need to make some additional data processing before saves. No matter
if you do a non-serializer-based data validation or talk to some other external
services, you will need to duplicate this additional processing code in both
handlers. With proper approach you can limit the code duplication by extrating
your resource processing procedures to additial methods but it will eventually
make things unnecessarily complex and will still be hard to maintain.

A little improvement to previous code is to reuse single resource creation
handler in your custom ``create_bulk()`` implementation but allow the
``create()`` handler to skip saving data to storage on the caller's demand.
Thus any per-resource processing will always stay in the ``create()`` handler
code and the ``create_bulk()`` will be responsible only for saving the data in
bulk:

.. code-block:: python
class DocumentsAPI(ListCreateAPI):
def list(self, params, meta, **kwargs):
return solr.search("*:*")
def create(self, params, meta, validated, skip_save=False, **kwargs):
# do some additional processing like adding defaults etc.
validated['created_at'] = time.time()
# note: skip_save defaults to False on ordinary POST requests
# this means ``create()`` was called in single-resource mode
if not skip_save:
solr.add([validated])
# note: return document back so its representation
# can be included in the response body
return validated
def create_bulk(self, params, meta, validated, **kwargs):
validated = kwargs.pop('validated')
processed = [
self.create(params, meta, item, skip_save=True)
for item in validated
]
solr.add(processed)
return processed
This way you can be sure that anything you add to the ``create()`` handler
will also affect the resources created in bulk. Additionally your API is more
efficient because it can save the data in bulk with single request to your
storage backend instead of making multiple requests.


Utilize your storage transactions
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Sometimes you may not concerned about the performance of multiple small saves
but only want to have the "all or nothing" semantics of the bulk creation
method. If the integration with your storage backend allows you to enforce
transactions on the block of code you can easily use such feature to make sure
that all the separate saves done with ``create()`` handler will take effect
in the "all or nothing" manner. Good use case for such appoach could be working
with any RDBMS that allows to use transactions.

Let's assume you have a per-request ``session`` object that wraps the
integration with the storage backend and allows you to set savepoints and
commit/rollback transactions. Many ORM layers (e.g. SQLAlchemy) offer such
kind of object code for such technique may look very simillar for different
storage providers:

.. code-block:: python
# note: example sqlachemy integration could work that way
engine = create_engine("...")
Session = sessionmaker(bind=engine)
class MyAPI(ListCreateAPI):
def on_post(req, resp, **kwargs):
# inject session object into kwargs so it can be later
# used by ``create()`` handler to manipulate storage
# and manage transaction
session = Session()
try:
super().on_post(req, resp, session=session, **kwargs)
except:
session.rollback()
raise
else:
session.commit()
def on_patch(req, resp, **kwargs):
# inject session object into kwargs so it can be later
# used by ``create_bulk()`` handler to manipulate storage
# and manage transaction
session = Session()
try:
super().on_patch(req, resp, session=session, **kwargs)
except:
session.rollback()
raise
else:
session.commit()
28 changes: 22 additions & 6 deletions src/graceful/resources/base.py
Original file line number Diff line number Diff line change
Expand Up @@ -139,6 +139,7 @@ def allowed_methods(self):
('GET', hasattr(self, 'on_get')),
('POST', hasattr(self, 'on_post')),
('PUT', hasattr(self, 'on_put')),
('PATCH', hasattr(self, 'on_patch')),
('DELETE', hasattr(self, 'on_delete')),
('HEAD', hasattr(self, 'on_head')),
('OPTIONS', hasattr(self, 'on_options')),
Expand Down Expand Up @@ -331,7 +332,7 @@ def require_representation(self, req):
description="only JSON supported, got: {}".format(content_type)
)

def require_validated(self, req, partial=False):
def require_validated(self, req, partial=False, bulk=False):
"""Require fully validated internal object dictionary.
Internal object dictionary creation is based on content-decoded
Expand All @@ -340,20 +341,35 @@ def require_validated(self, req, partial=False):
Args:
req (falcon.Request): request object
partial (bool): self to True if partially complete representation
partial (bool): set to True if partially complete representation
is accepted (e.g. for patching instead of full update). Missing
fields in representation will be skiped.
bulk (bool): set to True if request payload represents multiple
resources instead of single one.
Returns:
dict: dictionary of fields and values representing internal object.
Each value is a result of ``field.from_representation`` call.
"""
representation = self.require_representation(req)
representations = [
self.require_representation(req)
] if not bulk else self.require_representation(req)

if bulk and not isinstance(representations, list):
raise ValidationError(
"Request payload should represent a list of resources."
).as_bad_request()

object_dicts = []

try:
object_dict = self.serializer.from_representation(representation)
self.serializer.validate(object_dict, partial)
for representation in representations:
object_dict = self.serializer.from_representation(
representation
)
self.serializer.validate(object_dict, partial)
object_dicts.append(object_dict)

except DeserializationError as err:
# when working on Resource we know that we can finally raise
Expand All @@ -365,4 +381,4 @@ def require_validated(self, req, partial=False):
# so we also are prepared to catch it
raise err.as_bad_request()

return object_dict
return object_dicts if bulk else object_dicts[0]

0 comments on commit 4d8469c

Please sign in to comment.