"Serialization" is a common short-hand for two important concepts: serialization
and de-serialization
.
To over-simplify, serialization is the process of taking an object in one programming language (say, Python) that can be understood by one program (say, yours) and converting it into a format that can be understood by a different program, often written in a different programming language (say, JavaScript).
De-serialization is the exact same process - but in reverse. It takes data in another format, often from another program, and converts it into an object that your program can understand.
So why is it important? Well, because modern programming is all about communication. Which in this context means communication between different programs, APIs, microservices, etc. And these various programs and microservices may be written in vastly different programming languages, have (often different) approaches to security, etc. Which means they need to be able to talk to each other appropriately.
Which is where serialization and de-serialization come in, since they're the process that makes that communication possible.
In a very typical example, you can imagine a modern web application. The back-end might be written in Python, maybe using a framework like Flask or Django. The back-end exposes a variety of RESTful APIs that handle the business logic of your web app. But you've got an entirely separate front-end, probably written in JavaScript using React/Redux, AngularJS, or something similar.
In most web applications, at some point your back-end will need to retrieve data from a database (which an ORM <Object Relational Mapper (ORM)>
like SQLAlchemy <sqlalchemy:core/api_basics>
is great at), and will want to hand it off to your front-end. A typical example might be if you want to list users, or in a social media-style app, list a user's friends. So once your Python back-end has gotten a list of users, how does it communicate that list to your JavaScript front-end? Most likely by exchanging JSON <JavaScript Object Notation (JSON)>
objects.
Which means that your Python back-end needs to take the list of users it retrieved, convert their data into JSON format, and transmit it to the front-end. That's serialization.
But now let's say a user in your front-end changes their email address. The front-end will need to let the back-end know, and your back-end will need to update the relevant database record with the latest change. So how does the front-end communicate that change to the back-end? Again, by sending a JSON object to the back-end. But your back-end needs to parse that data, validate it, and then reflect the change in the underlying database. The process of parsing that data? That's de-serialization.
So if serialziation and de-serialization are so important, how does this relate to SQLAthanor? Well, serialization and de-serialization can be complicated:
- Different programs may need to serialize and de-serialize into and from multiple formats.
- Some data (like passwords) should only be de-serialized, but for security reasons should never be serialized.
- Serialization and de-serialization may need various pre-processing steps to validate the data or coerce it to/from different data types...and that validation/coercion logic may be different depending on the data format.
- The (fantastic)
SQLAlchemy ORM <sqlalchemy:orm/tutorial>
handles database read/write operations amazingly, but does not include any serialization/de-serialization support.
This leads to a labor-intensive process of writing custom serialization/de-serialization logic for multiple (different) models and repeating that process across multiple applications. Better, we think, to package that functionality into a library.
Which is what SQLAthanor is.
It is designed to extend the functionality of the SQLAlchemy ORM <sqlalchemy:orm/tutorial>
with support for serialization and de-serialization into/from:
JSON <JavaScript Object Notation (JSON)>
CSV <Comma-Separated Value (CSV)>
YAML <YAML Ain't a Markup Language (YAML)>
- Python
dict <python:dict>
Which should hopefully save some effort when building applications that need to talk to the outside world (and really, don't all apps do these days?).
- Configure
serialization
andde-serialization
support when defining your SQLAlchemymodels <Model Class>
. - Automatically include serialization methods in your SQLAlchemy
model instances <model instance>
. - Automatically include de-serialization "creator" methods in your SQLAlchemy
models <model class>
. - Automatically include de-serialization "updater" methods to your SQLAlchemy
model instances <model instance>
. - Support serialization and de-serialization across the most-common data exchange formats:
JSON <JavaScript Object Notation (JSON)>
,CSV <Comma-Separated Value (CSV)>
,YAML <YAML Ain't a Markup Language (YAML)>
, and Pythondict <python:dict>
. - Support pre-processing before serialization/de-serialization for data validation or coercion.
- Support serialization and de-serialization for complex
models <model class>
that may include:relationships <relationship>
,hybrid properties <hybrid property>
,association proxies <association proxy>
, or standard Python@property <python:property>
. - Maintain all of the existing APIs, methods, functions, and functionality of
SQLAlchemy Core <sqlalchemy:core/api_basics>
. - Maintain all of the existing APIs, methods, functions, and functionality of
SQLAlchemy ORM <sqlalchemy:orm/tutorial>
. - Maintain all of the existing APIs, methods, functions, and functionality of
SQLAlchemy Declarative ORM <sqlalchemy:orm/extensions/declarative/index>
.
SQLAthanor is designed to extend the fantastic SQLAlchemy library, to provide it with seamless serialization
and de-serialization
support. What do we mean by seamless? Well, in an ideal world we want serialization and de-serialization to work like this:
# To create serialized output from a model instance, just use:
as_json = model_instance.to_json()
as_csv = model_instance.to_csv()
as_yaml = model_instance.to_yaml()
as_dict = model_instance.to_dict()
# To create a new model instance from serialized data, just use:
new_as_instance = ModelClass.new_from_json(as_json)
new_as_instance = ModelClass.new_from_csv(as_csv)
new_as_instance = ModelClass.new_from_yaml(as_yaml)
new_as_instance = ModelClass.new_from_dict(as_dict)
# To update an existing model instance from serialized data, just use:
model_instance.update_from_json(as_json)
model_instance.update_from_csv(as_csv)
model_instance.update_from_yaml(as_yaml)
model_instance.update_from_dict(as_dict)
Even knowing nothing about SQLAlchemy or SQLAthanor, it should be pretty easy to figure out what's happening in that code, right?
Well, that's exactly what SQLAthanor does for you. So how? Let's break that down.
SQLAthanor is a drop-in replacement
for SQLAlchemy.
What does this mean? It means that it's designed to seamlessly replace some of your SQLAlchemy import
statements. Then, you can continue to define your models <model class>
just as you would using the SQLAlchemy ORM <sqlalchemy:orm/tutorial>
- but now they'll support serialization
and de-serialization
.
In other words, the process of using SQLAthanor is very simple:
- Install SQLAthanor. (see
here <installation>
) - Import the components used to define your
model classes <model class>
. (seehere <importing>
) - Define your
model classes <model class>
, just as you would in SQLAlchemy. (seehere <defining_models>
) - Configure which
model attributes <model attribute>
to be serialized (output) and de-serialized (input). (seehere <configuration>
) - Configure any pre/post-processing for serialization and de-serialization, respectively. (see
here <extra_processing>
) - Serialize your
model instances <model instance>
as needed. (seehere <serialization>
) - Create new
model instances <model instance>
using de-serialization. (seehere <deserialization>
) - Update existing
model instances <model instance>
using de-serialization. (seehere <deserialization>
)
And that's it! Once you've done the steps above, you can easily serialize data from your models and de-serialize data into your models using simple methods.
Tip
Because SQLAthanor inherits from and extends SQLAlchemy, your existing SQLAlchemy models <model class>
will work with no change.
By default, serialization
and de-serialization
are disabled for any model attribute
unless they are explicitly enabled.
Standard Approach
Because SQLAthanor is a drop-in replacement for SQLAlchemy and its Declarative ORM <sqlalchemy:orm/extensions/declarative/index>
, you can define your models the exact same way you would do so normally:
SQLAlchemy Declarative ORM <sqlalchemy:orm/extensions/declarative/index>
SQLAlchemy ORM Tutorial <sqlalchemy:orm/tutorial>
Flask-SQLAlchemy: Declaring Models <flask_sqlalchemy:models>
Declarative Reflection
SQLAlchemy supports the use of reflection with the SQLAlchemy Declarative ORM <sqlalchemy:orm/extensions/declarative/index>
.
This is a process where SQLAlchemy automatically constructs a Declarative <sqlalchemy:orm/extensions/declarative/index>
model class
based on what it reads from the table definition stored in your SQL database or a corresponding Table <sqlalchemy:sqlalchemy.schema.Table>
instance already defined and registered with a MetaData <sqlalchemy:sqlalchemy.schema.MetaData>
object.
SQLAthanor also supports the same pattern. For details, please see: Using Declarative Reflection with SQLAthanor <using_reflection>
Using Declarative Reflection with SQLAthanor <using_reflection>
- SQLAlchemy:
Reflecting Database Objects <sqlalchemy:core/reflection>
- SQLAlchemy: Using Reflection with Declarative
Using Automap
0.2.0
The Automap Extension <sqlalchemy:orm/extensions/automap>
is an incredibly useful tool for modeling existing databases with minimal effort. What it does is it reasd your existing database's metadata and automatically constructs SQLAlchemy Declarative ORM <sqlalchemy:orm/extensions/declarative/index>
model classes <model class>
populated with your tables' columns.
Neat, right? Saves a ton of effort.
Using SQLAthanor you can ensure that your automapped (automatically generated) models support serialization and de-serialization. For more details, please see: Using Automap with SQLAthanor <using_automap>
.
Using Automap with SQLAthanor <using_automap>
- SQLAlchemy:
Automap Extension <sqlalchemy:orm/extensions/automap>
explicit is better than implicit
-- PEP 20 - The Zen of Python
By default (for security reasons) serialization
and de-serialization
are disabled on all of your model attributes <model attribute>
. However, SQLAthanor exists to let you explicitly enable serialization
and/or de-serialization
for specific model attributes <model attribute>
.
Here are some important facts to understand for context:
- You can enable
serialization
separately fromde-serialization
. This allows you to do things like de-serialize apassword
field (expect it as an input), but never include it in the output that you serialize.- You can configure
serialization
/de-serialization
differently for different formats. SQLAthanor supportsCSV <Comma-Separated Value (CSV)>
,JSON <JavaScript Object Notation (JSON)>
,YAML <YAML Ain't a Markup Language (YAML)>
, and Pythondict <python:dict>
.- You can serialize or de-serialize any
model attribute
that is bound to yourmodel class
. This includes:
- Attributes that correspond to columns in the underlying SQL table.
- Attributes that represent a
relationship
to anothermodel <model class>
.- Attributes that are defined as
hybrid properties <hybrid property>
.- Attributes that are defined as
association proxies <association proxy>
.Instance attributes <instance attribute>
defined using Python's built-in@property <python:property>
decorator.
In general, SQLAthanor supports two different mechanisms to configure serialization/de-serialization: Declarative Configuration <declarative_configuration>
and Meta Configuration <meta_configuration>
.
The Declarative Configuration
approach is modeled after the SQLAlchemy Declarative ORM <sqlalchemy:orm/extensions/declarative/index>
itself. It allows you to configure a model attribute's <model attribute>
serialization
and de-serialization
when defining the model attribute
.
Here's a super-simplified example of how it works:
from sqlathanor import declarative_base, Column, relationship
from sqlalchemy import Integer, String
BaseModel = declarative_base()
class User(BaseModel):
__tablename__ = 'users'
id = Column('id',
Integer,
primary_key = True,
supports_csv = True,
csv_sequence = 1,
supports_json = True,
supports_yaml = True,
supports_dict = True,
on_serialize = None,
on_deserialize = None)
This example defines a model class
called User
which corresponds to a SQL database table named users
. The User
class defines one model attribute
named id
(which corresponds to a database Column <sqlathanor.schema.Column>
named id
). The database column is an integer, and it operates as the primary key for the database table.
So far, this is all exactly like you would normally see in the SQLAlchemy Declarative ORM <sqlalchemy:orm/extensions/declarative/index>
.
But, there are some additional arguments supplied to Column <sqlathanor.schema.Column>
: supports_csv
, csv_sequence
, supports_json
, supports_yaml
, supports_dict
, on_serialize
, and on_deserialize
. As you can probably guess, these arguments are what configure serialization
and de-serialization
in SQLAthanor when using declarative configuration
.
Here's what these arguments do:
SQLAthanor Configuration Arguments
- param supports_csv
Determines whether the column can be
serialized <serialization>
to orde-serialized <de-serialization>
fromCSV <Comma-Separated Value (CSV)>
format.If
True
, can be serialized to CSV and de-serialized from CSV. IfFalse
, will not be included when serialized to CSV and will be ignored if present in a de-serialized CSV.Can also accept a 2-member
tuple <python:tuple>
(inbound / outbound) which determines de-serialization and serialization support respectively.Defaults to
False
, which means the column will not be serialized to CSV or de-serialized from CSV.- type supports_csv
bool <python:bool>
ortuple <python:tuple>
of form (inbound:bool <python:bool>
, outbound:bool <python:bool>
)- param csv_sequence
Indicates the numbered position that the column should be in in a valid CSV-version of the object. Defaults to
None <python:None>
.Note
If not specified, the column will go after any columns that do have a
csv_sequence
assigned, sorted alphabetically.If two columns have the same
csv_sequence
, they will be sorted alphabetically.- type csv_sequence
int <python:int>
orNone <python:None>
- param supports_json
Determines whether the column can be
serialized <serialization>
to orde-serialized <de-serialization>
fromJSON <JavaScript Object Notation (JSON)>
format.If
True
, can be serialized to JSON and de-serialized from JSON. IfFalse
, will not be included when serialized to JSON and will be ignored if present in a de-serialized JSON.Can also accept a 2-member
tuple <python:tuple>
(inbound / outbound) which determines de-serialization and serialization support respectively.Defaults to
False
, which means the column will not be serialized to JSON or de-serialized from JSON.- type supports_json
bool <python:bool>
ortuple <python:tuple>
of form (inbound:bool <python:bool>
, outbound:bool <python:bool>
)- param supports_yaml
Determines whether the column can be
serialized <serialization>
to orde-serialized <de-serialization>
fromYAML <YAML Ain't a Markup Language (YAML)>
format.If
True
, can be serialized to YAML and de-serialized from YAML. IfFalse
, will not be included when serialized to YAML and will be ignored if present in a de-serialized YAML.Can also accept a 2-member
tuple <python:tuple>
(inbound / outbound) which determines de-serialization and serialization support respectively.Defaults to
False
, which means the column will not be serialized to YAML or de-serialized from YAML.- type supports_yaml
bool <python:bool>
ortuple <python:tuple>
of form (inbound:bool <python:bool>
, outbound:bool <python:bool>
)- param supports_dict
Determines whether the column can be
serialized <serialization>
to orde-serialized <de-serialization>
from a Pythondict <python:dict>
.If
True
, can be serialized todict <python:dict>
and de-serialized from adict <python:dict>
. IfFalse
, will not be included when serialized todict <python:dict>
and will be ignored if present in a de-serializeddict <python:dict>
.Can also accept a 2-member
tuple <python:tuple>
(inbound / outbound) which determines de-serialization and serialization support respectively.Defaults to
False
, which means the column will not be serialized to adict <python:dict>
or de-serialized from adict <python:dict>
.- type supports_dict
bool <python:bool>
ortuple <python:tuple>
of form (inbound:bool <python:bool>
, outbound:bool <python:bool>
)- param on_deserialize
A function that will be called when attempting to assign a de-serialized value to the column. This is intended to either coerce the value being assigned to a form that is acceptable by the column, or raise an exception if it cannot be coerced. If
None <python:None>
, the data type's defaulton_deserialize
function will be called instead.Tip
If you need to execute different
on_deserialize
functions for different formats, you can also supply adict <python:dict>
:on_deserialize = { 'csv': csv_on_deserialize_callable, 'json': json_on_deserialize_callable, 'yaml': yaml_on_deserialize_callable, 'dict': dict_on_deserialize_callable }
Defaults to
None <python:None>
.- type on_deserialize
callable or
dict <python:dict>
with formats as keys and values as callables- param on_serialize
A function that will be called when attempting to serialize a value from the column. If
None <python:None>
, the data type's defaulton_serialize
function will be called instead.Tip
If you need to execute different
on_serialize
functions for different formats, you can also supply adict <python:dict>
:on_serialize = { 'csv': csv_on_serialize_callable, 'json': json_on_serialize_callable, 'yaml': yaml_on_serialize_callable, 'dict': dict_on_serialize_callable }
Defaults to
None <python:None>
.- type on_serialize
callable or
dict <python:dict>
with formats as keys and values as callables
When using Declarative Configuration, the exact same arguments can be applied when defining a relationship
using relationship() <sqlathanor.schema.relationship>
as shown in the expanded example below. Let's look at a somewhat more complicated example:
from sqlathanor import declarative_base, Column, relationship
from sqlalchemy import Integer, String
BaseModel = declarative_base()
class User(BaseModel):
__tablename__ = 'users'
id = Column('id',
Integer,
primary_key = True,
supports_csv = True,
csv_sequence = 1,
supports_json = True,
supports_yaml = True,
supports_dict = True,
on_serialize = None,
on_deserialize = None)
addresses = relationship('Address',
backref = 'user',
supports_json = True,
supports_yaml = (True, True),
supports_dict = (True, False),
on_serialize = None,
on_deserialize = None)
This example is (obviously) very similar to the previous one. But now we have added a relationship
defined using the SQLAthanor relationship() <sqlathanor.schema.relationship>
function. This operates exactly as the built-in sqlalchemy.relationship() <sqlalchemy:sqlalchemy.relationship>
function. But it has the same set of declarative SQLAthanor configuration attributes.
So in this example, we define a relationship to a different model class
called Address
, and assign that relationship to the model attribute
User.addresses
. Given the configuration above, the User
model will:
- support serializing the
id
attribute to CSV, JSON, YAML, anddict <python:dict>
- support de-serializing the
id
attribute from CSV, JSON, YAML, anddict <python:dict>
- support serializing related
addresses
to JSON and YAML, but will not include theaddresses
attribute when serializing to CSV ordict <python:dict>
- support de-serializing related
addresses
from JSON, YAML, anddict <python:dict>
, but not from CSV.
The Meta Configuration
approach is a bit more robust than the Declarative <declarative_configuration>
approach. That's because it supports more model attribute
types, including hybrid properties <hybrid property>
, association proxies <association proxy>
, and Python @property
instance attributes <instance attribute>
.
The Meta Configuration approach relies on a special model attribute
that you define for your model class
: __serialization__
. This attribute should contain a list of AttributeConfiguration <sqlathanor.attributes.AttributeConfiguration>
objects, which are used to indicate the explicit serialization
/de-serialization
configuration for your model attributes <model attribute>
.
Here's how you would configure an example User
model using the Meta Configuration approach:
from sqlathanor import declarative_base, Column, relationship, AttributeConfiguration
from sqlalchemy import Integer, String
BaseModel = declarative_base()
class User(BaseModel):
__tablename__ = 'users'
__serialization__ = [AttributeConfiguration(name = 'id',
supports_csv = True,
csv_sequence = 1,
supports_json = True,
supports_yaml = True,
supports_dict = True,
on_serialize = None,
on_deserialize = None),
AttributeConfiguration(name = 'addresses',
supports_json = True,
supports_yaml = (True, True),
supports_dict = (True, False),
on_serialize = None,
on_deserialize = None)]
id = Column('id',
Integer,
primary_key = True)
addresses = relationship('Address',
backref = 'user')
The __serialization__
attribute contains an explicit configuration for both the id
and addresses
column. Each AttributeConfiguration <sqlathanor.attributes.AttributeConfiguration>
object supports the same configuration arguments as are used by the declarative <declarative_configuration>
approach, with one addition: It needs a name
argument that explicitly indicates the name of the model attribute
that is being configured.
Note
The __serialization__
attribute accepts both AttributeConfiguration <sqlathanor.attributes.AttributeConfiguration>
instances, as well as dict <python:dict>
representations of those instances.
If you supply dict <python:dict>
configurations, SQLAthanor will automatically convert them to AttributeConfiguration <sqlathanor.attributes.AttributeConfiguration>
instances.
The __serialization__
below is identical to the one above:
__serialization__ = [
{
'name': 'id',
'supports_csv': True,
'csv_sequence': 1,
'supports_json': True,
'supports_yaml': True,
'supports_dict': True,
'on_serialize': None,
'on_deserialize': None
},
{
'name': 'addresses',
'supports_json': True,
'supports_yaml': (True, True),
'supports_dict': (True, False),
'on_serialize': None,
'on_deserialize': None
}
]
AttributeConfiguration <sqlathanor.attributes.AttributeConfiguration>
SQLAthanor Configuration Arguments <configuration_arguments>
Declarative Configuration <declarative_configuration>
Unlike the declarative <declarative_configuration>
approach, you can use the __serialization__
attribute to configure serialization and de-serialization for more complex types of model attributes <model attribute>
, including hybrid properties <hybrid property>
, association proxies <association proxy>
, and Python @property <python:property>
attributes.
Using the meta configuration approach, you configure these more complex attributes in exactly the same way:
from sqlathanor import declarative_base, Column, relationship, AttributeConfiguration
from sqlalchemy import Integer, String
from sqlalchemy.ext.hybrid import hybrid_property
from sqlalchemy.ext.associationproxy import association_proxy
BaseModel = declarative_base()
class User(BaseModel):
__tablename__ = 'users'
__serialization__ = [AttributeConfiguration(name = 'id',
supports_csv = True,
csv_sequence = 1,
supports_json = True,
supports_yaml = True,
supports_dict = True,
on_serialize = None,
on_deserialize = None),
AttributeConfiguration(name = 'addresses',
supports_json = True,
supports_yaml = (True, True),
supports_dict = (True, False),
on_serialize = None,
on_deserialize = None),
AttributeConfiguration(name = 'hybrid',
supports_csv = True,
csv_sequence = 2,
supports_json = True,
supports_yaml = True,
supports_dict = True,
on_serialize = None,
on_deserialize = None)]
AttributeConfiguration(name = 'keywords',
supports_csv = False,
supports_json = True,
supports_yaml = True,
supports_dict = True,
on_serialize = None,
on_deserialize = None)]
AttributeConfiguration(name = 'python_property',
supports_csv = (False, True),
csv_sequence = 3,
supports_json = (False, True),
supports_yaml = (False, True),
supports_dict = (False, True),
on_serialize = None,
on_deserialize = None)]
id = Column('id',
Integer,
primary_key = True)
addresses = relationship('Address',
backref = 'user')
_hybrid = 1
@hybrid_property
def hybrid(self):
return self._hybrid
@hybrid.setter
def hybrid(self, value):
self._hybrid = value
@hybrid.expression
def hybrid(cls):
return False
keywords = association_proxy('keywords', 'keyword')
@property
def python_property(self):
return self._hybrid * 2
This more complicated pattern extends the earlier example with a hybrid property
nammed hybrid
, an association proxy
named keywords
, and an instance attribute
(defined using @property <python:property>
) named python_property
.
- The
__serialization__
configuration shown ensures that: hybrid
can be serialized to and de-serialized from CSV, JSON, YAML, anddict <python:dict>
.keywords
can be serialized to and de-serialized from JSON, YAML, anddict <python:dict>
, but cannot be serialized to or de-serialized from CSV.python_property
can be serialized to all formats, but cannot be de-serialized (which makes sense, since it is defined without asetter
)
Warning
A configuration found in __serialization__
always takes precedence.
This means that if you mix the declarative <declarative_configuration>
and meta <meta_configuration>
approaches, the configuration in __serialization__
will be applied.
Tip
For security reasons, if you don't explicitly configure serialization/de-serialization for a model attribute
using the meta <meta_configuration>
or declarative <declarative_configuration>
approach, by default that attribute will not be serialized and will be ignored when de-serializing.
How should you choose between the meta configuration <meta_configuration>
and declarative configuration <declarative_configuration>
approach?
Well, to some extent it's a question of personal preference. For example, I always use the meta <meta_configuration>
approach because I believe it is cleaner, easier to maintain, and more extensible. But as you can probably tell if you look at my code, I tend to be a bit pedantic about such things. There are plenty of times when the declarative <declarative_configuration>
approach will make sense and "feel" right.
Here's a handy flowchart to help you figure out which you should use:
The
Zen of Python <20>
holds that:There should be one-- and preferably only one --obvious way to do it.
And that is a very wise principle. But there are times when it makes sense to diverge from that principle. I made a conscious choice to support two different configuration mechanisms for several practical reasons:
- SQLAlchemy has been around for a long time, and is very popular. There are many existing codebases that might benefit from integrating with SQLAthanor. The
meta configuration <meta_configuration>
approach lets users make minimal changes to their existing codebases at minimal risk.- SQLAlchemy is often used in quick-and-dirty projects where the additional overhead of defining an explicit
meta configuration <meta_configuration>
in the__serialization__
class attribute will interrupt a programmer's "flow". TheSQLAlchemy Declarative ORM <sqlalchemy:orm/extensions/declarative/index>
already provides a great API for defining a model with minimal overhead, so piggybacking on that familiar API will enhance the programmer experience.- The internal mechanics of how SQLAlchemy implements
hybrid properties <hybrid property>
andassociation proxies <association proxy>
are very complicated, and can be mutated at various points in amodel class
ormodel instance
lifecycle. As a result, supporting them in thedeclarative configuration <declarative_configuration>
approach would have been very complicated, with a lot of exposure to potential edge case errors. But themeta configuration <meta_configuration>
approach neatly avoids those risks.- The only way for the
declarative configuration <declarative_configuration>
approach to support serialization and de-serialization ofmodel attributes <model attribute>
defined using Python's built-in@property <python:property>
decorator would require extending a feature of the standard library...which I consider an anti-pattern that should be done rarely (if ever).So given those arguments, you might ask: Why not just use the
meta configuration <meta_configuration>
approach, and call it a day? It clearly has major advantages. And yes, you'd be right to ask the question. It does have major advantages, and it is the one I use almost-exclusively in my own code.But!
I've spent close to twenty years in the world of data science and analytics, and I know what the coding practices in that community look like and how SQLAlchemy often gets used "in the wild". And that experience and knowledge tells me that the
declarative <declarative_configuration>
approach will just "feel more natural" to a large number of data scientists and developers who have a particular workflow, particular coding style, specific coding conventions, and who often don't need SQLAlchemy's more complicated features likehybrid properties <hybrid property>
orassociation proxies <association proxy>
.And since SQLAthanor can provide benefits to both "application developers" and "data scientists", I've tried to design an interface that will feel "natural" to both communities.
When serializing <serialization>
and de-serializing <deserialization>
objects, it is often necessary to either convert data to a more-compatible format, or to validate inbound data to ensure it meets your expectations.
SQLAthanor supports this using serialization pre-processing and de-serialization post-processing, as configured in the on_serialize
and on_deserialize
configuration arguments <configuration_arguments>
.
The on_deserialize
configuration argument <configuration_arguments>
allows you to assign a serialization function
to a particular model attribute
.
If assigned, the serialization function will be called before the model attribute
's value gets serialized into its target format. This is particularly useful when you need to convert a value from its native (in Python) type/format into a type/format that is supported by the serialized data type.
A typical example of this might be converting a None <python:None>
value in Python to an empty string ''
that can be included in a CSV <Comma-Separated Value (CSV)>
record, or ensuring a decimal value is appropriately rounded.
The on_serialize
argument <configuration_arguments>
expects to receive either a callable (a Python function), or a dict <python:dict>
where keys correspond to SQLAthanor's supported formats and values are the callables to use for that format. Thus:
...
on_serialize = my_serialization_function,
...
will call the my_serialization_function()
whenever serializing the value, but
...
on_serialize = {
'csv': my_csv_serialization_function,
'json': my_json_serialization_function,
'yaml': my_yaml_serialization_function,
'dict': my_dict_serialization_function
},
...
will call a different function when serializing the value to each of the four formats given above.
Tip
If on_serialize
is None <python:None>
, or if its value for a particular format is None <python:None>
, then SQLAthanor will default to a Default Serialization Function <default_serialization_functions>
based on the data type of the model attribute
.
If defining your own custom serializer function, please bear in mind that a valid serializer function will:
- accept one positional argument, which is the value of the
model attribute
to be serialized, and- return one value, which is the value that will be included in the serialized output.
Default Serialization Functions <default_serialization_functions>
SQLAthanor Configuration Arguments <configuration_arguments>
sqlathanor.declarative.BaseModel
The on_deserialize
configuration argument <configuration_arguments>
allows you to assign a de-serialization function
to a particular model attribute
.
If assigned, the de-serialization function will be called after your serialized object is parsed, but before the value is assigned to your Python model attribute
. This is particularly useful when you need to:
- convert a value from its serialized format (e.g. a string) into the type supported by your
model class
(e.g. an integer), - validate that a serialized value is "correct" (matches your expectations),
- do something to the value before persisting it (e.g. hash and salt) to the database.
A typical example of this might be:
- converting an empty string
''
in aCSV <Comma-Separated Value (CSV)>
record intoNone <python:None>
- validating that the serialized value is a proper telephone number
- hashing an inbound password.
The on_deserialize
argument <configuration_arguments>
expects to receive either a callable (a Python function), or a dict <python:dict>
where keys correspond to SQLAthanor's supported formats and values are the callables to use for that format. Thus:
...
on_deserialize = my_deserialization_function,
...
will call the my_deserialization_function()
whenever de-serializing the value, but
...
on_deserialize = {
'csv': my_csv_deserialization_function,
'json': my_json_deserialization_function,
'yaml': my_yaml_deserialization_function,
'dict': my_dict_deserialization_function
},
...
will call a different function when de-serializing the value to each of the four formats given above.
Tip
If on_deserialize
is None <python:None>
, or if its value for a particular format is None <python:None>
, then SQLAthanor will default to a Default De-serialization Function <default_deserialization_functions>
based on the data type of the model attribute
being de-serialized.
If defining your own custom deserializer function, please bear in mind that a valid deserializer function will:
- accept one positional argument, which is the value for the
model attribute
as found in the serialized input, and- return one value, which is the value that will be assigned to the
model attribute
(and thus probably persisted to the underlying database).
Default De-serialization Functions <default_deserialization_functions>
SQLAthanor Configuration Arguments <configuration_arguments>
sqlathanor.declarative.BaseModel
Once you've configured <configuration>
your model class
, you can now easily serialize it to the formats you have enabled. Your model instance
will have one serialization method for each of the formats, named to_<format>
where <format>
corresponds to csv
, json
, yaml
, and dict
:
to_csv() <sqlathanor.BaseModel.to_csv>
to_json() <sqlathanor.BaseModel.to_json>
to_yaml() <sqlathanor.BaseModel.to_yaml>
to_dict() <sqlathanor.BaseModel.to_dict>
SQLAthanor automatically supports nesting complex structures in JSON, YAML, and dict <python:dict>
. However, to prevent the risk of infinite recursion, those formats serialization methods all feature a required max_nesting
argument. By default, it is set to 0
which prevents model attributes <model attribute>
that resolve to another model class
from being included in a serialized output.
Unlike other supported formats, CSV <Comma-Separated Value (CSV)>
works best with "flat" structures where each output column contains one and only one simple value, and so to_csv()
does not include any max_nesting
or current_nesting
arguments.
Tip
As a general rule of thumb, we recommend that you avoid enabling CSV serialization on relationships <relationship>
unless using a custom serialization function
to structure nested data.
sqlathanor.BaseModel.to_csv
sqlathanor.BaseModel.to_json
sqlathanor.BaseModel.to_yaml
sqlathanor.BaseModel.to_dict
Once you've configured <configuration>
your model class
, you can now easily de-serialize <de-serialization>
it from the formats you have enabled.
However, unlike serializing <serialization>
your data, there are actually two types of de-serialization method to choose from:
- The
new_from_<format>
method operates on yourmodel class
directly and create a newmodel instance
whose properties are set based on the data you are de-serializing. - The
update_from_<format>
methods operate on amodel instance
, and update that instance's properties based on the data you are de-serializing.
Creating New:
new_from_csv() <sqlathanor.BaseModel.new_from_csv>
new_from_json() <sqlathanor.BaseModel.new_from_json>
new_from_yaml() <sqlathanor.BaseModel.new_from_yaml>
new_from_dict() <sqlathanor.BaseModel.new_from_dict>
Updating:
update_from_csv() <sqlathanor.BaseModel.update_from_csv>
update_from_json() <sqlathanor.BaseModel.update_from_json>
update_from_yaml() <sqlathanor.BaseModel.update_from_yaml>
update_from_dict() <sqlathanor.BaseModel.update_from_dict>
sqlathanor.BaseModel.new_from_csv
sqlathanor.BaseModel.new_from_json
sqlathanor.BaseModel.new_from_yaml
sqlathanor.BaseModel.new_from_dict
sqlathanor.BaseModel.update_from_csv
sqlathanor.BaseModel.update_from_json
sqlathanor.BaseModel.update_from_yaml
sqlathanor.BaseModel.update_from_dict
SQLAlchemy supports the use of reflection with the SQLAlchemy Declarative ORM <sqlalchemy:orm/extensions/declarative/index>
.
This is a process where SQLAlchemy automatically constructs a Declarative <sqlalchemy:orm/extensions/declarative/index>
model class
based on what it reads from the table definition stored in your SQL database or a corresponding Table <sqlalchemy:sqlalchemy.schema.Table>
instance already defined and registered with a MetaData <sqlalchemy:sqlalchemy.schema.MetaData>
object.
- SQLAlchemy:
Reflecting Database Objects <sqlalchemy:core/reflection>
- SQLAlchemy: Using Reflection with Declarative
Using Automap with SQLAthanor <using_automap>
SQLAthanor is also compatible with this pattern. In fact, it works just as you might expect. At a minimum:
from sqlathanor import declarative_base, Column, relationship, AttributeConfiguration
from sqlalchemy import create_engine, Integer, String, Table
from sqlalchemy.ext.hybrid import hybrid_property
from sqlalchemy.ext.associationproxy import association_proxy
engine = create_engine('... ENGINE CONFIGURATION GOES HERE ...')
# NOTE: Because reflection relies on a specific SQLAlchemy Engine existing, presumably
# you would know how to configure / instantiate your database engine using SQLAlchemy.
# This is just here for the sake of completeness.
BaseModel = declarative_base()
class ReflectedUser(BaseModel):
__table__ = Table('users',
BaseModel.metadata,
autoload = True,
autoload_with = engine)
will read the structure of your users
table, and populate your ReflectedUser
model class with model attributes <model attribute>
that correspond to the table's columns as defined in the underlying SQL table.
Caution
By design, SQLAlchemy's reflection ONLY reflects Column
definitions. It does NOT reflect relationships <relationship>
that you may otherwise model using SQLAlchemy.
Because the ReflectedUser
class inherits from the SQLAthanor base model, it establishes the __serialization__
attribute, and the to_csv() <sqlathanor.BaseModel.to_csv>
, to_json() <sqlathanor.BaseModel.to_json>
, to_yaml() <sqlathanor.BaseModel.to_yaml>
, and to_dict() <sqlathanor.BaseModel.to_dict>
methods on the ReflectedUser
class.
When working with a reflected model class, you can configure serialization/deserialization using either the declarative <declarative_configuration>
or meta <meta_configuration>
approach as you normally would.
Warning
In the example above, if the database table named users
already has a Table <sqlalchemy:sqlalchemy.schema.Table>
associated with it, ReflectedUser
will inherit the Column
definitions from the "original" Table <sqlalchemy:sqlalchemy.schema.Table>
object.
If those column definitions are defined using sqlathanor.schema.Column
with declarative <declarative_configuration>
, their serialization/deserialization will also be reflected (inherited).
However, the ReflectedUser
model class will NOT inherit any serialization/deserialization configuration defined using the meta <meta_configuration>
approach.
Just as with standard SQLAlchemy reflection, you can override your Column <sqlathanor.schema.Column>
definitions in your reflecting class (ReflectedUser
), or add additional relationship <sqlathanor.schema.relationship>
model attributes <model attribute>
, hybrid properties <hybrid property>
, or association proxies <association proxy>
to the reflecting class.
0.2.0
Caution
Automap <sqlalchemy:orm/extensions/automap>
was introduced in SQLAlchemy v.0.9.1. If you are using SQLAthanor with SQLAlchemy v.0.9.0, then if you attempt to use automap_base() <sqlathanor.automap.automap_base>
you will raise a SQLAlchemySupportError <sqlathanor.errors.SQLAlchemySupportError>
.
The Automap Extension <sqlalchemy:orm/extensions/automap>
is an incredibly useful tool for modeling existing databases with minimal effort. What it does is it reasd your existing database's metadata and automatically constructs SQLAlchemy Declarative ORM <sqlalchemy:orm/extensions/declarative/index>
model classes <model class>
populated with your tables' columns.
Neat, right? Saves a ton of effort.
Using SQLAthanor you can ensure that your automapped (automatically generated) models support serialization and de-serialization.
First, you need to create your automapped classes. This works just like in Automap <sqlalchemy:orm/extensions/automap>
, only you import automap_base() <sqlathanor.automap.automap_base>
from SQLAthanor instead as shown below:
from sqlathanor.automap import automap_base
from sqlalchemy import create_engine
# Create your Automap Base
Base = automap_base()
engine = create_engine('... DATABASE CONNECTION GOES HERE ...')
# Prepare your automap base. This reads your database and creates your models.
Base.prepare(engine, reflect = True)
# And here you can create a "User" model class and an "Address" model class.
User = Base.classes.users
Address = Base.classes.addresses
In the example above, we create User
and Address
model classes <model class>
which will be populated with the columns and relationships <relationship>
from the users
and addresses
tables in the database.
Both User
and Address
will have all of the standard SQLAthanor methods and functionality. BUT! They won't have any serialization/de-serialization configured.
Before you start working with your models, you can configure their serialization/de-serialization using either a declarative <declarative_configuration>
approach using .set_attribute_serialization_config() <sqlathanor.declarative.BaseModel.set_attribute_serialization_config>
or a meta approach <meta_configuration>
by setting the __serialization__
attribute directly:
Declarative Approach
User.set_attribute_serialization_config('email_address',
supports_csv = True,
supports_json = True,
supports_yaml = True,
supports_dict = True)
User.set_attribute_serialization_config('password',
supports_csv = (True, False),
supports_json = (True, False),
supports_yaml = (True, False),
supports_dict = (True, False),
on_deserialize = my_encryption_function)
Meta Approach
User.__serialization__ = [
{
'name': 'email_address',
'supports_csv': True,
'supports_json': True,
'supports_yaml': True,
'supports_dict': True
},
{
'name': 'password',
'supports_csv': (True, False),
'supports_json': (True, False),
'supports_yaml': (True, False),
'supports_dict': (True, False),
'on_deserialize': my_encryption_function
}
]
Both snippets of code above tell the User
model to include users.email_address
in both serialized output, and to expect it in de-serialized input. It also tells the User
model to never serialize the users.password
column, but to expect it in inbound data to de-serialize.
- SQLAlchemy:
Automap Extension <sqlalchemy:orm/extensions/automap>
Using Declarative Reflection with SQLAthanor <using_reflection>