Skip to content

Commit

Permalink
[#3020] Docs: Code archotecture tidy up
Browse files Browse the repository at this point in the history
Add a couple of tips to the CKAN Code Architecture docs and tidy it up a
bit, including moving the Creating a new migration script section into
its own page.
  • Loading branch information
Sean Hammond committed Nov 17, 2012
1 parent bf8f3ce commit 520ccc6
Show file tree
Hide file tree
Showing 2 changed files with 156 additions and 132 deletions.
186 changes: 54 additions & 132 deletions doc/architecture.rst
Expand Up @@ -6,9 +6,6 @@ This section tries to give some guidelines for writing code that is consistent
with the intended, overall design and architecture of CKAN.


``ckan.model``
--------------

Encapsulate SQLAlchemy in ``ckan.model``
````````````````````````````````````````

Expand All @@ -33,107 +30,66 @@ Database Migrations
When changes are made to the model classes in ``ckan.model`` that alter CKAN's
database schema, a migration script has to be added to migrate old CKAN
databases to the new database schema when they upgrade their copies of CKAN.
These migration scripts are kept in ``ckan.migration.versions``.

When you upgrade a CKAN instance, as part of the upgrade process you run any
necessary migration scripts with the ``paster db upgrade`` command::

paster --plugin=ckan db upgrade --config={.ini file}

Creating a new migration script
```````````````````````````````
A migration script should be checked into CKAN at the same time as the model
changes it is related to. Before pushing the changes, ensure the tests pass
when running against the migrated model, which requires the
``--ckan-migration`` setting.

To create a new migration script, create a python file in
``ckan/migration/versions/`` and name it with a prefix numbered one higher than
the previous one and some words describing the change.

You need to use the special engine provided by the SqlAlchemy Migrate. Here is
the standard header for your migrate script: ::

from sqlalchemy import *
from migrate import *

The migration operations go in the upgrade function: ::

def upgrade(migrate_engine):
metadata = MetaData()
metadata.bind = migrate_engine

The following process should be followed when doing a migration. This process
is here to make the process easier and to validate if any mistakes have been
made:
See :doc:`migration`.

1. Get a dump of the database schema before you add your new migrate scripts. ::
Always go through the Action Functions
``````````````````````````````````````

paster --plugin=ckan db clean --config={.ini file}
paster --plugin=ckan db upgrade --config={.ini file}
pg_dump -h host -s -f old.sql dbname
Whenever some code, for example in ``ckan.lib`` or ``ckan.controllers``, wants
to get, create, update or delete an object from CKAN's model it should do so by
calling a function from the ``ckan.logic.action`` package, and *not* by
accessing ``ckan.model`` directly.

2. Get a dump of the database as you have specified it in the model. ::

paster --plugin=ckan db clean --config={.ini file}
Action Functions are Exposed in the API
```````````````````````````````````````

#this makes the database as defined in the model
paster --plugin=ckan db create-from-model -config={.ini file}
pg_dump -h host -s -f new.sql dbname
The functions in ``ckan.logic.action`` are exposed to the world as the
:doc:`apiv3`. The API URL for an action function is automatically generated
from the function name, for example
``ckan.logic.action.create.package_create()`` is exposed at
``/api/action/package_create``. See `Steve Yegge's Google platforms rant
<https://plus.google.com/112678702228711889851/posts/eVeouesvaVX>`_ for some
interesting discussion about APIs.

3. Get agpdiff (apt-get it). It produces sql it thinks that you need to run on
the database in order to get it to the updated schema. ::

apgdiff old.sql new.sql > upgrade.diff

(or if you don't want to install java use http://apgdiff.startnet.biz/diff_online.php)

4. The upgrade.diff file created will have all the changes needed in sql.
Delete the drop index lines as they are not created in the model.

5. Put the resulting sql in your migrate script, e.g. ::

migrate_engine.execute('''update table .........; update table ....''')
**All** publicly visible functions in the
``ckan.logic.action.{create,delete,get,update}`` namespaces will be exposed
through the :doc:`apiv3`. **This includes functions imported** by those
modules, **as well as any helper functions** defined within those modules. To
prevent inadvertent exposure of non-action functions through the action api,
care should be taken to:

6. Do a dump again, then a diff again to see if the the only thing left are drop index statements.
1. Import modules correctly (see `Imports`_). For example: ::

7. run nosetests with ``--ckan-migration`` flag.
import ckan.lib.search as search

It's that simple. Well almost.
search.query_for(...)

* If you are doing any table/field renaming adding that to your new migrate
script first and use this as a base for your diff (i.e add a migrate script
with these renaming before 1). This way the resulting sql won't try to drop and
recreate the field/table!
2. Hide any locally defined helper functions: ::

* It sometimes drops the foreign key constraints in the wrong order causing an
error so you may need to rearrange the order in the resulting upgrade.diff.
def _a_useful_helper_function(x, y, z):
'''This function is not exposed because it is marked as private```
return x+y+z

* If you need to do any data transfer in the migrations then do it between the
dropping of the constraints and adding of new ones.
3. Bring imported convenience functions into the module namespace as private
members: ::

* May need to add some tests if you are doing data migrations.
_get_or_bust = logic.get_or_bust

An example of a script doing it this way is ``034_resource_group_table.py``.
This script copies the definitions of the original tables in order to do the
renaming the tables/fields.

In order to do some basic data migration testing extra assertions should be
added to the migration script. Examples of this can also be found in
``034_resource_group_table.py`` for example.
Use ``get_action()``
````````````````

This statement is run at the top of the migration script to get the count of
rows: ::
Don't call ``logic.action`` functions directly, instead use ``get_action()``.
This allows plugins to override action functions using the ``IActions`` plugin
interface. For example::

package_count = migrate_engine.execute('''select count(*) from package''').first()[0]
ckan.logic.get_action('group_activity_list_html')(...)

And the following is run after to make sure that row count is the same: ::
Instead of ::

resource_group_after = migrate_engine.execute('''select count(*) from resource_group''').first()[0]
assert resource_group_after == package_count
ckan.logic.action.get.group_activity_list_html(...)

``ckan.logic``
--------------

Auth Functions and ``check_access()``
``````````````
Expand Down Expand Up @@ -172,57 +128,23 @@ which will raise ``ValidationError`` if ``"id"`` is not in ``data_dict``. The
response and an error message explaining the problem.


Action Functions are Automatically Exposed in the API
`````````````````````````````````````````````````````

**All** publicly visible functions in the
``ckan.logic.action.{create,delete,get,update}`` namespaces will be exposed
through the :doc:`apiv3`. **This includes functions imported** by those
modules, **as well as any helper functions** defined within those modules. To
prevent inadvertent exposure of non-action functions through the action api,
care should be taken to:

1. Import modules correctly (see `Imports`_). For example: ::

import ckan.lib.search as search

search.query_for(...)

2. Hide any locally defined helper functions: ::

def _a_useful_helper_function(x, y, z):
'''This function is not exposed because it is marked as private```
return x+y+z

3. Bring imported convenience functions into the module namespace as private
members: ::

_get_or_bust = logic.get_or_bust

Action Function Docstrings
``````````````````````````

See :ref:`Action API Docstrings`.

``get_action()``
````````````````

Don't call ``logic.action`` functions directly, instead use ``get_action()``.
This allows plugins to override action functions using the ``IActions`` plugin
interface. For example::

ckan.logic.get_action('group_activity_list_html')(...)

Instead of ::

ckan.logic.action.get.group_activity_list_html(...)
Validation and ``ckan.logic.schema``
````````````````````````````````````

Logic action functions can use schema defined in ``ckan.logic.schema`` to
validate the contents of the ``data_dict`` parameters that users pass to them.

``ckan.lib``
------------
An action function should first check for a custom schema provided in the
context, and failing that should retrieve its default schema directly, and
then call ``_validate()`` to validate and convert the data. For example, here
is the validation code from the ``user_create()`` action function::

Code in ``ckan.lib`` should not access ``ckan.model`` directly, it should go
through the action functions in ``ckan.logic.action`` instead.
schema = context.get('schema') or ckan.logic.schema.default_user_schema()
session = context['session']
validated_data_dict, errors = _validate(data_dict, schema, context)
if errors:
session.rollback()
raise ValidationError(errors)


Controller & Template Helper Functions
Expand Down
102 changes: 102 additions & 0 deletions doc/migration.rst
@@ -0,0 +1,102 @@
Creating a new migration script
```````````````````````````````

When changes are made to the model classes in ``ckan.model`` that alter CKAN's
database schema, a migration script has to be added to migrate old CKAN
databases to the new database schema when they upgrade their copies of CKAN.
These migration scripts are kept in ``ckan.migration.versions``.

When you upgrade a CKAN instance, as part of the upgrade process you run any
necessary migration scripts with the ``paster db upgrade`` command::

paster --plugin=ckan db upgrade --config={.ini file}

A migration script should be checked into CKAN at the same time as the model
changes it is related to. Before pushing the changes, ensure the tests pass
when running against the migrated model, which requires the
``--ckan-migration`` setting.

To create a new migration script, create a python file in
``ckan/migration/versions/`` and name it with a prefix numbered one higher than
the previous one and some words describing the change.

You need to use the special engine provided by the SqlAlchemy Migrate. Here is
the standard header for your migrate script: ::

from sqlalchemy import *
from migrate import *

The migration operations go in the upgrade function: ::

def upgrade(migrate_engine):
metadata = MetaData()
metadata.bind = migrate_engine

The following process should be followed when doing a migration. This process
is here to make the process easier and to validate if any mistakes have been
made:

1. Get a dump of the database schema before you add your new migrate scripts. ::

paster --plugin=ckan db clean --config={.ini file}
paster --plugin=ckan db upgrade --config={.ini file}
pg_dump -h host -s -f old.sql dbname

2. Get a dump of the database as you have specified it in the model. ::

paster --plugin=ckan db clean --config={.ini file}

#this makes the database as defined in the model
paster --plugin=ckan db create-from-model -config={.ini file}
pg_dump -h host -s -f new.sql dbname

3. Get agpdiff (apt-get it). It produces sql it thinks that you need to run on
the database in order to get it to the updated schema. ::

apgdiff old.sql new.sql > upgrade.diff

(or if you don't want to install java use http://apgdiff.startnet.biz/diff_online.php)

4. The upgrade.diff file created will have all the changes needed in sql.
Delete the drop index lines as they are not created in the model.

5. Put the resulting sql in your migrate script, e.g. ::

migrate_engine.execute('''update table .........; update table ....''')

6. Do a dump again, then a diff again to see if the the only thing left are drop index statements.

7. run nosetests with ``--ckan-migration`` flag.

It's that simple. Well almost.

* If you are doing any table/field renaming adding that to your new migrate
script first and use this as a base for your diff (i.e add a migrate script
with these renaming before 1). This way the resulting sql won't try to drop and
recreate the field/table!

* It sometimes drops the foreign key constraints in the wrong order causing an
error so you may need to rearrange the order in the resulting upgrade.diff.

* If you need to do any data transfer in the migrations then do it between the
dropping of the constraints and adding of new ones.

* May need to add some tests if you are doing data migrations.

An example of a script doing it this way is ``034_resource_group_table.py``.
This script copies the definitions of the original tables in order to do the
renaming the tables/fields.

In order to do some basic data migration testing extra assertions should be
added to the migration script. Examples of this can also be found in
``034_resource_group_table.py`` for example.

This statement is run at the top of the migration script to get the count of
rows: ::

package_count = migrate_engine.execute('''select count(*) from package''').first()[0]

And the following is run after to make sure that row count is the same: ::

resource_group_after = migrate_engine.execute('''select count(*) from resource_group''').first()[0]
assert resource_group_after == package_count

0 comments on commit 520ccc6

Please sign in to comment.