Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JSON decoding/encoding, dumpers/loaders, extensions and system fields #225

Merged
merged 20 commits into from
Sep 16, 2020

Conversation

lnielsen
Copy link
Member

@lnielsen lnielsen commented Sep 3, 2020

  • Adds support for JSON encoding/decoding to/from the database. This allows e.g. have records with complex data types such as datetime objects. JSONSchema validation happens on the JSON encoded version of the record.

  • Adds dumpers to support dumping and loading records from secondary copies (e.g. records stored in an Elasticsearch index).

  • Adds support record extensions as a more strict replacement of signals. Allows writing extensions (like the system fields), that integrate into the Records API.

  • Adds support for system fields, that are Python data descriptors on the Record which allows for managed access to the Record's dictionary.

  • Adds support for disabling signals. Some signals can never have been used as they don't make sense (e.g. for revert()).

Backward compatibility note

  • All changes are optional add-ons. The Record class behaves as previously. If you have subclassed Record, and in particular if you have overwritten the validate() method, you may need to make slight changes to your record.

A TL;DR example:

# Don't try to run it :-) 
# - Imports missing 
# - Some classes are imaginary (e.g. GeoLocationsExt)

class DateEncoder:
    def __init__(self, *keys):
        self.keys = keys

    def encode(self, data):
        for k in self.keys:
            if k in data:
                data[k] = data[k].isoformat()

    def decode(self, data):
        for k in self.keys:
            if k in data:
                s = data[k]
                year, month, day = int(s[0:4]), int(s[5:7]), int(s[8:10])
                data[k] = date(year, month, date)

class MyRecordMetadata(RecordMetadataBase):
     __table_name__ = 'myrecord_metadata'
     encoder = DateEncoder('pubdate')
     expires_at = expires_at = db.Column(db.DateTime(), nullable=False)

class MyRecord(Record, SystemFieldsMixin):
    schema = ConstantField('$schema', 'https://.../records/record-v1.0.0.json')
    expires_at = ModelField()
    metadata = DictField(clear_none=True)
    model_cls = MyRecordMetadata
    send_signals = False
    enable_jsonref = False
    dumper = ElasticsearchDumper(
        extensions=[
            GeoLocationExt('locations')
        ],
        models_fields={
            'expires_at': ('expires_at', datetime),
        }
    )

# JSON Encoding
record = MyRecord.create({'pubdate': date(2020, 9, 7)})

# Dumping/loading
dump = MyRecord({}).dumps(dumper=ElasticsearchDumper())
MyRecord.loads(dump, loader=ElasticsearchDumper())

# System fields:
record = MyRecord({})
record.schema == record['$schema']

r = MyRecord.create({}, metadata={'title': 'a', 'a':None}, expires_at=datetime(2020,9,7,0,0))
r.expires_at == r.model.expires_at
r == {'metadata': {'title': 'a'}}

# Cleaning a record:
record.clear_none(key='metadata')

* Changes e.g. "pytest" to "python -m pytest". This ensures that all
  commands are using the version installed inside the virtualenv and not
  accidentally a globally installed version.
* Clean up package dependencies to align with coordinator packages
  released in Invenio v3.3.

* Bump version of dependent packages jsonref, jsonresolver, jsonschema
  and jsonpatch to use more recently released versions. Some of the
  minimal versions of packages where dating back to 2013 (7 years ago).
* Adds usage of docker-services-cli to have more reproducible ci tests.
@lnielsen lnielsen self-assigned this Sep 3, 2020
@lnielsen lnielsen changed the title WIP: Dumpers/loaders, encoding and more... JSON decoding/encoding, dumpers/loaders, extensions and system fields Sep 7, 2020
@lnielsen lnielsen marked this pull request as ready for review September 7, 2020 14:59
* Adds support for encoding/decoding the JSON data store in the
  database. This allow e.g. providing an encoder that converts
  ISO-formatted timestamps to Python datetime objects.

* Adds new hybrid SQLAlchemy property "is_deleted" on the model
  to determine if deleted or not. This makes it more clear that
  a null value in the json column means that a record has been
  soft-deleted.
* Adds a new feature to dump and load a record. This will be used by
  to e.g. harmonize access to records loaded via e.g. the database,
  Elasticsearch or third-party systems.
* System fields provides managed access to the record's dictionary.
  By default this behavior is not enabled on the records.

* Adds extension capabilities of the record API, so that it's possible
  to make e.g. a system fields extension that can hook into the record
  API. In many ways, this is similar to the signals, but provides a
  more strict and extensible API.
* Adds a method "clear_none()" which helps removing None and empty
  list/dict values from the dictionary.

* Adds a method "dict_lookup()" to support dot key string notation for
  looking up a key in a dictionary.
* Adds feature that allow system fields to take arguments from the
  initialization/creation of records. For instance,
  Record.create({}, metadata={...}).

* System fields now saves the class attribute name on the class they
  are installed. This allow using the name for the above feature.

* Changes create() method to initialize the database model before
  running the validation. This makes sense because the validation has to
  run on the JSON and not on the python dictionary.

* Fixes arguments that are passed the JSONSchema validator, as only the
  format_checker and cls parameters can be passed. We do this, because
  we want to use the keyword arguments for initializing system fields.

* Adds possibility to specify a default validator as was also possible
  with the format_checker. It does not makes sense to pass a validator
  in the create/commit methods.

* Fix a couple of bugs and improve test coverage.
* Adds possibility to not add a key name and instead use the attribute
  name as default.
* Adds support for automatically determine which data type a given
  model field should be dumped as. This avoids having to specify
  the dump type ModelField system fields.
* Fixes a bug in the ConstantField that would cause an error when an
  deleted record was loaded.
* Adds a new method to the API to undelete a record after it's been
  soft-deleted. This enables that you can again call record.commit(),
  and create a new version of the record.
@lnielsen
Copy link
Member Author

Merging as discussed in the architects meeting. Releasing as Invenio 1.4.0a1

@lnielsen lnielsen merged commit 019f4d9 into inveniosoftware:master Sep 16, 2020


class Record(RecordBase):
"""Define API for metadata creation and manipulation."""

model_cls = RecordMetadata
send_signals = True
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this also for backward compat? Otherwise I would personally try to abandon signals.

"""Called after a record is created."""

def pre_commit(self, record):
"""Called before a record is committed."""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not a post_commit? In ILS, we have a use case that we want to re-fetch book covers from the external provider after a record has been updated.
I guess any way that such use cases will be implemented in invenio-records-resources...


Uses the attribute name if the key is not defined.
"""
return self._key or self._attr_name
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can it be that both are None?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
No open projects
InvenioRDM September Board
  
Awaiting triage
Development

Successfully merging this pull request may close these issues.

None yet

2 participants