Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Export bagit #27

Merged
merged 7 commits into from Jul 12, 2017
Merged

Export bagit #27

merged 7 commits into from Jul 12, 2017

Conversation

remileduc
Copy link
Member

Integrate the work from @JavierDelgadoFernandez into the sipstore, in #10 (close #10).
refactor the code so it fits the new philosophy: the sipstore does nothing, it is managed by invenio-archivematica which create the exports of the SIPs. Thus, no need for task or whatever.

Also made the archiver more generic: there is a base class so we can create whatever export we want.

IMPORTANT NOTE This PR should be merged AFTER #26 as it depends on it.
If you look at the changelog, you'll see the diff of both PR. To see just the changes with #26, see https://github.com/remileduc/invenio-sipstore/pull/1/files

@remileduc remileduc force-pushed the export_bagit branch 4 times, most recently from c20a4c1 to 34dd3ee Compare July 4, 2017 10:01
@@ -71,24 +65,35 @@ class SIP(db.Model, Timestamp):
agent = db.Column(JSONType, default=lambda: dict(), nullable=False)
"""Agent information regarding given SIP."""

archivable = db.Column(
db.Boolean(name='ck_sipstore_archivable'),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Constraint name in sqlite3:
CONSTRAINT ck_sipstore_sip_ck_sipstore_archivable CHECK (archivable IN (0, 1)),

db.ForeignKey(SIP.id, name='fk_sipmetadata_sip_id'))
"""Id of SIP."""

format = db.Column(db.String(7), nullable=False)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this be longer string, given that 7 char string would save us anything w.r.t. to implementation in the DB?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@@ -155,6 +175,31 @@ def validate_key(self, filepath, filepath_):
"""Relation to the SIP along which given file was submitted."""


class SIPMetadata(db.Model, Timestamp):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this object meant to store different metadata formats per sip, or should we also use it to store metadata edits on the record? If so, how do we keep track of versions

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't change the content here, just added a way to be able to store multiple metadata. @lnielsen may have an answer?

:return: a dict with final relative path as keys and content as value.
:rtype: dict
"""
def get_extention(format):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

get_extention -> get_extension

@@ -71,24 +65,35 @@ class SIP(db.Model, Timestamp):
agent = db.Column(JSONType, default=lambda: dict(), nullable=False)
"""Agent information regarding given SIP."""

archivable = db.Column(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is flag archivable needed? Should this be a state = {new, non_archivable, archived}

:param bool create_sip_files: If True the SIPFiles will be created.
:returns: RecordSIP object.
:rtype: :py:class:`invenio_sipstore.api.RecordSIP`
"""
files = record.files if create_sip_files else None
metadata = {'json': json.dumps(record.dumps())}
mtype = SIPMetadataType.get_from_schema(record['$schema'])
metadata = {mtype.name: json.dumps(record.dumps())}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO: use the future "filename" field from the metadatatype table instead of mtype.name

Copy link
Member

@slint slint Jul 11, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe some other combination of fields would be more intuitive (technically filename is used as a slug/tag). Proposals for example Invenio JSON Record Metadata v1.0.0 - invenio-record-json-v1.0.0:

  1. name - tag/slug/code
  2. title/description - name

"""ID of the SIPMetadataType object."""

name = db.Column(db.String(255), nullable=False, unique=True)
"""The name of type of metadata (i.e. 'zenodo-json-1.0.0')."""
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should probably make it zenodo-agnostic. ...(i.e. `invenio-record-v1.0.0')

"""

schema = db.Column(db.String(1024), nullable=True, unique=True)
"""Path to a schema that describes the metadata (json or xml schema)."""
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can probably say "URI to a schema..."

* Adds a celery task that generates a BagIt file with the SIP content.

Signed-off-by: Javier Delgado <javier.delgado.fernandez@cern.ch>
- added class `SIPMetadata` to be able to have multiple metadata
- added attribute `archive` (boolean) to say if the content should
  be archived or not
- updated the model views to integrate the new fields
- added the model view for SIPMetadata with links to SIP
- add a signal when a SIP is created from the API
- 2 API classes: SIP and RecordSIP to manage the models
  based on Zenodo:
  https://github.com/zenodo/zenodo/blob/master/zenodo/modules/sipstore/api.py
- add a function to automatically find the current storage
  location of a SIPFile
- add a config variable to generate agent of the SIPs
- updated tests
- base class for archivers
- refactor of BagItArchiver
- Added SIPMetadataType for the type of metadata.
  This class describes the format of metadata, with an eventual
  schema to validate it if it exists.
- refactor the code to integrate this change
* Removes unique constraint from SIPMetadataType.title and sets it on
  SIPMetdataType.name.

Signed-off-by: Alexander Ioannidis <a.ioannidis@cern.ch>
@slint
Copy link
Member

slint commented Jul 12, 2017

@krzysztof Can you take a last look and merge?

@krzysztof krzysztof merged commit d79aacf into inveniosoftware:master Jul 12, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants