Document the data model #20

RKrahl · 2020-02-03T09:22:13Z

Reformat and sanitize the data model documentation.

Changes for now:

Reformat description with tables.
Made the relationships explicit including their cardinality.
Remove the Role class. It was not in the diagram in the confluence page and I'm not sure what this was supposed to be.
Add the missing description of the Sample class.
Add several properties missing in the description.
Sanitize identifiers with a clear distinction between (internal, ephemeral) ids and (persistent) pids. Ref. also discussion in Document API calls #19 concerning ids.
Various cleanup: Affiliation.name mandatory, Renamed Person.name and Person.surname to less ambiguous Person.givenName and Person.familyName and made them mandatory, some comments.

There are several other issues that need discussion:

Parameter values: we also have string values in some parameters, not only numbers. How to deal with that?
The relation from Dataset to Instrument is restricted to only one Instrument. We have more then one Instrument involved in a single measurement. At BESSY II, we need to distinguish between beam lines and experimental stations, as the latter may be moved between beam lines.
The relations from Dataset to Technique and Sample: the dataset is supposed to be "information about an experimental run". Does it make sense for a single experimental run to have several Samples and experimental Techniques?
If we follow the argument from the last bullet, we might drop the Technique class and make that a property of Dataset.
Why is there a relationship between Document and Parameter? What are document parameters supposed to be?
There are several unclear or dubious properties:
- what is Person.publication supposed to be and what is the value to have that property?
- what is Document.internal supposed to be and what is the value to have that property?
- what is Dataset.isPublic supposed to be and what is the value to have that property?
- what is Document.type?
- what is the value to have File.path? It only makes sense in the context of a given file system. But we do not have any file system at the level of the federated catalogue.
Parameter.units: mandatory where applicable doesn't really make sense from an API specification point of view. Either it must be present, then it's mandatory or it is not mandatory.
We might want to add a keywords property to Document, either a list of strings or a string of comma separated words.

content where possible. Changes: - Made the relationships explicit including cardinality. - Remove Role. Note sure what it supposed to mean. - Add missing Sample. - Added several missing properties.

related object may be included in instances of this object in the return of API calls.

RKrahl · 2020-02-03T12:16:58Z

I need to add two items after internal discussion:

Yes, it does make sense to have several Techniques for a single Dataset. So I withdraw my question from above in that point.
We would also need sample parameters, e.g. a relationship between Sample and Parameter.

garethcmurphy · 2020-02-05T05:44:15Z

Hi Rolf,

Thank you for your changes! I have the following responses to your bullet points

Parameter with strings - We could add a Parameter.type, string or numerical. The primary use case envisaged (until your comment) is for numerical values
Dataset to Instrument - this could be made one-to-many
Document parameters, an example could be a proposal/Document which studies a particular wavelength, then the parameter would be relevant to all datasets contained in the proposal. Could also be a summary, i.e. min/max, average value over the datasets
Document.internal - boolean flag for internal-only proposals, e.g. calibration runs
Document.type - would be either {proposal, paper} i.e. a full proposal published atfer embargo expiry or a subset selected by the journal author article.
File.path - for logged in users at a PaN institute to access data via jupyter, we must provide them with a path.
Parameter.units - units should be mandatory except for dimensionless quantities
Keywords - should be added as a list of strings

RKrahl added 4 commits January 31, 2020 14:52

Reformat the data model documentation, trying to faithfully retain the

c04efe5

content where possible. Changes: - Made the relationships explicit including cardinality. - Remove Role. Note sure what it supposed to mean. - Add missing Sample. - Added several missing properties.

Consistent use of identifiers.

1cc5443

Remove the definition of the filed in relationships, unless the

403dbdd

related object may be included in instances of this object in the return of API calls.

Various cleanup.

047f4b0

RKrahl mentioned this pull request Feb 3, 2020

Document the search API #21

Closed

garethcmurphy marked this pull request as ready for review February 5, 2020 11:42

garethcmurphy merged commit 0086817 into panosc-eu:master Feb 5, 2020

RKrahl deleted the doc-data branch February 5, 2020 12:12

RKrahl mentioned this pull request Feb 19, 2020

Change relation from dataset to instrument to many-to-many. #32

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Document the data model #20

Document the data model #20

RKrahl commented Feb 3, 2020

RKrahl commented Feb 3, 2020

garethcmurphy commented Feb 5, 2020

Document the data model #20

Document the data model #20

Conversation

RKrahl commented Feb 3, 2020

RKrahl commented Feb 3, 2020

garethcmurphy commented Feb 5, 2020