Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Document the data model #20

Merged
merged 4 commits into from
Feb 5, 2020
Merged

Document the data model #20

merged 4 commits into from
Feb 5, 2020

Conversation

RKrahl
Copy link
Contributor

@RKrahl RKrahl commented Feb 3, 2020

Reformat and sanitize the data model documentation.

Changes for now:

  • Reformat description with tables.
  • Made the relationships explicit including their cardinality.
  • Remove the Role class. It was not in the diagram in the confluence page and I'm not sure what this was supposed to be.
  • Add the missing description of the Sample class.
  • Add several properties missing in the description.
  • Sanitize identifiers with a clear distinction between (internal, ephemeral) ids and (persistent) pids. Ref. also discussion in Document API calls #19 concerning ids.
  • Various cleanup: Affiliation.name mandatory, Renamed Person.name and Person.surname to less ambiguous Person.givenName and Person.familyName and made them mandatory, some comments.

There are several other issues that need discussion:

  • Parameter values: we also have string values in some parameters, not only numbers. How to deal with that?
  • The relation from Dataset to Instrument is restricted to only one Instrument. We have more then one Instrument involved in a single measurement. At BESSY II, we need to distinguish between beam lines and experimental stations, as the latter may be moved between beam lines.
  • The relations from Dataset to Technique and Sample: the dataset is supposed to be "information about an experimental run". Does it make sense for a single experimental run to have several Samples and experimental Techniques?
  • If we follow the argument from the last bullet, we might drop the Technique class and make that a property of Dataset.
  • Why is there a relationship between Document and Parameter? What are document parameters supposed to be?
  • There are several unclear or dubious properties:
    • what is Person.publication supposed to be and what is the value to have that property?
    • what is Document.internal supposed to be and what is the value to have that property?
    • what is Dataset.isPublic supposed to be and what is the value to have that property?
    • what is Document.type?
    • what is the value to have File.path? It only makes sense in the context of a given file system. But we do not have any file system at the level of the federated catalogue.
  • Parameter.units: mandatory where applicable doesn't really make sense from an API specification point of view. Either it must be present, then it's mandatory or it is not mandatory.
  • We might want to add a keywords property to Document, either a list of strings or a string of comma separated words.

content where possible.

Changes:
- Made the relationships explicit including cardinality.
- Remove Role.  Note sure what it supposed to mean.
- Add missing Sample.
- Added several missing properties.
related object may be included in instances of this object in the
return of API calls.
@RKrahl RKrahl mentioned this pull request Feb 3, 2020
@RKrahl
Copy link
Contributor Author

RKrahl commented Feb 3, 2020

I need to add two items after internal discussion:

  • Yes, it does make sense to have several Techniques for a single Dataset. So I withdraw my question from above in that point.
  • We would also need sample parameters, e.g. a relationship between Sample and Parameter.

@garethcmurphy
Copy link
Contributor

Hi Rolf,

Thank you for your changes! I have the following responses to your bullet points

  • Parameter with strings - We could add a Parameter.type, string or numerical. The primary use case envisaged (until your comment) is for numerical values
  • Dataset to Instrument - this could be made one-to-many
  • Document parameters, an example could be a proposal/Document which studies a particular wavelength, then the parameter would be relevant to all datasets contained in the proposal. Could also be a summary, i.e. min/max, average value over the datasets
  • Document.internal - boolean flag for internal-only proposals, e.g. calibration runs
  • Document.type - would be either {proposal, paper} i.e. a full proposal published atfer embargo expiry or a subset selected by the journal author article.
  • File.path - for logged in users at a PaN institute to access data via jupyter, we must provide them with a path.
  • Parameter.units - units should be mandatory except for dimensionless quantities
  • Keywords - should be added as a list of strings

@garethcmurphy garethcmurphy marked this pull request as ready for review February 5, 2020 11:42
@garethcmurphy garethcmurphy merged commit 0086817 into panosc-eu:master Feb 5, 2020
@RKrahl RKrahl deleted the doc-data branch February 5, 2020 12:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants