-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add MassSpectrometryConfiguration and Configuration classes #130
Conversation
and has_configuration slot
Presumably we will need a |
src/schema/nmdc.yaml
Outdated
Configuration: | ||
abstract: true | ||
class_uri: nmdc:Configuration | ||
description: A platform that enables the user to configure the appearance, actions, and other usage preferences on a process. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In what way is a configuration a "platform"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally we were thinking a computer platform. We struggled with making this (and associated) descriptions and are open to other suggestions. Maybe "A framework that enables...."?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've updated the definition of the Configuration
class to "A programmable set of parameters that define the actions of a process and is shared among multiple instances of the process.". I think that addresses some of the outstanding issues regarding understanding when we would use this class to hold configuration information.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the case of MassSpectrometryConfiguration
, this would be 1:1 with a method file that we use to run the mass spectrometer.
src/schema/nmdc.yaml
Outdated
@@ -686,6 +704,12 @@ slots: | |||
range: FluidHandling | |||
description: how a processed sample is introduced into a mass spectrometer, for example liquid chromatography or direct infusion through a syringe | |||
|
|||
has_configuration: | |||
any_of: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why would the range be Configuration
in some cases and DataObject
in other cases?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should go to great lengths to avoid any_of
ranges
. They are complex to reason over and diagram. That may change in the future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@brynnz22 and I were thinking ahead to accommodate workflow configuration files (DataObjects
) that will be associated with a has_configuration
slot on WorkflowExcecution
class. See microbiomedata#1912 .
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would like to make sure it's very clear when we
- embed configuration-like data in an instance of a
PlannedProcess
- associate configuration-like data with a process via a
DataObject
, which is a pointer to a configuration file (but doesn't actually manifest the settings in the instance) - embed configuration-like data in instances of the new
Configuration
class
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A review of those configuration embedding/linking options may shine some additional light on the idea of a configurations as platforms
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry I do not quite understand this.
I would like to make sure it's very clear we we
- embed configuration-like data in an instance of a
PlannedProcess
- associate configuration-like data with a process via a
DataObject
, which is a pointer to a configuration file (but doesn't actually manifest the settings in the instance)- embed configuration-like data in instances of the new
Configuration
class
Are there actions we can take in the PR to meet this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since this PR adds a new way of capturing configuration, it is required that this PR includes documentation (or additional schema constraints) on on how it relates to existing ways of capturing configuration information.
We have a few following issues related to configurations (microbiomedata#1912 and microbiomedata#1920), and we'll plan to make a configuration_set with one of those, would that work? |
@cmungall @sierra-moxon and I just started discussing this today. This PR has a lot of good features, like a small size, inclusion of examples and the abstract We can help more if
|
If it's currently necessary for processes to indicate that they were configured by either a |
We will also have to think about the life cycle of |
also make a better definition for Configuration and related slots
I think this is outside of the scope of the modeling at this moment, but |
expand comments on Configuration and similar
I've made the following updates after PRs #142 and #141
@turbomam, this is ready for re-review. @SamuelPurvine, please take a look, especially the example file https://github.com/microbiomedata/berkeley-schema-fy24/blob/mass_spec_config/src/data/valid/Database-mass-spectrometry.yaml which holds two example |
@SamuelPurvine gave me his review via teams, I've incorporated these changes into the https://github.com/microbiomedata/berkeley-schema-fy24/blob/mass_spec_config/src/data/valid/Database-mass-spectrometry.yaml. |
has_output: | ||
- nmdc:dobj-00-9n9n9n | ||
has_configuration: nmdc:mscon-99-oW43DzG0 | ||
eluent_introduction: nmdc:cspro-99-hello00 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do you know if we have any example data file that populate an eluant introduction instance? Is that still going to be relevant after you do some chromatography configuration modeling?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this will still be relevant. I will provide an extend these MassSpectrometry examples alongside the parallel work I've been doing for the ChromatographicSeparationConfiguration
work.
@@ -727,6 +754,10 @@ slots: | |||
range: FluidHandling | |||
description: how a processed sample is introduced into a mass spectrometer, for example liquid chromatography or direct infusion through a syringe | |||
|
|||
has_configuration: | |||
range: Configuration | |||
description: a set of parameters that define the actions of the process |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am going to merge this but make a PR afterwards, with you as an approver, to change this description. As a matter of principle, we can't use near-identical descriptions for has_configuration
and and Configuration
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good, thanks. I could definitely use support in proper/ontologically correct descriptions.
This PR addresses microbiomedata#1913.
As currently modeled, each
MassSpectrometry
instance (currently applicable to about 2000 process samples), correspond to only 4 configurations in which the mass spectrometry instrument was actually used. We propose to have aMassSpectrometryConfiguration
class that houses the configuration information which would 1) greatly reduce repeating data onMassSpectrometry
instances and 2) allow for workflows to be configured based onMassSpectrometryConfiguration
instance and 3) allow users to query samples that were run with similar configurations for downstream comparisons.This would move several slots from
MassSpectrometry
toMassSpectrometryConfiguration
and associate the configuration through a newhas_configuration
slot (see below for diagram).Note that all slots inherited from
![Screenshot 2024-04-12 at 12 17 42 PM](https://private-user-images.githubusercontent.com/10502759/322115448-3f4485c2-f4c6-432e-b96b-3e206272bf63.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MjI0Mjg4OTEsIm5iZiI6MTcyMjQyODU5MSwicGF0aCI6Ii8xMDUwMjc1OS8zMjIxMTU0NDgtM2Y0NDg1YzItZjRjNi00MzJlLWI5NmItM2UyMDYyNzJiZjYzLnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNDA3MzElMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjQwNzMxVDEyMjMxMVomWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPTBlNDI4ZGZkZGFkYmViNzYxYTljYzM3MWQ2Yzg5NzdjYjVkYjFmOGRiNzJiMmI1OGQ4YzViMGMzN2YwYjNiMWYmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0JmFjdG9yX2lkPTAma2V5X2lkPTAmcmVwb19pZD0wIn0.0bUfuXAdusg4ewPGWM1Uz9epKZ0Ni0A35_iGE2GFAh8)
PlannedProcess
(like start_date, has_input) will remain onMassSpectrometry
This same configuration abstraction pertains to
ChromatographicSeparationProcess
(see microbiomedata#1920), so we've also created an abstractConfiguration
parent class.