Skip to content
This repository has been archived by the owner on May 30, 2022. It is now read-only.

Bump ord-schema version #56

Merged
merged 58 commits into from Dec 9, 2020
Merged

Bump ord-schema version #56

merged 58 commits into from Dec 9, 2020

Conversation

skearnes
Copy link
Contributor

@skearnes skearnes commented Nov 24, 2020

Migration to support proto changes in open-reaction-database/ord-schema#496 (reaction.proto diff)

Tasks:

  • Remove Features from Compound (replaced with Data)
  • Rename ReactionAnalysis -> Analysis
  • Rename Reaction.workup -> Reaction.workups
  • Support the new Amount message
  • Support the new Source message
  • Update Analysis fields
  • Add ProductMeasurement to Product and remove deprecated fields
  • Update test data
  • Fix editor tests
  • Test this PR extensively
  • Migrate user data

Note that this version of ord-schema includes the old reaction.proto as reaction_old.proto to aid in migrating existing data.

@skearnes skearnes marked this pull request as ready for review November 28, 2020 19:34
@skearnes
Copy link
Contributor Author

@connorcoley @antonkast-google @n8kim1 Tests are passing. PTAL and test the GUI.

@skearnes
Copy link
Contributor Author

skearnes commented Dec 8, 2020

Thanks @connorcoley for all the good fixes. PTAL.

Copy link
Contributor

@connorcoley connorcoley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Workup "Input"s have huge font
  • Opening a new reaction, adding an outcome product, and using Ketcher causes one POST error to be printed in the console upon opening the window (reaction.js:51558 POST http://localhost:5000/ketcher/molfile 404 (NOT FOUND)) and upon hitting "done", another TypeError (reaction.js:54389 Uncaught TypeError: Cannot read property 'getElementsByTagName' of undefined). This seems to happen for all compound identifier fields, not just products.

@skearnes
Copy link
Contributor Author

skearnes commented Dec 9, 2020

Thanks!

  • Workup "Input"s have huge font

Done. Although note that compounds added later will have slightly mismatched heading sizes since we don't have access to them here. We could be fancy about this but it's not jarring like this was.

  • Opening a new reaction, adding an outcome product, and using Ketcher causes one POST error to be printed in the console upon opening the window (reaction.js:51558 POST http://localhost:5000/ketcher/molfile 404 (NOT FOUND)) and upon hitting "done", another TypeError (reaction.js:54389 Uncaught TypeError: Cannot read property 'getElementsByTagName' of undefined). This seems to happen for all compound identifier fields, not just products.

Good catch. Multiple issues were fixed here:

  1. Added a ReactionRole field to ProductCompound in the form (that was the source of the getElementsByTagName error).
  2. Change the 404 response from /ketcher/molfile to 204 since a new compound has no structural identifiers and AFAICT 404 is always logged to the console regardless of any extra handling that might happen in JS.

@connorcoley
Copy link
Contributor

Added a ReactionRole field to ProductCompound in the form (that was the source of the getElementsByTagName error).

Ah, that's very important!

  1. For aesthetics, I would suggest promoting "is_desired" to that upper-most level as well (above the identifiers).
  2. Perhaps the default reaction role for a product compound should be PRODUCT? Or do you think it's safer to keep it unspecified?
  3. After playing around with it a bit, I think that some of the custom hide/show for measurement types might be too restrictive. For example, "counts" for an LCMS analysis could be referring to an integrated mass spec peak at a specific retention time, so we'd still want the retention time field. I could also see value in having "identity" allow a retention time to be defined, although it's not essential I suppose. PURITY could theoretically be calculated from MS but I have never seen that (nor would I want to). I'm not sure if I'm being too cautious here. I am a huge fan of this kind of interactivity but I don't want us to miss any edge cases and have a user get frustrated because they can't define their data the way they want to. @michaelmaser ?

@skearnes
Copy link
Contributor Author

skearnes commented Dec 9, 2020

Added a ReactionRole field to ProductCompound in the form (that was the source of the getElementsByTagName error).

Ah, that's very important!

  1. For aesthetics, I would suggest promoting "is_desired" to that upper-most level as well (above the identifiers).

Done. I also made it so that this field is only visible if the role is PRODUCT (like we do with is_limiting for REACTANT)

  1. Perhaps the default reaction role for a product compound should be PRODUCT? Or do you think it's safer to keep it unspecified?

I sort of like having it unspecified so users see the full list and can consider INTERNAL_STANDARD, etc. Do you agree?

  1. After playing around with it a bit, I think that some of the custom hide/show for measurement types might be too restrictive. For example, "counts" for an LCMS analysis could be referring to an integrated mass spec peak at a specific retention time, so we'd still want the retention time field. I could also see value in having "identity" allow a retention time to be defined, although it's not essential I suppose. PURITY could theoretically be calculated from MS but I have never seen that (nor would I want to). I'm not sure if I'm being too cautious here. I am a huge fan of this kind of interactivity but I don't want us to miss any edge cases and have a user get frustrated because they can't define their data the way they want to. @michaelmaser ?

I broadened this a bit (added retention_time to IDENTITY and COUNTS). I agree that we should err on the side of showing more fields if they might be used. Also added some vertical spacing between sections.

Copy link
Contributor

@connorcoley connorcoley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it looks good enough to me to merge and start the data migration. I'd also be okay waiting until @michaelmaser has had a chance to stress test it a bit more, just in case both of us missed something obvious.

@michaelmaser
Copy link

  1. Perhaps the default reaction role for a product compound should be PRODUCT? Or do you think it's safer to keep it unspecified?

I sort of like having it unspecified so users see the full list and can consider INTERNAL_STANDARD, etc. Do you agree?

I agree. I think it could help prevent unintended species from being added to the product list/reaction SMILES.

  1. After playing around with it a bit, I think that some of the custom hide/show for measurement types might be too restrictive. For example, "counts" for an LCMS analysis could be referring to an integrated mass spec peak at a specific retention time, so we'd still want the retention time field. I could also see value in having "identity" allow a retention time to be defined, although it's not essential I suppose. PURITY could theoretically be calculated from MS but I have never seen that (nor would I want to). I'm not sure if I'm being too cautious here. I am a huge fan of this kind of interactivity but I don't want us to miss any edge cases and have a user get frustrated because they can't define their data the way they want to. @michaelmaser ?

I broadened this a bit (added retention_time to IDENTITY and COUNTS). I agree that we should err on the side of showing more fields if they might be used. Also added some vertical spacing between sections.

I agree with broadening flexibility to avoid restricting use. I agree with adding retention_time to both IDENTITY and COUNTS for the cases @connorcoley mentioned. I don't think there's a need for PURITY either.

I think it looks good enough to me to merge and start the data migration. I'd also be okay waiting until @michaelmaser has had a chance to stress test it a bit more, just in case both of us missed something obvious.

I took a look yesterday and had basically the same comments @connorcoley had, which have been addressed. I think it's okay to merge and start migration. Would minor adjustments later on be tolerable or require re-migration? Pardon my naivety, I am less familiar with the editor than the schema.

@skearnes
Copy link
Contributor Author

skearnes commented Dec 9, 2020

Thanks @michaelmaser

Would minor adjustments later on be tolerable or require re-migration? Pardon my naivety, I am less familiar with the editor than the schema.

Nope, the migration is only required for changes to the schema. One additional test we should do before deploying is making sure that the migrated data successfully round-trips through the editor; this will help us identify anything that we missed in updating the editor to match the new schema.

@skearnes skearnes merged commit 0f4ba4c into main Dec 9, 2020
@skearnes skearnes deleted the migration branch December 9, 2020 21:46
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants