Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue1031 input sample mass #1109

Merged
merged 8 commits into from
Sep 15, 2023
Merged

Issue1031 input sample mass #1109

merged 8 commits into from
Sep 15, 2023

Conversation

brynnz22
Copy link
Contributor

@brynnz22 brynnz22 commented Sep 12, 2023

Issue: #1031

Updated the sample_mass slot to be called input_mass as it is being used in the Extraction class (a process) in order to clarify if the mass is consumed or generated. This process involved:

  • updating the nmdc.yaml file to change slot name and add additional annotations.
  • updating the neon_nmdc_term_mapping.tsv file to change the mapping to input_mass. Changed from sample_mass
  • updating the valid extraction_set example data files to validate with the newly named input_mass slot
  • Including a new invalid extraction_set sample data file with the old slot name of sample_mass
  • Updating the migration_recursion.py file to include a function for migrating the extraction_set collections to change the slot name. Note that the function uses the word NEXT in the name to specify the next version number, which will need to be updated when we know the release version.
  • Copy and paste Eric's jupyter notebook in the nmdc-runtime repo and add migration changes to this migration. See PR Implement v7.8.0 to v8.0.0 migration notebook and refactor notebook folder nmdc-runtime#305

@brynnz22
Copy link
Contributor Author

@sujaypatil96 @eecavanna

@turbomam
Copy link
Member

looking now

@turbomam
Copy link
Member

It's a little unusual (but not a big problem per se) to request reviews on a draft PR.

@brynnz22 brynnz22 marked this pull request as ready for review September 13, 2023 16:38
@brynnz22
Copy link
Contributor Author

Oops! Forgot to make it official.

@turbomam
Copy link
Member

turbomam commented Sep 13, 2023

make squeaky-clean all test gives one relevant warning:

WARNING:root:Unrecognized prefix: AFO

I see that you are asserting AFO:/result#AFR_0001982 as an exact_mappings for input_mass

AFO = CurieNamespace('AFO', 'http://example.org/UNKNOWN/AFO/') appears in a couple of places now like nmdc.py, but I can't tell whether that expansion is something you asserted or something that was autogenerated.

AFO:/result#AFR_0001982 is a pretty unusual CURIe syntax too. I'm not familiar with that ontology either. So I'm going to look into that.

@brynnz22
Copy link
Contributor Author

Hmm... I think that must have been autogenerated. I just added the mapping to the nmdc.py file.

@brynnz22
Copy link
Contributor Author

Do I fix this by adding AFO as a prefix?

@turbomam
Copy link
Member

OK, you can't edit nmdc.py. It's autogenerated from nmdc.yaml and it's inputs on every build.

@turbomam
Copy link
Member

Do I fix this by adding AFO as a prefix?

Yes, but lte's reassess whether we even want to use that mapping. One should always be hesitant to start using a new namespace.

@turbomam
Copy link
Member

https://bioportal.bioontology.org/ontologies/AFO

Allotrope is a high profile organization and the say that they are based on the Basic Formal Ontology (which is good), but I have no familiarity with their ontology at all.

@turbomam
Copy link
Member

turbomam commented Sep 13, 2023

http://purl.allotrope.org/ontologies/result#AFR_0001982 is the full URI for that term

Here's the OLS page if you don't already have it bookmarked

Their axioms look pretty good to me

If we decide to use this mapping and all of their URIs start with http://purl.allotrope.org/ontologies/result#AFR_, then I would suggest making the prefix assertion

AFO: http://purl.allotrope.org/ontologies/result#AFR_

so that the terms' CURIe will be (in our environment at least) AFO:0001982

@turbomam
Copy link
Member

Otherwise this looks pretty good to me. I'm going to run the migration now as part of the PR assessment. That's doesn't mean that I'm going to share the results with anybody but you. I.e. no intention to load this into any MongoDB yet.

@turbomam
Copy link
Member

make rdf reiterated

WARNING:root:Unrecognized prefix: AFO

but also reported several Study.award_dois patterns violations. If we talk our way through the AFO term mapping and address the DOI pattern violations (in a separate issue?), then I'm 99%+ sure that your PR will be mergable.

✗ 'doi:10.255851487763' does not match '^doi:10.\d{2,9}/.$' in $.study_set[0].award_dois[0]
✗ 'doi:10.46936jejc.proj.2014.4848360005501' does not match '^doi:10.\d{2,9}/.
$' in $.study_set[0].award_dois[1]
✗ 'doi:10.255851488099' does not match '^doi:10.\d{2,9}/.$' in $.study_set[1].award_dois[0]
✗ 'doi:10.46936sthm.proj.2016.4927960005943' does not match '^doi:10.\d{2,9}/.
$' in $.study_set[1].award_dois[1]
✗ 'doi:10.255851487765' does not match '^doi:10.\d{2,9}/.$' in $.study_set[2].award_dois[0]
✗ 'doi:10.46936jejc.proj.2014.4847360005497' does not match '^doi:10.\d{2,9}/.
$' in $.study_set[2].award_dois[1]
✗ 'doi:10.255851488160' does not match '^doi:10.\d{2,9}/.$' in $.study_set[4].award_dois[0]
✗ 'doi:10.46936sthm.proj.2017.4980460006181' does not match '^doi:10.\d{2,9}/.
$' in $.study_set[4].award_dois[1]
✗ 'doi:10.4693610.2558560001061' does not match '^doi:10.\d{2,9}/.$' in $.study_set[5].award_dois[0]
✗ 'doi:10.4693610.2558560001198' does not match '^doi:10.\d{2,9}/.
$' in $.study_set[6].award_dois[0]
✗ 'doi:10.4693610.2558560001289' does not match '^doi:10.\d{2,9}/.$' in $.study_set[7].award_dois[0]
✗ 'doi:10.46936sarr.proj.2018.5029860000034' does not match '^doi:10.\d{2,9}/.
$' in $.study_set[7].award_dois[1]
✗ 'doi:10.46936cpcy.proj.2019.5118060006718' does not match '^doi:10.\d{2,9}/.$' in $.study_set[7].award_dois[2]
✗ 'doi:10.255851488224' does not match '^doi:10.\d{2,9}/.
$' in $.study_set[8].award_dois[0]
✗ 'doi:10.46936fics.proj.2017.4999160006232' does not match '^doi:10.\d{2,9}/.$' in $.study_set[8].award_dois[1]
✗ 'doi:10.255851488096' does not match '^doi:10.\d{2,9}/.
$' in $.study_set[9].award_dois[0]
✗ 'doi:10.4693610.2558560000762' does not match '^doi:10.\d{2,9}/.$' in $.study_set[10].award_dois[0]
✗ 'doi:10.4693610.2558560000017' does not match '^doi:10.\d{2,9}/.
$' in $.study_set[11].award_dois[0]
✗ 'doi:10.46936intm.proj.2021.6014160000423' does not match '^doi:10.\d{2,9}/.$' in $.study_set[12].award_dois[0]
✗ 'doi:10.46936sarr.proj.2018.5026760000030' does not match '^doi:10.\d{2,9}/.
$' in $.study_set[16].award_dois[0]
✗ 'doi:10.46936lser.proj.2019.5071860006575' does not match '^doi:10.\d{2,9}/.*$' in $.study_set[17].award_dois[0]

@brynnz22
Copy link
Contributor Author

Sounds good. I guess I am wondering why we would not want to add the AFO mapping. Doesn't it make the schema more interoperable? What are the down sides to adding it?

Its strange because I did not get the study award violations yesterday when I ran it. I know there was a change sheet happening with the award dois, so maybe that went through?

@turbomam
Copy link
Member

turbomam commented Sep 14, 2023

I've given input_mass's AFO mapping more thought and have done more research into our use of mappings and prefix expansions.

I'm going to do some writing about how to navigate this kind of thing and you can contribute your insights and preferences. You are correct that coverage and interoperability are important goals. But we do have to ask ourselves: do we have a goal of being interoperable with AFO?

Please remove the AFO mapping. I created #1113 so we wouldn't forget about it

  • We don't have any holistic experience with AFO
  • At least in the OLS, AFO uses an unusual CURIe <-> URI conversion pattern. The presence of a / and a # in AFO:/result#AFR_0001982 is highly degenerate.
  • You also provided a mapping from input_mass to a MS:1000004. @cmungall and I have more familiarity and trust with MS.

@brynnz22
Copy link
Contributor Author

This sounds good. I'll remove it now.

@turbomam
Copy link
Member

I approve this PR but we shouldn't merge until an apparently new problem validating award_dois in MongoDB has been solved.

@brynnz22
Copy link
Contributor Author

brynnz22 commented Sep 14, 2023

@turbomam I could not find these dois mongoDb and took a close look at the before and after running the migration_recursion.py script and it looks like the normalize_curie removed any / from the curies rendering the above award_dois incorrect. I added a / to the regex pattern in the function and I believe it all should pass now.

@turbomam turbomam merged commit 699a9e4 into main Sep 15, 2023
2 checks passed
@turbomam turbomam deleted the issue1031-input_sample_mass branch September 15, 2023 17:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants