fix: Handle `anyOf` in JSON Schema property #379

eecavanna · 2023-11-13T19:58:25Z

Fixes #378

eecavanna · 2023-11-13T20:08:38Z

I've drafted this function, but haven't tested it yet. I'm trying to get unit tests or doctests running locally.

dwinston · 2023-11-13T20:14:55Z

Grep the codebase for additional uses of “$ref” in .py files. I think there are a few other instances that we want to convert to more robust access.

eecavanna · 2023-11-13T20:27:41Z

I will do that and add fixes to this PR.

Update: That's turning out to be a bit of a rabbit hole (since some things, and the things that use those things, were written under the assumption that a collection always maps to at most one class).

…llection

…tiple classes

…tions

eecavanna · 2023-11-14T01:22:25Z

@dwinston, I don't know how I can adapt this statement to work in the new "a collection can be associated with multiple classes" situation. It's within a 175-line function in the changesheet code.

nmdc-runtime/nmdc_runtime/api/core/metadata.py

Lines 171 to 172 in 1eacc43

    
           else: 
        
               class_name = class_name_dict[collection_name_to_class_name[collection_name]]

In this PR branch, I have arbitrarily chosen to use the first class in the list of classes. 🤷

        else:
            # FIXME: As of nmdc-schema v9.1.0 (maybe earlier), a collection name can map to multiple class names.
            #        In the case of multiple class names, which one do you want to assign to `class_name`?
            #        Here, I have arbitrarily chosen to use the first class name in the list.
            class_names = collection_name_to_class_names[collection_name]
            class_name = class_name_dict[class_names[0]]

Source: https://github.com/microbiomedata/nmdc-runtime/pull/379/files#diff-a981e2387918aa46e3575eddf8d3b3efd43dc78ca8be2a03f3b974e3887d6368R171-R176

…sses

eecavanna · 2023-11-14T02:51:07Z

@dwinston, I adapted this function and the associated search.html page, to work in the new "a collection can be associated with multiple classes" situation; with the caveat that the web page will ignore any classes after the first one.

nmdc-runtime/nmdc_runtime/api/endpoints/find.py

Lines 190 to 222 in 1eacc43

    
           def documentation_links(jsonschema_dict, collection_names): 
        
               rv = {"Activity": []} 
        
               for cn in collection_names: 
        
                   last_part = jsonschema_dict["$defs"]["Database"]["properties"][cn]["items"][ 
        
                       "$ref" 
        
                   ].split("/")[-1] 
        
                   entity_attrs = list(jsonschema_dict["$defs"][last_part]["properties"]) 
        
                   if last_part in ("Biosample", "Study", "DataObject"): 
        
                       assoc_path = [cn] 
        
                   else: 
        
                       assoc_path = ["activity_set", cn] 
        
                   rv = assoc_in( 
        
                       rv, 
        
                       assoc_path, 
        
                       { 
        
                           "collection_name": cn, 
        
                           "entity_url": "https://microbiomedata.github.io/nmdc-schema/" 
        
                           + last_part, 
        
                           "entity_name": last_part, 
        
                           "entity_attrs": sorted( 
        
                               [ 
        
                                   { 
        
                                       "url": f"https://microbiomedata.github.io/nmdc-schema/{a}", 
        
                                       "attr_name": a, 
        
                                   } 
        
                                   for a in entity_attrs 
        
                               ], 
        
                               key=itemgetter("attr_name"), 
        
                           ), 
        
                       }, 
        
                   ) 
        
               return rv

eecavanna · 2023-11-14T02:56:24Z

Hi @dwinston, I finished making all the changes to this branch that I expect to make. I want to know what you think about what I said in these two comments:

I'm OK with you making additional commits to this branch.

dwinston · 2023-11-15T15:42:01Z

thanks for tackling this, @eecavanna. I don't love the added complexity. I'd like to try and identify a more elegant solution than this band-aid, realizing that this bug is blocking folks, so if I can't identify a better route soon, I'll go with this (after careful review). Again, thank you.

…essing

unauthenticated. closes #385

* Consolidate workflows for building docker images and deploying to Spin into one workflow * Remove docker-build.sh in favor of letting GitHub Actions handle Docker build and push * Update Release Process doc with info about initiating via GitHub Releases * Replace Rancher-Action with generic HTTP call * Replace release event with tag push event, which is required for semver metadata * Remove unnecessary pr event * Add more release instructions

This reverts commit 1b2372d.

dwinston

I can't think of a more straightforward way to handle the appearance of one-to-many collection-to-class relationships, and I think we should shoot for straightforward for now.

* fix: Handle `anyOf` in JSON Schema property * Indicate function return value type * Refactor function and add comments * fix: Prefix class name with `nmdc:` * Implement helper function to process both single-ref and multi-ref specs * Document prefix functionality * Fix punctuation in comment * Update dictionary and function to accommodate multiple classes per collection * WIP: Update doc link maker to accommodate collections that map to multiple classes * Clarify variable names * Add comments in an attempt to clarify code * Delete commented-out code that doesn't accommodate multi-class collections * Add tests covering some corner cases * Fix inaccurate type hint * Clarify docstring * Replace reference to nonexistent dict and implement preliminary patch * Make the collection name bold on the search page * Update search page to account for collections mapping to multiple classes * Remove redundant type hints * style: black format * panic on no-type given * add script and api function * update script * Refactor runtime client methods to raise for status and parse and return results * handle omics processing records * update docstring * update to include correct prefix * update to use use new insdc_bioproject_identifiers slot on omics_processing * style: black format * add typecodes enpoint (#386) unauthenticated. closes #385 * update .gitpod.yml * add sshproxy.sh for nersc tunneling * update make cmd * add gitpod affordance * add gitpod dockerfile * update gitpod stuff * rename * update Makefile * fix * gitpod: pull dev mdb * fix * fix make target * Separate dev and production deployments in GitHub workflow (#382) * Consolidate workflows for building docker images and deploying to Spin into one workflow * Remove docker-build.sh in favor of letting GitHub Actions handle Docker build and push * Update Release Process doc with info about initiating via GitHub Releases * Replace Rancher-Action with generic HTTP call * Replace release event with tag push event, which is required for semver metadata * Remove unnecessary pr event * Add more release instructions * style: fix, and elaborate a bit * Revert stuff This reverts commit 1b2372d. * style: fix, and elaborate a bit --------- Co-authored-by: eecavanna <eecavanna@users.noreply.github.com> Co-authored-by: Donny Winston <donny@polyneme.xyz> Co-authored-by: Michael Thornton <mbthornton@lbl.gov> Co-authored-by: Donny Winston <dwinston@alum.mit.edu> Co-authored-by: Jing Cao <jingcao.me@gmail.com> Co-authored-by: Patrick Kalita <pkalita@lbl.gov>

* add stuff * Update json.dump() to json.dumps() in user.py * Mix in orcid jwt flow with client_credentials flow closes #333 * update .gitpod.yml * add sshproxy.sh for nersc tunneling * update make cmd * add gitpod affordance * add gitpod dockerfile * update gitpod stuff * rename * update Makefile * fix * gitpod: pull dev mdb * fix * fix make target * Separate dev and production deployments in GitHub workflow (#382) * Consolidate workflows for building docker images and deploying to Spin into one workflow * Remove docker-build.sh in favor of letting GitHub Actions handle Docker build and push * Update Release Process doc with info about initiating via GitHub Releases * Replace Rancher-Action with generic HTTP call * Replace release event with tag push event, which is required for semver metadata * Remove unnecessary pr event * Add more release instructions * fix: Handle `anyOf` in JSON Schema property (#379) * fix: Handle `anyOf` in JSON Schema property * Indicate function return value type * Refactor function and add comments * fix: Prefix class name with `nmdc:` * Implement helper function to process both single-ref and multi-ref specs * Document prefix functionality * Fix punctuation in comment * Update dictionary and function to accommodate multiple classes per collection * WIP: Update doc link maker to accommodate collections that map to multiple classes * Clarify variable names * Add comments in an attempt to clarify code * Delete commented-out code that doesn't accommodate multi-class collections * Add tests covering some corner cases * Fix inaccurate type hint * Clarify docstring * Replace reference to nonexistent dict and implement preliminary patch * Make the collection name bold on the search page * Update search page to account for collections mapping to multiple classes * Remove redundant type hints * style: black format * panic on no-type given * add script and api function * update script * Refactor runtime client methods to raise for status and parse and return results * handle omics processing records * update docstring * update to include correct prefix * update to use use new insdc_bioproject_identifiers slot on omics_processing * style: black format * add typecodes enpoint (#386) unauthenticated. closes #385 * update .gitpod.yml * add sshproxy.sh for nersc tunneling * update make cmd * add gitpod affordance * add gitpod dockerfile * update gitpod stuff * rename * update Makefile * fix * gitpod: pull dev mdb * fix * fix make target * Separate dev and production deployments in GitHub workflow (#382) * Consolidate workflows for building docker images and deploying to Spin into one workflow * Remove docker-build.sh in favor of letting GitHub Actions handle Docker build and push * Update Release Process doc with info about initiating via GitHub Releases * Replace Rancher-Action with generic HTTP call * Replace release event with tag push event, which is required for semver metadata * Remove unnecessary pr event * Add more release instructions * style: fix, and elaborate a bit * Revert stuff This reverts commit 1b2372d. * style: fix, and elaborate a bit --------- Co-authored-by: eecavanna <eecavanna@users.noreply.github.com> Co-authored-by: Donny Winston <donny@polyneme.xyz> Co-authored-by: Michael Thornton <mbthornton@lbl.gov> Co-authored-by: Donny Winston <dwinston@alum.mit.edu> Co-authored-by: Jing Cao <jingcao.me@gmail.com> Co-authored-by: Patrick Kalita <pkalita@lbl.gov> * Update build-and-release-to-spin.yml * fix: Update build-and-release-to-spin.yml * Replace illegal variable defined using other variable (#389) Co-authored-by: Donny Winston <donny@polyneme.xyz> * Do full depth checkout so setuptools-scm can detect version * Rever to using env context for variables * Use local path context when running docker build so that it knows about git tags * Actually push the docker image, but not if running in a fork * Fix use of boolean var * Env context only available in `steps.if` * Update .dockerignore * Update main.py to display scm version * Allow study metadata not captured in submission portal to be passed to SubmissionPortalTranslator * Allow passing doi category to SubmissionPortalTranslator, use dataset_doi by default * Connect additional study translator parameters to Dagster op inputs * Allow additional Biosample metadata to be passed in via external CSV file * Add PyPI URL and elaborate on manual publishing process * Add related links section to `README.md` * ensure no w3id.org loop; use data portal API (#403) closes #402 * Mix in orcid jwt flow with client_credentials flow closes #333 * style: rm print statement * style: add docstring * style: add commentary * style: rm print-debugging * fix: rm unneeded import --------- Co-authored-by: Jing Cao <jingcao.me@gmail.com> Co-authored-by: Patrick Kalita <pkalita@lbl.gov> Co-authored-by: eecavanna <134325062+eecavanna@users.noreply.github.com> Co-authored-by: eecavanna <eecavanna@users.noreply.github.com> Co-authored-by: Michael Thornton <mbthornton@lbl.gov>

eecavanna added 2 commits November 13, 2023 11:55

fix: Handle anyOf in JSON Schema property

84b67c8

Indicate function return value type

b752877

eecavanna changed the title ~~fix: Handle anyOf in JSON Schema property~~ fix: Handle anyOf in JSON Schema property Nov 13, 2023

Refactor function and add comments

037b31d

eecavanna marked this pull request as ready for review November 13, 2023 20:08

eecavanna requested a review from dwinston November 13, 2023 20:08

fix: Prefix class name with nmdc:

a26e79f

eecavanna added 12 commits November 13, 2023 12:51

Implement helper function to process both single-ref and multi-ref specs

752eaba

Document prefix functionality

3db0d92

Fix punctuation in comment

e78579e

Update dictionary and function to accommodate multiple classes per co…

e7d5773

…llection

WIP: Update doc link maker to accommodate collections that map to mul…

581db63

…tiple classes

Clarify variable names

93f1393

Add comments in an attempt to clarify code

2f1ab46

Delete commented-out code that doesn't accommodate multi-class collec…

f7c0c06

…tions

Add tests covering some corner cases

0ce48c8

Fix inaccurate type hint

2550172

Clarify docstring

a532559

Replace reference to nonexistent dict and implement preliminary patch

3766889

eecavanna added 2 commits November 13, 2023 17:55

Make the collection name bold on the search page

5d588ea

Update search page to account for collections mapping to multiple cla…

436b56d

…sses

eecavanna and others added 2 commits November 13, 2023 18:59

Remove redundant type hints

737743f

style: black format

f2f1d15

mbthornton-lbl and others added 22 commits November 20, 2023 16:24

handle omics processing records

eb18494

update docstring

f295ecf

update to include correct prefix

ad9e9e8

update to use use new insdc_bioproject_identifiers slot on omics_proc…

dcd6069

…essing

style: black format

96cdea4

add typecodes enpoint (#386)

5bdadae

unauthenticated. closes #385

update .gitpod.yml

29a644d

add sshproxy.sh for nersc tunneling

4af568b

update make cmd

63ec314

add gitpod affordance

efa71ba

add gitpod dockerfile

17cda4a

update gitpod stuff

6ec91e5

rename

587dac0

update Makefile

9f566de

fix

c1df262

gitpod: pull dev mdb

793e36b

fix

5423a33

fix make target

a52a944

style: fix, and elaborate a bit

7267aac

Revert stuff

160c036

This reverts commit 1b2372d.

style: fix, and elaborate a bit

4c84f30

dwinston approved these changes Nov 20, 2023

View reviewed changes

dwinston merged commit 30e3838 into main Nov 20, 2023

eecavanna self-assigned this Nov 20, 2023

eecavanna added the bug Something isn't working label Nov 20, 2023

dwinston mentioned this pull request Nov 21, 2023

Search page (/search) exhibits an HTTP 500 Internal Server Error (KeyError: '$ref') #380

Closed

eecavanna deleted the 378-error-keyerror-ref-occurred-on-dagster-after-using-metadatajsonsubmit-endpoint branch November 22, 2023 00:42

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: Handle `anyOf` in JSON Schema property #379

fix: Handle `anyOf` in JSON Schema property #379

eecavanna commented Nov 13, 2023

eecavanna commented Nov 13, 2023

dwinston commented Nov 13, 2023

eecavanna commented Nov 13, 2023 •

edited

Loading

eecavanna commented Nov 14, 2023 •

edited

Loading

eecavanna commented Nov 14, 2023

eecavanna commented Nov 14, 2023

dwinston commented Nov 15, 2023

dwinston left a comment

fix: Handle anyOf in JSON Schema property #379

fix: Handle anyOf in JSON Schema property #379

Conversation

eecavanna commented Nov 13, 2023

eecavanna commented Nov 13, 2023

dwinston commented Nov 13, 2023

eecavanna commented Nov 13, 2023 • edited Loading

eecavanna commented Nov 14, 2023 • edited Loading

eecavanna commented Nov 14, 2023

eecavanna commented Nov 14, 2023

dwinston commented Nov 15, 2023

dwinston left a comment

Choose a reason for hiding this comment

fix: Handle `anyOf` in JSON Schema property #379

fix: Handle `anyOf` in JSON Schema property #379

eecavanna commented Nov 13, 2023 •

edited

Loading

eecavanna commented Nov 14, 2023 •

edited

Loading