Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bump nmdc-schema from 10.3.0 to 10.5.5 #560

Merged
merged 6 commits into from
Jun 21, 2024

Conversation

eecavanna
Copy link
Collaborator

In this branch, I bumped nmdc-schema from 10.3.0 to 10.5.4 in requirements/main.in, then I ran $ make update-deps to synchronize the other requirements files.

@eecavanna eecavanna linked an issue Jun 18, 2024 that may be closed by this pull request
3 tasks
@eecavanna eecavanna self-assigned this Jun 18, 2024
@eecavanna
Copy link
Collaborator Author

The GHA workflow failed. Looks like one of the containers didn't start up "in time":

docker compose --file docker-compose.test.yml run test
time="2024-06-18T20:11:26Z" level=warning msg="/home/runner/work/nmdc-runtime/nmdc-runtime/docker-compose.test.yml: `version` is obsolete"
 Container dagster-postgresql  Running
 Container mongo  Running
 Container dagster-daemon  Running
 Container fastapi  Created
 Container dagster-dagit  Running
 Container fastapi  Starting
 Container fastapi  Started
wait-for-it.sh: waiting 300 seconds for fastapi:8000
wait-for-it.sh: timeout occurred after waiting 300 seconds for fastapi:8000
wait-for-it.sh: strict mode, refusing to execute subprocess
make: *** [Makefile:39: test-run] Error 124
Error: Process completed with exit code 2.

@eecavanna
Copy link
Collaborator Author

eecavanna commented Jun 18, 2024

I've seen this symptom before, in other Runtime PRs. I'll re-run the GHA workflow now (with no changes to the code).

@eecavanna
Copy link
Collaborator Author

eecavanna commented Jun 19, 2024

I reproduced the failure locally and got more information. According to the output of make up-test followed by make test in my local development environment, the Runtime is not booting successfully. Instead, it is outputting this error:

ModuleNotFoundError: No module named 'nmdc_schema.nmdc_schema_accepting_legacy_ids'

The offending import statement is in nmdc_runtime/util.py (line 19).

Locally, I tried changing that line as follows:

- from nmdc_schema.nmdc_schema_accepting_legacy_ids import Database as NMDCDatabase
+ from nmdc_schema.nmdc import Database as NMDCDatabase

That eliminated the original error message (which I'll refer to as the "first-level" error), but then revealed this one (which I'll refer to as the "second-level" error):

fastjsonschema.exceptions.JsonSchemaDefinitionException: Unresolvable ref: WorkflowExecutionActivity

@eecavanna
Copy link
Collaborator Author

eecavanna commented Jun 19, 2024

Hi @turbomam,

I checked the JSON Schema file (nmdc_schema/nmdc.schema.json) in version 10.5.4 of the nmdc-schema repository and saw that it, indeed, does not contain a definition of the WorkflowExecutionActivity class, although it does contain several refs to it (for example, this one).

@aclum mentioned that she expects that definition to be present in that version of the schema.

I looked into the history of that file in an attempt to determine when the definition was last present in the file. I found that—in terms of Git tags—it was present in v10.4.0 and absent in v10.5.0.

I wanted to bring this to your attention. I am curious what you think about it. I want to reconcile @aclum's expectation that the class definition be present, with my observation that it is absent.

@eecavanna eecavanna added help wanted Extra attention is needed upstream-issue The root cause lies upstream from here labels Jun 19, 2024
@pkalita-lbl
Copy link
Collaborator

pkalita-lbl commented Jun 20, 2024

I believe the root cause is this commit which added abstract: true to the WorkflowExecutionActivity class.

As it stands today LinkML's JSON Schema generator omits abstract classes from its output during its squashing of LinkML's class hierarchy-based representation into JSON Schema's flat representation. Normally that doesn't cause a problem, but apparently it does in this case, for some reason.

I wonder if the quickest way forward right now is to remove abstract: true from the WorkflowExecutionActivity class?

@turbomam
Copy link
Member

turbomam commented Jun 20, 2024

I would prefer to use the abstract: true feature as a way of educating schema users and validating data, rather than ensuring which classes are included in the JSON schema file.

I would also prefer that more of our software used the YAML, Python or SchemaView() representations of the schema rather than other derived formats, which may not retain all of the power and expressiveness of the schema

Having said that, I'm not in a position to deliver a good short-term solution for this problem, so I won't object to removing the

WorkflowExecutionActivity:
    abstract: true

as long as we take other steps to ensure that instances of this class aren't allowed in the data. Or if all else fails, an issue about this is opened in the appropriate repo (linkml/linkml?) and the class gets some documentation about the problem, including a see_also link to the GH issue.

@pkalita-lbl
Copy link
Collaborator

an issue about this is opened in the appropriate repo (linkml/linkml?)

Yes I definitely agree with that. Although as I started to try and narrow down a simple example to illustrate the issue I'm coming to realize that there may be more to it than I initially thought. I'll update again here once I'm more confident that's the right path forward.

@eecavanna
Copy link
Collaborator Author

Thanks, @pkalita-lbl and @turbomam!

The commit @pkalita-lbl referenced in his comment was introduced as part of the same repository release (i.e. v10.5.0) in which the definition of the WorkflowExecutionActivity starting being absent from the JSON Schema.

One thing I want to point out is that, while the definition of the class was absent, the JSON Schema file still contained references to that (absent) class, which I think is what the error message from the Runtime was referring to.

@eecavanna
Copy link
Collaborator Author

eecavanna commented Jun 20, 2024

As it stands today LinkML's JSON Schema generator omits abstract classes from its output during its squashing of LinkML's class hierarchy-based representation into JSON Schema's flat representation.

I am wondering whether it omits the class definition, but preserves references (if any) to that class definition. If that is the case, there may be a bug in LinkML's JSON Schema generator (because I don't think references like that would be resolvable).

@pkalita-lbl
Copy link
Collaborator

there may be a bug in LinkML's JSON Schema generator

Yes I believe that's true, but I think it needs some further characterization before even thinking about what a potential fix will be. In the meantime I don't want that work to hold this up.

@pkalita-lbl pkalita-lbl changed the title Bump nmdc-schema from 10.3.0 to 10.5.4 Bump nmdc-schema from 10.3.0 to 10.5.5 Jun 21, 2024
@pkalita-lbl
Copy link
Collaborator

I added a new commit to upgrade to the new nmdc-schema version 10.5.5 which should resolve the JSON Schema issues mentioned above.

@pkalita-lbl pkalita-lbl merged commit 7a26638 into main Jun 21, 2024
2 checks passed
@pkalita-lbl pkalita-lbl deleted the 559-bump-schema-version-from-1030-to-1054 branch June 21, 2024 17:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Extra attention is needed upstream-issue The root cause lies upstream from here
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Bump schema version from 10.3.0 to 10.5.4
3 participants