Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Migrator: Write migrator for NomAnalysisActivity documents #2010

Closed
aclum opened this issue May 28, 2024 · 8 comments
Closed

Migrator: Write migrator for NomAnalysisActivity documents #2010

aclum opened this issue May 28, 2024 · 8 comments
Assignees

Comments

@aclum
Copy link
Contributor

aclum commented May 28, 2024

We need a migrator which will search for NomAnalysisActivity which do not have a version appended, for each of those records it should update ID to append a .1 to the existing value in slot ID and move the existing value of ID to alternative_identifiers.

Example before:

{
  "_id": {
    "$oid": "649b0095daa6c19f56b2777c"
  },
  "type": "nmdc:NomAnalysisActivity",
  "has_input": [
    "nmdc:dobj-13-hg0x7944"
  ],
  "has_output": [
    "nmdc:dobj-13-48nyp930"
  ],
  "id": "nmdc:wfnom-13-7yf9qj85",
  "ended_at_time": "2021-01-21T23:27:57Z",
  "execution_resource": "EMSL-RZR",
  "git_url": "https://github.com/microbiomedata/enviroMS",
  "started_at_time": "2021-01-21T23:27:57Z",
  "used": "12T_FTICR_B",
  "was_informed_by": "nmdc:omprc-11-3x68c186"
}

example after:

{
  "_id": {
    "$oid": "649b0095daa6c19f56b2777c"
  },
  "type": "nmdc:NomAnalysisActivity",
  "has_input": [
    "nmdc:dobj-13-hg0x7944"
  ],
  "has_output": [
    "nmdc:dobj-13-48nyp930"
  ],
  "id": "nmdc:wfnom-13-7yf9qj85.1",
  "ended_at_time": "2021-01-21T23:27:57Z",
  "execution_resource": "EMSL-RZR",
  "git_url": "https://github.com/microbiomedata/enviroMS",
  "started_at_time": "2021-01-21T23:27:57Z",
  "used": "12T_FTICR_B",
  "was_informed_by": "nmdc:omprc-11-3x68c186",
  "alternative_identifiers": ["nmdc:wfnom-13-7yf9qj85"]
}

Example migrators can be found https://github.com/microbiomedata/nmdc-schema/tree/main/nmdc_schema/migrators

Target completion for this is 6/17. This migrator is needed for the 6/24 release or the records will be invalid b/c that release will have more stringent pattern matches on IDs. cc @ssarrafan

@eecavanna eecavanna changed the title write migrator for NomAnalysisActivity records Migrator: Write migrator for NomAnalysisActivity documents May 29, 2024
@eecavanna
Copy link
Contributor

Hi @JamesTessmer, all of the migrators — whether written for the nmdc-schema schema or the berkeley-schema-fy24 schema — can be found in the berkeley-schema-fy24 repository; here: https://github.com/microbiomedata/berkeley-schema-fy24/tree/main/nmdc_schema/migrators

@eecavanna
Copy link
Contributor

eecavanna commented May 29, 2024

Hi @aclum , I have a question. There are a few places in a migrator where schema version numbers are indicated; for example, each migrator's name has the format migrator_from_{initial_schema_version}_to_{final_schema_version}.py, and each migrator has a variable named _from_version and a variable named _to_version, etc. What are the "from version" and "to version" in this case? In other words, what schema versions will this migrator be used to migrate the database from and to?

@eecavanna
Copy link
Contributor

eecavanna commented May 29, 2024

@JamesTessmer, when the person writing a migrator doesn't know what the specific schema versions will be yet, I usually recommend that they either (a) make up some non-sensical versions (e.g. 0.0.0) and then mention in the PR that they are placeholder versions that will be updated to match the eventual starting/ending schema versions that go along with the migrator; or (b) specify the starting version as the currently-released schema version and specify the ending version as some PR number (the number of the schema repository PR that introduced the relevant schema change).

Here's a (hypothetical) example:

migrator_from_10_3_0_to_PR123.py

The version numbers can remain as placeholders until the migrator is in a PR. In other words, they can remain as placeholder while writing and testing the migrator.

@aclum
Copy link
Contributor Author

aclum commented May 29, 2024

It will be 10.3.0 to whatever the version release at the end of June for nmdc-schema will be proposed. I propose 10.4.0 unless @turbomam objects.

@eecavanna
Copy link
Contributor

Thanks, @aclum.

FYI @JamesTessmer, when writing the migrator, I recommend naming it migrator_from_10_3_0_to_10_4_0.py and setting [its class variables] _from_version = "10.3.0" and _to_version = "10.4.0". We can go back and edit those things during the PR review phase, if needed.

@JamesTessmer
Copy link
Contributor

Added PR for this issue here: #2059

@aclum @eecavanna What's the best way to test the migrator before marking the PR as ready for review?

@eecavanna
Copy link
Contributor

Hi @JamesTessmer,

The test approach I consider to be the "lowest-hanging fruit" is to run the doctests. You can do that by running $ poetry run python -m doctest -v /path/to/the/migrator.py.

  • Note: There is also a make target that can be used to run the doctests in all migration-related code (it's $ make migration-doctests), but it can be difficult to spot error messages in its output due to it outputting a large quantity of messages (all of which are the same color). I use the more specific $ poetry run python -m doctest ... command when working on a specific migrator.

@aclum
Copy link
Contributor Author

aclum commented Jun 14, 2024

merged with #2059

@aclum aclum closed this as completed Jun 14, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
No open projects
Development

No branches or pull requests

3 participants