Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

berkeley-schema-fy24: Implement migrator that merges some collections #2045

Closed
eecavanna opened this issue Jun 5, 2024 · 5 comments · Fixed by microbiomedata/berkeley-schema-fy24#196
Assignees
Labels
berkeley-fy24-refactor Label to describe issues created during the December 2023 hackathon for schema refactor enhancement New feature or request needs_details These issues need more details to adhere to best practices. X SMALL Less than 8 hours, less than 1 day

Comments

@eecavanna
Copy link
Contributor

Hi @aclum, I created this ticket to represent the task that came up during today's metadata meeting.

It sounded to me like you wanted all of the documents in one collection to be moved to another collection, and to have the first collection be deleted. Is there more to it than that (e.g. modifying fields within documents)?

@eecavanna eecavanna added enhancement New feature or request X SMALL Less than 8 hours, less than 1 day berkeley-fy24-refactor Label to describe issues created during the December 2023 hackathon for schema refactor needs_details These issues need more details to adhere to best practices. labels Jun 5, 2024
@eecavanna eecavanna self-assigned this Jun 5, 2024
@eecavanna
Copy link
Contributor Author

In terms of existing adapter methods, here's what I expect this migrator to do for each of those child classes (written here in pseudocode):

# Move all documents from the "pooling_set" collection into the "material_processing_set" collection,
# then delete the "pooling_set" collection.
self.adapter.do_for_each_document(
    collection_name="pooling_set", 
    action=lambda document: self.adapter.insert_document(collection="material_processing_set", document=document)
)
self.adapter.delete_collection(collection_name="pooling_set")

@eecavanna eecavanna changed the title berkeley-schema-fy24: Implement migrator that merges some collections (details TBD) berkeley-schema-fy24: Implement migrator that merges some collections Jun 6, 2024
@aclum
Copy link
Contributor

aclum commented Jun 6, 2024

DataGeneration subclasses are already in a combined collection so there no action there. All existing collections from children of MaterialProcessing should combined to a new collection called material_processing_set, same for WorkflowExecution. Im assuming you want to put this migrator at the end in which case you use commit id microbiomedata@ca304e4 to determine what the starting Database slot names are. If this is running earlier you may need to use nmdc-schema Database slot names. Note that some of these subclasses never had a collection in mongo so the code should be able to handle that.

@eecavanna
Copy link
Contributor Author

eecavanna commented Jun 6, 2024

Thanks! That was very helpful to me. I'm operating under the assumption that this will run after all migrators that have been implemented so far, so I'll refer to that commit you linked to.

P.S. I'll be out until about 9:45pm PT.

@eecavanna
Copy link
Contributor Author

I implemented this migrator. It's in this PR: microbiomedata#196

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
berkeley-fy24-refactor Label to describe issues created during the December 2023 hackathon for schema refactor enhancement New feature or request needs_details These issues need more details to adhere to best practices. X SMALL Less than 8 hours, less than 1 day
Projects
Status: Done
2 participants