Updates pk creation & adds merging to process log#69
Merged
Conversation
- PK is created using both the record ID and the article(/record) ID - Assuming rows aren't deleted and new entries created in their place this should be unique. Ideally, a unique ID should be created in RedCap. - The data are sorted prior to UUID creation
- Clearer code
sangeetabhatia03
approved these changes
May 15, 2025
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This PR combines two updates (sorry!)
Due to the format of the REDCap data, the only unique identifier is the record_id (which gets renamed to Article_ID). Previously, the [table]_access_id pks were created as auto-incrementing integers, on the assumption that the data were ordered by time. However, this assumption was incorrect. Consequently, if new data is extracted using an existing form, the [table]_access_id may change. This is problematic since these ids are needed by the fixed double extraction files when merging them to the matching and single extraction files.
The solution is assign an id based on the unique record id and the order of extraction of the data in REDCap, i.e.
record_id_extraction_number. A potential flaw with this approach is that we will be unable to know someone deletes a record and then recreates a record in its place - however, this seems like an edge case.The ideal solution would be to assign a primary key to every outbreak, model, and parameter instance in REDCap and to use that instead of the [table]_access_id throughout the code base.