Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make optional to merge non same overlappings variants in MongoDB #574

Closed
j-coll opened this issue Apr 21, 2017 · 1 comment

Comments

Projects
None yet
1 participant
@j-coll
Copy link
Member

commented Apr 21, 2017

When loading variants into one database, we may find two types of overlapping variants: Variants from different files that are in the same position with same reference and alternate (same variant), and variants that are not same but overlapping.

With the two steps load #288 we started merging variants that were just overlapping, and not just the same variant. This nice feature has been working well, but it may be unneeded or unwanted in some scenarios. It also increases the loading time and final database size.

If this merge is not needed, load may be faster, and we can filter out much of the documents from the first collection.

  • Make optional the "non same variants" merge at MongoDBVariantMerger. Done at 10430e5
  • Create a new indexed field on the stage collection to filter by files and studies.
    • New indexed field _i
    • An array of strings containing files to be merged "_", and existing studies in the document, just "".This will help to filter by study or by file.
  • Fill_i field in the MongoDBVariantStageLoader
  • Add "files" parameter to the MongoDBVariantStageReader to allow files filtering.
    • Filter using the new indexed field (for files and studies).
    • In this situation, sorting by _id will not be needed, as the MongoDBVariantMerger will try to find overlapping variants.
  • Clear loaded files from _i in the MongoDBVariantMergeLoader
  • Ensure default genotype is the unknown genotype. We are not going to read all the documents from the stage collection, so we are not going to fill gaps properly.
  • Ensure retro-compatibility or create migration script
@j-coll

This comment has been minimized.

Copy link
Member Author

commented May 12, 2017

To upgrade from OpenCGA 1.0.x to 1.1.0, this migration script must be executed for all Variant databases.

https://gist.github.com/j-coll/8e9ace0b24c9f65fa99be64ebae5a9bb

j-coll added a commit that referenced this issue May 16, 2017

@j-coll j-coll closed this May 26, 2017

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.