Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove files from variants storage #192

Closed
11 tasks done
j-coll opened this issue Aug 17, 2015 · 2 comments
Closed
11 tasks done

Remove files from variants storage #192

j-coll opened this issue Aug 17, 2015 · 2 comments
Assignees
Milestone

Comments

@j-coll
Copy link
Member

j-coll commented Aug 17, 2015

Currently it's not possible to delete specific samples data. Issue extracted from issue #160

Deleting samples data can be implemented by removing specific samples, or removing indexed files.

  • Delete files
    A file contains a set of samples. The storage metadata contains a list of all the indexed samples. Also, delete BatchFileOperations are available.
  • Delete samples
    This feature will require more accurate metadata information, to be able to know exactly which samples are indexed. Now the only way to determine if a sample is indexed or not, is if its file is indexed.
    Deleting samples is out of the scope

Task list:

  • Remove delete methods from VariantDBAdaptor interface
  • Rename delete and drop methods with remove
  • Delete file for MongoDB
    • Collect all existing genotypes
    • Synchronize delete with the stage collection
    • Detect and remove unused secondary alternates
    • Add BatchFileOperation blocking the study to avoid concurrent operations.
    • Migration script to collect loaded genotypes
  • Delete file for Hadoop
  • Invalidate cohorts stats with removed samples
  • Remove deleted samples from cohort ALL.
    Be aware that samples can be in multiple files.
  • Add Remove operations at VariantStorageManager
@j-coll
Copy link
Member Author

j-coll commented Jan 9, 2017

see first approach drop_file.js

@j-coll j-coll modified the milestones: v1.2.0, v1.0.0 Jun 2, 2017
@j-coll j-coll added the storage label Jun 14, 2017
@j-coll j-coll changed the title Delete variant sample data from OpenCGA-Storage Remove files from variants storage Jun 14, 2017
j-coll added a commit that referenced this issue Jul 27, 2017
Be aware that samples can be in multiple files.
@j-coll
Copy link
Member Author

j-coll commented Aug 3, 2017

Migrate script for MongoDB at https://gist.github.com/j-coll/3dec01abc70644943d33de78105c633e

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants