Skip to content
This repository has been archived by the owner on Jan 24, 2018. It is now read-only.

Sql repo #1166

Merged
merged 65 commits into from
May 5, 2016
Merged

Sql repo #1166

merged 65 commits into from
May 5, 2016

Conversation

jeromekelleher
Copy link
Contributor

@jeromekelleher jeromekelleher commented Apr 28, 2016

This PR changes master to use the sql repo, and removes support for file system based data repositories.

Issues closed by this PR:

This is a large change affecting all developers and users, so please review and vote.

jeromekelleher and others added 30 commits April 11, 2016 15:43
Created a new API based on a SQLite DB for the data repository. This is
still WIP, and is incomplete.

Changes to download data script:

- Move the script to the project root so that it can access the ga4gh
  package.
- Changed the NCBI URL used to access sequence data as the existing one
  seems to have been discontinued.
- Added a --force/-f flag to force removal of any existing directories.
- Changed the download directory to contain a flat list of the files, as
  the hierarchy wasn't useful any more.
- Added the repo DB.
- Removed the checkpointing functionality. This would have been very
  complex to maintain now that we are using a DB rather than just
putting files in a specific location. The main reason for including it
has gone away in any case, as htslib should be much more reliable now.
Conflicts:
	ga4gh/datamodel/datasets.py
Updated all test data to use the single FASTA for a reference set.
Simplified ontologies to just have a single object representing an
ontology, backed by a single file, with the data repository providing
all the other functionality.

Partial refactor of VariantAnnotations:

The present organisation of the VariantAnnotations code was difficult to
reconcile with the DB based repo refactor. This commit gives an outline
of how it could work, by changing the relationship between VariantSet
and VariantAnnotationSet from "is a" to "has a". Unfortunately I could
not complete this work, and have had to move on to other aspects. I have
therefore disabled the tests that are failing and moved on.
- added read/write semantics for opening the repo manager
- removed repo_manager module and disabled tests.
Also created conditional startup code for the server to keep support for
the file system repo.:
Variant annotation sets were stored at the top level in the dataset,
which was awkward and inconsistent. Fixed simulated stack tests.
All VA tests have been re-enabled.
This allows us to be systematic about what we accept into the repo and
ensure that we don't have duplciates. Also gives a neat way of checking
for errors, and tidies up a lot of code.
Also implemented remove dataset, referenceSet in the CLI, and
reenabled some CLI tests.
Add feature set add / delete to repo manager
The SQL schema and CLI used the term Ontology rather than
OntologyTermMap because it seems that the current approach is quite
limited and will need to be changed. This seems more forward compatible,
since we don't want to affect uses by making them change CLI syntax or
make backwards incompatible changes to the schema.

Conflicts:
	ga4gh/cli.py
	tests/unit/test_repo_manager.py
- Cache num(Un)AlignedBases for ReadGroupSets to prevent a file access

Issue #1129
Issue #1130

move ReadGroupSets.getStats to abstract

asdf
Cache num(Un)AlignedBases for ReadGroupSets
Adds initial support for adding Variants and VariantAnnotations to the
SQL repo and the manager CLI.

Conflicts:
	tests/unit/test_repo_manager.py
@jeromekelleher
Copy link
Contributor Author

OK, #1212 has been merged so I think we're good to go. Retracting my previous -1.

@dcolligan, what do you think? Ready to push the button?

@dcolligan
Copy link
Member

I am now pushing the button.

@dcolligan dcolligan merged commit fe1d2f1 into master May 5, 2016
@dcolligan
Copy link
Member

@jeromekelleher let the great issue-closing begin

@jeromekelleher
Copy link
Contributor Author

Woohoo! Thanks @dcolligan, time for a closeathon!

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants