New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Samtools to pysam #5037
Samtools to pysam #5037
Commits on Dec 8, 2017
-
-
-
Use pysam for metadata setting and grooming
There are also some noteworthy changes here: - We do always respect the sort-order specified in the header - If the sort order is not mentioned in the header or no header exists we coordinate-sort the file. - We do not use indexing to determine if a file is coordinate sorted, because this does not work reliably with samtools/pysam > 1.X, since arbitrarily sorted files can be indexed now. This also fixes advanced metadata setting (sort_order, bam_version and more), which appears to have been broken. This probably went by unnoticed because of the catch-all try-except-pass. The downside to fixing this is that I had to (temporarilly, hopefully) comment out the reference_names, reference_lengths, bam_header and readgroups attributes, because they led to the following error: ``` galaxy.model.metadata DEBUG 2017-11-19 14:47:30,540 loading metadata from file for: HistoryDatasetAssociation 582 galaxy.jobs.runners.local ERROR 2017-11-19 14:47:30,636 Job wrapper finish method failed Traceback (most recent call last): File "/Users/mvandenb/src/galaxy/lib/galaxy/jobs/runners/local.py", line 130, in queue_job job_wrapper.finish(stdout, stderr, exit_code) File "/Users/mvandenb/src/galaxy/lib/galaxy/jobs/__init__.py", line 1357, in finish self.sa_session.flush() File "/Users/mvandenb/src/galaxy/.venv/lib/python2.7/site-packages/sqlalchemy/orm/scoping.py", line 157, in do return getattr(self.registry(), name)(*args, **kwargs) File "/Users/mvandenb/src/galaxy/.venv/lib/python2.7/site-packages/sqlalchemy/orm/session.py", line 2019, in flush self._flush(objects) File "/Users/mvandenb/src/galaxy/.venv/lib/python2.7/site-packages/sqlalchemy/orm/session.py", line 2137, in _flush transaction.rollback(_capture_exception=True) File "/Users/mvandenb/src/galaxy/.venv/lib/python2.7/site-packages/sqlalchemy/util/langhelpers.py", line 60, in __exit__ compat.reraise(exc_type, exc_value, exc_tb) File "/Users/mvandenb/src/galaxy/.venv/lib/python2.7/site-packages/sqlalchemy/orm/session.py", line 2101, in _flush flush_context.execute() File "/Users/mvandenb/src/galaxy/.venv/lib/python2.7/site-packages/sqlalchemy/orm/unitofwork.py", line 373, in execute rec.execute(self) File "/Users/mvandenb/src/galaxy/.venv/lib/python2.7/site-packages/sqlalchemy/orm/unitofwork.py", line 532, in execute uow File "/Users/mvandenb/src/galaxy/.venv/lib/python2.7/site-packages/sqlalchemy/orm/persistence.py", line 170, in save_obj mapper, table, update) File "/Users/mvandenb/src/galaxy/.venv/lib/python2.7/site-packages/sqlalchemy/orm/persistence.py", line 706, in _emit_update_statements execute(statement, multiparams) File "/Users/mvandenb/src/galaxy/.venv/lib/python2.7/site-packages/sqlalchemy/engine/base.py", line 914, in execute return meth(self, multiparams, params) File "/Users/mvandenb/src/galaxy/.venv/lib/python2.7/site-packages/sqlalchemy/sql/elements.py", line 323, in _execute_on_connection return connection._execute_clauseelement(self, multiparams, params) File "/Users/mvandenb/src/galaxy/.venv/lib/python2.7/site-packages/sqlalchemy/engine/base.py", line 1010, in _execute_clauseelement compiled_sql, distilled_params File "/Users/mvandenb/src/galaxy/.venv/lib/python2.7/site-packages/sqlalchemy/engine/base.py", line 1146, in _execute_context context) File "/Users/mvandenb/src/galaxy/.venv/lib/python2.7/site-packages/sqlalchemy/engine/base.py", line 1341, in _handle_dbapi_exception exc_info File "/Users/mvandenb/src/galaxy/.venv/lib/python2.7/site-packages/sqlalchemy/util/compat.py", line 202, in raise_from_cause reraise(type(exception), exception, tb=exc_tb, cause=cause) File "/Users/mvandenb/src/galaxy/.venv/lib/python2.7/site-packages/sqlalchemy/engine/base.py", line 1139, in _execute_context context) File "/Users/mvandenb/src/galaxy/.venv/lib/python2.7/site-packages/sqlalchemy/engine/default.py", line 450, in do_execute cursor.execute(statement, parameters) OperationalError: (psycopg2.OperationalError) index row size 7208 exceeds maximum 2712 for index "ix_history_dataset_association_metadata" HINT: Values larger than 1/3 of a buffer page cannot be indexed. Consider a function index of an MD5 hash of the value, or use full text indexing. [SQL: 'UPDATE history_dataset_association SET update_time=%(update_time)s, blurb=%(blurb)s, peek=%(peek)s, metadata=%(_metadata)s WHERE history_dataset_association.id = %(history_dataset_association_id)s'] [parameters: {'_metadata': <psycopg2.extensions.Binary object at 0x119ed8990>, 'update_time': datetime.datetime(2017, 11, 19, 13, 47, 30, 598058), 'history_dataset_association_id': 582, 'blurb': '3.5 KB', 'peek': 'Binary bam alignments file'}] ```
-
-
Use subprocess to check if pysam.index succeeds
Checking if pysam.index succeeds tests whether a file is coordinate sorted. If pysam.index fails to index htslib writes to stderr and this fails set meta tool. To prevent this we run this in a subprocess and discard stderr.
-
-
Drop samtools from metadata and upload tools
We only need samtools for the dataproviders, which shouldn't be used by these tools.
-
-
-
-
-
Tabix indexing fixes. Upstream used 'index' instead of 'index_filenam…
…e', and 'force' is now required since we precreate the destination location.
-
-
Need force=True in pysam.tabix_index because the index file path exis…
…ts in the object store
-
-
Symlink tbi index to work around pysam limitation
Before pysam-developers/pysam#586 is merged and a new release is out we create a symlink to the tbi file, which is required for creating TabixFile instances. Since we want to cleanup the symlinks I turned `get_data_file` into a contextmanager. Along the way I also changed many open()/close() calls to `with` statements.
-
-
Decompose interval_to_tabix converter script
This renames some variables to make it clearer what files they reflect. Also adds a very basic test that this works as intended.