Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Samtools to pysam #5037

Merged
merged 22 commits into from Dec 10, 2017
Merged

Samtools to pysam #5037

merged 22 commits into from Dec 10, 2017

Commits on Dec 8, 2017

  1. Copy the full SHA
    3fa608f View commit details
    Browse the repository at this point in the history
  2. Add unsorted test bam

    mvdbeek committed Dec 8, 2017
    Copy the full SHA
    16d0d3e View commit details
    Browse the repository at this point in the history
  3. Use pysam for metadata setting and grooming

    There are also some noteworthy changes here:
      - We do always respect the sort-order specified in the header
      - If the sort order is not mentioned in the header or no header
        exists we coordinate-sort the file.
      - We do not use indexing to determine if a file is coordinate sorted,
        because this does not work reliably with samtools/pysam > 1.X,
        since arbitrarily sorted files can be indexed now.
    
    This also fixes advanced metadata setting (sort_order, bam_version and more),
    which appears to have been broken. This probably went by unnoticed because of the catch-all
    try-except-pass.
    The downside to fixing this is that I had to (temporarilly, hopefully)
    comment out the reference_names, reference_lengths, bam_header and readgroups attributes,
    because they led to the following error:
    
    ```
    galaxy.model.metadata DEBUG 2017-11-19 14:47:30,540 loading metadata from file for: HistoryDatasetAssociation 582
    galaxy.jobs.runners.local ERROR 2017-11-19 14:47:30,636 Job wrapper finish method failed
    Traceback (most recent call last):
      File "/Users/mvandenb/src/galaxy/lib/galaxy/jobs/runners/local.py", line 130, in queue_job
        job_wrapper.finish(stdout, stderr, exit_code)
      File "/Users/mvandenb/src/galaxy/lib/galaxy/jobs/__init__.py", line 1357, in finish
        self.sa_session.flush()
      File "/Users/mvandenb/src/galaxy/.venv/lib/python2.7/site-packages/sqlalchemy/orm/scoping.py", line 157, in do
        return getattr(self.registry(), name)(*args, **kwargs)
      File "/Users/mvandenb/src/galaxy/.venv/lib/python2.7/site-packages/sqlalchemy/orm/session.py", line 2019, in flush
        self._flush(objects)
      File "/Users/mvandenb/src/galaxy/.venv/lib/python2.7/site-packages/sqlalchemy/orm/session.py", line 2137, in _flush
        transaction.rollback(_capture_exception=True)
      File "/Users/mvandenb/src/galaxy/.venv/lib/python2.7/site-packages/sqlalchemy/util/langhelpers.py", line 60, in __exit__
        compat.reraise(exc_type, exc_value, exc_tb)
      File "/Users/mvandenb/src/galaxy/.venv/lib/python2.7/site-packages/sqlalchemy/orm/session.py", line 2101, in _flush
        flush_context.execute()
      File "/Users/mvandenb/src/galaxy/.venv/lib/python2.7/site-packages/sqlalchemy/orm/unitofwork.py", line 373, in execute
        rec.execute(self)
      File "/Users/mvandenb/src/galaxy/.venv/lib/python2.7/site-packages/sqlalchemy/orm/unitofwork.py", line 532, in execute
        uow
      File "/Users/mvandenb/src/galaxy/.venv/lib/python2.7/site-packages/sqlalchemy/orm/persistence.py", line 170, in save_obj
        mapper, table, update)
      File "/Users/mvandenb/src/galaxy/.venv/lib/python2.7/site-packages/sqlalchemy/orm/persistence.py", line 706, in _emit_update_statements
        execute(statement, multiparams)
      File "/Users/mvandenb/src/galaxy/.venv/lib/python2.7/site-packages/sqlalchemy/engine/base.py", line 914, in execute
        return meth(self, multiparams, params)
      File "/Users/mvandenb/src/galaxy/.venv/lib/python2.7/site-packages/sqlalchemy/sql/elements.py", line 323, in _execute_on_connection
        return connection._execute_clauseelement(self, multiparams, params)
      File "/Users/mvandenb/src/galaxy/.venv/lib/python2.7/site-packages/sqlalchemy/engine/base.py", line 1010, in _execute_clauseelement
        compiled_sql, distilled_params
      File "/Users/mvandenb/src/galaxy/.venv/lib/python2.7/site-packages/sqlalchemy/engine/base.py", line 1146, in _execute_context
        context)
      File "/Users/mvandenb/src/galaxy/.venv/lib/python2.7/site-packages/sqlalchemy/engine/base.py", line 1341, in _handle_dbapi_exception
        exc_info
      File "/Users/mvandenb/src/galaxy/.venv/lib/python2.7/site-packages/sqlalchemy/util/compat.py", line 202, in raise_from_cause
        reraise(type(exception), exception, tb=exc_tb, cause=cause)
      File "/Users/mvandenb/src/galaxy/.venv/lib/python2.7/site-packages/sqlalchemy/engine/base.py", line 1139, in _execute_context
        context)
      File "/Users/mvandenb/src/galaxy/.venv/lib/python2.7/site-packages/sqlalchemy/engine/default.py", line 450, in do_execute
        cursor.execute(statement, parameters)
    OperationalError: (psycopg2.OperationalError) index row size 7208 exceeds maximum 2712 for index "ix_history_dataset_association_metadata"
    HINT:  Values larger than 1/3 of a buffer page cannot be indexed.
    Consider a function index of an MD5 hash of the value, or use full text indexing.
     [SQL: 'UPDATE history_dataset_association SET update_time=%(update_time)s, blurb=%(blurb)s, peek=%(peek)s, metadata=%(_metadata)s WHERE history_dataset_association.id = %(history_dataset_association_id)s'] [parameters: {'_metadata': <psycopg2.extensions.Binary object at 0x119ed8990>, 'update_time': datetime.datetime(2017, 11, 19, 13, 47, 30, 598058), 'history_dataset_association_id': 582, 'blurb': '3.5 KB', 'peek': 'Binary bam alignments file'}]
    ```
    mvdbeek committed Dec 8, 2017
    Copy the full SHA
    3c9999d View commit details
    Browse the repository at this point in the history
  4. Copy the full SHA
    369485a View commit details
    Browse the repository at this point in the history
  5. Use subprocess to check if pysam.index succeeds

    Checking if pysam.index succeeds tests whether a file is coordinate sorted.
    If pysam.index fails to index htslib writes to stderr and this fails set meta
    tool. To prevent this we run this in a subprocess and discard stderr.
    mvdbeek committed Dec 8, 2017
    Copy the full SHA
    ae72e56 View commit details
    Browse the repository at this point in the history
  6. Copy the full SHA
    836aea0 View commit details
    Browse the repository at this point in the history
  7. Drop samtools from metadata and upload tools

    We only need samtools for the dataproviders, which shouldn't
    be used by these tools.
    mvdbeek committed Dec 8, 2017
    Copy the full SHA
    bac56d6 View commit details
    Browse the repository at this point in the history
  8. Copy the full SHA
    df35d08 View commit details
    Browse the repository at this point in the history
  9. Copy the full SHA
    3f6a59e View commit details
    Browse the repository at this point in the history
  10. Metadata fixes for VcfGz

    mvdbeek committed Dec 8, 2017
    Copy the full SHA
    8e580ec View commit details
    Browse the repository at this point in the history
  11. Copy the full SHA
    0ac77ad View commit details
    Browse the repository at this point in the history
  12. Copy the full SHA
    6ca896f View commit details
    Browse the repository at this point in the history
  13. Tabix indexing fixes. Upstream used 'index' instead of 'index_filenam…

    …e', and 'force' is now required since we precreate the destination location.
    dannon authored and mvdbeek committed Dec 8, 2017
    Copy the full SHA
    9c4a2f7 View commit details
    Browse the repository at this point in the history
  14. one more linting fix

    mvdbeek committed Dec 8, 2017
    Copy the full SHA
    2a6e00d View commit details
    Browse the repository at this point in the history
  15. Need force=True in pysam.tabix_index because the index file path exis…

    …ts in the object store
    mvdbeek committed Dec 8, 2017
    Copy the full SHA
    07158e0 View commit details
    Browse the repository at this point in the history
  16. Copy the full SHA
    64deb4f View commit details
    Browse the repository at this point in the history
  17. Copy the full SHA
    90a56ae View commit details
    Browse the repository at this point in the history
  18. Copy the full SHA
    eb636c1 View commit details
    Browse the repository at this point in the history
  19. Symlink tbi index to work around pysam limitation

    Before pysam-developers/pysam#586 is merged and a new release is out
    we create a symlink to the tbi file, which is required for creating TabixFile
    instances. Since we want to cleanup the symlinks I turned `get_data_file` into a contextmanager.
    Along the way I also changed many open()/close() calls to `with` statements.
    mvdbeek committed Dec 8, 2017
    Copy the full SHA
    1195183 View commit details
    Browse the repository at this point in the history
  20. Copy the full SHA
    4795e49 View commit details
    Browse the repository at this point in the history
  21. Decompose interval_to_tabix converter script

    This renames some variables to make it clearer what files they reflect.
    Also adds a very basic test that this works as intended.
    mvdbeek committed Dec 8, 2017
    Copy the full SHA
    46371d1 View commit details
    Browse the repository at this point in the history

Commits on Dec 10, 2017

  1. Copy the full SHA
    309b717 View commit details
    Browse the repository at this point in the history