Skip to content

Latest commit

 

History

History
230 lines (154 loc) · 5.74 KB

api.rst

File metadata and controls

230 lines (154 loc) · 5.74 KB

Introduction

Pysam is a python module that makes it easy to read and manipulate mapped short read sequence data stored in SAM/BAM files. It is a lightweight wrapper of the htslib_ C-API.

This page provides a quick introduction in using pysam followed by the API. See :ref:`usage` for more detailed usage instructions.

To use the module to read a file in BAM format, create a :class:`~pysam.AlignmentFile` object:

import pysam
samfile = pysam.AlignmentFile("ex1.bam", "rb")

Once a file is opened you can iterate over all of the read mapping to a specified region using :meth:`~pysam.AlignmentFile.fetch`. Each iteration returns a :class:`~pysam.AlignedSegment` object which represents a single read along with its fields and optional tags:

for read in samfile.fetch('chr1', 100, 120):
    print(read)

samfile.close()

To give:

EAS56_57:6:190:289:82       0       99      <<<7<<<;<<<<<<<<8;;<7;4<;<;;;;;94<;     69      CTCAAGGTTGTTGCAAGGGGGTCTATGTGAACAAA     0       192     1
EAS56_57:6:190:289:82       0       99      <<<<<<;<<<<<<<<<<;<<;<<<<;8<6;9;;2;     137     AGGGGTGCAGAGCCGAGTCACGGGGTTGCCAGCAC     73      64      1
EAS51_64:3:190:727:308      0       102     <<<<<<<<<<<<<<<<<<<<<<<<<<<::<<<844     99      GGTGCAGAGCCGAGTCACGGGGTTGCCAGCACAGG     99      18      1
...

You can also write to a :class:`~pysam.AlignmentFile`:

import pysam
samfile = pysam.AlignmentFile("ex1.bam", "rb")
pairedreads = pysam.AlignmentFile("allpaired.bam", "wb", template=samfile)
for read in samfile.fetch():
    if read.is_paired:
        pairedreads.write(read)

pairedreads.close()
samfile.close()

An alternative way of accessing the data in a SAM file is by iterating over each base of a specified region using the :meth:`~pysam.AlignmentFile.pileup` method. Each iteration returns a :class:`~pysam.PileupColumn` which represents all the reads in the SAM file that map to a single base in the reference sequence. The list of reads are represented as :class:`~pysam.PileupRead` objects in the :attr:`PileupColumn.pileups <pysam.PileupColumn.pileups>` property:

import pysam
samfile = pysam.AlignmentFile("ex1.bam", "rb" )
for pileupcolumn in samfile.pileup("chr1", 100, 120):
    print("\ncoverage at base %s = %s" % (pileupcolumn.pos, pileupcolumn.n))
    for pileupread in pileupcolumn.pileups:
        if not pileupread.is_del and not pileupread.is_refskip:
            # query position is None if is_del or is_refskip is set.
            print('\tbase in read %s = %s' %
                  (pileupread.alignment.query_name,
                   pileupread.alignment.query_sequence[pileupread.query_position]))

samfile.close()

The above code outputs:

coverage at base 99 = 1
    base in read EAS56_57:6:190:289:82 = A

coverage at base 100 = 1
    base in read EAS56_57:6:190:289:82 = G

coverage at base 101 = 1
    base in read EAS56_57:6:190:289:82 = G

coverage at base 102 = 2
    base in read EAS56_57:6:190:289:82 = G
    base in read EAS51_64:3:190:727:308 = G
...

Commands available in `samtools`_ are available as simple function calls. For example:

pysam.sort("-o", "output.bam", "ex1.bam")

corresponds to the command line:

samtools sort -o output.bam ex1.bam

Analogous to :class:`~pysam.AlignmentFile`, a :class:`~pysam.TabixFile` allows fast random access to compressed and tabix indexed tab-separated file formats with genomic data:

import pysam
tabixfile = pysam.TabixFile("example.gtf.gz")

for gtf in tabixfile.fetch("chr1", 1000, 2000):
    print(gtf.contig, gtf.start, gtf.end, gtf.gene_id)

:class:`~pysam.TabixFile` implements lazy parsing in order to iterate over large tables efficiently.

More detailed usage instructions is at :ref:`usage`.

Note

Coordinates in pysam are always 0-based (following the python convention). SAM text files use 1-based coordinates.

Note

The above examples can be run in the :file:`tests` directory of the
installation directory. Type 'make' before running them.
.. seealso::

   https://github.com/pysam-developers/pysam

       The pysam code repository, containing source code and download
       instructions

   http://pysam.readthedocs.org/en/latest/

       The pysam website containing documentation

API

SAM/BAM/CRAM files

Objects of type :class:`~pysam.AlignmentFile` allow working with BAM/SAM formatted files.

.. autoclass:: pysam.AlignmentFile
   :members:

.. autoclass:: pysam.AlignmentHeader
   :members:

An :class:`~pysam.AlignedSegment` represents an aligned segment within a SAM/BAM file.

.. autoclass:: pysam.AlignedSegment
   :members:

.. autoclass:: pysam.PileupColumn
   :members:

.. autoclass:: pysam.PileupRead
   :members:

.. autoclass:: pysam.IndexedReads
   :members:


Tabix files

:class:`~pysam.TabixFile` opens tabular files that have been indexed with tabix_.

.. autoclass:: pysam.TabixFile
   :members:

To iterate over tabix files, use :func:`~pysam.tabix_iterator`:

.. autofunction:: pysam.tabix_iterator

.. autofunction:: pysam.tabix_compress

.. autofunction:: pysam.tabix_index

.. autoclass:: pysam.asTuple
   :members:

.. autoclass:: pysam.asVCF
   :members:

.. autoclass:: pysam.asBed
   :members:

.. autoclass:: pysam.asGTF
   :members:


FASTA files

.. autoclass:: pysam.FastaFile
   :members:

FASTQ files

.. autoclass:: pysam.FastxFile
   :members:


.. autoclass:: pysam.FastqProxy
   :members:


VCF/BCF files

.. autoclass:: pysam.VariantFile
   :members:

.. autoclass:: pysam.VariantHeader
   :members:

.. autoclass:: pysam.VariantRecord
   :members:

.. autoclass:: pysam.VariantHeaderRecord
   :members:

HTSFile

HTSFile is the base class for :class:`pysam.AlignmentFile` and :class:`pysam.VariantFile`.

.. autoclass:: pysam.HTSFile
   :members: