Pysam is a python module that makes it easy to read and manipulate mapped short read sequence data stored in SAM/BAM files. It is a lightweight wrapper of the htslib_ C-API.
This page provides a quick introduction in using pysam followed by the API. See :ref:`usage` for more detailed usage instructions.
To use the module to read a file in BAM format, create a :class:`~pysam.AlignmentFile` object:
import pysam samfile = pysam.AlignmentFile("ex1.bam", "rb")
Once a file is opened you can iterate over all of the read mapping to a specified region using :meth:`~pysam.AlignmentFile.fetch`. Each iteration returns a :class:`~pysam.AlignedSegment` object which represents a single read along with its fields and optional tags:
for read in samfile.fetch('chr1', 100, 120): print(read) samfile.close()
To give:
EAS56_57:6:190:289:82 0 99 <<<7<<<;<<<<<<<<8;;<7;4<;<;;;;;94<; 69 CTCAAGGTTGTTGCAAGGGGGTCTATGTGAACAAA 0 192 1 EAS56_57:6:190:289:82 0 99 <<<<<<;<<<<<<<<<<;<<;<<<<;8<6;9;;2; 137 AGGGGTGCAGAGCCGAGTCACGGGGTTGCCAGCAC 73 64 1 EAS51_64:3:190:727:308 0 102 <<<<<<<<<<<<<<<<<<<<<<<<<<<::<<<844 99 GGTGCAGAGCCGAGTCACGGGGTTGCCAGCACAGG 99 18 1 ...
You can also write to a :class:`~pysam.AlignmentFile`:
import pysam samfile = pysam.AlignmentFile("ex1.bam", "rb") pairedreads = pysam.AlignmentFile("allpaired.bam", "wb", template=samfile) for read in samfile.fetch(): if read.is_paired: pairedreads.write(read) pairedreads.close() samfile.close()
An alternative way of accessing the data in a SAM file is by iterating over each base of a specified region using the :meth:`~pysam.AlignmentFile.pileup` method. Each iteration returns a :class:`~pysam.PileupColumn` which represents all the reads in the SAM file that map to a single base in the reference sequence. The list of reads are represented as :class:`~pysam.PileupRead` objects in the :attr:`PileupColumn.pileups <pysam.PileupColumn.pileups>` property:
import pysam samfile = pysam.AlignmentFile("ex1.bam", "rb" ) for pileupcolumn in samfile.pileup("chr1", 100, 120): print("\ncoverage at base %s = %s" % (pileupcolumn.pos, pileupcolumn.n)) for pileupread in pileupcolumn.pileups: if not pileupread.is_del and not pileupread.is_refskip: # query position is None if is_del or is_refskip is set. print('\tbase in read %s = %s' % (pileupread.alignment.query_name, pileupread.alignment.query_sequence[pileupread.query_position])) samfile.close()
The above code outputs:
coverage at base 99 = 1 base in read EAS56_57:6:190:289:82 = A coverage at base 100 = 1 base in read EAS56_57:6:190:289:82 = G coverage at base 101 = 1 base in read EAS56_57:6:190:289:82 = G coverage at base 102 = 2 base in read EAS56_57:6:190:289:82 = G base in read EAS51_64:3:190:727:308 = G ...
Commands available in `samtools`_ are available as simple function calls. For example:
pysam.sort("-o", "output.bam", "ex1.bam")
corresponds to the command line:
samtools sort -o output.bam ex1.bam
Analogous to :class:`~pysam.AlignmentFile`, a :class:`~pysam.TabixFile` allows fast random access to compressed and tabix indexed tab-separated file formats with genomic data:
import pysam tabixfile = pysam.TabixFile("example.gtf.gz") for gtf in tabixfile.fetch("chr1", 1000, 2000): print(gtf.contig, gtf.start, gtf.end, gtf.gene_id)
:class:`~pysam.TabixFile` implements lazy parsing in order to iterate over large tables efficiently.
More detailed usage instructions is at :ref:`usage`.
Note
Coordinates in pysam are always 0-based (following the python convention). SAM text files use 1-based coordinates.
Note
- The above examples can be run in the :file:`tests` directory of the
- installation directory. Type 'make' before running them.
.. seealso:: https://github.com/pysam-developers/pysam The pysam code repository, containing source code and download instructions http://pysam.readthedocs.org/en/latest/ The pysam website containing documentation
Objects of type :class:`~pysam.AlignmentFile` allow working with BAM/SAM formatted files.
.. autoclass:: pysam.AlignmentFile :members:
.. autoclass:: pysam.AlignmentHeader :members:
An :class:`~pysam.AlignedSegment` represents an aligned segment within a SAM/BAM file.
.. autoclass:: pysam.AlignedSegment :members:
.. autoclass:: pysam.PileupColumn :members:
.. autoclass:: pysam.PileupRead :members:
.. autoclass:: pysam.IndexedReads :members:
:class:`~pysam.TabixFile` opens tabular files that have been indexed with tabix_.
.. autoclass:: pysam.TabixFile :members:
To iterate over tabix files, use :func:`~pysam.tabix_iterator`:
.. autofunction:: pysam.tabix_iterator
.. autofunction:: pysam.tabix_compress
.. autofunction:: pysam.tabix_index
.. autoclass:: pysam.asTuple :members:
.. autoclass:: pysam.asVCF :members:
.. autoclass:: pysam.asBed :members:
.. autoclass:: pysam.asGTF :members:
.. autoclass:: pysam.FastaFile :members:
.. autoclass:: pysam.FastxFile :members:
.. autoclass:: pysam.FastqProxy :members:
.. autoclass:: pysam.VariantFile :members:
.. autoclass:: pysam.VariantHeader :members:
.. autoclass:: pysam.VariantRecord :members:
.. autoclass:: pysam.VariantHeaderRecord :members:
HTSFile is the base class for :class:`pysam.AlignmentFile` and :class:`pysam.VariantFile`.
.. autoclass:: pysam.HTSFile :members: