Skip to content

Latest commit

 

History

History
230 lines (154 loc) · 5.74 KB

api.rst

File metadata and controls

230 lines (154 loc) · 5.74 KB

Introduction

Pysam is a python module that makes it easy to read and manipulate mapped short read sequence data stored in SAM/BAM files. It is a lightweight wrapper of the htslib C-API.

This page provides a quick introduction in using pysam followed by the API. See usage for more detailed usage instructions.

To use the module to read a file in BAM format, create a ~pysam.AlignmentFile object:

import pysam
samfile = pysam.AlignmentFile("ex1.bam", "rb")

Once a file is opened you can iterate over all of the read mapping to a specified region using ~pysam.AlignmentFile.fetch. Each iteration returns a ~pysam.AlignedSegment object which represents a single read along with its fields and optional tags:

for read in samfile.fetch('chr1', 100, 120):
    print(read)

samfile.close()

To give:

EAS56_57:6:190:289:82   0   99  <<<7<<<;<<<<<<<<8;;<7;4<;<;;;;;94<; 69  CTCAAGGTTGTTGCAAGGGGGTCTATGTGAACAAA 0   192 1
EAS56_57:6:190:289:82   0   99  <<<<<<;<<<<<<<<<<;<<;<<<<;8<6;9;;2; 137 AGGGGTGCAGAGCCGAGTCACGGGGTTGCCAGCAC 73  64  1
EAS51_64:3:190:727:308  0   102 <<<<<<<<<<<<<<<<<<<<<<<<<<<::<<<844 99  GGTGCAGAGCCGAGTCACGGGGTTGCCAGCACAGG 99  18  1
...

You can also write to a ~pysam.AlignmentFile:

import pysam
samfile = pysam.AlignmentFile("ex1.bam", "rb")
pairedreads = pysam.AlignmentFile("allpaired.bam", "wb", template=samfile)
for read in samfile.fetch():
    if read.is_paired:
        pairedreads.write(read)

pairedreads.close()
samfile.close()

An alternative way of accessing the data in a SAM file is by iterating over each base of a specified region using the ~pysam.AlignmentFile.pileup method. Each iteration returns a ~pysam.PileupColumn which represents all the reads in the SAM file that map to a single base in the reference sequence. The list of reads are represented as ~pysam.PileupRead objects in the PileupColumn.pileups <pysam.PileupColumn.pileups> property:

import pysam
samfile = pysam.AlignmentFile("ex1.bam", "rb" )
for pileupcolumn in samfile.pileup("chr1", 100, 120):
    print("\ncoverage at base %s = %s" % (pileupcolumn.pos, pileupcolumn.n))
    for pileupread in pileupcolumn.pileups:
        if not pileupread.is_del and not pileupread.is_refskip:
            # query position is None if is_del or is_refskip is set.
            print('\tbase in read %s = %s' %
                  (pileupread.alignment.query_name,
                   pileupread.alignment.query_sequence[pileupread.query_position]))

samfile.close()

The above code outputs:

coverage at base 99 = 1
    base in read EAS56_57:6:190:289:82 = A

coverage at base 100 = 1
    base in read EAS56_57:6:190:289:82 = G

coverage at base 101 = 1
    base in read EAS56_57:6:190:289:82 = G

coverage at base 102 = 2
    base in read EAS56_57:6:190:289:82 = G
    base in read EAS51_64:3:190:727:308 = G
...

Commands available in samtools are available as simple function calls. For example:

pysam.sort("-o", "output.bam", "ex1.bam")

corresponds to the command line:

samtools sort -o output.bam ex1.bam 

Analogous to ~pysam.AlignmentFile, a ~pysam.TabixFile allows fast random access to compressed and tabix indexed tab-separated file formats with genomic data:

import pysam
tabixfile = pysam.TabixFile("example.gtf.gz")

for gtf in tabixfile.fetch("chr1", 1000, 2000):
    print(gtf.contig, gtf.start, gtf.end, gtf.gene_id)

~pysam.TabixFile implements lazy parsing in order to iterate over large tables efficiently.

More detailed usage instructions is at usage.

Note

Coordinates in pysam are always 0-based (following the python convention). SAM text files use 1-based coordinates.

Note

The above examples can be run in the tests directory of the

installation directory. Type 'make' before running them.

https://github.com/pysam-developers/pysam

The pysam code repository, containing source code and download instructions

http://pysam.readthedocs.org/en/latest/

The pysam website containing documentation

API

SAM/BAM/CRAM files

Objects of type ~pysam.AlignmentFile allow working with BAM/SAM formatted files.

pysam.AlignmentFile

pysam.AlignmentHeader

An ~pysam.AlignedSegment represents an aligned segment within a SAM/BAM file.

pysam.AlignedSegment

pysam.PileupColumn

pysam.PileupRead

pysam.IndexedReads

Tabix files

~pysam.TabixFile opens tabular files that have been indexed with tabix.

pysam.TabixFile

To iterate over tabix files, use ~pysam.tabix_iterator:

pysam.tabix_iterator

pysam.tabix_compress

pysam.tabix_index

pysam.asTuple

pysam.asVCF

pysam.asBed

pysam.asGTF

FASTA files

pysam.FastaFile

FASTQ files

pysam.FastxFile

pysam.FastqProxy

VCF/BCF files

pysam.VariantFile

pysam.VariantHeader

pysam.VariantRecord

pysam.VariantHeaderRecord

HTSFile

HTSFile is the base class for pysam.AlignmentFile and pysam.VariantFile.

pysam.HTSFile