Skip to content

Latest commit

 

History

History
60 lines (43 loc) · 2.59 KB

CODE_EXAMPLES.rst

File metadata and controls

60 lines (43 loc) · 2.59 KB

Code Examples

In addition to command-line utilities, TRTools has a Python library which can be imported in custom scripts.

TR Utilities

The module :pytrtools.utils.utils contains various helpful functions, e.g. for inferring a repeat sequence motif from a string, converting repeat motifs to canonical representations, or computing functions like heterozygosity from allele frequency distributions:

import trtools.utils.utils as trutils  
trutils.InferRepeatSequence('ATATATATATATA', 2) # returns 'AT'
trutils.GetCanonicalMotif('CAG') # returns 'ACG'
afreqs = {10:0.25, 11:0.5, 12: 0.25} # frequency of each length
trutils.GetHeterozygosity(afreqs) # returns 0.625

See here <trtools.utils.utils> for a complete list of utility functions.

TR Harmonization

Note: the interfaces provided by this library are still under active development. While their rough shape will likely stay the same, particular details are likely to change in future releases.

The module trtools.utils.tr_harmonizer is responsible for providing a genotyper agnostic view of a VCF containing TR records. The class that provides this functionality is :pytrtools.utils.tr_harmonizer.TRRecord. There are two coding paradigms for accessing this API. If you just want to iterate through the TRRecords in a vcf, use the TRRecordHarmonizer:

import cyvcf2
import trtools.utils.tr_harmonizer as trh

invcf = cyvcf2.VCF("path/to/my.vcf")
harmonizer = trh.TRRecordHarmonizer(invcf)
for trrecord in harmonizer:
      # do something with the trrecord 
      # here, we print out an array of length genotypes
      # containing entries for each haplotype of each sample
      print(trrecord.GetLengthGenotypes())

If you want to work with the cyvcf2 VCF object yourself and then later convert it to a TRRecord, use the module method HarmonizeRecord:

import cyvcf2
import trtools.utils.tr_harmonizer as trh

invcf = cyvcf2.VCF("path/to/my.vcf")
#make sure to grab the vcf's filetype
vcftype = trh.InferVCFType(invcf)
for record in invcf:
  # do some filtering on the record
  passesFilters = filter(record)

  # later on, convert the record to a trrecord
  if not passesFilters:
     continue

  trrecord = trh.HarmonizeRecord(vcftype, record)
  # do something with the tr record
  # here, we print out an array of string genotypes
  # containing entries for each haplotype of each sample
  print(trrecord.GetStringGenotypes())

See :pytrtools.utils.tr_harmonizer for a complete list of all functions used for inspecting VCFs and TRRecords.