Skip to content

kvector is a small utility for converting motifs to kmer vectors to compare motifs of different lengths

License

Notifications You must be signed in to change notification settings

olgabot/kvector

Repository files navigation

kvector

What is kvector?

kvector is a small utility for converting motifs to kmer vectors to compare motifs of different lengths

Installation

To install this code, clone this github repository and use pip to install

git clone https://github.com/olgabot/kvector.git
cd kvector
pip install .  # The "." means "install *this*, the folder where I am now"

Features

Check out this notebook for an overview of features with both inputs and outputs (below shows only inputs)

Count k-mers for each line in a bed file (multithreaded)

For each interval in a bed file, count the kmers and return a (n_intervals, n_kmers) matrix of the k-mer counts of each region.

kmers = kvector.per_interval_kmers(bedfile, genome_fasta, threads=threads,
    kmer_lengths=(4, 5, 6), residues='ACGT')
csv = bedfile.replace('.bed', '_kmers.csv')
kmers.to_csv(csv)

Count k-mers for each line in a bed file, intersected (multithreaded)

For each interval in a bed file, intersect it with another (other) bed file (e.g. only conserved regions of introns) and count k-mers for the intersected region. Returns a (n_intervals, n_kmers) matrix of the k-mer counts of each line in the bed file, intersected with the other bed.

kmers = kvector.per_interval_kmers(bedfile, genome_fasta, other, threads=threads,
    kmer_lengths=(4, 5, 6))
csv = bedfile.replace('.bed', '_kmers.csv')
kmers.to_csv(csv)

Count all k-mers in a fasta file

kmer_vector = kvector.count_kmers('kvector/tests/data/example.fasta', kmer_lengths=(3, 4))
kmer_vector.head()

Read HOMER motif file

motifs = kvector.read_motifs("kvector/tests/example.motifs", residues='ACGT')

The output is a pandas Series of the motif ids from the file, mapped to a dataframe of the position-weight matrix of the motif.

Create metadata matrix from the ID lines of the motifs

metadata = kvector.create_metadata(motifs)

Transform the motif PWM to a kmer vector

Keep in mind that on most computers, only kmers up to about 8 (4^8 = 65,536) can be stored in memory. You may want to do this on a supercomputer and not just your laptop.

motif_kmer_vectors = kvector.motifs_to_kmer_vectors(motifs, residues='ACGT',
    kmer_lengths=(4, 5, 6))

About

kvector is a small utility for converting motifs to kmer vectors to compare motifs of different lengths

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published