kvector

What is `kvector`?

kvector is a small utility for converting motifs to kmer vectors to compare motifs of different lengths

Free software: BSD license
Documentation: https://olgabot.github.io/kvector

Installation

To install this code, clone this github repository and use pip to install

git clone https://github.com/olgabot/kvector.git
cd kvector
pip install .  # The "." means "install *this*, the folder where I am now"

Features

Check out this notebook for an overview of features with both inputs and outputs (below shows only inputs)

Count k-mers for each line in a `bed` file (multithreaded)

For each interval in a bed file, count the kmers and return a (n_intervals, n_kmers) matrix of the k-mer counts of each region.

kmers = kvector.per_interval_kmers(bedfile, genome_fasta, threads=threads,
    kmer_lengths=(4, 5, 6), residues='ACGT')
csv = bedfile.replace('.bed', '_kmers.csv')
kmers.to_csv(csv)

Count k-mers for each line in a `bed` file, intersected (multithreaded)

For each interval in a bed file, intersect it with another (other) bed file (e.g. only conserved regions of introns) and count k-mers for the intersected region. Returns a (n_intervals, n_kmers) matrix of the k-mer counts of each line in the bed file, intersected with the other bed.

kmers = kvector.per_interval_kmers(bedfile, genome_fasta, other, threads=threads,
    kmer_lengths=(4, 5, 6))
csv = bedfile.replace('.bed', '_kmers.csv')
kmers.to_csv(csv)

Count all k-mers in a fasta file

kmer_vector = kvector.count_kmers('kvector/tests/data/example.fasta', kmer_lengths=(3, 4))
kmer_vector.head()

Read HOMER motif file

motifs = kvector.read_motifs("kvector/tests/example.motifs", residues='ACGT')

The output is a pandas Series of the motif ids from the file, mapped to a dataframe of the position-weight matrix of the motif.

Create metadata matrix from the ID lines of the motifs

metadata = kvector.create_metadata(motifs)

Transform the motif PWM to a kmer vector

Keep in mind that on most computers, only kmers up to about 8 (4^8 = 65,536) can be stored in memory. You may want to do this on a supercomputer and not just your laptop.

motif_kmer_vectors = kvector.motifs_to_kmer_vectors(motifs, residues='ACGT',
    kmer_lengths=(4, 5, 6))

Name		Name	Last commit message	Last commit date
Latest commit History 59 Commits
docs		docs
kvector		kvector
testing		testing
.editorconfig		.editorconfig
.gitignore		.gitignore
.travis.yml		.travis.yml
AUTHORS.rst		AUTHORS.rst
CONTRIBUTING.rst		CONTRIBUTING.rst
HISTORY.rst		HISTORY.rst
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
Makefile		Makefile
README.md		README.md
README.rst		README.rst
conda_requirements.txt		conda_requirements.txt
environment.yml		environment.yml
overview.ipynb		overview.ipynb
requirements.txt		requirements.txt
setup.cfg		setup.cfg
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

kvector

What is `kvector`?

Installation

Features

Count k-mers for each line in a `bed` file (multithreaded)

Count k-mers for each line in a `bed` file, intersected (multithreaded)

Count all k-mers in a fasta file

Read HOMER motif file

Create metadata matrix from the ID lines of the motifs

Transform the motif PWM to a kmer vector

About

Releases

Packages

Languages

License

olgabot/kvector

Folders and files

Latest commit

History

Repository files navigation

kvector

What is kvector?

Installation

Features

Count k-mers for each line in a bed file (multithreaded)

Count k-mers for each line in a bed file, intersected (multithreaded)

Count all k-mers in a fasta file

Read HOMER motif file

Create metadata matrix from the ID lines of the motifs

Transform the motif PWM to a kmer vector

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

What is `kvector`?

Count k-mers for each line in a `bed` file (multithreaded)

Count k-mers for each line in a `bed` file, intersected (multithreaded)

Packages