Skip to content
km : a software for RNA-seq investigation using k-mer decomposition
Branch: master
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Type Name Latest commit message Commit time
Failed to load latest commit information.
.github/ISSUE_TEMPLATE Update issue templates Nov 1, 2018
example update documentation May 17, 2019
README.rst Update README.rst May 17, 2019 mv install script May 17, 2019


km : a software for RNA-seq investigation using k-mer decomposition


This tool was developed to identify and quantify the occurence of single nucleotide variants, insertions, deletions and duplications in RNA-seq data. Contrary to most tools that try to report all variants in a complete genome, here we instead propose to focus the analysis on small regions of interest.

Given a reference sequence (typically a few hundred base pairs) around a known or suspected mutation in a gene of interest, all possible sequences that can be be created between the two end k-mers according to the sequenced reads will be reported. A ratio of variant allele vs WT will be computed for each possible sequence constructed.


Easy install: will install jellyfish with python binding, km in a virtual environement, and test it. Without modification, all the code source will be downloaded in your $HOME/software directory and all executable will be available in the virtual environement directory: $HOME/.virtualenvs/km.



  • Copy/past each line in a terminal.
  • The virtual environment need to loaded each time you open a new terminal, with this command:
$ source $HOME/.virtualenvs/km/bin/activate

Setup install:

If you don't use, before used, you need install Jellyfish with Python bindings, before using


  • Python 2.7.6 or later
  • Jellyfish 2.2 or later with Python bindings.
  • (Optional) Matplotlib
$ python install
$ km -h
$ km find_mutation ./data/catalog/GRCh38/NPM1_4ins_exons_10-11utr.fa ./data/jf/02H025_NPM1.jf | km find_report -t ./data/catalog/GRCh38/NPM1_4ins_exons_10-11utr.fa

Without install:

km can be executed directly from source code.


  • Python 2.7.6 or later
  • Jellyfish 2.2 or later with Python bindings.
  • (Optional) Matplotlib
$ python -m km find_mutation ./data/catalog/GRCh38/NPM1_4ins_exons_10-11utr.fa ./data/jf/02H025_NPM1.jf | km find_report -t ./data/catalog/GRCh38/NPM1_4ins_exons_10-11utr.fa

Display help:

From source:

$ cd [your_km_folder]
$ python -m km -h

After install:

$ km -h

Design your target sequence:

(Coming soon)

km's tools overview:

For more detailed documentation click here


This is the main tool of km, to identify and quantify mutations from a target sequence and a k-mer jellyfish database.

$ km find_mutation -h
$ km find_mutation [your_fasta_targetSeq] [your_jellyfish_count_table]
$ km find_mutation [your_catalog_directory] [your_jellyfish_count_table]


This tool parse find_mutation output to reformat it in more user friendly tabulated file.

$ km find_report -h
$ km find_report -t [your_fasta_targetSeq] [find_mutation_output]
$ km find_mutation [your_fasta_targetSeq] [your_jellyfish_count_table] | km find_report -t [your_fasta_targetSeq]


This tools display some k-mer's coverage stats of a target sequence and a list of jellyfish database.

$ km min_cov -h
$ km min_cov [your_fasta_targetSeq] [[your_jellyfish_count_table]...]


Length of k-mers is a central parameter:

  • To produce a linear directed graph from the target sequence.
  • To avoid false-positive. `find_mutation`_ shouldn't be use on jellyfish count table build with k<21 bp (we recommand k=31 bp, by default)

linear_kmin tool is design to give you the minimun k length to allow a decomposition of a target sequence in a linear graph.

$ km linear_kmin -h
$ km linear_kmin [your_catalog_directory]

Runing km on a real sample from scratch:

In the example folder you can find a script to help you to run your first km analysis on a Leucegene sample.

You can’t perform that action at this time.