NESS (Neural Sequence Search) is an alignment-free tool for sequence search based on word embedding an approximate nearest neighbor (ANN) search. The tool is still under development and the code present in this repository is a proof of concept distributed under the GPL v3 license.
$ pip install ness-search
Try NESS on this Google Colab notebook.
Currently the NESS CLI interface provides the following commands:
Creates a Word2Vec model from a multi FASTA file. For DNA sequences, use --both-strands
.
$ ness build_model \
--input swissprot.fasta \
--output swissprot.model
Similarly to makeblastdb
, formats a sequence database with vectors computed using a
model previously built. For DNA sequences, use --both-strands
.
$ ness build_database \
--input swissprot.fasta \
--model swissprot.model \
--output swissprot.csv
Similarly to the blast*
programs, compares a multi FASTA file with the previously formated database.
$ ness search \
--input sequences.fasta \
--database swissprot \
--output hits.csv
Kremer, FS et al (2021). NESS: an word embedding-based tool for alignment-free sequence search. Available at: https://github.com/omixlab/ness.
NESS was supported by grants from Fundação de Amparo à Pesquisa do Estado do Rio Grande do Sul (FAPERGS) and is developed in partership with BiomeHub.