Skip to content
/ ness Public

NESS: embedding-based similarily search tool for biological sequences

License

Notifications You must be signed in to change notification settings

omixlab/ness

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

30 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

NESS

NESS (Neural Sequence Search) is an alignment-free tool for sequence search based on word embedding an approximate nearest neighbor (ANN) search. The tool is still under development and the code present in this repository is a proof of concept distributed under the GPL v3 license.

Installation

$ pip install ness-search

Try on Google Colab!

Try NESS on this Google Colab notebook.

Usage

Currently the NESS CLI interface provides the following commands:

ness build_model

Creates a Word2Vec model from a multi FASTA file. For DNA sequences, use --both-strands.

$ ness build_model \
    --input swissprot.fasta \
    --output swissprot.model

ness build_database

Similarly to makeblastdb, formats a sequence database with vectors computed using a model previously built. For DNA sequences, use --both-strands.

$ ness build_database \
    --input swissprot.fasta \
    --model swissprot.model \
    --output swissprot.csv

ness search

Similarly to the blast* programs, compares a multi FASTA file with the previously formated database.

$ ness search \
    --input sequences.fasta \
    --database swissprot \
    --output hits.csv

Cite

Kremer, FS et al (2021). NESS: an word embedding-based tool for alignment-free sequence search. Available at: https://github.com/omixlab/ness.

Acknownledgements

NESS was supported by grants from Fundação de Amparo à Pesquisa do Estado do Rio Grande do Sul (FAPERGS) and is developed in partership with BiomeHub.

About

NESS: embedding-based similarily search tool for biological sequences

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages