Natix is an opensource (Apache 2.0) library with state of the art methods in the following areas:
- Similarity and metric search
- Exact indexes
- Approximate indexes
- Compressed (and compact) data-structures
- Indexed bitmaps
- Indexed sequences
- Indexed permutations
- Sorted and unsorted lists of integers
- Integer encoders
- Fulltext indexes
- Inverted lists
- Intersection algorithms
- T-Threshold algorithms
- Union-Intersection algorithms
- Searching and sorting algorithms and structures
- Numeric sorting
- Comparison based algorithms - sorting and searching
- Adaptive searching/sorting with SkipLists
Natix is designed for experimentation and testing of algorithms. However, it should be robust enough to be of use in application development.
Natix has few dependencies
- A running Mono/.NET environment. We use the mono framework in Linux.
- JSon.NET from Newtonsoft
- (Optional) An IDE to modify the C# project Monodevelop / XamarinStudio / #Develop / VisualStudio. We use monodevelop in Linux.
New: We added a Makefile that compile libraries and programs. Moreover, it fetches dependencies using nuget.exe Using this Makefile is also of use for environments without an IDE
At the main directory execute
make all
It will fetch nuget.exe
from NuGet and then get Json from Newtonsoft (using nuget.exe). After that, it will build the natix libraries and programs:
natix.dll
andnatix.SimilaritySearch.dll
ExactIndexes.exe
andApproxIndexes.exe
You can copy all *.dll
and *.exe
to your working directory or work directly in the current path
You can write you own programs and scripts (e.g., using IronPython or any other programming language for the CLR).
For similarity search, natix.SimilaritySearch
, we provide two
programs ApproxIndexes.exe
and ExactIndexes
, for aproximate and
exact indexes, with a bunch of indexes to be tested with a number of
parameters.
Both can be used for benchmarking purposes or examples of how to create these
indexes. Of course, you need to check the available options.
A complete example is provided
exp-approx.py
and exp-exact.py
, two Python scripts,
those systematically perform a benchmarking.
Also, for summarize the output of ApproxIndexes.exe
and ExactIndexes.exe
we provide the summary.py
script.
All these tools are not intended to be exhaustive for the available indexes, and should be adjusted as need.
We strongly recommend to use standard databases for testing purposes. We also provide some fixed queries.
- SISAP project
- Histogram of colors
- Hard queries - nearest neighbors are close to mean
- Easy queries - following dataset distribution
- Dictionaries
- English
- Spanish
- Nasa images
- Queries
- Histogram of colors
- Wiktionary
- CoPhIR project
- Synthetic datasets. Randomly generated datasets in the unitary cube, from a wide range of dimensions and sizes. We provide a python script for this purpose. New synthetic queries can be created with this tool.
- Texmex datasets. The one billion dataset. We provide a python script to decode binary files.