Skip to content
master
Switch branches/tags
Go to file
Code

Latest commit

 

Git stats

Files

Permalink
Failed to load latest commit information.
Type
Name
Latest commit message
Commit time
src
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

README.md

⍼ Resin.Search

NuGet version (Resin.Search)

Overview | How to install | User guide

HTTP search engine/embedded library

Launch a Resin HTTP server or use the Resin search library to search through any vector space. With hardware accelerated vector operations from MathNet Resin is especially well suited for problem spaces that can be defined as such.

Vector spaces are configured by implementing IModel.

Document database

Resin stores data as document collections. It applies your prefered IModel onto your data while you write and query it. The write pipeline produces a set of indices (graphs), one for each document field, that you may interact with by using the Resin web GUI, the Resin read/write JSON HTTP API, or programmatically.

Vector-based indices

Resin indices are binary search trees and creates clusters of those vectors that are similar to each other, as you populate them with your data. Graph nodes are created in the Tokenize method of your model. When a node is added to the graph its cosine angle, i.e. its similarity to other nodes, determine its position (path) within the graph.

Customizable vector spaces

Resin comes pre-loaded with two IModel vector space configurations: one for text and another for MNIST images. The text model has been tested by validating indices generated from Wikipedia search engine backup files as well as by parsing Common Crawl WAT, WET and WARC files, to determine at which scale Resin may operate in and at what accuracy.

The image model is included mostly as an example of how to implement your own prefered machine-learning algorithm for building custom-made search indices. The error rate of the image classifier is ~5%.

Performance

Currently, Wikipedia size data sets produce indices capable of sub-second phrase searching.

You may also

  • build, validate and optimize indices using the command-line tool Sir.Cmd
  • read efficiently by specifying which fields to return in the JSON result
  • implement messaging formats such as XML (or any other, really) if JSON is not suitable for your use case
  • construct queries that join between fields and even between collections, that you may post as JSON to the read endpoint or create programatically.
  • construct any type of indexing scheme that produces any type of embeddings with virtually any dimensionality using either sparse or dense vectors.

Applications

Executables

  • Sir.HttpServer: HTTP search service with HTML GUI and HTTP JSON API for reading and writing.
  • Sir.Cmd: Command line tool that executes commands that implement Sir.ICommand. Write, validate, optimize and more via command-line.

Libraries

  • Sir.CommonCrawl: Command for downloading and indexing Common Crawl WAT and WET files.
  • Sir.Mnist: Command for training and testing the accuracy of a index of MNIST images.
  • Sir.Wikipedia: Command for indexing Wikipedia.
  • Sir.Search: In-process search engine.
  • Sir.Core: Shared interfaces and types, such as IModel, ICommand and IVector.

Roadmap

  • v0.1a - bag-of-characters vector space language model
  • v0.2a - HTTP API
  • v0.3a - query language
  • v0.4 - linear classifier image model
  • v0.5 - semantic language model
  • v1.0 - voice model
  • v2.0 - image-to-voice
  • v2.1 - voice-to-text
  • v2.2 - text-to-image
  • v2.3 - AI