Skip to content

osawa-naotaka/staticseek

Repository files navigation

staticseek: A Lightweight, Fast Full-text Search Engine Supporting All Unicode Languages

For detailed instructions on how to use staticseek, please visit the official website.

This package is developed using Node.js v22.

To build staticseek, execute the following command:

git clone https://github.com/osawa-naotaka/staticseek.git
cd staticseek
npm install
npm run build

The JavaScript files will be generated in the dist directory.

Additionally, by running the npm run dev command, you can execute the application and benchmark. Please refer to the comments in index.html.

Overview

staticseek is a client-side full-text search engine designed specifically for static websites. It enables searching through arrays of JavaScript objects containing strings or string arrays. By converting your articles into JavaScript objects, you can implement full-text search functionality on static sites without any server-side implementation.

Key Features

  • Simple and intuitive API
  • Support for fuzzy search with customizable edit distance
  • Advanced search operations (AND, OR, NOT)
  • Field-specific search capabilities
  • TF-IDF based scoring with customizable weights
  • Google-like query syntax
  • Unicode support for all languages including CJK characters and emojis
  • Multiple index implementations for different performance needs
  • Seamless integration with popular Static Site Generators (SSG)

Quick Start

First, install staticseek in your environment:

npm install staticseek

Alternatively, you can directly import staticseek using jsDelivr's CDN service.

Next, import staticseek into your project and perform indexing and searching. Here, array_of_articles represents an array of JavaScript objects containing the text to be searched.

import { LinearIndex, createIndex, search, StaticSeekError } from "staticseek";

// Create an index
const index = createIndex(LinearIndex, array_of_articles);
if(index instanceof StaticSeekError) throw index;

// Perform a search
const result = await search(index, "search word");
if(result instanceof StaticSeekError) throw result;
for(const r of result) {
  console.log(array_of_articles[r.id]);
}

The search results are returned as an array, sorted by score (relevance). The id field in each result contains the array index of the matching document.

To accelerate searches using WebGPU, use the following code. The usage after index creation remains the same as above.

import { GPULinearIndex, createIndex, search, StaticSeekError } from "staticseek";

const index = createIndex(GPULinearIndex, array_of_articles);
...

If you experience performance issues, try using speed-optimized index.

import { HybridTrieBigramInvertedIndex, createIndex, search, StaticSeekError } from "staticseek";

const index = createIndex(HybridTrieBigramInvertedIndex, array_of_articles);
...

Search Features

Query Syntax

  • Fuzzy Search: Default behavior with configurable edit distance
    • distance:2 searchterm - allows 2 character edits
  • Exact Match: "exact phrase"
  • AND Search: term1 term2
  • OR Search: term1 OR term2
  • NOT Search: -term1 term2
  • Field-Specific: from:title searchterm

Index Types

  1. LinearIndex (Default)

    • Best for small to medium-sized content
    • Simple and reliable
    • Good balance of performance and accuracy
  2. GPULinearIndex

    • WebGPU-accelerated fuzzy search
    • 2-10x faster for larger datasets
    • Gracefully falls back to LinearIndex when WebGPU is unavailable
  3. HybridTrieBigramInvertedIndex

    • ~100x faster search performance
    • Ideal for larger datasets
    • Trade-offs:
      • Higher false positive rate for CJK-like languages
      • Less precise fuzzy search for CJK-like languages
      • Limited result metadata

Performance

By selecting the appropriate index type, typical search times can be kept within a few milliseconds. Search performance for a 4MB dataset (worst case scenario, slowest index type, approximately 100 articles):

  • Exact Match: < 5ms
  • Fuzzy Search: < 150ms
  • Index Generation: ~1sec
    • ~30sec for HybridTrieBigramInvertedIndex

For detailed benchmarks across different hardware configurations and index types, see the Benchmarks section of official website.

Integration with Static Site Generators

Example implementations are available for:

About

lightweight full-text search engine for statice site.

Resources

License

Stars

Watchers

Forks

Packages

No packages published