Skip to content

nacNAC333/factbase

 
 

Repository files navigation

Single-Table NoSQL-ish In-Memory Database

DevOps By Rultor.com We recommend RubyMine

rake discipline PDD status Gem Version Test Coverage Yard Docs Hits-of-Code License FOSSA Status

This Ruby gem manages an in-memory database of facts. A fact is simply an associative array of properties and their values. The values are either atomic literals or non-empty sets of literals. It is possible to delete a fact, but impossible to delete a property from a fact.

Here is how you use it (it's thread-safe, by the way):

fb = Factbase.new
f = fb.insert
f.kind = 'book'
f.title = 'Object Thinking'
fb.query('(eq kind "book")').each do |f|
  f.seen = Time.now
end
fb.insert
fb.query('(not (exists seen))').each do |f|
  f.title = 'Elegant Objects'
end

You can save the factbase to the disk and then load it back:

file = '/tmp/simple.fb'
f1 = Factbase.new
f = f1.insert
f.foo = 42
File.binwrite(file, f1.export)
f2 = Factbase.new
f2.import(File.binread(file))
assert(f2.query('(eq foo 42)').each.to_a.size == 1)

You can check the presence of an attribute by name and then set it, also by name:

n = 'foo'
if f[n].nil?
  f.send("#{n}=", 'Hello, world!')
end

You can make a factbase log all operations:

require 'loog'
require 'factbase/logged'
log = Loog::VERBOSE
fb = Factbase::Logged.new(Factbase.new, log)
f = fb.insert

You can also count the amount of changes made to a factbase:

require 'loog'
require 'factbase/tallied'
log = Loog::VERBOSE
fb = Factbase::Tallied.new(Factbase.new, log)
f = fb.insert
churn = fb.churn
assert churn.inserted == 1

Properties are accumulative. Setting a property again adds a value instead of overwriting:

f = fb.insert
f.foo = 42
f.foo = 43
assert(f.foo == 42)
assert(f['foo'] == [42, 43])
fb.query('(eq foo 43)').each do |f|
  assert(f.foo == 42)
  assert(f['foo'].include?(43))
end

Deleting while iterating is unsafe and may cause elements to be skipped:

fb = Factbase.new
fb.insert.id = 1
fb.insert.id = 2
fb.query('(always)').each do |f|
  fb.query("(eq id #{f.id})").delete!
end
assert(1 == fb.size)

To safely delete, use a snapshot:

fb = Factbase.new
fb.insert.id = 1
fb.insert.id = 2
fb.query('(always)').to_a.each do |f|
  fb.query("(eq id #{f.id})").delete!
end
assert(0 == fb.size)

Terms

There are some boolean terms available in a query (they return either true or false):

  • (always) and (never) are true and false
  • (nil v) is true if v is nil
  • (not b) is the inverse of b
  • (or b1 b2 ...) is true if at least one argument is true
  • (and b1 b2 ...) — if all arguments are true
  • (when b1 b2) — if b1 is true and b2 is true or b1 is false
  • (exists p) — if p property exists
  • (absent p) — if p property is absent
  • (zero v) — if any v equals to zero
  • (eq v1 v2) — if any v1 equals to any v2
  • (lt v1 v2) — if any v1 is less than any v2
  • (gt v1 v2) — if any v1 is greater than any v2
  • (many v) — if v has many values
  • (one v) — if v has one value

There are string manipulators:

  • (concat v1 v2 v3 ...) — concatenates all v
  • (sprintf v v1 v2 ...) — creates a string by v format with params
  • (matches v s) — if any v matches the s regular expression

There are a few terms that return non-boolean values:

  • (at i v) is the i-th value of v
  • (size v) is the cardinality of v (zero if v is nil)
  • (type v) is the type of v ("String", "Integer", "Float", "Time", or "Array")
  • (either v1 v2) is v2 if v1 is nil

It's possible to modify the facts retrieved, on fly:

  • (as p v) adds property p with the value v
  • (join s t) adds properties named by the s mask with the values retrieved by the t term, for example, (join "x<=foo,y<=bar" (gt x 5)) will add x and y properties, setting them to values found in the foo and bar properties in the facts that match (gt x 5)

Also, some simple arithmetic:

  • (plus v1 v2) is a sum of ∑v1 and ∑v2
  • (minus v1 v2) is a deduction of ∑v2 from ∑v1
  • (times v1 v2) is a multiplication of ∏v1 and ∏v2
  • (div v1 v2) is a division of ∏v1 by ∏v2

It's possible to add and deduct string values to time values, like (plus t '2 days') or (minus t '14 hours').

Types may be converted:

  • (to_int v) is an integer of v
  • (to_str v) is a string of v
  • (to_float v) is a float of v

One term is for meta-programming:

  • (defn f "self.to_s") defines a new term using Ruby syntax and returns true
  • (undef f) undefines a term (nothing happens if it's not defined yet), returns true

There are terms that are history of search aware:

  • (prev p) returns the value of p property in the previously seen fact
  • (unique p1 p2 ...) returns true if at least one property value hasn't been seen yet; returns false when all specified properties have duplicate values in this particular combination

The agg term enables sub-queries by evaluating the first argument (term) over all available facts, passing the entire subset to the second argument, and then returning the result as an atomic value:

  • (lt age (agg (eq gender 'F') (max age))) selects all facts where the age is smaller than the maximum age of all women
  • (eq id (agg (always) (max id))) selects the fact with the largest id
  • (eq salary (agg (eq dept $dept) (avg salary))) selects the facts with the salary average in their departments

There are also terms that match the entire factbase and must be used primarily inside the (agg ..) term:

  • (nth v p) returns the p property of the v-th fact (must be a positive integer)
  • (first p) returns the p property of the first fact
  • (count) returns the tally of facts
  • (max p) returns the maximum value of the p property in all facts
  • (min p) returns the minimum
  • (sum p) returns the arithmetic sum of all values of the p property

It's also possible to use a sub-query in a shorter form than with the agg:

  • (empty q) is true if the subquery q is empty

It's possible to post-process a list of facts, for agg and join:

  • (sorted p expr) sorts them by the value of p property
  • (inverted expr) reverses them
  • (head n expr) takes only n facts from the head of the list

There are some system-level terms:

  • (env v1 v2) returns the value of environment variable v1 or the string v2 if it's not set

Architecture

The entire database is a single flat Ruby Array of Hash objects held in RAM (Factbase#@maps). There are no tables, schemas, or type enforcement beyond four scalar types: Integer, Float, String, and Time. This contrasts with SQLite (fixed-column tables on disk) and MongoDB (typed document collections). New programmers must understand that all data vanishes on process exit unless export/import is called explicitly.

Each property of a fact is a non-empty ordered set of values rather than a single value. Assigning f.foo = 1 then f.foo = 2 produces f['foo'] == [1, 2]; each assignment appends. Reading f.foo returns the first element; f['foo'] returns the full array. This accumulative semantics differs from SQL (one value per column) and most NoSQL stores where assignment overwrites. New programmers must expect multi-element arrays on every property read.

Queries use a custom Lisp-style S-expression language: (and (eq kind 'book') (gt age 10)). Factbase::Syntax tokenizes and parses a query string into an AST of Factbase::Term objects; Factbase::Query#each evaluates that AST against every fact. This differs from SQL, XPath, and JSONPath. New programmers add operators by implementing a term class, not by modifying parser grammar.

Each query operator (eq, gt, agg, join, etc.) is a separate class under lib/factbase/terms/. Factbase::Term holds a dispatch hash (@terms) mapping operator symbols to instances and delegates evaluate and predict calls there. This is not a class hierarchy — adding a new operator requires a new file in terms/ and a registration line in the Factbase::Term constructor. New programmers extending the query language must follow this two-step pattern.

Transactions are ACID and implemented via lazy copy-on-write journaling. Factbase#txn wraps the array in Factbase::LazyTaped, which defers physical duplication of hash objects until the first write. Inserts, deletes, and property additions are tracked by Ruby object_id. On commit the journal is replayed into the main array; raising Factbase::Rollback discards it. Nesting transactions is explicitly forbidden by Factbase::Light. This differs from SQLite's WAL and PostgreSQL's MVCC.

Cross-cutting capabilities — thread safety, indexing, constraint validation, logging, and change counting — are added via decorators: Factbase::SyncFactbase, Factbase::IndexedFactbase, Factbase::Rules, Factbase::Logged, and Factbase::Tallied. The decoor gem provides delegation boilerplate. The bare Factbase class is not thread-safe; new programmers must wrap it with SyncFactbase before sharing across threads.

Persistence uses Ruby's Marshal, serializing the internal array of hashes to a binary blob via Marshal.dump. The format is Ruby-version-specific and not portable across major Ruby versions or platforms, unlike JSON or Protocol Buffers. Output-only decorators Factbase::ToJson, Factbase::ToXml, and Factbase::ToYaml exist but do not support round-trip import.

Factbase::IndexedFactbase lazily builds a hash-based inverted index for equality queries, keyed by array object_id, property name, and operator. The index is built incrementally on each query and invalidated entirely on any mutation (delete or property addition). Without this decorator every query#each call performs a full linear scan over all facts. New programmers should add IndexedFactbase whenever the factbase holds more than a few thousand facts.

How to contribute

Read these guidelines. Make sure your build is green before you contribute your pull request. You will need to have Ruby 3.4+ and Bundler installed. Then:

bundle update
bundle exec rake

If it's clean and you don't see any error messages, submit your pull request.

Benchmark

This is the result of the benchmark:

                                                                   user
void scan                                                      0.000986
20k facts: export: 2973KB                                      0.869801
20k facts: import: 2973KB                                      0.997180
50k facts: read                                                0.000174
50k facts: read in txn                                         0.001175
50k facts: insert                                              0.000082
50k facts: insert in txn                                       0.000182
50k facts: modify                                              1.330390
50k facts: modify in txn                                       2.558140
12k facts: large query: match 3k                              12.527501
12k facts: large query: match 3k in txn                       17.404027
12k facts: large query: match zero                            13.257105
12k facts: large query: match zero in txn                     18.218544

The results were calculated in this GHA job on 2026-05-06 at 08:43, on Linux with 4 CPUs.

About

In-memory database of facts (records with attributes) with a predicative searching facility

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages

  • Ruby 100.0%