A Rails gem that enables searchable blind indexing for PII fields — powered by a Rust extension for performance.
PiiCipher handles the search layer of encrypted PII. It is designed to sit alongside Rails' built-in ActiveRecord::Encryption (encrypts :email), which handles the actual column encryption. Together they give you full GDPR-compliant storage: the real value never touches the database as plaintext, and searching still works.
PiiCipher computes HMAC-SHA256 hashes of the plaintext value before it is encrypted, and stores those hashes in a separate column. Queries are rewritten to search the hashes — the ciphertext column is never scanned.
Two search modes are supported:
| Mode | Column type | Use case |
|---|---|---|
| Partial (default) | jsonb array |
LIKE-style substring searches (e.g. searching "smi" matches "Smith") |
| Exact | string |
Exact-match lookups (e.g. looking up a full SSN or email) |
For partial search, PiiCipher slides a window across the plaintext and HMAC-SHA256s each n-gram using your secret key. The window size defaults to 3 (trigrams) and is configurable per attribute with gram_size::
"smith" → ["smi", "mit", "ith"] → [hmac("smi"), hmac("mit"), hmac("ith")]
By default values are downcased before hashing, so search is case-insensitive ("smi" matches "Smith"). Set case_sensitive: true to opt out.
These hashes are stored in a jsonb array column. Querying with where(email: "mit") generates the same hashes for the search term and uses a PostgreSQL @> (contains) check — no plaintext ever touches the database.
Partial search is approximate: @> matches when the stored array contains all of the search term's n-gram hashes, which is occasionally satisfied by values that don't actually contain the term as a contiguous substring. Treat it like a fast candidate filter; if you need exact substring semantics, re-filter the returned (decrypted) records in Ruby.
For exact match, a single HMAC-SHA256 of the full value is stored in a regular string column. Querying generates the same hash and does a standard equality check.
Both hash functions live in a Rust extension (magnus bindings + the hmac and sha2 crates) and are called transparently from Ruby.
PiiCipher only generates the blind indexes — it does not encrypt the column itself. Column encryption is handled by Rails AR Encryption (encrypts). The two work at different layers and do not interfere:
user.save
├─ before_save (pii_cipher) → reads plaintext → writes hashes to email_bidx_array
└─ DB write (Rails AR Enc.) → encrypts plaintext → writes ciphertext to email column
Because Rails AR Encryption works at the DB serialization layer (not a callback), self.email always returns plaintext during before_save — pii_cipher always hashes the real value, never the ciphertext.
- Ruby >= 3.1
- Rails / ActiveRecord >= 7.1 (Active Record Encryption ships in Rails 7.0+)
- PostgreSQL (partial search relies on the
jsonb@>operator) - Rust toolchain (only needed when building the gem from source)
Add to your Gemfile:
gem "pii_cipher"Then run:
bundle installRun this once to generate the three keys Rails AR Encryption needs:
bin/rails db:encryption:initCopy the output into your credentials file:
bin/rails credentials:editactive_record_encryption:
primary_key: <generated>
deterministic_key: <generated>
key_derivation_salt: <generated>These keys encrypt and decrypt the column values. Keep them in your secrets manager — losing them means losing access to your data.
PiiCipher reads the HMAC key from the PII_SECRET_KEY environment variable. Add it to your environment (e.g. via credentials, dotenv, or your secrets manager):
PII_SECRET_KEY=your-long-random-secret-hereGenerate a secure random value with:
rails secretChanging this key will invalidate all existing blind indexes.
For each encrypted attribute, add the corresponding blind index column in a migration.
Partial search (default — stores trigram hashes in a jsonb array):
class AddEmailBidxToUsers < ActiveRecord::Migration[8.1]
def change
add_column :users, :email_bidx_array, :jsonb
add_index :users, :email_bidx_array, using: :gin
end
endExact search (stores a single hash string):
class AddSsnBidxToUsers < ActiveRecord::Migration[8.1]
def change
add_column :users, :ssn_bidx, :string
add_index :users, :ssn_bidx
end
endThe GIN index on jsonb columns is strongly recommended for performance on partial searches.
Declare encrypts (Rails AR Encryption) first, then use_pii_cipher. Both must be present for full GDPR-compliant searchable encryption.
class User < ApplicationRecord
encrypts :email # Rails: stores ciphertext in DB, decrypts on read
use_pii_cipher :email # pii_cipher: generates trigram blind indexes from plaintext
encrypts :ssn
use_pii_cipher :ssn, partial: false # exact-match blind index
endMultiple attributes can be passed to use_pii_cipher in a single call:
encrypts :email, :phone_number
use_pii_cipher :email, :phone_numberNo changes to your existing create/update code. Everything happens automatically:
User.create!(email: "alice@example.com", ssn: "123-45-6789")What happens under the hood:
before_save(pii_cipher) reads"alice@example.com"as plaintext, generates trigram hashes, writes them toemail_bidx_array- Rails AR Encryption encrypts
"alice@example.com"and writes ciphertext to theemailcolumn
user = User.find(1)
# Ruby — always decrypted transparently by Rails
user.email
# => "alice@example.com"
# Raw database row — email column holds ciphertext, blind index holds hashes
# email => {"p":"Wd5LybiwJGPHYI...","h":{"iv":"XJul...","at":"Pk..."}}
# email_bidx_array => ["a3f2c1...", "9b4e7d...", ...]Nobody with direct database access can read the email. The blind index is just opaque hashes — it reveals nothing about the original value without the PII_SECRET_KEY.
Pass the plaintext value to where exactly as you normally would — PiiCipher intercepts encrypted columns and rewrites the query to search the blind index:
# Partial search — finds any user whose email contains "alice"
User.where(email: "alice")
# Exact search — finds the user with that exact SSN
User.where(ssn: "123-45-6789")
# Mix encrypted and plain columns freely
User.where(email: "alice", status: "active")The found records have their emails decrypted by Rails on the way out — callers always receive plaintext. The interceptor only rewrites keys declared with use_pii_cipher; all other where calls pass through to ActiveRecord unchanged.
Benchmarked on a local machine against PostgreSQL 18 with 100,000 rows. The comparison baseline is a plain (unencrypted) column with a standard index — the closest real-world alternative for each search type.
| Time (100k rows) | |
|---|---|
| Plain insert | 1,221 ms |
| Encrypted insert | 2,861 ms (+134%) |
The overhead is not from the Rust hashing — that runs in microseconds. It comes from writing significantly more data per row: each record gains a jsonb array of 64-character HMAC hex strings (one per trigram) and a 64-character blind index string. Both the larger rows and the GIN index maintenance during insert contribute to the slower writes.
| Query type | Plain | Encrypted | Difference |
|---|---|---|---|
| Exact match (B-tree) | 0.121 ms | 0.095 ms | ~within noise |
| Partial match (GIN) | 1.515 ms | 1.865 ms | +23% |
Exact match is effectively identical. Both paths hit a B-tree index; the lookup cost is the same regardless of what the key looks like.
Partial match is ~23% slower. The GIN index sizes end up comparable (see below), but PostgreSQL has to parse the jsonb array and evaluate the @> containment operator on each probe, which adds a small constant overhead that pg_trgm's native GIN operator doesn't pay.
| Table total | Email index | Name GIN index | |
|---|---|---|---|
| Plain | 21 MB | 5 MB | 7.2 MB |
| Encrypted | 89 MB | 12 MB | 7.0 MB |
The table is 4.2× larger. Every stored trigram hash is 64 characters regardless of what the original value looked like — a 5-character name still produces 3 trigrams × 64 chars = 192 bytes of blind index data. At large scale, this is the dominant cost to plan for.
The email B-tree index is 2.4× larger for the same reason (64-char hash vs ~25-char email). The name GIN index sizes are nearly identical — HMAC hashes repeat across rows the same way plain trigrams do (same input + same key = same hash), so the GIN posting lists compress similarly.
- Reads are fast. Sub-millisecond exact lookups and ~2ms partial searches hold up well even at this row count.
- Writes cost more. If your workload is write-heavy on PII fields, budget for the extra insert time.
- Storage is the main tradeoff. Plan for roughly 4× the table and index footprint compared to an equivalent unencrypted schema.
You can reproduce these results yourself:
ruby -I lib benchmarks/run.rbuse_pii_cipher(*attributes, partial: true, gram_size: 3, case_sensitive: false)
| Option | Type | Default | Description |
|---|---|---|---|
partial |
Boolean | true |
true → n-gram array in column_bidx_array; false → single hash in column_bidx |
gram_size |
Integer | 3 |
Sliding-window size for partial search. Ignored when partial: false. Changing it invalidates existing indexes. |
case_sensitive |
Boolean | false |
false downcases values before hashing (case-insensitive search). Must match between stored index and queries; changing it invalidates existing indexes. |
- Query rewriting covers hash-form
where.Model.where(email: "x"), scopes, and chained relations (Model.active.where(email: "x")) are all rewritten. Conditions that don't go throughwhere(hash)are not rewritten — includingwhere.not(...), raw string/array conditions (where("email = ?", x)),.or(...)branches, andfind_bywith string SQL. For those, build the blind index yourself withPiiCipher.generate_ngram_hashes/generate_blind_index. - Partial search is approximate and may over-match (see "How it works"). Re-filter in Ruby if you need exact substring semantics.
- Search terms shorter than
gram_sizeare hashed whole and only match values that were themselves shorter thangram_size. Prefer search terms at leastgram_sizecharacters long. - PostgreSQL only for partial search — it uses the
jsonb@>containment operator. - Key/option changes invalidate indexes. Changing
PII_SECRET_KEY,gram_size, orcase_sensitivemeans existing blind indexes no longer match; you must re-save affected records to regenerate them.
After checking out the repo, run bin/setup to install dependencies (this also compiles the Rust extension). Then run the test suite:
bundle exec rake specThe Ruby specs include a PostgreSQL-backed integration suite (it builds a temporary table and exercises real @> queries). Set the standard PG* env vars to point at a database, or skip those examples with bundle exec rspec --tag ~integration. The Rust extension also has its own unit tests, runnable from ext/pii_cipher with cargo test.
To open an interactive console with the gem loaded:
bin/consoleTo build and install the gem locally:
bundle exec rake installBug reports and pull requests are welcome on GitHub at https://github.com/selvachezhian/pii_cipher. This project is intended to be a safe, welcoming space for collaboration, and contributors are expected to adhere to the code of conduct.
The gem is available as open source under the terms of the MIT License.