Skip to content

Commit

Permalink
Add safe mode (base24) (#20)
Browse files Browse the repository at this point in the history
* Use recommended LCG parameter

* Add base24 mode

* Update dependencies
  • Loading branch information
turiyadev committed Jan 31, 2024
1 parent 3c17085 commit b474b21
Show file tree
Hide file tree
Showing 12 changed files with 2,035 additions and 662 deletions.
237 changes: 113 additions & 124 deletions README.md
Expand Up @@ -13,22 +13,28 @@ random-looking character strings.

Values are first transformed by a Linear Congruential Generator (a fast,
reversible pseudo-random number generator), then the result is encoded using a
Base41 encoding scheme. Minimum output length(s) can be specified, otherwise
the keymask length will scale as needed.
URL-safe character encoding. Minimum output length(s) can be specified,
otherwise the keymask length will scale as needed. Each instance can be
personalized using a unique seed value, resulting in unique keymask mappings.

The `Keymask` instance can be personalized using a 256-bit seed value. As long
as this value is kept secret, it will be difficult for anyone who doesn't know
the seed to reverse map the encoded values.
There are two modes of operation. The standard mode encodes each 32 bits to 6
characters, using a base-41 encoding scheme. Since this uses a mix of uppercase
and lowercase letters, it is only suitable when case-sensitivity can be
guaranteed. The more restrictive "safe" mode encodes each 32 bits to 7
characters using a base-24 encoding scheme. This can be used in
case-insensitive settings, such as a hostname or subdomain.

## Motivation

Serial numbers and sequential database IDs are extremely useful, however, when
displayed publicly they can leak information about system internals, such as
how old a given record is, or the frequency of record creation (see
["German tank problem"](https://search.brave.com/search?q=german+tank+problem)).
When serial numbers or sequential database IDs are displayed publicly, they can
leak information about system internals, such as how old a given record is, or
the frequency of record creation.

`Keymask` encodes serial numbers in such a way that they can be displayed to
end-users without revealing these kinds of details.
end users without revealing these kinds of details. Since they are meant to be
displayed publicly, measures are taken to avoid potentially offensive character
combinations (by omitting vowels from its encoding alphabet, `Keymask`
generally does not output recognizable words of any kind).

## Installation

Expand All @@ -39,70 +45,46 @@ using your preferred package manager (`npm i keymask`, `yarn add keymask`,
## Usage

The module exports three classes, `Keymask`, `KeymaskGenerator` (the LCG) and
`KeymaskEncoder` (the base41 encoder). These can be used independently of each
other, but for simple use cases, the main `Keymask` class is typically all you
need.
`KeymaskEncoder` (the character encoder). These can be used independently of
each other, but for simple use cases, the main `Keymask` class is typically all
you need.

The `Keymask` class constructor can optionally be passed an object containing
various settings. When no settings are provided, the resulting class instance
will encode variable-length outputs (the output length will depend on the
magnitude of the input value), and decoded values will be returned as either a
`number`, a `bigint` or an `ArrayBuffer`, depending again on the magnitude of
the value.
**Example (Default settings)**

**Options**
```JavaScript
import { Keymask } from "keymask";

const keymask = new Keymask();

const masked = keymask.mask(123456789); // "wMjMGR"
const unmask = keymask.unmask("wMjMGR"); // 123456789
```

## Options

```Typescript
type KeymaskOptions = {
seed?: ArrayBuffer;
size?: number | number[];
safe?: boolean;
type?: "number" | "bigint" | "string" | "integer" | "buffer";
encoder?: KeymaskEncoder;
};
```

**Example (Default settings, no seed, variable outputs)**

```JavaScript
import { Keymask } from "keymask";

const keymask = new Keymask();

const masked1 = keymask.mask(12); // "X"
const unmask1 = keymask.unmask("X"); // 12

const masked2 = keymask.mask(123456789); // "wMjMGR"
const unmask2 = keymask.unmask("wMjMGR"); // 123456789

const masked3 = keymask.mask(1234567890123456789n); // "csMvrvQsMdVG"
const unmask3 = keymask.unmask("csMvrvQsMdVG"); // 1234567890123456789n

const masked4 = keymask.mask(new Uint8Array([
1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12
]).buffer); // "NpRcJcFtscDkyxmQkD"

const unmask4 = kaymask.unmask("NpRcJcFtscDkyxmQkD");
// [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12] as ArrayBuffer
```

Since `v0.9.2`, the input value to the `mask` function can also be provided as
a `string`. This will be converted internally to a `BigInt` and treated as a
numeric value. For example, in the above example `keymask.mask("123456789")`
would also mask to `"wMjMGR"`.

### `seed`

If a `seed` value is provided, it will be used to initialize LCG offsets
and/or shuffle the Base41 encoding alphabet. This allows different `Keymask`
instances to produce completely different outputs.
and shuffle the encoding alphabet. This allows different `Keymask` instances to
produce completely different outputs.

The `seed` should be provided as an `ArrayBuffer`, and should be either 32 or 8
bytes long depending on whether a preconfigured `KeymaskEncoder` is used (see
`encoder` option below). When no `encoder` is provided, the full `32` bytes
are required.
The `seed` should be provided as an `ArrayBuffer`, and should be 32 bytes long
for standard mode, or 20 bytes for safe mode. Note, however, that if a
preconfigured `KeymaskEncoder` is used (see `encoder` option below), then the
seed supplied to the `Keymask` constructor only needs to be 8 bytes long.

Providing a randomized `seed` is generally recommended, as this makes the
mappings between inputs and outputs highly unpredictable. However, the `seed`
Providing a randomized `seed` is highly recommended, as this makes the mappings
between inputs and outputs relatively unpredictable. Note that the `seed`
should typically not change for the lifetime of your application, as this would
render it impossible to unmask previously masked values.

Expand All @@ -118,21 +100,8 @@ const keymask = new Keymask({
]).buffer
});

const masked1 = keymask.mask(12); // "P"
const unmask1 = keymask.unmask("P"); // 12

const masked2 = keymask.mask(123456789); // "GVSYBp"
const unmask2 = keymask.unmask("GVSYBp"); // 123456789

const masked3 = keymask.mask(1234567890123456789n); // "BVTFGGfNNqCX"
const unmask3 = keymask.unmask("BVTFGGfNNqCX"); // 1234567890123456789n

const masked4 = keymask.mask(new Uint8Array([
1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12
]).buffer); // "FYNGFBkhgnvBChrHQg"

const unmask4 = kaymask.unmask("FYNGFBkhgnvBChrHQg");
// [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12] as ArrayBuffer
const masked = keymask.mask(123456789); // "ycQXDm"
const unmask = keymask.unmask("ycQXDm"); // 123456789
```

### `size`
Expand All @@ -143,19 +112,18 @@ numbers to the `size` option. If a single number is provided, it defines the
automatically, with additional characters added as needed.

If an array of numbers is provided, they represent successive allowable output
lengths. If the highest provided `size` is less than `12`, then longer outputs
will scale automatically, as above. If you do not want this auto-scaling
behaviour, be sure to include `12` as the last value in the `size` array.
lengths. If the highest provided `size` is less than `12` (`14` in safe mode),
then longer outputs will scale automatically, as above. If you do not want
this auto-scaling behavior, be sure to include `12` ( or `14` in safe mode) as
the last value in the `size` array.

Note that long inputs (greater than 64 bits) are processed in 64-bit blocks;
the `size` setting applies only to the final block. If your inputs are
Note that long inputs (greater than 64 bits) are processed in 64-bit blocks,
and the `size` setting applies only to the final block. If your inputs are
word-aligned (to some multiple of 64 bits), it is generally recommended to
provide the setting `size: 12`, as this will ensure that unmasked values are
always a multiple of 64 bits long, even when the final block happens to contain
a value that can be encoded in fewer characters.

The `size`(s) should be between `1` and `12`, inclusive. Other values will be
silently ignored.
provide the setting `size: 12` (or `size: 14` in safe mode), as this will
ensure that unmasked values are always a multiple of 64 bits long, even when
the final block happens to contain a value that can be encoded in fewer
characters.

This setting should generally not be changed for the lifetime of your
application, as this may interfere with the ability to unmask previously masked
Expand All @@ -170,55 +138,57 @@ const keymask = new Keymask({
size: [5, 10]
});

const masked1 = keymask.mask(12); // "MxHqP"
const unmask1 = keymask.unmask("MxHqP"); // 12
const masked = keymask.mask(123456789); // "xMMJdmtCcf"
const unmask = keymask.unmask("xMMJdmtCcf"); // 123456789
```

### `safe`

Safe mode is triggered using a boolean flag on the options object.

const masked2 = keymask.mask(123456789); // "xMMJdmtCcf"
const unmask2 = keymask.unmask("xMMJdmtCcf"); // 123456789
**Example (Safe mode)**

const masked3 = keymask.mask(1234567890123456789n); // "csMvrvQsMdVG"
const unmask3 = keymask.unmask("csMvrvQsMdVG"); // 1234567890123456789n
```JavaScript
import { Keymask } from "keymask";

const masked4 = keymask.mask(new Uint8Array([
1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12
]).buffer); // "NpRcJcFtscDkbZZXpWbVyd"
const keymask = new Keymask({
safe: true
});

const unmask4 = kaymask.unmask("NpRcJcFtscDkbZZXpWbVyd");
// [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12] as ArrayBuffer
const masked = keymask.mask(123456789); // "mfwbdg"
const unmask = keymask.unmask("mfwbdg"); // 123456789
```

### `type`

By default, keymasks between 1 and 10 characters long will unmask to a
`number`, while 11- or 12-character keymasks will be unmask to a `BigInt` and
anything longer than 12 characters (=64 bits) will be returned as an
`ArrayBuffer`. Since there is no way of knowing in advance how long the
supplied keymask will be, the return type is a union type:
By default, `Keymask` unmasks values as a `number` when possible, while larger
values, up to 64 bits, will unmask to a `BigInt` and above 64 bits they will be
returned as an `ArrayBuffer`. Since there is no way of knowing in advance how
long the supplied keymask will be, the return type is a union type:

```TypeScript
type KeymaskData = number | bigint | string | ArrayBuffer;
```

There may very well be times when you know the expected return type in advance,
or you want to consistently cast the result to a specified type. In such cases,
you can supply the expected or desired type using the `type` option. If
provided, it must conform to one of the following strings:
There may well be times when you know the expected return type in advance, or
you want to cast the result to a specified type. In such cases, you can supply
the expected or desired type using the `type` option. If provided, it must
conform to one of the following strings (otherwise it will fall back to the
default behavior).

- `"number"` The result will be returned optimistically as a `number` type (no
type conversion is done, so be sure to only use this with short keymasks).
type conversion is done, so type safety is not guaranteed; be sure to only use
this with short keymasks).
- `"bigint"` The result will be converted to a `BigInt` regardless of its
magnitude.
- `"string"` The result will be converted to a `BigInt` then cast to a `string`
regardless of its magnitude.
- `"integer"` Similar to the default behaviour, but values larger than 64 bits
will be returned as a `BigInt` rather than an `ArrayBuffer`.
will be returned as a `BigInt` rather than an `ArrayBuffer`. The return type is
`number | bigint`.
- `"buffer"` The result will be converted to an `ArrayBuffer` regardless of its
magnitude.

These conversions are type-safe, so when calling from TypeScript, there is
generally no need to further cast the result before using it (except for
`"integer"` which returns a `number | bigint` union type).

**Example (Specify the return type)**

```JavaScript
Expand All @@ -230,11 +200,11 @@ const bigintKeymask = new Keymask({ type: "bigint" });
const stringKeymask = new Keymask({ type: "string" });
const bufferKeymask = new Keymask({ type: "buffer" });

const unmask1 = defaultKeymask.unmask("GVSYBp"); // 123456789 as KeymaskData
const unmask2 = numberKeymask.unmask("GVSYBp"); // 123456789 as number
const unmask3 = bigintKeymask.unmask("GVSYBp"); // 123456789n as bigint
const unmask4 = stringKeymask.unmask("GVSYBp"); // "123456789"
const unmask5 = bufferKeymask.unmask("GVSYBp");
const unmask1 = defaultKeymask.unmask("wMjMGR"); // 123456789 as KeymaskData
const unmask2 = numberKeymask.unmask("wMjMGR"); // 123456789 as number
const unmask3 = bigintKeymask.unmask("wMjMGR"); // 123456789n as bigint
const unmask4 = stringKeymask.unmask("wMjMGR"); // "123456789"
const unmask5 = bufferKeymask.unmask("wMjMGR");
// [21, 205, 91, 7, 0, 0, 0, 0] as ArrayBuffer
```

Expand All @@ -261,30 +231,30 @@ const sharedEncoder = new KeymaskEncoder(new Uint8Array([
1, 2, 3, 4, 5, 6, 7, 8,
9, 10, 11, 12, 13, 14, 15, 16,
17, 18, 19, 20, 21, 22, 23, 24
]));
]).buffer);

const keymask1 = new Keymask({
encoder: sharedEncoder,
seed: new Uint8Array([1, 2, 3, 4, 5, 6, 7, 8]),
seed: new Uint8Array([1, 2, 3, 4, 5, 6, 7, 8]).buffer,
size: 5
});

const keymask2 = new Keymask({
encoder: sharedEncoder,
seed: new Uint8Array([11, 22, 33, 44, 55, 66, 77, 88]),
seed: new Uint8Array([11, 22, 33, 44, 55, 66, 77, 88]).buffer,
size: 5
});

const mask1a = keymask1.mask(1); // "bgLtm"
const mask1b = keymask1.mask(2); // "FVyHP"
const mask1c = keymask1.mask(3); // "vrMZL"
const mask1a = keymask1.mask(1); // "yBtrf"
const mask1b = keymask1.mask(2); // "qTvWY"
const mask1c = keymask1.mask(3); // "ZkgVt"

const mask2a = keymask2.mask(1); // "LjjWh"
const mask2b = keymask2.mask(2); // "RHRkJ"
const mask2c = keymask2.mask(3); // "xmXrp"
const mask2a = keymask2.mask(1); // "tHHCd"
const mask2b = keymask2.mask(2); // "LWLzR"
const mask2c = keymask2.mask(3); // "xfQkX"
```

## Why Base41?
## Why Base 41?

Base41 is a highly efficient encoding for 16-, 32- and 64-bit values,
comparable to Base57 or Base85 in this respect. Whereas Base85 encodes 32 bits
Expand Down Expand Up @@ -317,6 +287,25 @@ otherwise offensive character combinations. The encoding is therefore both
URL-safe and "safe for all audiences". In addition, it is free of commonly
confused character sets, including `O` / `0` and `l` / `I` / `1`.

## Why Base 24?

The motivating use case for the "safe" mode is URL subdomains, as this portion
of the URL is case-insensitive. Thus, we only have 36 characters at our
disposal (the lowercase Latin ASCII alphabet and the 10 numerals). Base41 is
therefore not an option.

If we are to maintain the "safe for all audiences" standard, we must remove the
five vowels and any numbers that are commonly substituded for vowels in 1337
speak, pagercode and similar conventions. These include `0` for `o`, `1` for
`i`, `3` for `e`, and, depending on dialect, `4`, `6` or `8` for `a`, leaving
a maximum of 25-27 encoding characters.

Efficient encodings for 64-bit values include Base31 (13 characters per 64
bits) and Base24 (14 characters per 64 bits), and, given the above, the latter
is the most efficient encoding that can be used. Keymask's "safe" encoding
alphabet therefore consists of the lowercase Latin alphabet, minus the five
vowels and the letter `l`, plus the numbers `2`, `5`, `7` and `9`.

## Performance

On commodity hardware (2020 M1 Macbook Air), a single invocation of
Expand Down
14 changes: 7 additions & 7 deletions package.json
@@ -1,6 +1,6 @@
{
"name": "keymask",
"version": "0.9.2",
"version": "0.10.0",
"description": "Map sequential IDs or serial numbers to random-looking strings",
"type": "module",
"exports": {
Expand Down Expand Up @@ -52,17 +52,17 @@
},
"bugs": "https://github.com/keymask/keymask-js/issues",
"devDependencies": {
"@rollup/plugin-typescript": "^11.1.5",
"@rollup/plugin-typescript": "^11.1.6",
"@types/mocha": "^10.0.6",
"@types/node": "^20.10.4",
"@typescript-eslint/eslint-plugin": "^6.14.0",
"@typescript-eslint/parser": "^6.14.0",
"@types/node": "^20.11.13",
"@typescript-eslint/eslint-plugin": "^6.20.0",
"@typescript-eslint/parser": "^6.20.0",
"c8": "^8.0.1",
"eslint": "^8.56.0",
"eslint-plugin-jsdoc": "^46.9.1",
"eslint-plugin-jsdoc": "^46.10.1",
"mocha": "^10.2.0",
"rimraf": "^5.0.5",
"rollup": "^4.9.1",
"rollup": "^4.9.6",
"rollup-plugin-cleanup": "^3.2.1",
"rollup-plugin-dts": "^6.1.0",
"ts-node": "^10.9.2",
Expand Down

0 comments on commit b474b21

Please sign in to comment.