This document defines a data format and encryption algorithm agnostic specification for storing, accessing, and deleting end-to-end encrypted user data blobs. The goals being that:
- clients written to this specification should not have to trust the registry provider to keep user data private.
- clients should be able to run on a variety of platforms with minimal integration effort.
- clients can choose from a variety of encryption algorithms to suite their needs.
- power users can run their own instance of the data registry on a platform of their choosing.
This specification makes use of AES for client side data encryption and DID Authentication for generating access tokens to modify storage. It requires all encryption and decryption to be performed by the client and that no plain-text data ever be sent to the registry. The communication between client and server is transmitted in accordance with an HTTP API to facilitate integration with a wide variety of platforms and developer backgrounds.
Bloom will be providing an open source reference implementation of the data registry built with docker. This allows power users to easily deploy their own instances of the registry.
This document also defines some suggested methods for safe storage and transmission of AES keys for the purpose of using the registry to sync data between client devices belonging to the same user as well as recommended libraries and default encryption parameters to use for the AES implementation.
This example uses JSON but the implementation could just as easily be normalized SQL tables
Entity:
{
// DID for this entity
"did": "did:ethr:0xabc...",
"deletedIds": [2,4]
// return to the client as base 64 encoded data blob (the format of which is unknown to the registry)
"data": [
"YXNnYWRnaGFzZGdhZGg0IHVldHNmaGZna2pnaCBka2drbXhnIHNkZmdkZmc",
"YXNnYWRnaGFzZGdhZGg0IHVldHNmaGZna2pnaCBka2drbXhnIHNkZmdkZmc",
null,
"YXNnYWRnaGFzZGdhZGg0IHVldHNmaGZna2pnaCBka2drbXhnIHNkZmdkZmc",
null
]
}
When a new data blob is added to an entity the new data should be appended to the end of data array. When a data item is deleted, the cyphertext for that index should be set to null and the index should be appended to the deletedIds array (these two operations should happen atomically). This helps clients quickly check for new data that needs to be synced by querying for the length of the data array or check which of their cached data need to be deleted by querying with their cached deletedCount.
Initiates an access token request challenge for the requested did. If an entity does not already exist with a matching did then one is created. should be replaced with the did:ethr:0x... formatted DID. Returns the access token which needs to be subsequently signed before access is granted
Example POST url:
https://example.com/auth/request-token?did=did:ethr:0xabc...
Example response (aplication/json):
{
"token": "889a35f4-f4ef-431d-8eb7-629a126f972e"
}
Validates an access token for the requested DID. The access token and a valid signature of the token must be submitted for the token to be marked as validated. Once validated the token can be used until it is expired. To prevent leaking information about which DIDs have data stored in the registry and to mitigate possible future preimage attacks (what attacks?) the registry will process requests in this way.
- Lookup pending access token and linked entity by the posted access token uuid
- If none are found return 404
- Verify a DID exists for the entity and the posted signature is valid against that key
- If either is false return 401
- Else authorize the token and return it
Example POST body (application/json):
// request for a token
{
"accessToken": "889a35f4-f4ef-431d-8eb7-629a126f972e",
"signature": "0x123..."
}
Example response (aplication/json):
{
"expiresAt": 1546883485
}
The REST endpoints require an authorized access token which can be obtained by using the Authorization endpoints above. Then token must be specified in the Authorization header to the HTTP request Authorization: Bearer <token>
Javascript example:
fetch(
`http://example.com/data`,
{
headers: {
'Authorization': `Bearer ${accessToken}`,
'Content-Type': 'application/json',
},
}
)
Returns the entity object associated with the included access token without the data blobs.
Example response (aplication/json):
{
"did": "did:ethr:0xabc...",
// total number of data blobs
"dataCount" : 10,
// number of delete data blobs
"deletedCount": 2,
}
Returns the data objects associated with the included access token and between the start and end ids including the end. if end is omitted only the data with the starting id will be returned
Example GET url:
https://example.com/data/7/9
OR
https://example.com/data/7/9?cypherindex=AES_ENCRYPTED_INDEX_VALUE
OR
https://example.com/data/7/9?cypherindex=AES_ENCRYPTED_INDEX_VALUE_A,AES_ENCRYPTED_INDEX_VALUE_B
Notes on cypherindex:
The cypherindex
is an optional query parameter that can be generated by the client and is used to filter the result set by records that match the encrypted value.
Example response (aplication/json):
[
{
"id": 7,
"cyphertext": "YXNnYWRnaGFzZGdhZGg0IHVldHNmaGZna2pnaCBka2drbXhnIHNkZmdkZmc"
},
{
"id": 8,
"cyphertext": null,
},
{
"id": 9,
"cyphertext": "YXNkZmdlIHJ5NCA1dWV5aiBoIGR0aTM1dXdoZ3R1d3J0aHNhdHdlYWdkaHJ0aGFxcg=="
}
]
Deletes the data objects associated with the included access token and the included ids starting with :start and ending with :and inclusive. Signatures can be included in the body which can be verified upon receiving the deleted data. If signatures are included they must be signatures of the form "delete data id :id" where :id should be replaced with the id of the data being deleted. They also must be in ascending order respective to the ids
// to delete the 10th 11th and 12th data blobs
https://example.com/data/9/11
Example optional request body (aplication/json):
{
"signatures": [
"0x123...",
"0x456...",
"0x789...",
]
}
Example response (aplication/json):
{
"dataCount" : 10,
"deletedCount": 3
}
Create a new data blob for the entity associated with the included access token
Example POST url:
https://example.com/data
Example request body (aplication/json):
{
// optional expected id, request will rollback if new id does not match
"id": 9,
"cyphertext": "YXNnYWRnaGFzZGdhZGg0IHVldHNmaGZna2pnaCBka2drbXhnIHNkZmdkZmc",
// optional, but recommended, see below notes
"cypherindex": "AES_ENCRYPTED_INDEX_VALUE"
}
Example request body (aplication/json) multiple indexes:
{
// optional expected id, request will rollback if new id does not match
"id": 9,
"cyphertext": "YXNnYWRnaGFzZGdhZGg0IHVldHNmaGZna2pnaCBka2drbXhnIHNkZmdkZmc",
// optional, but recommended, see below notes
"cypherindex": ["AES_ENCRYPTED_INDEX_VALUE_A", "AES_ENCRYPTED_INDEX_VALUE_B"]
}
Notes on cypherindex:
The cypherindex is intended as both a private way to improve usability and performance. It's value is up to the client to generate. A recommended implementation for indexing the data as a type phone
attestation would look like this:
-
Create some JSON that includes a private to the client
nonce
value and thetype
of data that's being stored (in our examplephone
):{ "nonce": "SUFFICIENTLY_RANDOM_STRING_VALUE", "type": "phone" }
-
Encrypt the JSON string with the same AES key as was used to create the
cyphertext
The SUFFICIENTLY_RANDOM_STRING_VALUE
should be stored locally, and is used to make common property values like type
of phone
private / not searchable in the vault db and will be required when using for lookup later in the GET /data/:end/:start?cypherindex=AES_ENCRYPTED_INDEX_VALUE
request. The same nonce
value can be used for all indexes, but that's up to the client implementation.
Example response (aplication/json):
{
"id": 9
}
GET /deletions/:start/:end
Returns the ids of the data blobs deleted between the **i***th* and **j***th* deletion inclusive. j is optional
Example GET url:
// to get the third deletion (0 indexed)
https://example.com/data/deletions/2
// to get the third fourth, and fifth deletion
https://example.com/data/deletions/2/4
Example response (aplication/json):
[
{
id: 2,
"signature": "0xab0..."
},
{
id: 50,
"signature": null
},
{
id: 6,
"signature": "0xcd1..."
}
]
A typical scenario might be as follows:
- Client device A adds 3 data blobs for an entity
- Entity decides to delete one of the blobs using device A
- Device A caches the deletedCount as 1
- The same entity syncs his DID to client device B
- Client device B notices that the deletedCount is > 0 and calls /deleted/0
- Entity decides to delete another blob on device B
- Next time device A syncs it notices the deleted count is now 2 and calls /deleted/1 since it already deleted its cached data for the first deleted item
- When posting new data:
- Make use of the "id" request parameter to prevent replay attacks and duplicated data.
- Sign and encrypt your data blobs and verify them upon receipt to reduce risk of a malicious provider returning invalid data.
- Include the data id in the signed/encrypted data to prevent the possibility of a malicious provider re-ordering your data
- When deleting data make use of the "signature" parameter and verify it upon receiving deletions. This makes it difficult for a provider to fake a deletion.
- install node if you havent already (version 14 recommended) *if you have nvm run
nvm use
npm install
npm run docker-debug
use the VSCode debug profiles to attach the debugger to the server or the tests or both
first start up in debug mode using the commands above then
npm run test
By default the above tests use the .env.test
file, but it can easily be overridden by setting the TEST_ENV
to a different file path like so:
TEST_ENV=../.env npm run test
using the VSCode debug profile "Attach to Docker" will enable hot reloading. Or you can run npm run watch
in a separate terminal
first set the required environment variables like so
cp .env.sample .env
nano .env #edit your file
chmod 600 .env
docker-compose -f docker-compose.yml -f docker-db.yml up --build -d
make sure you set the following values in your .env file (see above)
POSTGRES_USER
POSTGRES_HOST
POSTGRES_PORT
POSTGRES_DATABASE
then start up the container
docker-compose up --build -d
copy the root cert to ./pg_ca.crt then
docker-compose -f docker-compose.yml -f docker-pg-ssl.yml up --build
docker-compose -f docker-debug.yml down --volumes
if you want errors to be posted as json to an external logging service set the following environement variables
LOG_URL
LOG_USER
LOG_PASSWORD
the logger will use basic http authentication with the username and password
if you use the included postgres image and want to periodically back up the volume, set the BACKUP_LOCATION
environment variable to a location on the host machine
- the POSTGRES_PASSWORD will be set in the volume the first time running and will not reset between rebuilding the images. If you want to change the password you have to either remove the volume using the command above or connect using a pg client and change it
- On initial boot the debugdb will use the
POSTGRES_DB
env var, not thePOSTGRES_DATABASE
env var, to create a db