Skip to content
Permalink
Branch: master
Find file Copy path
Find file Copy path
Fetching contributors…
Cannot retrieve contributors at this time
134 lines (75 sloc) 15.5 KB

Data Decentralization

Opt-In

By default all of a broker's data is stored in broker-local storage and not in decentralized persistence.

A user has to either opt-in all their data or select segment-keys to leverage decentralized persistence.

Regardless of opt-in, the user always has at their disposal client-driven quasi-decentralization as enabled by the client-driven data migration use-cases.

With the exception of the section on client-driven quasi-decentralization and parts of "Broker API Support For Data-Stewardship", this write-up discusses data-stewardship and use of IPFS as if the user did opt-in into decentralized persistence.

Data Stewardship

Data decentralization in overhide is supported by an explicitly managed network of data-stewards.

Data-stewards are all overhide brokers on the same distributed persistence network to which a user authorizes: a subset of all overhide brokers on the network.

An active data-steward is an authorized overhide broker that the user interacts with for mutating data. Only one overhide broker can be delegated as an active data-steward (see "Data Mutation Speed Considerations" below).

A passive data-steward is any other authorized overhide broker. There can be many passive data-stewards. These are read-only. These guarantee to make the user's data available as soon as it's shared with them by the active data-steward. A passive data-steward can be made "active" at any time; it's prudent to deactivate any previously active data-steward.

All nodes on the broker's distributed persistence network may receive and make-available the user's data; but the stewards ensure to pin the user's data and guarantee its availability.

stewards

Data Mutation Speed Considerations

Ideally data changes would be immediately finally consistent all across a distributed persistence network. This is not feasible in practice.

User's data mutations go to a specially nominated active data-steward before being propagated to other peers. Each data change progresses through three persistence-levels:

Each level operates at different update speeds. Data propagation could be superceded at any level by a faster rate of change upstream.

memory

The working-memory and permanent-memory are part of the active data-steward. Only changes that progress to shared-memory can be pinned by passive data-stewards and made available by nodes on the distributed persistence network.

The SHA-256 content-hashes of any segment-key value can be retrieved for each persistence-level: use the GET /{segment-key}/persistence-status API call. On an active steward equivalence between the shared-memory hash and the working-memory hash means consistency is achieved. On a passive steward data is driven by the shared-memory value; all hashes would be the same, always. A client can use this API to validate data propagation, pining, hosting, to passive stewards.

Client-Driven Quasi-Decentralization

A user has ultimate control over their data. A core-value of overhide is that users are always able to extract all of their data from any overhide system (provided they know their credentials). The data may be exported from one overhide system and imported into another--presumably with incompatible underlying distributed persistence networks. Users can leverage tooling that uses the import/export functionality of overhide broker APIs to extract all their data so they can do with it as they please. That's a contractual obligation on a proper overhide broker implementation.

import/export only

The next section introduces a couple client-driven data-migration use-cases that dive deeper into this topic in the corresponding API documentation.

Broker API Support For Data-Stewardship

The data-stewardship mechanism exists to provide control over where data resides, which necessarily includes ability to move data around. The distributed persistence network and the client-driven quasi-decentralization are but a transport layer. Managing data transfers on top of these features--without service interruption--require coordination via several APIs. The APIs involved are covered in several sample use cases:

  1. Disassociation from Steward Over Persistence Network With Pour-Active Steward
    • migration of data from one steward to another without interruption
    • typical use-case of migrating service or app data while allowing user access
    • leverages distributed persistence network
    • all migrated data is opted-in
  2. Client Driven Export+Fill With Pour-Active Steward
    • migration of data from one steward to another without interruption
    • similar use-case of migration data while allowing user access to above; except includes all data, even not opted-in
    • does not involve a distributed persistence network
  3. Client Driven Export+Import
    • most basic export and import
    • user access to data is interrupted
    • typical use-case of migrating data by a single user who's fine waiting for the migration to occur

Refer to the data-stewardship section of the Broker API for an in-depth look at these data-migration use cases as available to overhide brokers.

Overview of overhide Reference Implementation and IPFS

The overhide broker reference implementation uses IPFS for its decentralized persisted data.

All user data and data-specific metadata is stored on the IPFS network.

Broker specific metadata--e.g. configuration and session information--is not stored on the IPFS network.

A broker facilitates a connection to IPFS and enforces an overhide specific data structure on IPFS. It enables current Web 2.0 services to leverage IPFS through the overhide abstraction.

overhide broker as facilitator to IPFS /* app uses JSON to push objects into overhide which stores files into a nice structure, ipfs is in a cloud, other overhide instances reference the same cloud*/

Identity Child Directories

An identity is a token calculated client-side--in overhide.js. To an overhide broker an identity certifies a user owns the public key used for broker remuneration. It is also used to create a directory to contain all the data, metadata, and backchannel queues for segment-keys owned by the user.

Each broker is a steward of its users' data in an instance-local repository (/repo). This repository has a folder for each of the users with their identity as the directory name. These folders are constantly updated in IPFS and IPNS as users work with their data--this necessarily means that each identity has its own private-public IPNS key-pair. The public key constitutes a GUID (globally unique ID), which is the user's data content-root on the IPNS network. Ideally, on each change of the user's data, the IPFS merkle-dag is updated and the IPNS snapshot is re-referenced. Realistically the IPFS network--and especially the slower IPNS reference to the most recent data--is only eventually consistent as seen by other brokers.

delay in distributing user data /* user works with data in /overhide-repo and files, ipfs points to /oh-repo with a clock beside the line, ipns points to ipfs, other broker peers point to ipns and have a clock beside their lines, as well as a googly eyes as in 'looking', there are data bits flying between user and /oh-repo much faster than the googly bits being updated one way in ipfs, even more delay for other nodes */

As each user modifies their data, the broker updates IPFS, necessarily pins all the content, and re-publishes to IPNS under the user's IPNS public-key hash (see ipfs name publish)--the GUID. The broker makes the user's GUID available to the app (service) via APIs. Armed with the GUID the user can use tooling to directly access their most recent data on IPFS--allowing for publishing delay--or direct their app (service) to use a different broker to work with their data.

Data as Files

A broker writes data to local files under *//repo/<identity>/<segment-key>/<identity>. All of a user's owned content is under their //repo/<identity>/ subfolder. The data itself sits in the file named with identity as file name. This is to distinguish the owner's data from delegate data.

This data is navigable via IPNS by dereferencing <GUID>/<segment-key>/<identity> where GUID is the identities IPNS public-key hash.

The data is encrypted in a usage specific way.

Delegate Data as Files

Delegate data resides in *//repo/<owner-identity>/<segment-key>/<delegate-identity>. The owner-identity subfolder is the identity of the owner of the referenced segment-key. The delegate-identity file contains the delegate data and is named with the delegate's identity as file name.

The delegage data is navigable via IPNS by dereferencing <GUID>/<segment-key>/<delegate-identity>.

The data is encrypted in a usage specific way.

Messages as Files

All backchannel queues for an identity are filed under *//repo/<identity>/<segment-key>/backchannel/<identity>. The queue is stored as a file with identity as the file name. This is to distinguish the owner's backchannel from delegate's backchannels.

Each queue is navigable via IPNS by dereferencing <GUID>/<segment-key>/backchannel/<identity>.

The messages are encrypted in a usage specific way, decipherable by recipient upon receipt.

Delegate Messages as Files

Delegate's messages are filed under *//repo/<owner-identity>/<segment-key>/backchannel/<delegate-identity>. The owner-identity subfolder is the identity of the owner of the referenced segment-key. The delegate-identity file contains the delegate backchannel queue and is named with the delegate's identity as file name.

Each queue is navigable via IPNS by dereferencing <GUID>/<segment-key>/backchannel/<delegate-identity>.

The messages are encrypted in a usage specific way, decipherable by recipient upon receipt.

Access Metadata as Files

Metadata for a specific segment-key lives under //repo/<identity>/<segment-key>/.metadata file.

Each metadata file is navigable via IPNS by dereferencing <GUID>/<segment-key>/.metadata.

This file is not meant for consumption by the user, but by overhide systems; these files are plain-text with pseudonymous content.

You can’t perform that action at this time.