# The NuCypher Beachhead
By Arjun Hassard, Product Lead & Struggling Writer @ NuCypher

### The Beachhead: a high-level guide to the NuCypher Access Management System 

This resource focuses on the simplest, highest-level functionality available to developers leveraging *nucypher/nucypher*, NuCypher's access management system (hereafter simply referred to as "NuCypher"). Hence, this notebook serves as a technical strategy *beachhead* – an initial foothold for developers to: 

1) understand the fundamental capabilities of NuCypher

2) form sensible plans for integrating the NuCypher system

3) flesh out features in their application involving data sharing and access control 

We recommend using this notebook to get comfortable with relatively common/simple operations, such as *ALICE.grant()* and *BOB.join_policy()*, before moving on to more advanced NuCypher files and libraries, such as *nucypher/pyUmbral* – that afford greater customization, opportunities to economize NuCypher network usage, and more complex delegation workflows.

### Notes on the data sharing 'narrative'
We're going to walk through a typical data sharing journey, or 'narrative', centered around production-ready code snippets. In general terms, this narrative involves a *data controller* who intends for a *designated recipient* to access her data, and so proceeds to connect to the NuCypher network and then create a sharing policy specifically for said recipient. Later, when the recipient requests access to the data, the NuCypher network performs the necessary work for him to gain access. Without spoiling the ending, access is delivered securely, and without further any action required by the data controller.

To better illustrate the relationship between the functionality present here and a real-world application leveraging NuCypher, we will occasionally refer to two end-users of a hypothetical health record management platform: 

- A medical patient (the data controller) 
- A doctor (the designated recipient) 

However, there is nothing special about patients or doctors, other than that the data involved in a medical application is likely to be sensitive. Hence they can be swapped out for any other end-user. These references will mostly appear under the subtitle: "Relevance to example real-world application".

At various stages of the narrative, we will also dive into important lower-level work occurring in the background, plus abstracted opportunities to customize. Please note that these conceptual explanations will not cover everything, and are simply what we deem to be potentially relevant to a typical application's technical strategy. These will appear under the subtitle: "Relevant work/optionality abstracted by *function/method/etc.*".

With all this in mind, let's get started!

### Importing stuff
We'll start by importing two built-in functions. We'll also import *maya*, which makes managing timezones easier – see github.com/kennethreitz/maya.

In [12]:
import datetime
import sys
import maya

ModuleNotFoundError: No module named 'maya'

Next, we'll import some classes – these bestow specific powers to each 'character' in the data sharing flow.

In [None]:
from nucypher.characters import Alice, Bob, Ursula
from nucypher.data_sources import DataSource

Finally, we'll import the modules which enable the NuCypher network to function properly.

In [None]:
from examples.sandbox_resources import SandboxNetworkyStuff
from nucypher.network.node import NetworkyStuff

### *Ursula*: the proxy & network node
The first character in our narrative that we'll introduce is *Ursula* – she represents the 'proxy' in 'proxy re-encrytion'. We can think of a proxy as an third-party, anonymous, remote machine somewhere whose sole purpose is to take encrypted text and transform it – i.e. perform 're-encryption'. 

The NuCypher network comprises thousands of proxies, scattered around the world, that collectively perform the necessary re-encryptions that power the data sharing narratives within NuCypher user applications. Every time a permission needs to be updated, or a new data recipient needs their access granted/revoked, multiple Ursulas are called upon. Ursulas provide this service in exchange for fees, paid by developers like yourself. Ursulas are referred to as nodes, network participants, network nodes, miners, re-encryptors, proxies and service-providers in other contexts. 

The code in this notebook is too high-level to directly touch Ursula – we'll be mainly dealing with the data controller and designated recipient. Nonetheless, it's critical that we have a grasp of Ursula's role, at least on a conceptual level – because there is a great deal of behind-the-scenes interaction with Ursulas occurring in virtually all NuCypher data sharing narratives.

### *Alice*: the data controller
Let's now introduce Alice, the second character in our data sharing narrative. Like all NuCypher characters, Alice has a Class which prescribes her various unique abilities/powers. She represents all of the following: 
- Data producer
- Data owner
- Data delegator
- Data controller (we will use hereafter 'data controller' to signify all of the above) 

However, it's perhaps most precise to think of Alice as the device, or devices, that a real-life data controller would use to manage their data. In our hypothetical application, that would be the  patient, managing their medical data 'through' the Alice character.

Let's instantiate Alice and ensure that she is properly connected to the NuCypher network. *Network_middleware* works as a 'bridge' between our application and the network – the details of which are not relevant to this narrative. 

In [None]:
ALICE = Alice(network_middleware=network_middleware)

The next step is to decide some basic parameters for a sharing policy, which we'll invoke later when adding a designated recipient to it. For this narrative, we'll choose the following four settings:

1) We set the policy to expire 5 days from the moment it is created. 

*Note: This is an example of policy revocation. This feature is powerful because, with NuCypher, policies can be revoked based on arbitray conditions. Revocation is also, uniquely, enforced by the network itself. In this case, the policy is time-bounded. However, a policy could also be programmed to expire based on express input from a user, or the fulfilment (or lack of fulfilment) of other conditions – such as payment,  receipt of data, receipt of signature, or more complicated rules written into a smart contract, or even determined by the output of an Oracle with regard to the outcome of a real-world event.*

*You may notice that there are no more references to revoking policies after this. That's because the relevant functions for more advanced policy revocation are not complete, and this notebook only contains working code. Look out for updates on this front.*

In [None]:
policy_end_datetime = maya.now() + datetime.timedelta(days=5)

2) We'll set 'n', the number of Ursulas (proxies) to which the forthcoming re-encryption job will be assigned, to 1. 

*Note: Normally, outside of a demo walkthrough like this, a policy's re-encryption job is always assigned to many more Ursulas than just 1. Setting n > 1 splits the re-encryption key into a corresponding number of fragments, each of which is sent to a different Ursula. This significantly increases the security of the policy. We'll explain this in more detail later.* 

In [None]:
n = 1

3) We'll set 'm', the minimum number of Ursulas required to perform the re-encryption for the recipient to access the data, also to 1. 

*Note: Similarly, the minimum number of re-encrypting Ursulas, or the 'threshold', would always be set to greater than 1. As this is an 'm-of-n' scheme, n must be greater than m. A sensible choice would be m = 10, n = 20.* 


In [None]:
m = 1

4) Finally, choose a specific file path for as the 'label' of the data we intend to share. 

*Note: the label variable does not refer to the exact file that we intend to share. Rather, it functions more like a tag, where any amount of data can use the same label – a bit like multiple files in a single directory. As we'll see, this means that the data does not necessarily need to exist at this stage, and can be produced later.*

In [None]:
label = b"secret/files/and/stuff"

Before we can begin granting permissions to anyone, we need Alice to connect to the NMKS network and the Ursulas/proxies that will perform the re-encryption service on her behalf. We bootstrap this process by connecting Alice to one known Ursula, who then connects Alice to other Ursulas in their network. Note: there will be an option to start the connection process with Ursulas run by NuCypher, if this is preferable. 

In [None]:
ALICE.network_bootstrap([("localhost", 3601)])

### *Bob*: the designated recipient
Bob is the second character in the data sharing narrative. Conceptually, his role is fairly passive – he is simply the chosen recipient. However, as we shall see, there are a number of more complex actions/requirements this character must perform/fulfill in order to securely gain access to the data. Remember, like Alice, Bob represents the device(s) a real-world data recipient – or in our hypothetical application, the doctor – would use to achieve their goals. 

Right now, we'll just instantiate Bob, so we can add him to Alice's sharing policy in the next section.  

In [None]:
BOB = Bob()

### Creating a sharing policy with *Alice*
It's time to create our first sharing policy, using the .*grant()* function. Sharing policies have three distinguishing features, which are defined when a policy is granted: 

 1) The data controller, *Alice*.
 
 2) The designated recipient, *Bob*.
 
 3) The file(s) location, *label*.

In theory we could leave the *label* argument empty at this stage. However, this would mean ceding important security benefits. For example, in the (highly unlikely) situation where the data recipient successfully colludes with all the Ursulas involved in this sharing narrative, they would only gain access to data kept under that single label in this policy, and nothing else belonging to the data controller, or patient. In this way, labels afford us reliable *forward-secrecy*. In addition, labels are a convenient way to choose specific, per-category sharing parameters for specific data.

##### Relevance to example real-world application

In our hypothetical real-world application, the granting of a policy is likely to be the first instance in this narrative where the medical patient proactively does something: namely, deciding to add a specific doctor to an approved list for her health records. Note that Alice does not need to encrypt or send the actual data at this moment. In fact, the data doesn't even need to exist yet – for example, not-yet-available results from an medical exam could be shared automatically under this policy (to all the doctors added to it), once they are ready. More on this scenario below, when we introduce *DataSource*.

##### Relevant work/optionality abstracted by the .grant() method

When *.grant()* is run, a great deal of background work is triggered – work which lays a metaphorical path through the NuCypher network for the forthcoming sharing of data. 

Firstly, the number of Ursulas we requested earlier (n), need to be located. Next, an "arrangement" is proposed to those Ursulas, which contains relevant parameters to help each Ursula decide if they want to participate. For example, the arrangement object specifies the funds available to pay for the re-encryption service (known as the "deposit" – this can be altered to suit the application's economic requirements) as well as the duration of the policy, which we set earlier. 

Once the required number (n) of Ursulas have accepted the arrangement, an equivalent number of re-encryption key fragments, known as "KFrags", are generated. A quick reminder: a *re-encryption key* is a special key unique to NuCypher. It is constructed, safely and locally, using Alice's private key and Bob's public key. Ursulas use re-encryption keys to transform/re-encrypt ciphertexts such that they are decryptable by Bobs – specifically, the Bobs who supplied their public key. So, to get KFrags, the re-encryption key is split into multiple fragments. Each of those stands ready to be sent to the participating Ursulas – one KFrag for each Ursula. 

Finally, the *.grant()* method also triggers the generation of a "TreasureMap" – this returns the locations of all the participating Ursulas, once they have confirmed their involvement. Bob will use the TreasureMap later, when he wants to retrieve the data Alice has shared with him. 

Now we fully understand what's happening under the hood, let's go ahead and create a sharing policy: 

In [None]:
policy = ALICE.grant(BOB, 
                     label,
                     m=m, 
                     n=n,
                     expiration=policy_end_datetime)

Our final step involving Alice is to save her public key, such that Bob can easily locate and use it, once the designated recipient is ready to request access. 

*Note: Alice has multiple public keys, that perform different roles in the data sharing process. The public key we are saving in this step is her signing public key. Later, when we want to encrypt the underlying data, we will employ her encrypting public key.*

We can quickly get her *signing* public key via the 'stamp' function, then cast it into a convenient, immutable byte sequence for retrieval later.

In [None]:
alices_pubkey_saved_for_posterity = bytes(ALICE.stamp)

This juncture marks the end of Alice's involvement in the data sharing narrative, and by extension, the required involvement of a real-life data controller – or medical patient. 

Of course, if the data controller wished to create another policy, they would need to return online in order to choose the recipient and label, potentially tweak other parameters, and execute the grant function. 

Nevertheless, with NuCypher, a  medical patient would be able to go permanently offline at this point, and still see data shared with their doctor, provided that it fell under the correct label – for example, diagnosis data that is yet to be finalized. This affords our medical application, or any other application leveraging NuCypher, a great deal of flexibility. 

### Joining a sharing policy with *Bob*
As it stands, we have successfully created a sharing policy for a specific Bob (our recipient's device), and this policy is now sitting on the NuCypher network, readily awaiting for Bob to join it. Once Bob joins the policy, the designated recipient (the doctor) will be able to access future data shared on it.  

##### Relevance to example real-world application

Before getting into the joining details, it's worth acknowledging the various ways a recipient would actually join a policy. In this example, we have chosen a situation where some time passes in between the creation of the policy and the first time it is used. There is also a sense that the recipient has *decided* to join the policy. However, one could very well design an application where the opposite occurs – the specified recipient is automatically, and immediately, added to any policy bearing their public key. Plus, anything in between, including the joining action being contingent on the fulfilment of specified conditions. 

##### Relevant work/optionality abstracted by the .join_policy() method

When Bob joins the policy, a few important things occur in the background. Information relating to the policy, including the data controller and recipient's respective public keys, and the data's *label*, are hashed and saved as a variable ('hrac') for record-keeping and other uses. Separately, Bob also connects to the NuCypher network, and, using the TreasureMap and *hrac*, finds the participating Ursulas. This gets Bob, and therefore the recipient, ready to receieve data once it is sent. 

To join the policy, Bob needs certain information to hand: 

1) The *label* – he needs to know what data he's after.

2) *Alice's signing key* – he needs the data controller's signature

3) *Signature verification* – he needs to check the signature is legitimate and that he's receiving data from the correct source. 

4) *List of nodes* – he can optionally connect himself to the network at this time, using the same 'bootstrap' method Alice used earlier on in the narrative.

In [None]:
BOB.join_policy(label, #1 
                alices_pubkey_saved_for_posterity,  #2
                verify_sig=True, #3
                node_list=[("localhost", 3601)] #4 
                )

### Preparing the bulk data (this section can be skipped)
We're going to prepare some data to play the role of the data that would actually be shared in a real-world application - the 'underlying' or 'bulk' data. The main reason we include this section is to give readers a sense of the data format we'll be working with next, but there's nothing here that's unique to NuCypher. 

In short, we're going to decompose the novel 'Finnegans Wake' into plaintext lumps, and then share these periodically. We'll also print out some metadata to gauge performance. 

##### Relevance to example real-world application

It's worth pointing out that, in a normal application's architecture, the 'data' shared in a narrative like this is unlikely to a human-readable text. Instead, the literary passage we are sharing here is standing in for a more likely candidate – a symmetric key pertaining to some individual bulk data (e.g. photos, videos, chat message(s), collaborative documents, medical data, etc.), hosted in decentralized (IPFS, Storj etc.) or centralized (S3, etc.) storage somewhere, that the recipient can use to decrypt and view it. 

In [None]:
finnegans_wake = open(sys.argv[1], 'rb')

start_time = datetime.datetime.now()

for counter, plaintext in enumerate(finnegans_wake):
    if counter % 20 == 0:
        now_time = datetime.datetime.now()
        time_delta = now_time - start_time
        seconds = time_delta.total_seconds()
        print("********************************")
        print("Performed {} PREs".format(counter))
        print("Elapsed: {}".format(time_delta.total_seconds()))
        print("PREs per second: {}".format(counter / seconds))
        print("********************************")

### Encrypting and sharing on Alice's behalf with *DataSource*

We're nearly there! We have a sharing policy, a confirmed recipient, and some underlying/bulk data we want to share. However, the bulk data is currently in plaintext format. To securely transport it to the recipient, we need to encrypt it – in such a way that it can remain encrypted until it reaches Bob, and that he alone can decrypt it.

To achieve this, we're going to introduce the final character in our narrative: 'DataSource'. An important reason *DataSource* exists is so that Alice (& the data controller) are not needed at this stage in the narrative to perform the encryption themselves. However, this may be doing DataSource a disservice, as the character can be harnessed for more – including 'producing' data on the data controller's behalf. 

##### Relevance to example real-world application

In general, DataSource affords great flexibility to applications leveraging NuCypher, because it widens the range of possible entities that can encrypt for Bob. 

Imagine a medical scenario where the patient is expecting a regular stream of test results, from a third-party blood-testing lab (i.e. third-party in the sense the lab is not associated with their doctor/hospital), at some point in the future, and wants those results to be immediately shared with the doctors on an existing sharing policy. It would be undesirable to stay online and continually grant access to each test-result as it arrived. 

To avoid this, the application can assign a special role to the blood-testing lab, such that the lab gains the encryption powers of DataSource. Similar to the way 'Alice' represents the patient's device, DataSource can do the same for primary producers of data – in this case, the blood-testing lab. Hence, the lab can write the new data onto the sharing policy, under the specified label. Then, the designated doctors can access it. Thus, DataSource is 'producing' data on the patient's behalf. 

Note: this does not mean that DataSource, or the lab, can access other data on the sharing policy - including their own test results, once they've been saved. Accessing the data would require a *read* permission, which has only been granted to Bob – through the existence of a sharing policy, and re-encryption key, in his name. Rather, DataSource(s) have been solely granted a *write* permission to the policy. 

In general, these write permissions can be recorded directly onto a distributed ledger, such that the application can reliably confirm the correct actor is producing/encrypting data for a given sharing policy. As we will encounter below, this data also comes with a signature, which helps the recipient further verify its authenticity. 


Let's first create a DataSource specifically for our sharing policy.  

In [None]:
data_source = DataSource(policy_pubkey_enc=policy.public_key)

We're going to use DataSource to encrypt the data we wish to share – the passages from Finnegans Wake. We're also going to generate an artefact known as a 'MessageKit' – this contains a ciphertext (the encrypted version of the passages), plus two unique identifiers: the policy's public key, and the recipient's public key. 

We'll also take this opportunity to generate a signature. The signature is unique to that message, and, when verified, can confirm that the data has not been corrupted or manipulated while in transit, and that the data did indeed come from the expected, correct DataSource. This further mitigates the risk of incorrect data being added to a label from a malignant source that has gotten hold of the relevant public keys. The signature can be sent to Bob via a side-channel, or published publicly as a evidence of the manner in which the data was shared.

Let's go ahead and create a tuple for these:  

In [None]:
message_kit, _signature = data_source.encapsulate_single_message(plaintext)

We want Bob to be able to verify that MessageKit and signature he will soon access came from the right DataSource, so we'll save its public key, in the same way we did Alice's public key earlier: 

In [None]:
data_source_public_key = bytes(data_source.stamp)

### Retrieiving the data with *Bob*

To give the designated recipient even greater independence with regard to when and how they access the data, we're going to include a snippet which reconstructs the DataSource from Bob's perspective. This may not be necessary, if the DataSource remains online and available, but this means there's no obligation to keep DataSource around.

In [None]:
datasource_as_understood_by_bob = DataSource.from_public_keys(
        policy_public_key=policy.public_key,
        datasource_public_key=data_source_public_key,
        label=label
    )

We're now in the endzone – just one more method to run before our recipient gets their hands on the 'cleartext' – the data that the data controller intended them to see, decrypted and readable. We'll achieve this with *.retrieve()*, which takes three arguments to compute correctly:  

1) The MessageKit – the actual data. 

2) DataSource – where we got the data from.

3) Alice's public key – an identifier for the original delegator of the data. 

##### Relevance to example real-world application

In our hypothetical medical application, this is the stage where the doctor accesses the patient's health records. How the doctor does this is flexible – the application could automatically retrieve the data as soon as it is suitably encrypted by DataSource, or the doctor could be required to proactively decide when they want their access to begin – the latter may be useful if it's desirable to notify the patient that the doctor has now begun their analysis/diagnosis, and/or how much time the doctor spends looking at the data, for example. 

##### Relevant work/optionality abstracted by the .retrieve() method

Although the inputs above are easy to grasp, what happens in the background is more complex. 

The first thing that needs to be checked is that the number of Ursulas who completed a re-encryption is equal to or greater than the minimum we specified, right at the start (m). We set this to be some fraction of n (the total number of Ursulas involved), to protect against redundancy and ensure that, even if some Ursulas fail to re-encrypt the KFrags they'd received, that the data could reach it's destination. We perform this check using the TreasureMap we generated above, which guides us to the participating Ursulas. 

Now we know that a sufficient number of re-encryptions occurred, it's time to work with the output of that work – "CFrags". In the same way a KFrag is a fragment of key, a CFrag is a fragment of ciphertext. When brought together by Bob, these fragments combine into a complete ciphertext, that can be used to access the underlying data. 

So, the next step is to gather those CFrags. In order to ensure that the number of collectable CFrags is as expected, the figure is checked against something called a WorkOrder, which was generated previously. It's not worth digging into this, other than to say there is an auditable trail that minimizes unexpected behaviour or DOSs by Ursulas.    

The next step may seem a little odd. We're going to attach the gathered CFrags to an artefact called a 'capsule'. The capsule's role, for the purposes of this narrative, is to bring together the MessageKit (the data we want to share) and the collection of ciphertext fragments, such that Bob can get to the cleartext, and hence the designated recipient is able to access the data they were granted. In reality, the capsule fulfils more than just this, but we can think of the capsule as a way of simply protecting the underlying data, for example in the scenario where Bob does not have the correct CFrags.


Now we understand the essential processes taken care of by .retrieve(), let's use it to get our hands on the data we set out to share:

In [None]:
  delivered_cleartext = BOB.retrieve(message_kit=message_kit,
                                       data_source=datasource_as_understood_by_bob,
                                       alice_pubkey_sig=alices_pubkey_saved_for_posterity)

We've done it! Let's quickly check it's correct:

In [None]:
    assert plaintext == delivered_cleartext
    print("Retrieved: {}".format(delivered_cleartext))

Congratulations on making it through the NuCypher Beachhead. This notebook is certainly lengthy, but it exists as a initial point of reference for the entire NuCypher Access Management System. Once you have digested the concepts that we explored here, you are in a great position to plan out an NuCypher integration, and eventually, provide your users with secure, flexible and powerful data sharing functionality. 

Whether reading this notebook was your first dive into NuCypher, or you've been familiar with our codebase for a while, we're very keen to hear your feedback. We're particularly interested in whether this notebook has the right depth, detail, clarity and coherence. You can email me – arjun [at] nucypher [dot] com. 