# Introduction to CrypTen

CrypTen is a machine learning framework built on PyTorch that enables you to easily study and develop machine learning models using secure computing techniques. CrypTen allows you to develop models with the PyTorch API while performing computations on encrypted data -- without revealing the protected information. CrypTen ensures that sensitive or otherwise private data remains private, while still allowing model inference and training on data encrypted data that may be aggregated across various organizations or users. 

CrypTen currently uses secure multiparty computation (MPC) (see (1)) as its cryptographic paradigm. To use CrypTen effectively, it is helpful to understand the secure MPC abstraction as well as the semi-honest adversarial model in which CrypTen operates. This introduction explains the basic concepts of secure MPC and the threat model. It then explores a few different use cases in which CrypTen may be applied. 


## Secure Multi-party Computation

Secure multi-party computation is an abstraction that allows multiple parties to compute a function on encrypted data, with protocols designed such that every party is able to access only their own data and data that all parties agree to reveal.

Let's look at a concrete example to better understand this abstraction. Suppose we have four parties, A, B, C and D that each have a private number, and together want to compute the sum of all their private numbers. A secure MPC protocol for this computation will allow each party to learn the final sum; however, they will not learn any information about the other 3 parties' individual private numbers other than the aggregate sum.

In secure MPC, the data owner encrypts its data by splitting it using random masks into <i>n</i> random shares that can be combined to reconstruct the original data. These <i>n</i> shares are then distributed between <i>n</i> parties. This process is called <i>secret sharing</i>. The parties can compute functions on the data by operating on the secret shares and can decrypt the final result by communicating the resulting shares amongst each other.


## Use Cases
In the tutorials, we show how to use CrypTen to four main scenarios:
<ul>
<li> <b>Feature Aggregation</b>: A common use case for CrypTen is when multiple parties hold distinct sets of features, and want to learn a classifier over the joint feature set without sharing their data. For example, different health providers may each have part of the medical history of patients, and may wish to learn over the patients' combined medical history while protecting patient privacy.</li> 
<li> <b>Data Labeling</b>: Another common use case for CrypTen is when one party holds the features while the another party holds the labels, and again, the two parties want to learn a classifier without sharing their data. This use case comes up in online advertising -- one company may hold the user characteristics while another company holds information on whether an ad was converted, and the companies wish to jointly learn how to improve advertising.</li>
<li> <b>Data Augmentation</b>: A third use case for CrypTen is when different parties each hold a small number of examples (with identical feature sets), and want to learn a classifier over all the examples without sharing their data. For example, different oncologists may have records of different cancer patients, and they may want to train a classifier over all their pooled patient records while protecting patient privacy.</li>
<li> <b>Model Hiding</b>: A final use case we discuss is when one party has a trained classifier, another party has data that needs to be classified, and neither party can share the data. For example, say company A has a lot of relevant proprietary data and can afford to train its own classifiers on it. Another company B with a small amount of similar proprietary data may wish to label its data with company A's classifier, but cannot be directly given company A's classifier for fear of data leakage in the classifier.</li>
</ul>
 
The tutorials illustrate how to use CrypTen can model each of these scenarios.


## Threat Model
When determining whether MPC is appropriate for a certain computation, we must assess the assumptions we make about the actions that different parties can take. The <i>threat model</i> we use defines the assumptions we are allowed to make. CrypTen uses the <i>"honest-but-curious"</i> threat model (see (1, 2)). (This is sometimes also referred to as the <i>"semi-honest"</i> threat model.)  It is specified by the following assumptions: 
<ul>
    <li> Every party faithfully follows the protocol: i.e., it performs all the computations specified in the program, and communicates the correct results to the appropriate parties specified in the program.</li>
    <li> The communication channel is secure: no party can see any data that is not directly communicated to it.</li>
    <li> Every party has access to a private source of randomness, e.g., a private coin to toss</li>
    <li> Parties may use any data they has already seen and perform arbitrary processing to infer information from it.</li>
</ul>

To make these assumptions concrete, let's look at what they might mean for our earlier example. First, all parties carry out every step of the specified protocol for computing the sum and faithfully communicate the intermediate results forward. Because the communication channel is secure, parties B and C, for example, would not be able to see any information that was exchanged only between A and D. On the other hand, if party A broadcasts its information publicly, parties B, C and D will see it, and will use it to deduce any additional information they can about each other. So, for example, suppose A broadcasts its number publicly. Once party A has broadcast its number, parties B, C and D will always know it, and cannot "forget" it. They will then use this information, along with the other information they each have, to privately infer as much as they can about the data. To see how this might leak information, suppose parties C and D each have 0 as their numbers. Then once the sum is unencrypted, each party can make the following deductions: 
<ul>
    <li> Party B will know that the sum of the numbers of parties C and D is 0.</li>
    <li> Party C will know the sum of parties B and D, but will <i>not</i> know that the sum of its own number and the number of party D is 0.</li> 
    <li> Party D will know the sum of parties B and D, but will <i>not</i> know that the sum of its own number and the number of party C is 0.</li>
    <li> Party A can make no deductions (since it does not get any additional information)</li>
</ul>        
        


## References
(1) Goldreich Oded. 2009. Foundations of Cryptography: Volume 2, Basic Applications (1st ed.). Cambridge University Press, New York, NY, USA.<br>
(2) Andrew C. Yao. 1982. Protocols for secure computations. In Proceedings of the 23rd Annual Symposium on Foundations of Computer Science (SFCS '82). IEEE Computer Society, Washington, DC, USA, 160-164. (<b>TODO</b>: Should reference be to '82 paper or '86 paper?)