- **PETs** are tools or methods that let companies or users process data **while keeping the data private and confidential** helping to protect **Personally Identifiable Information (PII)**—like name, address, or phone number—when it is handled by online services.
- **Differential Privacy (DP)**: A **mathematical way** to keep individual data private while still analyzing datasets by adding some “noise” or randomness to the data or output so individual info is hidden. This noise is carefully designed to mask any single individual’s contribution but still give accurate overall results.
    - DP has a **privacy budget** — more queries or more precise answers use up more privacy.
    - There’s a **tradeoff**: stronger privacy means less accurate results, and vice versa.

---

### 3.3 **Cryptographic Techniques**

These are advanced math methods to secure data while processing.

#### 3.3.1 **Homomorphic Encryption (HE)**

* Encryption that lets you **perform operations on encrypted data without decrypting it first**.
* The result, when decrypted, matches the result of performing the same operation on the original data.
* Useful for outsourcing computations to a server you don’t trust.

**Types of HE:**

| Type                        | Operations Allowed                   | Number of Operations Allowed |
| --------------------------- | ------------------------------------ | ---------------------------- |
| Partially Homomorphic (PHE) | Only addition or only multiplication | Unlimited                    |
| Somewhat Homomorphic (SHE)  | Both addition and multiplication     | Limited                      |
| Fully Homomorphic (FHE)     | Both addition and multiplication     | Unlimited                    |

**Example:**

* Encrypt two numbers, add their encrypted forms, then decrypt the result to get the sum of the original numbers.

**Downside:**

* Very slow and expensive compared to normal computing.
* Requires trusting the server to perform computations honestly.

---

#### 3.3.2 **Multiparty Computation (MPC)**

* Allows **multiple parties who don’t trust each other** to jointly compute a function on their private inputs without revealing those inputs.
* Example: Two millionaires want to find out who is richer without telling their exact wealth.

**Key Points:**

* Requires **at least two parties**.
* Trust is **distributed**; no single party learns everything.
* Multiple rounds of communication are needed.
* More efficient than fully homomorphic encryption but still slower than regular computation.

---

## **4. A Closer Look at MPC**

### Security Models in MPC

* **Active security (Malicious adversary):** The attacker can cheat or behave arbitrarily.
* **Passive security (Honest-but-curious adversary):** The attacker follows the protocol but tries to learn extra info.

### Corruption Thresholds

* How many parties an attacker can control without breaking security.

| Threshold          | Meaning                                     |
| ------------------ | ------------------------------------------- |
| Honest Majority    | Attacker corrupts less than half of parties |
| Dishonest Majority | Attacker corrupts more than half of parties |
| Full Threshold     | Attacker corrupts all but one party         |

### Network Considerations for MPC

* **Local Area Network (LAN):**

  * Low latency, high bandwidth.
  * Use Secret Sharing MPC (low data transfer).
* **Wide Area Network (WAN):**

  * Higher latency, less bandwidth.
  * Use Garbled Circuit MPC (fewer communication rounds).

---

### MPC Techniques

* **Secret Sharing MPC:**

  * Data is split into shares distributed among parties.
  * Addition is easy (done locally).
  * Multiplication needs communication rounds.

* **Garbled Circuits:**

  * The function is represented as a circuit.
  * One party “garbles” the circuit (encrypts it).
  * The other party “evaluates” it without learning inputs.

---

### Preprocessing Model in MPC

* **Offline Phase:**

  * Data-independent.
  * Generates random values and prepares data.
  * Can be done anytime before actual inputs are available.

* **Online Phase:**

  * Data-dependent.
  * Actual inputs are processed using preprocessed data.
  * Much faster and more efficient.

---

## **5. Real-World Applications**

### 5.1 EPIC: Efficient Private Image Classification

* Classify images **privately** using machine learning without revealing the images or the model.
* Uses **Transfer Learning** and MPC.
* Faster and more communication-efficient than previous systems like Gazelle.
* Example: Alice owns a classifier; Bob wants to classify his images. They use MPC so Bob’s image and Alice’s model stay private.

### 5.2 Privacy-Preserving Genome-Wide Association Study (GWAS)

* GWAS needs large genomic data but raises privacy concerns.
* Solutions:

  1. **Somewhat Homomorphic Encryption** approach.
  2. **Secure Multiparty Computation** approach.
* They compute whether a genetic marker is significant **without revealing individual data or the exact statistics**.
* Efficient and secure, practical for large datasets.

---

## **6. Summary Table of Key Points**

| Topic                  | What it Does                               | Pros                           | Cons/Challenges                     |
| ---------------------- | ------------------------------------------ | ------------------------------ | ----------------------------------- |
| Data Anonymization     | Remove identifiers                         | Simple                         | Vulnerable to re-identification     |
| Differential Privacy   | Add noise to outputs                       | Strong mathematical guarantees | Tradeoff between accuracy & privacy |
| Homomorphic Encryption | Compute on encrypted data                  | No need to decrypt             | High computation cost               |
| Multiparty Computation | Joint computation without revealing inputs | No single trusted party needed | Communication rounds required       |

---

## **7. Additional Concepts**

### Homomorphism in RSA

* RSA encryption allows **multiplication on ciphertexts**, resulting in multiplication of plaintexts after decryption.
* This “malleability” can be a risk for attacks but also useful in secure computation.

---

## **8. Example: The Millionaires' Problem**

Two millionaires want to know who is richer without revealing their actual wealth.

* Using MPC, they input their amounts privately.
* The protocol outputs who has more money without revealing the amounts.

---

## **9. Important Terms to Remember**

* **PII:** Personal Identifiable Information (name, birthdate, etc.)
* **Anonymization:** Hiding personal identifiers
* **Differential Privacy:** Adding noise to protect individual data
* **Homomorphic Encryption:** Computing on encrypted data
* **MPC:** Secure computation among multiple distrustful parties
* **Active Security:** Protection against cheating
* **Passive Security:** Protection when parties follow protocol honestly
* **Secret Sharing:** Splitting data into shares
* **Garbled Circuits:** Encrypting computation steps for MPC
* **Preprocessing:** Preparing random data ahead of time to speed up MPC

---

## **10. Tips for Exam**

* Understand the difference between **Data Anonymization** and **Differential Privacy**.
* Be able to explain **Homomorphic Encryption** and its types.
* Know the concept and importance of **Multiparty Computation**.
* Know the **security models** (active vs passive) and corruption thresholds.
* Be familiar with **real-world applications** like EPIC and Privacy-Preserving GWAS.
* Remember the **tradeoffs** between privacy, efficiency, and data utility.

---

If you want, I can also help you with practice questions or quick summaries for revision. Would you like that?

---

**Good luck with your exam preparation!**
