# Introduction

This is a book about differential privacy, for programmers. It is intended to give you an introduction to the challenges of data privacy, introduce you to the techniques that have been developed for addressing those challenges, and help you understand how to implement some of those techniques.

这是一本面向程序员的差分隐私书籍。本书旨在向您介绍数据隐私保护领域所面临的挑战，描述为解决这些挑战而提出的技术，并帮助您理解如何实现其中一部分技术。

The book contains numerous examples *as programs*, including implementations of many concepts. Each chapter is generated from a self-contained Jupyter Notebook. You can click on the "download" button at the top-right of the chapter to download the notebook for that chapter, and you'll be able to execute the examples yourself. Many of the examples are generated by code that is hidden (for readability) in the chapters you'll see here. You can show this code by clicking the "Click to show" labels adjacent to these cells.

本书包含了很多示例，也包含了很多概念的具体实现，这些示例和实现都是用真正的*程序*所撰写的。每一章节都由一个独立的Jupyter笔记本（Jupyter Notebook）文件所生成。您可以单击相应章节右上角的"下载"按钮来下载该章节的Jupyter笔记本文件，从而亲自执行这些示例。章节中的很多示例都是用代码来生成的。为了便于阅读，我们将这些代码隐藏了起来。您可以通过单击示例单元格旁边的"点击显示"标签来显示隐藏在背后的代码。

This book assumes a working knowledge of Python, as well as basic knowledge of the pandas and NumPy libraries. You will also benefit from some background in discrete mathematics and probability - a basic undergraduate course in these topics should be more than sufficient.

本书假定您可以使用Python语言编写和运行程序，并掌握Pandas和NumPy的一些基本概念。如果您具有离散数学和概率论的相关背景知识，那您会更加轻松地理解本书的内容。不必担心，本科课程上的离散数学和概率论知识对学习本书来说已经绰绰有余了。

This book is open source, and the latest version will always be available online [here](https://uvm-plaid.github.io/programming-dp/notebooks/intro.html). The source code is available [on GitHub](https://github.com/uvm-plaid/programming-dp). If you would like to fix a typo, suggest an improvement, or report a bug, please open an issue on GitHub.

这是一本源代码开放的书籍，可以从[这里](https://uvm-plaid.github.io/programming-dp/notebooks/intro.html)在线获取本书的最新版本。可以在[GitHub](https://github.com/uvm-plaid/programming-dp)上获取本书的源代码。如果你找到一处笔误、提出一处改进建议、或报告一个程序错误，请在GitHub上提交问题。

The techniques described in this book have developed out of the study of *data privacy*. For our purposes, we will define data privacy this way:

```{admonition} Definition
*Data privacy* techniques have the goal of allowing analysts to learn about *trends* in sensitive data, without revealing information specific to *individuals*.
```

本书描述的技术是从*数据隐私*（Data Privacy）领域的研究中发展得来的。出于本书的撰写目的，我们将按照下述方式定义数据隐私：

```{admonition} 定义
*数据隐私*技术的目标是，允许数据分析方获取隐私数据中蕴含的*趋势*，但不会泄露特定*个体*的信息。
```

This is a broad definition, and many different techniques fall under it. But it's important to note what this definition *excludes*: techniques for ensuring *security*, like encryption. Encrypted data doesn't reveal *anything* - so it fails to meet the first requirement of our definition. The distinction between security and privacy is an important one: privacy techniques involve an *intentional* release of information, and attempt to control *what can be learned* from that release; security techniques usually *prevent* the release of information, and control *who can access* data. This book covers privacy techniques, and we will only discuss security when it has important implications for privacy.

这是一个宽泛的数据隐私定义，很多不同的技术都是围绕这个定义而提出的。但要特别注意的是，这一定义*不包括*保证*安全*的技术，如加密技术。加密数据不会泄露任何信息，因此加密技术不能满足我们定义的前半部分要求。我们需要特别注意安全与隐私之间的差异：隐私技术涉及到*故意*发布信息，并试图控制从发布信息中*学到什么*。安全技术通常会*阻止*信息的泄露，并控制数据可以*被谁访问*。本书主要涵盖的是隐私技术。只有当安全对隐私有重要影响时，我们才会讨论相应的安全技术。

This book is primarily focused on differential privacy. The first couple of chapters outline some of the reasons why: differential privacy (and its variants) is the only formal approach we know about that seems to provide robust privacy protection. Commonly-used approaches that have been used for decades (like de-identification and aggregation) have more recently been shown to break down under sophisticated privacy attacks, and even more modern techniques (like $k$-Anonymity) are susceptible to certain attacks. For this reason, differential privacy is fast becoming the gold standard in privacy protection, and thus it is the primary focus of this book.

本书主要聚焦于差分隐私（Differential Privacy）。我们将在前几章概述本书之所以聚焦差分隐私的一部分原因：差分隐私（及其变种）是我们已知的唯一能够提供健壮隐私性的形式化方法。去标识化、聚合等技术是人们常用的隐私技术，已经被使用了十几年。这些技术近期已被证明无法抵御复杂的隐私攻击。如$k$-匿名性等更现代的一些隐私技术也无法抵御特定的攻击。因此，差分隐私正迅速成为隐私保护的黄金标准，也是本书重点介绍的隐私技术。