# Example - Introduction to Differential Privacy (DP)

In [1]:
import numpy as np
import random
random.seed(1)

#### We generate 30 random notes between 5 and 10 (from a uniform distribution, rounding each note to 2 decimal places). We are interested in calculating the average of these notes (let's assume that each note was sent encrypted, so that we cannot see each one individually, only the average).

In [2]:
notes = []
for i in range(30):
    notes.append(round(random.uniform(5, 10), 2))

In [3]:
notes

[5.67,
 9.24,
 8.82,
 6.28,
 7.48,
 7.25,
 8.26,
 8.94,
 5.47,
 5.14,
 9.18,
 7.16,
 8.81,
 5.01,
 7.23,
 8.61,
 6.14,
 9.73,
 9.51,
 5.15,
 5.13,
 7.71,
 9.7,
 6.91,
 6.08,
 7.11,
 5.15,
 6.11,
 7.19,
 7.48]

In [4]:
mean = np.mean(notes)
print(f'Average of all notes: {mean}')

Average of all notes: 7.255000000000001


#### Suppose we now repeat this same study but without the first student: now 29 students send us their notes, again in encrypted form, and we calculate the average of these values:

In [5]:
mean_new = np.mean(notes[1:])
print(f'Average all grades except that of the first student: {mean_new}')

Average all grades except that of the first student: 7.309655172413794


#### Now, knowing the value of the mean of all the grades, and that of all but the first student, it is very easy to infer the grade of that student.

#### Be $N=30$ the number of students, $n_i$ the note of the student $i$, $\forall i \in \{1,...,N\}$, $mean_{N}$ the average value of the notes of the $N$ students, and $mean_{N-1}$ the averge considering the students $\{n_{2},...,n_{30}\}$, then:

$$
\frac{\sum_{i=1}^{N}n_{i}}{30} = mean_{N} \hspace{0.5cm} \mbox{and} \hspace{0.5cm} \frac{\sum_{i=2}^{N}n_{i}}{29} = mean_{N-1}
$$

#### Then, we get: 

$$
\sum_{i=1}^{N}n_{i} = 30 \cdot mean_{N}, \hspace{0.5cm} \sum_{i=2}^{N}n_{i} = 29 \cdot mean_{N-1} \Longrightarrow
n_{1} + \sum_{i=2}^{N}n_{i} = 30 \cdot mean_{N} \Longrightarrow \boxed{n_{1} = 30 \cdot mean_{N} - 29 \cdot mean_{N-1}}
$$

In [6]:
first_note_calculated = 30 * mean - 29 * mean_new
print(f'''True value of the first student's note: {notes[0]}
Calculated value for the first student's note (rounded to 2 decimal places): {round(first_note_calculated,2)}''')

True value of the first student's note: 5.67
Calculated value for the first student's note (rounded to 2 decimal places): 5.67


#### To avoid this lack of privacy, we are going to add some noise to each note (i.e. a random number between -10 and 10). We will see that the average obtained is close to the real average (without the noise), but by adding this random number, we cannot know what the real value of the note is.

In [7]:
notes_noise = []
for note in notes:
    notes_noise.append(round(note + random.uniform(-10, 10), 2))

In [8]:
notes_noise

[0.33,
 3.86,
 3.2,
 5.47,
 3.28,
 -2.32,
 15.01,
 10.07,
 8.32,
 -1.14,
 19.03,
 14.36,
 1.23,
 1.66,
 11.66,
 12.83,
 14.87,
 8.17,
 16.11,
 8.56,
 1.2,
 9.46,
 17.35,
 13.83,
 6.19,
 8.89,
 -4.16,
 0.96,
 13.14,
 5.77]

In [9]:
mean_noise = np.mean(notes_noise)
print(f'Average of all notes: {mean}')
print(f'Average of all notes with noise: {mean_noise}')

Average of all notes: 7.255000000000001
Average of all notes with noise: 7.573000000000001


In [10]:
mean_noise_new = np.mean(notes_noise[1:])
print(f'Average all notes except that of the first student: {mean_new}')
print(f'Average all notes except that of the first student (with noise): {mean_noise_new}')

Average all notes except that of the first student: 7.309655172413794
Average all notes except that of the first student (with noise): 7.822758620689656


In [11]:
first_note_noise = 30 * np.mean(notes_noise) - 29 * np.mean(notes_noise[1:])
round(first_note_noise, 2), notes_noise[0]

(0.33, 0.33)

#### It is observed that when the number of random numbers generated by a uniform distribution tends to infinity, their mean tends to 0:

In [12]:
random_unif = []
for _ in range(1000000):
    random_unif.append(random.uniform(-10, 10))
np.mean(random_unif)

0.0032135002616634615

*For a justification of the above, remember the Law of Large Numbers (LLN).*