<a href="https://colab.research.google.com/github/souravs17031999/private-ai/blob/master/encrypted_deep_learning_in_pysyft.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Project - XV
## Objective : Part - I :To implement encrypted computations using pysyft  functions/methods.
## Objective : Part - II :Create a encrypted database.

### Part - I

In [91]:
pip install syft



In [92]:
import torch as th
import syft as sy

hook = sy.TorchHook(th)
from torch import nn, optim

W0719 09:08:56.250784 139956197033856 hook.py:98] Torch was already hooked... skipping hooking process


We have created three virtual workers here , bob , alice and secure_worker to simulate our process.

In [0]:
bob = sy.VirtualWorker(hook, id = "bob").add_worker(sy.local_worker)
alice = sy.VirtualWorker(hook, id = "alice").add_worker(sy.local_worker)
secure_worker = sy.VirtualWorker(hook, id = "secure_worker").add_worker(sy.local_worker)

Let's create our data.

In [0]:
x = th.tensor([1,2,3,4])
y = th.tensor([2,-1,1,0])

Let's say we firstly wanted to implement the secret sharing process so that there are randomly generated numbers , and since crypto_provider here is used only for system efficiency for deep learning systems when generating randomly large numbers.

In [0]:
x = x.share(bob, alice, crypto_provider = secure_worker)
y = y.share(bob, alice, crypto_provider = secure_worker)

We can see that , there are pointer tensors created for x and y for bob and alice.

In [96]:
x

(Wrapper)>[AdditiveSharingTensor]
	-> (Wrapper)>[PointerTensor | me:30222100825 -> bob:18842362774]
	-> (Wrapper)>[PointerTensor | me:62987771405 -> alice:73491776140]
	*crypto provider: secure_worker*

In [97]:
y

(Wrapper)>[AdditiveSharingTensor]
	-> (Wrapper)>[PointerTensor | me:28378567834 -> bob:89511740206]
	-> (Wrapper)>[PointerTensor | me:91296390059 -> alice:23058094486]
	*crypto provider: secure_worker*

Here , we can see randomly generated numbers for both tensors x, y

In [98]:
bob._objects 

{18842362774: tensor([ 682205084225130468, 4246699920549455353, 4285627654817167306,
         3055557465183765950]),
 89511740206: tensor([1435455331595824286, 4293133921423497963, 2005778343057245443,
         2478799071062201470])}

In [99]:
alice._objects


{23058094486: tensor([-1435455331595824284, -4293133921423497964, -2005778343057245442,
         -2478799071062201470]),
 73491776140: tensor([ -682205084225130467, -4246699920549455351, -4285627654817167303,
         -3055557465183765946])}

In [100]:
secure_worker._objects

{}

We will now start implementing operations using pysyft.

In [101]:
z = x + y
z
z.get()

tensor([3, 1, 4, 4])

In [102]:
z = x - y
z.get()

tensor([-1,  3,  2,  4])

In [103]:
z = x > y
z.get()

tensor([0, 1, 1, 1])

In [0]:
z = x < y

In [105]:
z = x == y
z.get()

tensor([0, 0, 0, 0])

In [106]:
z = x * y
z.get()

tensor([ 2, -2,  3,  0])

We can't just only this for integers , but also for floating numbers.

In [0]:
x = th.tensor([1,2,3,4])
y = th.tensor([2,-1,1,0])

x = x.fix_precision().share(bob, alice, crypto_provider=secure_worker)
y = y.fix_precision().share(bob, alice, crypto_provider=secure_worker)

In [108]:
x

(Wrapper)>FixedPrecisionTensor>(Wrapper)>[AdditiveSharingTensor]
	-> (Wrapper)>[PointerTensor | me:30138995206 -> bob:61867297547]
	-> (Wrapper)>[PointerTensor | me:20360962498 -> alice:95074218914]
	*crypto provider: secure_worker*

In [109]:
z = x + y
z.get().float_precision()

tensor([3., 1., 4., 4.])

In [110]:
z = x - y
z.get().float_precision()

tensor([-1.,  3.,  2.,  4.])

In [111]:
z = x * y
z.get().float_precision()

tensor([ 2., -2.,  3.,  0.])

In [112]:
z = x > y
z.get().float_precision()

tensor([0., 1., 1., 1.])

In [113]:
z = x < y
z.get().float_precision()

tensor([1., 0., 0., 0.])

In [114]:
z = x == y
z.get().float_precision()

tensor([0., 0., 0., 0.])

### Part -2 

## Encrypted Database
### The objective to build a encrypted database with all the values represented in either some index mapped or one hot encoded so that datbase managers do not know what goes inside and what comes out as we have to encrypt all operations on query as well.
### The database is stored in the form of keys and values pair.

So , first of all , let's get started by creating a lookup table , something sort of dictioneries mapping each indices to some chars and characters (will be used when querying the db) to some index.

In [0]:
import string

Creating two dictioners for above mentioned goal.

In [0]:
char2index = {} # maps each chars to unique index
index2char = {} # maps each index to some unique char that we have chosen below

In [117]:
' ' + string.ascii_lowercase + '0123456789' + string.punctuation

' abcdefghijklmnopqrstuvwxyz0123456789!"#$%&\'()*+,-./:;<=>?@[\\]^_`{|}~'

In [0]:
for i, char in enumerate(' ' + string.ascii_lowercase + '0123456789' + string.punctuation):
  char2index[char] = i
  index2char[i] = char

In [119]:
char2index

{' ': 0,
 '!': 37,
 '"': 38,
 '#': 39,
 '$': 40,
 '%': 41,
 '&': 42,
 "'": 43,
 '(': 44,
 ')': 45,
 '*': 46,
 '+': 47,
 ',': 48,
 '-': 49,
 '.': 50,
 '/': 51,
 '0': 27,
 '1': 28,
 '2': 29,
 '3': 30,
 '4': 31,
 '5': 32,
 '6': 33,
 '7': 34,
 '8': 35,
 '9': 36,
 ':': 52,
 ';': 53,
 '<': 54,
 '=': 55,
 '>': 56,
 '?': 57,
 '@': 58,
 '[': 59,
 '\\': 60,
 ']': 61,
 '^': 62,
 '_': 63,
 '`': 64,
 'a': 1,
 'b': 2,
 'c': 3,
 'd': 4,
 'e': 5,
 'f': 6,
 'g': 7,
 'h': 8,
 'i': 9,
 'j': 10,
 'k': 11,
 'l': 12,
 'm': 13,
 'n': 14,
 'o': 15,
 'p': 16,
 'q': 17,
 'r': 18,
 's': 19,
 't': 20,
 'u': 21,
 'v': 22,
 'w': 23,
 'x': 24,
 'y': 25,
 'z': 26,
 '{': 65,
 '|': 66,
 '}': 67,
 '~': 68}

In [120]:
index2char

{0: ' ',
 1: 'a',
 2: 'b',
 3: 'c',
 4: 'd',
 5: 'e',
 6: 'f',
 7: 'g',
 8: 'h',
 9: 'i',
 10: 'j',
 11: 'k',
 12: 'l',
 13: 'm',
 14: 'n',
 15: 'o',
 16: 'p',
 17: 'q',
 18: 'r',
 19: 's',
 20: 't',
 21: 'u',
 22: 'v',
 23: 'w',
 24: 'x',
 25: 'y',
 26: 'z',
 27: '0',
 28: '1',
 29: '2',
 30: '3',
 31: '4',
 32: '5',
 33: '6',
 34: '7',
 35: '8',
 36: '9',
 37: '!',
 38: '"',
 39: '#',
 40: '$',
 41: '%',
 42: '&',
 43: "'",
 44: '(',
 45: ')',
 46: '*',
 47: '+',
 48: ',',
 49: '-',
 50: '.',
 51: '/',
 52: ':',
 53: ';',
 54: '<',
 55: '=',
 56: '>',
 57: '?',
 58: '@',
 59: '[',
 60: '\\',
 61: ']',
 62: '^',
 63: '_',
 64: '`',
 65: '{',
 66: '|',
 67: '}',
 68: '~'}

so we are here  , creating a function for changing every char in the given string to mapped index values and return that list as a tensor.

In [0]:
def string2values(str_input, max_length = 8):
  
  str_input = str_input[:max_length].lower()  # if the string length is more than max_length , then trim it to the fixed max_length
   
  if(len(str_input) < max_length):
    str_input = str_input + "." * (max_length - len(str_input))  # if the string length is less than fixed max length then pad it with that number of zeroes
  
  values = []
  for char in str_input:
    values.append(char2index[char])  # appending every char mapped to index values to the list
  return th.tensor(values).long      # returning list as a tensor so that we can encrypt the computations  

In [122]:
string2values("sourav")

<bound method TorchHook.get_hooked_method.<locals>.overloaded_native_method of tensor([19, 15, 21, 18,  1, 22, 50, 50])>

Here  , we create a function to one hot encoding so that we represent each char as a one hot tensor 

In [0]:
def onehot(index, length):
  vect = th.zeros(length).long()  # creating zeroes filled tensor 
  vect[index] = 1  # putting 1 at given index 
  return vect  # returning one hot encoded tensor


In [124]:
onehot(1, 8)

tensor([0, 1, 0, 0, 0, 0, 0, 0])

here , we are creating  a function which takes str_input as input value , and change it into one hot encoding of tensors , and then unsqueezing it so that we have a matrix of 2-d of shape 8, 69 so that we have a whole list of eevry character of string as a one hot encoded in the tensor form.

In [0]:
def string2one_hot_matrix(str_input, max_length=8):
  str_input = str_input[:max_length].lower()
  
  if(len(str_input) < max_length):
    str_input = str_input + "." * (max_length - len(str_input))
  
  char_vectors = list()
  for char in str_input:
    char_v = onehot(char2index[char], len(char2index)).unsqueeze(0)  # creating one hot encoding for each char and appending to list
    char_vectors.append(char_v)
  return th.cat(char_vectors, dim = 0)  # returning a whole matrix of one hot encoded tensors for each characters in the string given as input

In [0]:
matrix = string2one_hot_matrix('sourav')

In [127]:
matrix.shape

torch.Size([8, 69])

Now we are interested in findind if queries match the required keys that will be storing in the database , so for that we apply a very clever technique that firstly we get onc hot encoded values of both strings to be matched as a matrix , then we multiply corresponding values and sum them , so that we get exactly how many of them matched , as those matching will be 1's and not matching will be 0's.
Now , after this , we mutiply all these values in the above tensor , so that if the strings matched completely , then it will return the tensor will all 1's and multiplying these will also return '1' which is our required result and if it doesn't match completely (if a single char also doesn't matches) then it will return somewhere 0 in the above made tensor and then multiplying these will get us '0' which is also our desired result in this case.

In [0]:
def strings_equal(str_a, str_b):
  vect = (str_a * str_b).sum(1)  # multiplying both one-hot enocoded matrices and summing over them across a dimension 
  x = vect[0] # assinging x the first value of vect 
  for i in range(vect.shape[0] - 1): 
    x *= vect[i + 1]  # simply multiplying all the values in the tensor of vect 
  return x  

In [129]:
strings_equal(string2one_hot_matrix("soura"), string2one_hot_matrix("sourav"))

tensor(0)

Let's now enter something into database in the form of key value pairs

In [0]:
keys = list()
values = list()

keys.append(string2one_hot_matrix("key1"))
keys.append(string2one_hot_matrix("key2"))

In [131]:
keys

[tensor([[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
          0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
          0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
         [0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
          0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
          0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
         [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
          0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
          0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
         [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
          0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
          0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
         [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,

In [0]:
values.append(string2values("value1"))
values.append(string2values("value2"))

In [133]:
values

[<bound method TorchHook.get_hooked_method.<locals>.overloaded_native_method of tensor([22,  1, 12, 21,  5, 28, 50, 50])>,
 <bound method TorchHook.get_hooked_method.<locals>.overloaded_native_method of tensor([22,  1, 12, 21,  5, 29, 50, 50])>]

We are now interested in getting a query from user and checking whether any of our keys match with the query key so that we can return appropriate values.

In [0]:
query_str = "key2"
query_matrix = string2one_hot_matrix(query_str)

key_matches = list()
for key in keys:
  key_match = strings_equal(key, query_matrix)
  key_matches.append(key_match)

In [135]:
print(key_matches)
print(values)

[tensor(0), tensor(1)]
[<bound method TorchHook.get_hooked_method.<locals>.overloaded_native_method of tensor([22,  1, 12, 21,  5, 28, 50, 50])>, <bound method TorchHook.get_hooked_method.<locals>.overloaded_native_method of tensor([22,  1, 12, 21,  5, 29, 50, 50])>]


In [0]:
def values2string(input_values):
    s = ""
    for value in input_values:
        s += index2char[int(value)]
    return s

In [137]:
p = string2values("sourav")
print(p)

<bound method TorchHook.get_hooked_method.<locals>.overloaded_native_method of tensor([19, 15, 21, 18,  1, 22, 50, 50])>


Let's now make a class and put them all inside one.

In [0]:
def string2values(str_input, max_len=8):

    str_input = str_input[:max_len].lower()

    # pad strings shorter than max len
    if(len(str_input) < max_len):
        str_input = str_input + "." * (max_len - len(str_input))

    values = list()
    for char in str_input:
        values.append(char2index[char])

    return th.tensor(values).long()

def values2string(input_values):
    s = ""
    for value in input_values:
        s += index2char[int(value)]
    return s

def strings_equal(str_a, str_b):

    vect = (str_a * str_b).sum(1)

    x = vect[0]

    for i in range(vect.shape[0] - 1):
        x = x * vect[i + 1]    

    return x

def one_hot(index, length):
    vect = th.zeros(length).long()
    vect[index] = 1
    return vect
def string2one_hot_matrix(str_input, max_len=8):

    str_input = str_input[:max_len].lower()

    # pad strings shorter than max len
    if(len(str_input) < max_len):
        str_input = str_input + "." * (max_len - len(str_input))

    char_vectors = list()
    for char in str_input:
        char_v = one_hot(char2index[char], len(char2index)).unsqueeze(0)
        char_vectors.append(char_v)
        
    return th.cat(char_vectors, dim=0)

In [0]:
class EncryptedDB():
    
    def __init__(self, *owners, max_key_len=8, max_val_len=8):
        self.max_key_len = max_key_len
        self.max_val_len = max_val_len
        
        self.keys = list()
        self.values = list()
        self.owners = owners
        
    def add_entry(self, key, value):
        key = string2one_hot_matrix(key)
        key = key.share(*self.owners)
        self.keys.append(key)
        
        value = string2values(value, max_len=self.max_val_len)
        value = value.share(*self.owners)
        self.values.append(value)
        
    def query(self, query_str):
        query_matrix = string2one_hot_matrix(query_str)
        
        query_matrix = query_matrix.share(*self.owners)

        key_matches = list()
        for key in self.keys:

            key_match = strings_equal(key, query_matrix)
            key_matches.append(key_match)

        result = self.values[0] * key_matches[0]

        for i in range(len(self.values) - 1):
            result += self.values[i+1] * key_matches[i+1]
            
        result = result.get()

        return values2string(result).replace(".","")

In [0]:
db = EncryptedDB(bob, alice, secure_worker, max_val_len=256) # the db has joint ownership of these workers but they will not be able to see whats inside


In [145]:
db.add_entry("Bob","(123) 456 7890")
db.add_entry("Bill", "(234) 567 8901")
db.add_entry("Sam","(345) 678 9012")
db.add_entry("Key","really big json value")

db.query("Bob")

'(123) 456 7890'