# Private Set Intersect

- Identify the intersection of multiple datasets without revealing the contents of the datasets.
- The datasets are not required to be sorted.
- The datasets are not required to be of the same size.
- They are required to have a common identifier.
- We first implement a primitive variant of this.

In [1]:
import numpy as np

Create two datasets with a common identifier.

Each identifier should be unique within a dataset.

In [13]:
data_party1=np.unique(np.random.randint(0,100000,10000))
data_party2=np.unique(np.random.randint(0,100000,10000))

Using the numpy internal intersection function, we can find the intersection of the two datasets.

In [53]:
%%time
result_quick=np.intersect1d(data_party1,data_party2)
result_quick.shape

CPU times: total: 15.6 ms
Wall time: 2.1 ms


(938,)

Naive Implementation would be to simply compare each element of one dataset with the other and add if they match

In [21]:
%%time
result=[]
for i in range(data_party1.shape[0]):
    if data_party1[i] in data_party2:
        result.append(data_party1[i])

CPU times: total: 250 ms
Wall time: 255 ms


Ultra naive implementation would be to compare each element of one dataset with each element of the other and add if they match

In [22]:
%%time
result=[]
for i in range(data_party1.shape[0]):
    for s in range(data_party2.shape[0]):
        if data_party1[i]==data_party2[s]:
            result.append(data_party1[i])

CPU times: total: 49.1 s
Wall time: 53.1 s


In [39]:
# Python program for Bitonic Sort. Note that this program
# works only when size of input is a power of 2.

# The parameter dir indicates the sorting direction, ASCENDING
# or DESCENDING; if (a[i] > a[j]) agrees with the direction,
# then a[i] and a[j] are interchanged.*/


def compAndSwap(a, i, j, dire):
	if (dire == 1 and a[i] > a[j]) or (dire == 0 and a[i] > a[j]):
		a[i], a[j] = a[j], a[i]

# It recursively sorts a bitonic sequence in ascending order,
# if dir = 1, and in descending order otherwise (means dir=0).
# The sequence to be sorted starts at index position low,
# the parameter cnt is the number of elements to be sorted.


def bitonicMerge(a, low, cnt, dire):
	if cnt > 1:
		k = cnt//2
		for i in range(low, low+k):
			compAndSwap(a, i, i+k, dire)
		bitonicMerge(a, low, k, dire)
		bitonicMerge(a, low+k, k, dire)

# This function first produces a bitonic sequence by recursively
# sorting its two halves in opposite sorting orders, and then
# calls bitonicMerge to make them in the same order


def bitonicSort(a, low, cnt, dire):
	if cnt > 1:
		k = cnt//2
		bitonicSort(a, low, k, 1)
		bitonicSort(a, low+k, k, 0)
		bitonicMerge(a, low, cnt, dire)

# Caller of bitonicSort for sorting the entire array of length N
# in ASCENDING order


def sort(a, N, up):
	bitonicSort(a, 0, N, up)


# Driver code to test above
a = [3, 7, 4, 8, 6, 2, 1, 5]
n = len(a)
up = 1

sort(a, n, up)
print("\n\nSorted array is")
for i in range(n):
	print("%d" % a[i], end=" ")






Sorted array is
1 5 2 6 3 7 4 8 

In [50]:
arr1 = np.sort(data_party1)
arr2 = np.sort(data_party2)
bitonic_seq = np.append(arr1, arr2[::-1])
bitonic_seq = np.append(bitonic_seq,np.zeros(2**15-len(bitonic_seq)))
bitonicMerge(bitonic_seq, 0, 2**15, 1)
bitonic_seq

array([    0.,     0.,     0., ..., 99995., 99996., 99997.])

In [61]:
%%time
def find_intersection(arr1, arr2):
    # Create bitonic sequence by concatenating datasets in reverse order
    arr1 = np.sort(data_party1)
    arr2 = np.sort(data_party2)
    bitonic_seq = np.append(arr1, arr2[::-1])
    bitonic_seq = np.append(bitonic_seq,np.zeros(2**15-len(bitonic_seq)))
    
    # Perform bitonic merge sort on the bitonic sequence
    bitonicMerge(bitonic_seq, 0, len(bitonic_seq), 1)
     # Remove zeros from the bitonic sequence
    non_zero_indices = np.nonzero(bitonic_seq)
    bitonic_seq = bitonic_seq[non_zero_indices]
    
    # Find common elements during the merge process
    intersection = []
    prev = None
    for num in bitonic_seq:
        if num == prev:
            intersection.append(num)
        prev = num

    return intersection

len(find_intersection(data_party1, data_party2))


CPU times: total: 172 ms
Wall time: 160 ms


938