# sangupta/ps

Switch branches/tags
Nothing to show
Fetching contributors…
Cannot retrieve contributors at this time
74 lines (58 sloc) 2.28 KB

# Problem

You are given multiple integer sets and you need to find duplicates in them, ie. you need to find the intersection of all the sets.

# Solution

Suppose we have 3 arrays of integer sets and we need to merge them. The fastest solution would be possible in time of `O(n1 + n2 + n3)` where `n1`, `n2`, `n3` are the lengths of the three sets respectively. The solution lies in using two bit-vectors (also called as bit-set or a bit-array) to represent intersection between any two sets and then using the resultant to intersect with the next one, and so on.

• Construct a running bit-set and populate it with the first array
• Using a second bit-set intersect the first bit-set with second array
• Now change-over the second to the first bit-set
• And repeat the process above with the third array, fourth array and so on
```public void findDuplicates(int[] array1, int[] array2, int[] array3) {
BitSet first = new BitSet();
BitSet result = new BitSet();

// populate the initial one
for(int num : array1) {
first.setBit(num, true);
}

// intersect with second
for(int num : array2) {
if(first.isBitSet(num)) {
result.setBit(num);
}
}

// change-over
first = result;
result = new BitSet();

// intersect with third
for(int num : array3) {
if(first.isBitSet(num)) {
result.setBit(num);
}
}

// output
int index = -1;
do {
index = result.getNextSetBit(index);
if(index < 0) {
break;
}

System.out.println(index);
} while(true);
}```

The solution above can be extended to as many arrays as are provided in the problem definition. The time to sort will still remain `O(N)` where `N` is the sum of total number of elements across all provided arrays.

## Optimizations available

• One can make use of sparsed-bit-arrays to reduce memory consumption. Refer to brettwooldridge/SparseBitSet for one such implementation.

• If the arrays are really, really huge - an implementation that uses file-based persistence of a bit-array can be used. Refer to one such implementation available in the jerry-core project.