Sets and Boolean Things
====================

Let's start with two sets:

In [1]:
A = {1, 2, 3, 4}
B = {3, 4, 5, 6}

In [2]:
type(A)

set

Remember that sets are unordered collections of unique values, and behave a lot like the mathematical notion of a set, so you can perform operations a lot like the mathematical set operations.

Union
-----

For example, you can create the union of `A` and `B`, which is the set containing all of the elements in `A` and all of the elements of `B` with any duplicates removed, by calling the `union` method of `A` with
the argument `B`.

In [3]:
C = A.union(B)
print(C)

{1, 2, 3, 4, 5, 6}


The same effect can be acheived using the | operator:

In [4]:
print(A|B)

{1, 2, 3, 4, 5, 6}


Note one important way in which the method differs from the operator.  For the method, `B`
can be any container.  Not so with the operator.

In [5]:
# Union with a list
print(A.union([3,4,5,6,3]))
print(A | [3,4,5,6])

{1, 2, 3, 4, 5, 6}


TypeError: ignored

Another way in which `|` differs from the union method is that `|` has a second meaning in Python.

In [6]:
True|False, False|True, False|False

(True, True, False)

In [7]:
# Using Python's standard implementation of numerical arrays
import numpy as np
np.array([0,1]) | np.array([1,0])

array([1, 1])

So `|` can also mean 'or'.

Those who've been exposed to set theory will have seen some of algebraic similarities between set operations and Boolean operations like 'or'.  Logically, you can think of `a.union(b)` as the set of things in `a` _or_ `b`.

Computationally, though, it is better to think of '|' as a special kind of `or` called bitwise `or` because of how it works with numbers.

In [None]:
X,Y = 5,3
# We print 5 and 3 as base 2 numbers to illustrate how | works
print(f'{X:03b} {Y:03b} {7:03b}')
print(X|Y)

101 011 111
7


Contrast Python's logical `or`:

In [8]:
print(5 or 3)
print(0 or 3)
print(0 or 0)

5
3
0


Intersection
------------

As an alternative, the intersection of two sets are the elements that are in both sets, so in the above example it would be the elements `3` and `4`.

You can compute the intersection using the `intersection()` method:

In [26]:
C = A.intersection(B)
print(A, B, C)

{1, 2, 3, 4} {3, 4, 5, 6} {3, 4}


or using the binary <i>and</i> operator `&`, since you want things in `a` _and_ in `b`:

In [11]:
C = A & B
print(C)

{3, 4}


Set Difference
--------------

Set difference gives you the elements which are only in `a`, and not in `b`.  So in this example it would be 1 and 2.

You can compute the set difference using the `difference()` method:

In [13]:
C = A.difference(B)
print(C)

{1, 2}


or using the standard minus operator `-`:

In [14]:
C = A - B
print(C)

{1, 2}


Note that unlike union and intersection, `a - b` and `b - a` are different.  In this example, `b - a` gives 5 and 6:

In [15]:
B - A

{5, 6}

Symmetric Difference
--------------------

The last basic operation is symmetric difference, which is the things which are in `a` or in `b`, but _not_ in both.  So in this example it would be 1, 2, 5, and 6, but not 3 and 4.

You can compute the symmtric difference using the `symmetric_difference()` method:

In [16]:
C = A.symmetric_difference(B)
print(C)

{1, 2, 5, 6}


or using the "exclusive or" (or "xor") binary operator `^`, because you are getting the elements which are in `a` _xor_ `b`:

In [17]:
C = A ^ B
print(C)

{1, 2, 5, 6}


#### Summary

```
  Method              Operation             Op Name
.union()                  |            bitwise "or"                      
.intersection()           &            bitwise "and" 
.difference()             -            setminus/set difference
.symmetric_difference()   ^            xor
```

Operations all overlap in meaning with the methods,
but have different restrictions, and when operating
on nonsets (bit sequences) different meanings.


Set Containment
---------------

There are also a couple of operations that allow you to check for set containment.  If we have two sets like this:

In [18]:
a = {1, 2, 3}
b = {1, 2}

You can ask if `b` is a subset of `a`:

In [19]:
b.issubset(a)

True

You can also use `<=` to test for containment, since if `b` is a subset of `a` it is smaller than `a` in a very real sense:

In [20]:
b <= a

True

You can also ask if `a` is a superset of `b` using the `issuperset()` method:

In [21]:
a.issuperset(b)

True

or by using `>=`:

In [22]:
a >= b

True

Although there aren't methods for these, you can also use `>` and `<` to test for proper subset and superset, so `a` is a subset of itself:

In [23]:
a <= a

True

but not a proper subset:

In [24]:
a < a

False

and this isn't surprising if you think about it.

Finally the `isdisjoint()` method tells you if two sets have nothing in common:

In [25]:
a = {1, 2}
b = {3, 4}
a.isdisjoint(b)

True

Updating Sets
----------------

None of the methods discussed above modify the content of the sets that call them: `A.union(B)` returns a set
that is the union of `A` and `B` but leaves `A` unchanged.  Sometimes you want to change the contents of a set, to add or remove elements. 

### Adding & subtracting elements

For a list, you would use `append()` and `extend()` methods to add elements to the end of the list; because sets are unordered, the names for the corresponding methods are different.

The `add()` method adds a single element to a set, like `append()`:

In [None]:
t = {1, 2, 3}
t.add(5)
t

{1, 2, 3, 5}

Remember that you can't have duplicates in a set, so if you add the same element twice, you'll only see it once in the set:

In [None]:
t.add(5)
t

{1, 2, 3, 5}

The `update()` method adds a collection of elements from another container into the set. It is worth emphasizing that the second element does not have to be a set.  It can be any container, with the elements of
the container added one by one.

In [None]:
t = {1, 2, 3, 5}
t.update([5, 6, 7])
print(t)
t.update((8,9,8))
print(t)
t.update('ab')
print(t)
t.add('ab')
print(t)

{1, 2, 3, 5, 6, 7}
{1, 2, 3, 5, 6, 7, 8, 9}
{1, 2, 3, 5, 6, 7, 8, 9, 'a', 'b'}
{1, 2, 3, 5, 6, 7, 8, 9, 'ab', 'a', 'b'}


This resembles `update()` for dictionaries, which changes the contents of a dictionary.

In [None]:
dd0 = {'a':1, 'b':2}
print(dd0)
dd1 = {'a':2, 'c':7}
dd0.update(dd1)
print(dd0)

{'a': 1, 'b': 2}
{'a': 2, 'b': 2, 'c': 7}


Finally, there are update operations corresponding to `intersection` and `difference`, which update
the membership of the set to be the result of these operations.

Thus, `update` on sets could be called `union_update` to emphasize its similarity to the other two update
operations.  The motivation for the shorter name is probably to emphasize its similarity
to the dictionary operation.

In [None]:
t = {1, 2, 3,4,5,6,7}
print(t)
t.intersection_update([5, 6, 7])
print(t)
print('='*25)
t = {1, 2, 3}
s = {2,4,5}
print(t)
t.difference_update(s)
print(t)

{1, 2, 3, 4, 5, 6, 7}
{5, 6, 7}
{1, 2, 3}
{1, 3}


Note that update operations don't return a value. They just change the set calling the method.

In [None]:
t = {1, 2, 3}
s = {2,4,5}
print(t)
v = t.difference_update(s)
print(v,t)

{1, 2, 3}
None {1, 3}


### Removing Elements

There are at least 3 different ways of removing elements from a set.

The first is the `remove()` method which works a lot like the list `remove()` method by the value of the element, so:

In [None]:
t = {1, 2, 3}
t.remove(1)
t

{2, 3}

removes `1` from the set.  However if you try to remove an element which doesn't exist the `remove()` method complains by raising a `KeyError` exception:

In [None]:
t.remove(10)

KeyError: 10

If you want to remove an element from the set, but don't care which one, then you can use the `pop()` method:

In [None]:
t.pop()

2

In [None]:
t

{3}

Notice that `pop()` returns the value of the element that was removed from the set.  If you try to `pop()` an element from an empty set, then it raises a `KeyError`:

In [None]:
s = set()
s.pop()

KeyError: 'pop from an empty set'

Also notice that in these examples we've been seeing the elements of the set as if they are in order, and the `pop()` removed the first one, but that's an accident of the way that sets store small numbers, and isn't guaranteed in general.

If you want to remove an element without worrying about getting an exception, `discard()` works like `remove()`, so if you say:

In [None]:
t.discard(3)
t

it removes 3 from the set, but if you try to discard an element which isn't in the set, then it just does nothing:

In [None]:
t.discard(20)
t

in particular, it doesn't raise an exception.

## Numpy arrays

Above, using the example of `|`, we introduced the idea of a **bitwise operator**, which presupposes
arguments that can be interpreted as sequences of bits (0s and 1s).  Because numbers can be interpreted
as sequences of binary digits, they can serve as arguments to bitwise operators like `|`,
but they are not inherently sequences (it makes no sense to ask what the first element of
5 is).

Along with other programming languages, Python has a notion of a **bit vector** (or bit array), though the natural implementation is in `numpy`,  a module that must be imported; `numpy` arrays  are a datatype
of their own; they are updatable sequences generally used with a fixed datatype, usually numerical.

In this section, we will restrict discussion to arrays with Boolean values like 0 and 1 (note that
numpy arrays with `True` behave basically the same).

In [None]:
import numpy as np

a = np.array([1,1,0]) 
print(a)
b =  np.array([True,True,False]) 
print(b)
a == b

[1 1 0]
[ True  True False]


array([ True,  True,  True])

Note that you can interpret this as the binary (base 2) representation of the number 5, but that's just one of an infinite number of interpretation of these three binary bits.

So it's not accurate to think of bit arrays as binary numbers.  Python has  no datatype for a binary number, just `int`s, which by default are printed in their usual base 10 form, or upon request  in binary or other
bases.

In [None]:
X = 15
print(f'decimal: {X}   binary: {X:03b}   octal: {X:03o}   hexadecimal: {X:03x}')

decimal: 15   binary: 1111   octal: 017   hexadecimal: 00f


Similarly a string of digits can be converted to an `int` by being interpreted according to
a given base.

In [None]:
int('101'), int('101',2), int('101',8), int('101',16)

(101, 5, 65, 257)

The `&` and `|` operators work as expected on numpy arrays:

In [None]:
a, b = np.array([1,0,1]),np.array([0,1,1])
print(a|b)
print(a&b)

[1 1 1]
[0 0 1]


Because Python provides bitwise operators like "|", it might make sense to think a bit before
incurring the overhead of a `numpy` array.  What job can an array do that an integers 3 or 5 can't do?  Arrays are justified when specific kinds of binary information needs to be encoded and you need to efficiently answer questions like "What is the third bit?" and "How many 1s are there?"

To answer these question for `a`, do:

In [None]:
print(a)
print(a[2])
print(a.sum())

[1 1 0]
0
2
