## Python Sets

### What are they?
Sets are one of the Python's built in types that have these characteristics:
1. **Unordered**
2. **Unique** - duplicate elements not allowed
3. Sets themselves are **mutable** but set elements are **Immutable** and **Unchangeable** - a set itself can be changed, but not the elements in the set
4. **Unindexed** - can't access with `[i]` as with lists
5. **Iterable** - we can loop over set items


We can construct by putting items inside curly braces (like dictionaries) separated by commas or by using the `set` operator:

In [32]:
set1 = {"apple", "banana", "cherry", "apple"}
# or
set2 = set(["apple", "banana", "cherry", "apple"])
print(set1) 
print(set2)
set1==set

{'cherry', 'apple', 'banana'}
{'cherry', 'apple', 'banana'}


False

Notice in our results that:
- duplicates were removed (sets are **unique**)
- Items were re-ordered (sets are **unordered**)

What about different data types in sets?

In [7]:
set3 = {"apple",1,"banana",4,"cherry"}
print(set3)
set4 = {"apple",1,"banana",4,"cherry",(1,2,3,4)}
print(set4)
set5 = {"apple",1,"banana",4,"cherry",[1,2,3,4]}
print(set5)

{1, 'apple', 4, 'cherry', 'banana'}
{(1, 2, 3, 4), 1, 'apple', 4, 'cherry', 'banana'}


TypeError: unhashable type: 'list'

We can see above that:
- we can have different *allowed* data types mixed together in sets
    + *immutable types* - string, numbers, tuples
- we can't put *mutable* items in a set (set elements are **immutable**) - doing this returns the *unhashable type* error for set4 where we try to put a list in a set

### Accessing `set` items

Let's access the first item in our set-

In [3]:
set1[1]

TypeError: 'set' object is not subscriptable

Remember, sets are **unindexed** so we get an error trying to index.

But they are **iterable**:

In [9]:
for x in set1:
  print(x)

cherry
apple
banana


### Modifying Sets
We can modify sets in by adding a single *immutable* item using `add()` or several immutable items using `update()`

In [33]:
newset=set1
print(newset)
newset.add('pear')
print(newset)
newset.update([6],[7],[8],['melon'])
print(newset)

{'cherry', 'apple', 'banana'}
{'cherry', 'apple', 'pear', 'banana'}
{'melon', 'apple', 6, 7, 8, 'cherry', 'banana', 'pear'}


We can get rid of items using:
- `discard()` – removes a particular item or does nothing if that item is absent in the set
- `remove()` – removes a particular item or raises KeyError if that item is absent in the set
- `pop()` – removes and returns a random item or raises KeyError if the set is empty
- `clear()` – clears the set (removes all the items)ms)

In [28]:
print(newset)
newset.remove('cherry')
print(newset)
newset.discard('charry') # already removed but does nothing if item is absent, unlike remove
print(newset)
new = newset.pop()
print(new)

{'melon', 'apple', 6, 7, 8, 'cherry', 'banana', 'pear'}
{'melon', 'apple', 6, 7, 8, 'banana', 'pear'}
{'melon', 'apple', 6, 7, 8, 'banana', 'pear'}
melon
set()


### Operating on sets
We can use a number of Python built-in operators on sets:

In [38]:
print(len(newset))

8


In [40]:
'pear' in newset

True

In [41]:
'cherry' not in newset

False

In [45]:
numericset = {2,4,6}
print(min(numericset))
print(max(numericset))
print(sum(numericset))
print(sorted(numericset))

2
6
12
[2, 4, 6]


In [44]:
charset = {'me','you','them'}
print(min(charset))
print(max(charset))
print(sorted(charset))

me
you
['me', 'them', 'you']


#### Union
The real power of sets is doing real `set` operations such as unions, intersections and differences which we can do using either set operators or set methods - here we show both with union operator and method :

In [46]:
a={'apple', 'pear', 'cherry'}
b={'peach', 'mellon', 'pear'}
print(a | b)
print(b | a)
print(a.union(b))
print(b.union(a))

{'mellon', 'apple', 'peach', 'cherry', 'pear'}
{'cherry', 'apple', 'peach', 'mellon', 'pear'}
{'mellon', 'apple', 'peach', 'cherry', 'pear'}
{'cherry', 'apple', 'peach', 'mellon', 'pear'}


#### Intersection 
using numbers

In [48]:
a = {1, 2, 3, 4, 5}
b = {4, 5, 6, 7}
print(a & b)
print(b & a)
print(a.intersection(b))
print(b.intersection(a))

{4, 5}
{4, 5}
{4, 5}
{4, 5}


### Difference
returns a new set containing all the items from the first (left) set that are absent in the second (right) set - order of operation makes a difference!

In [49]:
print(a - b)
print(b - a)
print(a.difference(b))
print(b.difference(a))

{1, 2, 3}
{6, 7}
{1, 2, 3}
{6, 7}


#### Symmetric Difference
returns items present in either first or second set but not both - think of this as the difference between the set union and the set intersection

In [50]:
print(a ^ b)
print(b ^ a)
print(a.symmetric_difference(b))
print(b.symmetric_difference(a))

{1, 2, 3, 6, 7}
{1, 2, 3, 6, 7}
{1, 2, 3, 6, 7}
{1, 2, 3, 6, 7}


### Application
There are a number of powerful and often underused applications of sets - I often use to get the intersection of columns in different `pandas` data.frames (series). I also use to get list intersections. 

In [51]:
df1 = {'A': [1, 2, 3, 4], 
         'B': ['cherry', 'apple', 'melon', 'peach'],
         'C': ['Bill', 'Susan', 'Tom', 'Laurie']}  
df2 = {'A': [1, 2, 3, 4, 5, 6 ], 
         'B': ['pear', 'cherry', 'peach', 'pear'], 
         'C':['Tom', 'Laurie', 'Jim', 'Ken']} 

In [52]:
# Union of A in df1 and df2:
print(set(df1['A']).union(set(df2['A'])))
#intersect_B
#symmetric_difference_C


{1, 2, 3, 4, 5, 6}


In [53]:
# Intersect of B in df1 and df2:
print(set(df1['B']).intersection(set(df2['B'])))

{'cherry', 'peach'}


In [54]:
# Symmetric difference of C in df1 and df2:
print(set(df1['B']) ^ (set(df2['B'])))

{'melon', 'apple', 'pear'}


## Python Collections
The `collections` modeule was originally created with only one data structure, `deque` which is a double-ended queue for for fast efficient `pop` and `append` operations on either end of a sequence.  Other usefule types have been added such as `OrderedDict`, `defaultdict`, and a couple others.

We'll just glance at a couple usefule examples of these types in `collections` - I use `defaultdict`, `OrderedDict`, and `deque` for a number of steps in Python code to build the [StreamCat](https://github.com/USEPA/StreamCat) codebase to work efficiently with [NHDPlus](https://www.epa.gov/waterdata/nhdplus-national-hydrography-dataset-plus) flow (from-to) tables.  These tables simply contain information on what flows to what and you can make use of `collections` structures to quickly and efficiently mine this from-to table to generate watersheds for every stream reach.


### OrderedDict
Sometimes we need to maintain the oder of dictionary keys based on the order they are created (dictionaries are unordered) - `OrderedDict` was added to `collections` to handle this:

In [56]:
from collections import OrderedDict

results = OrderedDict()

results["first set"] = "0-5"
results["second set"] = "6-10"
results["third set"] = "11-15"
results["fourth set"] = "16+"

for group, range in results.items():
    print(group, "->", range)


first set -> 0-5
second set -> 6-10
third set -> 11-15
fourth set -> 16+


### `defaultdict` for handling missing keys
Very handy for setting rules when you encounter missing keys in dictionaries

In [64]:
from collections import defaultdict
d = defaultdict(set,results)
d

defaultdict(set,
            {'first set': '0-5',
             'second set': '6-10',
             'third set': '11-15',
             'fourth set': '16+'})

In [66]:
if 'last set' in d:
# Do something with 'key'...
    print(d['last set'])
else:
 d['last_set'] = d['fourth set']
d

defaultdict(set,
            {'first set': '0-5',
             'second set': '6-10',
             'third set': '11-15',
             'fourth set': '16+',
             'last_set': '16+'})

### *deque*
A deque is a double-ended queue in which elements can be both inserted and deleted from either the left or the right end of the queue. Operations on a deque include:
- 
append(item): Add an item to the right end
- appendleft(item): Add an item to the left end.
- insert(index, value): Add an element with the specified value at the given index.
- extend(list): This function is used to insert multiple values at the right end. It takes a list of values as an argument.
- extendleft(list): This function is similar to extend(), but it reverses the list of values passed as the argument and then appends that list to the left end of the deque.
- pop(): Remove an element from the right end.
- popleft(): Remove an element from the left end.
- remove(value): Remove the first occurrence of the mentioned value.
- count(value): Return the total number of occurrences of the given value.
- index(e, start, end): Search the given element from start to finish​ and return the index of the first occurrence.
- rotate(n): Rotate the deque n number of times. A positive value rotates it to the right, while a negative value rotates it to the left.
- reverse(): Reverse the order of the deque..erse the order of the deque.

Examples from [here](https://www.educative.io/answers/how-to-use-a-deque-in-python) and good place to read more is [here](https://realpython.com/python-collections-module/#building-efficient-queues-and-stacks-deque)

In [67]:
# Import collections module:
import collections

# Initialize deque:
dq = collections.deque([4, 5, 6])

print(dq)

# Append to the right:
dq.append(7)
print("Append 7 to the right: ", list(dq))

# Append to the left:
dq.appendleft(3)
print("Append 3 to the left: ", list(dq))

# Append multiple values to right:
dq.extend([8, 9, 10])
print("Append 8, 9 and 10 to the right: ", list(dq))

# Append multiple values to left:
dq.extendleft([1, 2])
print("Append 2 and 1 to the left: ", list(dq))

# Insert -1 at index 5
dq.insert(5, -1)
print("Insert -1 at index 5: ", list(dq))

# Pop element from the right end:
dq.pop()
print("Remove element from the right: ", list(dq))

# Pop element from the left end:
dq.popleft()
print("Remove element from the left: ", list(dq))

# Remove -1:
dq.remove(-1)
print("Remove -1: ", list(dq))

# Count the number of times 5 occurs:
i = dq.count(5)
print("Count the number of times 5 occurs: ", i)

# Return index of '7' if found between index 4 and 6:
i = dq.index(7, 4, 6)
print("Search index of number 7 between index 4 and 6: ", i)

# Rotate the deque three times to the right:
dq.rotate(3)
print("Rotate the deque 3 times to the right: ", list(dq))

# Reverse the whole deque:
dq.reverse()
print("Reverse the deque: ", list(dq))

deque([4, 5, 6])
Append 7 to the right:  [4, 5, 6, 7]
Append 3 to the left:  [3, 4, 5, 6, 7]
Append 8, 9 and 10 to the right:  [3, 4, 5, 6, 7, 8, 9, 10]
Append 2 and 1 to the left:  [2, 1, 3, 4, 5, 6, 7, 8, 9, 10]
Insert -1 at index 5:  [2, 1, 3, 4, 5, -1, 6, 7, 8, 9, 10]
Remove element from the right:  [2, 1, 3, 4, 5, -1, 6, 7, 8, 9]
Remove element from the left:  [1, 3, 4, 5, -1, 6, 7, 8, 9]
Remove -1:  [1, 3, 4, 5, 6, 7, 8, 9]
Count the number of times 5 occurs:  1
Search index of number 7 between index 4 and 6:  5
Rotate the deque 3 times to the right:  [7, 8, 9, 1, 3, 4, 5, 6]
Reverse the deque:  [6, 5, 4, 3, 1, 9, 8, 7]
