## SET OPERATIONS

Python has useful **operations** that can be performed between sets
* **union** - Returns all unique values in both sets
```python
set1.union(set2)
```
* **intersection** - Returns values present in both sets 
```python
set1.intersection(set2)
```
* **difference** - Returns values present in set 1, but not set 2
```python
set1.difference(set2)
```
* **symmetric difference** - Returns values not shared between sets (opposite of intersection)
```python
set1.symmetric_difference(set2)
```

**PRO TIP:** Chain set operations to capture the relationship between three or more sets, for example - 
```python
set1.union(set2).union(set3)
```

In [1]:
friday_items = {'snowboard', 'snowboard', 'skis', 'snowboard', 'sled'}

satuarday_items = {'goggles', 'helmet', 'snowboard', 'skis', 'goggles'}

sunday_items = {'coffee'}

In [2]:
# UNION 

friday_items.union(satuarday_items)

{'goggles', 'helmet', 'skis', 'sled', 'snowboard'}

In [3]:
friday_items.union(satuarday_items).union(sunday_items)

{'coffee', 'goggles', 'helmet', 'skis', 'sled', 'snowboard'}

In [4]:
# INTERSECTION

friday_items.intersection(satuarday_items)

{'skis', 'snowboard'}

In [5]:
# since no value is present in all three sets, an empty set is returned

friday_items.intersection(satuarday_items).intersection(sunday_items)

set()

In [6]:
# DIFFERENCE
# 'sled' is the only value in friday_items that is NOT in satuarday_items

friday_items.difference(satuarday_items)

{'sled'}

In [7]:
# If you reverse the order, the output changes -- 
# 'goggles' and 'helmet' are in satuarday_items but NOT in friday_items
# Note that the subtraction sign can be used instead of the difference method

satuarday_items - friday_items

{'goggles', 'helmet'}

In [8]:
# SYMMETRIC DIFFERENCE
# 'sled' is only in set 1, and 'goggles' and 'helmet' are only in set 2

friday_items.symmetric_difference(satuarday_items)

{'goggles', 'helmet', 'sled'}

### SET USE CASES

1. Sets are more efficient than lists for performing **membership tests**

In [9]:
time_list = list(range(1000000))
time_set = set(range(1000000))

In [14]:
%%time
100000 in time_list

CPU times: user 712 µs, sys: 0 ns, total: 712 µs
Wall time: 716 µs


True

In [15]:
%%time
100000 in time_set

CPU times: user 2 µs, sys: 6 µs, total: 8 µs
Wall time: 7.87 µs


True

Sets are implemented as **hash tables**, which makes looking up values extremely fast; the downside is that
they cannot preserve order (lists rely on dynamic arrays that preserve order but have slower performance)

2. Sets can **gather unique values** efficiently without looping 

In [16]:
shipment_today = ['ski', 'snowboard', 'ski', 'ski', 'helmet', 'hat', 'goggles']
shipment_yesterday = ['hat', 'goggles', 'snowboard', 'hat', 'bindings']

In [17]:
%%time
unique_items = []

for item in shipment_today:
    if item not in unique_items:
        unique_items.append(item)

unique_items

CPU times: user 12 µs, sys: 0 ns, total: 12 µs
Wall time: 14.1 µs


['ski', 'snowboard', 'helmet', 'hat', 'goggles']

In [18]:
%%time
list(set(shipment_today))

CPU times: user 3 µs, sys: 0 ns, total: 3 µs
Wall time: 3.81 µs


['ski', 'hat', 'snowboard', 'goggles', 'helmet']

3. Set operations can find the **data shared, or not shared, between items** without looping

In [20]:
shipment_today = ['ski', 'snowboard', 'ski', 'ski', 'helmet', 'hat', 'goggles']
shipment_yesterday = ['hat', 'goggles', 'snowboard', 'hat', 'bindings']

unique_today = []
for item_t in shipment_today:
    if item_t not in shipment_yesterday:
        if item_t not in unique_today:
            unique_today.append(item_t)

unique_today

['ski', 'helmet']

In [21]:
set(shipment_today).difference(set(shipment_yesterday))

{'helmet', 'ski'}

### Practice

In [22]:
transaction1 = ['snowboard', 'helmet', 'boots', 'hat', 'sweater', 'sweater']
transaction2 = ['helmet', 'boots', 'skis', 'keychain', 'coffee', 'hat']
transaction3 = ['snowboard', 'helmet', 'boots', 'ski poles']

In [23]:
transaction1_set = set(transaction1)
transaction2_set = set(transaction2)
transaction3_set = set(transaction3)

In [24]:
# transaction1 and transaction3 were non sale days, group them together

non_sale_set = transaction1_set.union(transaction3_set)

non_sale_set

{'boots', 'hat', 'helmet', 'ski poles', 'snowboard', 'sweater'}

In [25]:
# what non sale days had in common with sale days
# what was shared across all three days

non_sale_set.intersection(transaction2_set)

{'boots', 'hat', 'helmet'}

In [26]:
transaction2_set.intersection(non_sale_set)

{'boots', 'hat', 'helmet'}

In [28]:
# what wasn't shared?
# because I'm really interested in my sale day, I'm going to take the difference between transaction2 and non_sale_set
# output - 3 unique items that were purchased in transaction2 that weren't purchased in non sale days

transaction2_set.difference(non_sale_set)

{'coffee', 'keychain', 'skis'}

In [29]:
# I remember we had a special anniversary keychain on sale, a discount on coffee, 
# and I guess what that person was in the shop, they decided to buy another pair of skis, so this was a pretty effective promotion
# we should really look into doing more coffee and keychain promotions in the future

transaction2_set - non_sale_set

{'coffee', 'keychain', 'skis'}

In [31]:
non_sale_set.symmetric_difference(transaction2_set)

{'coffee', 'keychain', 'ski poles', 'skis', 'snowboard', 'sweater'}