## SETS

A **set** is a collection of unique values
* Sets are **unordered**, which means their values cannot be accessed via index or key
* Sets are mutable (values can be added/removed), but set _values_ must be **unique** & **immutable**

In [1]:
# The absense of colons differentiates them from dictionaries
# Note that duplicate values are automatically removed when created


my_set = {'snowboard', 'snowboard', 'skis', 'snowboard', 'sled'}

my_set

{'skis', 'sled', 'snowboard'}

In [2]:
# Sets can also be created via conversion using set()
# Note that duplicate values are automatically removed when created

my_set = set(['snowboard', 'snowboard', 'skis', 'snowboard', 'sled'])

my_set

{'skis', 'sled', 'snowboard'}

### Working with SETS

You can conduct **membership tests** on sets

In [3]:
my_set

{'skis', 'sled', 'snowboard'}

In [4]:
'snowboard' in my_set

True

You can **loop through** them

In [5]:
for value in my_set:
    print(value)

skis
snowboard
sled


But you **can't index** them (_they are unordered_)

In [6]:
my_set[0]

TypeError: 'set' object is not subscriptable

### Practice

Sets are very helpful for returning unique values in sequences of data.

In [7]:
# three transactions made by the same customer

transaction1 = ['snowboard', 'helmet', 'boots', 'hat', 'sweater', 'sweater']
transaction2 = ['helmet', 'boots', 'skis', 'keychain', 'coffee', 'hat']
transaction3 = ['snowboard', 'helmet', 'boots', 'ski poles']

In [8]:
# how many sweater were purchased in transaction1?

transaction1.count('sweater')

2

In [9]:
# number of unique items purchased in transaction1

set(transaction1)

{'boots', 'hat', 'helmet', 'snowboard', 'sweater'}

In [10]:
# if working with a bigger piece of data, we could use the len() function to get the number of unique items purchased in transaction1

len(set(transaction1))

5

In [11]:
'boots' in set(transaction1)

True

In [12]:
# if we want to aggregate data and collect unique transactions or unique values across multiple rows of data

set(transaction1 + transaction2 + transaction3)

# get all the unique items across three transactions

{'boots',
 'coffee',
 'hat',
 'helmet',
 'keychain',
 'ski poles',
 'skis',
 'snowboard',
 'sweater'}

In [13]:
# use the union() method

set(transaction1).union(transaction2).union(transaction3)

{'boots',
 'coffee',
 'hat',
 'helmet',
 'keychain',
 'ski poles',
 'skis',
 'snowboard',
 'sweater'}

In [14]:
# use the len() function count

len(set(transaction1 + transaction2 + transaction3))

9

In [15]:
len(set(transaction1).union(transaction2).union(transaction3))

9

They are making a lot of repeat purchases. 
This would be a good case study for further analysis by Alfie and our Marketing team.