# Set

* Unordered collection of unique elements. We can think as dict but only keys, no values
* Basic use,
    - Removing duplicate values
    - Membership test
    - Support operation like union, intersection, symmetric difference (all elements in exactly one set)

In [1]:
xyz = {'apple', 'banana', 'chiku', 'mango', 'apple', 'orange'}

In [2]:
xyz

{'apple', 'banana', 'chiku', 'mango', 'orange'}

In [3]:
'mango' in xyz

True

In [4]:
'papaya' in xyz

False

In [5]:
set([1,2,4,6,1,4,5])

{1, 2, 4, 5, 6}

In [6]:
a = set('abracadabra')

In [7]:
a

{'a', 'b', 'c', 'd', 'r'}

In [8]:
b = set('alacazam')

In [9]:
b

{'a', 'c', 'l', 'm', 'z'}

#### letters in a but not in b

In [10]:
a - b

{'b', 'd', 'r'}

In [11]:
a.difference(b)

{'b', 'd', 'r'}

In [12]:
a.difference_update(b)

In [13]:
a

{'b', 'd', 'r'}

In [14]:
a -= b

In [15]:
a

{'b', 'd', 'r'}

#### letters in a or b or both 

In [16]:
a = set('abracadabra')
b = set('alacazam')

In [17]:
a | b

{'a', 'b', 'c', 'd', 'l', 'm', 'r', 'z'}

In [18]:
a.union(b) # same as above

{'a', 'b', 'c', 'd', 'l', 'm', 'r', 'z'}

In [19]:
a.update(b)

In [20]:
a

{'a', 'b', 'c', 'd', 'l', 'm', 'r', 'z'}

In [21]:
a |= b

In [22]:
a

{'a', 'b', 'c', 'd', 'l', 'm', 'r', 'z'}

#### letters in both a and b

In [23]:
a = set('abracadabra')
b = set('alacazam')

In [24]:
a & b

{'a', 'c'}

In [25]:
a.intersection(b) # same as above

{'a', 'c'}

In [26]:
a.intersection_update(b)

In [27]:
a

{'a', 'c'}

In [28]:
a &= b

In [29]:
a

{'a', 'c'}

#### letters in a or b but not in both

In [30]:
a = set('abracadabra')
b = set('alacazam')

In [31]:
a ^ b

{'b', 'd', 'l', 'm', 'r', 'z'}

In [32]:
a.symmetric_difference(b)

{'b', 'd', 'l', 'm', 'r', 'z'}

In [33]:
a.symmetric_difference_update(b)

In [34]:
a

{'b', 'd', 'l', 'm', 'r', 'z'}

In [35]:
a ^= b

In [36]:
a

{'a', 'b', 'c', 'd', 'r'}

----------------

In [37]:
a = set('abracadabra')
a

{'a', 'b', 'c', 'd', 'r'}

In [38]:
a.add('x') # add element 'x' in a

In [39]:
a

{'a', 'b', 'c', 'd', 'r', 'x'}

In [40]:
a.remove('x') # remove element x from set

### `discard`
* safely removes an element from set by values, No error will be thrown if no value found.

In [41]:
a

{'a', 'b', 'c', 'd', 'r'}

In [42]:
a.pop() # Remove random element from set a, KeyError if set is empty

'a'

In [43]:
a

{'b', 'c', 'd', 'r'}

In [44]:
a.clear() # reset set to empty

In [45]:
a

set()

---------------

In [46]:
a = set('ala')
b = set('alacazam')

In [47]:
a.issubset(b) # True of the elements of a are all contained in b

True

In [48]:
a.issuperset(b) # True if the elements of b are all contained in a

False

In [49]:
a.isdisjoint(b)  # True if a and b have no element in common

False

In [50]:
c = a.copy() # Copy a into c

In [51]:
c

{'a', 'l'}

* Like dictionary's key, set elements must be immutable so to have list like elements you must have to convert to tuple

In [52]:
my_data = [1,2,3,4,5]

In [53]:
my_set = {tuple(my_data)}

In [54]:
my_set

{(1, 2, 3, 4, 5)}

-----------

### Set Comprehension

In [55]:
a = {x for x in 'abracadabra' if x not in 'abc'}

In [56]:
a

{'d', 'r'}

In [57]:
strings = ['a', 'as', 'bat', 'car', 'dove', 'python']

In [58]:
unique_length = {len(x) for x in strings}

In [59]:
unique_length

{1, 2, 3, 4, 6}

In [60]:
set(map(len,strings))

{1, 2, 3, 4, 6}

In [2]:
import pandas as pd 
df = pd.read_csv('worldcities.csv')

In [3]:
df.head()

Unnamed: 0,City,Country
0,Rovaniemi,Finland
1,Steinkjer,Norway
2,Monterey,United States of America
3,Kuta,Indonesia
4,Lovec,Bulgaria


In [5]:
df.duplicated().sum()

72