# Sets and Dictinonaries

## Set

Set is an unordered collection of distinct items. The keywords here are 'unordered','collection' and 'distinct'. A set contains zero or more values and cannot hold a duplicate copy of any item/s.

### Set creation

**set** takes one argument. The argument should be a single *container* that can be looped over (list, string).

In [204]:
s1 = set([1,2,3,4,5])
s2 = set({1,2,3,4,5})
s3 = ({1,2,3,4,5})
s4 = {1,2,3,4,5}

#string
s5 = {'a','e', 'i', 'o', 'u'}
s7 = set("aeiou")
s8 = set(["aeiou"])

'''
error_sets = {["aeiou"]}
error_sets = set('a','e', 'i', 'o', 'u')

Notice the difference between using set() and {} !!!
'''

#empty set
s9 = {}
s10 = set()


sets = [eval("s"+str(i)) for i in range(1,11)]
print("{}".format(sets))

[{1, 2, 3, 4, 5}, {1, 2, 3, 4, 5}, {1, 2, 3, 4, 5}, {1, 2, 3, 4, 5}, {'a', 'o', 'u', 'e', 'i'}, {'a', 'o', 'u', 'e', 'i'}, {'a', 'o', 'u', 'e', 'i'}, {'aeiou'}, {}, set()]


In [205]:
pangram = "The quick brown fox jumps over the lazy dog"

print("{}".format(set(pangram)))

alphabets = {x.lower() for x in pangram}
print("{}".format(alphabets))

{'a', 'T', 'm', 'g', 'n', 'r', 's', 'l', 'd', 'p', 't', 'q', 'i', 'v', 'b', 'y', 'k', 'u', 'e', 'j', 'c', 'w', 'z', ' ', 'o', 'x', 'h', 'f'}
{'a', 'm', 'g', 'n', 'r', 's', 'l', 'd', 'p', 't', 'q', 'i', 'b', 'v', 'y', 'k', 'u', 'e', 'j', 'c', 'w', 'z', ' ', 'o', 'x', 'h', 'f'}


### Modify elements

In [154]:
real = set(range(10))
evens = set([x for x in real if x%2 == 0])
odds = set([x for x in real if x&1 == 1])


odds.clear()
odds = set([x for x in real if x&1 == 1])


odds.add(9) #adds 1 element
odds.update({8,9}) #updates with a set of elements
odds.remove(9)

print(odds)

{1, 3, 5, 7, 8}


In [182]:
def set_methods(set1, set2):
    avail_methods = dir(set1)
    underscore_methods = {method for method in avail_methods if re.search(r'(__)', method)}
    avail_methods = set(dir(set1)).difference(underscore_methods)
    for method in avail_methods:
        if method in ['difference', 'discard', 'issuperset']:
            print("{} ==> {}".format(method, eval(str(set1)+"."+str(method)+"("+str(set2)+")")))
        else:
            break

# some methods counld not be applied to both the sets            
set_methods(real, odds)

difference ==> {0, 9, 2, 4, 6}
discard ==> None


{0, 2, 4, 6, 9}

## Subsets

In [190]:
real.issuperset(odds), odds.issubset(real), real.isdisjoint(real)


(True, True, False)

### Set operations

In [192]:

real.difference(odds), real.symmetric_difference(odds)
"""
symmetric_difference:
"exclusive or" which returns the values that are in one set 
or another, but not in both.
"""
real.intersection(odds), real.union(odds)

{0, 1, 2, 3, 4, 5, 6, 7, 8, 9}

Set cannot be *negatiated*, i.e. There is no **not** operation in sets.

Mathermaticians often negatiate a set by invoking a set like {all integers except 1,2}

## Gotchas

 <html> 
 <font face="verdana" size = 2 color="#990099">
 When <b> sets </b> are created, python allocates a blob of memory to store refences to set elements. <i>Hash function</i> knows where to store each elements reference.
 
 <p> (Tuples) are immutable, therfore less resourse on hash. Sets can be created from Tuples and can use .add() method to mutate. Cannot look up partial entries. </p> 
 
 <p> [Lists] are mutable, and therfore is memory expensive.</p>
 
 
 </font> </html>

## Dictonaries

**Dict** are hash tables with *key* as index.

Dict supports three forms of initialization. Its constructor can be called with a sequence of items, a dictionary containing keys and counts, or using keyword arguments mapping string names to counts.

In [None]:
eg_dict = dict(one=[1], two=[2,2], three=[3,3,3])

eg_dict1 = {'one':[1],
            'two':[2,2],
            'three':[3,3,3]
           }

eg_dict2 = {}
eg_dict2['one'] = [1]
eg_dict2['two'] = [2,2]
eg_dict2['three'] = [3,3,3]

In [None]:
d = {"a":[1,2,3],
     "b":[4,5,6],
     "c":[4,6,8]
    }

#### Common operations

In [None]:
len(d), d['a'], d.keys(), d.values(), d.items()

#### Verifying the key

In [None]:
#Does the dictonary "d" have a key 'a'
'a' in d, 'd' in d, 'd' not in d

In [None]:
eg_dict1.update(d)
eg_dict1

#### Accessing the keys / values

In [None]:
d['a'], d['a'][1:]

In [None]:
"""
.get('key', 'vlaues')
Return the previously set 'values' of the 'key' if 'key' is present
Set a default value if the key is not found in the existing dict
    Note: this does not update an existing dict
"""

d.get('a', 'int, float, text, list'), d.get('dt', ['Key not found','int|float|text|list'])

#### Modifying the dict

In [5]:
# d.update({'dt': ['foo, bar'], 'del_this':['delete','these','values']})

# del(d['del_this']) #deletes key and values
# d['del_this'] = ['delete','these','values'] # adds key and values

# d

char = 'blah blah blah \n'
char.strip(" ")
char.split()

['blah', 'blah', 'blah']

In [None]:
keys = d.keys()
keys_sorted = sorted(keys)

for key in keys:
    print(key)

print("*****")

for key,value in sorted(d.items()):
    print("{}--> {}".format(key, value))

### Default dict

A dict will raise a KeyError if the input key is absent


Note: defaultdict is very very differnet from $\texttt{.get()}$ method. $\texttt{.get}$ compares against existing dictonary and returns a default value if the expected/input key is not present 

In [None]:
a_dict = {"a":[1,2,3],
          "b":[4,5,6],
          "c":[4,6,8],
          "b":[5,6,7]
         }
#duplicate keys are overwritten

In [None]:
from collections import defaultdict

In [None]:
a_def_dict = defaultdict(lambda: 'text', key="some_value")

for k,v in a_dict.items():
    a_def_dict[k] = v

a_def_dict, a_def_dict['b'], a_def_dict['x']

In [None]:
a_def_dict = defaultdict(list) #initialization can be 'int | float'
a_def_dict = defaultdict(lambda: defaultdict(lambda: list))

for k,v in a_dict.items():
    a_def_dict[k] = v

a_def_dict, a_def_dict['b'], a_def_dict['x']

## Iterating a *Dictonary*

In [None]:
for k,v in a_def_dict.items():
    print("{}->{}".format(k,v))

#onlinener
[print("\n{}->{}".format(k,v)) for k,v in a_def_dict.items()]

## Inverting a dictionary

In [None]:
m = {'a': 1, 'b': 2, 'c': 3, 'd': 4}

m_inv = dict(zip(m.values(), m.keys()))
m_inv

In [None]:
m = {x: x ** 2 for x in range(5)}
m = {x: 'A' + str(x) for x in range(10)}
print(m)
{v: k for k, v in m.items()}

## Trees

In [None]:
import json
tree = lambda: defaultdict(tree)
root = tree()
root['menu']['id'] = 'file'
root['menu']['value'] = 'File'
root['menu']['menuitems']['new']['value'] = 'New'
root['menu']['menuitems']['new']['onclick'] = 'new();'
root['menu']['menuitems']['open']['value'] = 'Open'
root['menu']['menuitems']['open']['onclick'] = 'open();'
root['menu']['menuitems']['close']['value'] = 'Close'
root['menu']['menuitems']['close']['onclick'] = 'close();'

print(json.dumps(root, sort_keys=True, indent=4, separators=(',', ': ')))

### collections.Counter

**Counter** keeps track of how many times equivalent values are added.

Counter supports three forms of initialization. Its constructor can be called with a:
    
    * sequence of items, 
    * dictionary containing keys and counts,
    * using keyword arguments mapping string names to counts.

In [None]:
from collections import Counter

In [None]:
print(Counter(['a', 'b', 'c', 'a', 'b', 'b']))
print(Counter({'a':2, 'b':3, 'c':1}))
print(Counter(a=2, b=3, c=1))

In [178]:
import random 

def Rand7():
  while True:
    x = (Rand5() - 1) * 5 + (Rand5() - 1)
    if x < 21: return x/3 + 1

7

In [181]:
rand7=lambda:eval("+rand5()"*7)%7+1