<h2>Dictionary and Set</h2>

In this module, we discuss Dictionary, Set, and serializing objects.

<h3>Dictionary</h3>

Is another type of collections in Python (recall, the other two we have discussed are list and tuple). Dictionaries store elements that are called <b>key-value</b> pairs. A key-value pair is similar to a word and its meaning that you can look up in a real dictionary. With Python dictionaries, you can use a key to locate a specific value.

Dictionaries are created using the curly brackets <b>{}</b>, the syntax is as

<b>
    dictionary_name = { <br>
        &emsp;key_1 : value_1, <br>
        &emsp;key_2 : value_2, <br>
        &emsp;#... <br>
    }
</b>

- the ":" between each key and value pair is mandatory
- the "," after each key-value pair is mandatory (except the last pair, similar to items in a list)
- keys must be immutable, but values can be pretty much anything
- you can have as many key-value pairs as you want
- new lines are not necessary, but are helpful if you have many items in your dictionary

To obtain the value of a key, we use the syntax

<b>dictionary_name[key]</b>

so, similar to a list, but the index is a key.

Some example:

- Create a new dictionary

In [2]:
state_capitals={
    'New York': 'Albany',
    'New Jersey': 'Trenton',
    'Georgia' : 'Atlanta',
    'Texas' : 'Austin'
}

In [11]:
print(state_capitals)

{'New York': 'Albany', 'New Jersey': 'Trenton', 'Georgia': 'Atlanta', 'Texas': 'Austin'}


- then access values by "looking up" key

In [4]:
state_capitals['New York']

'Albany'

In [5]:
state_capitals['Georgia']

'Atlanta'

We can also assign values of keys to variables, for example

In [6]:
ny_cap = state_capitals['New York']
print(ny_cap)

Albany


In [8]:
ga_cap = state_capitals['Georgia']
print(ga_cap)

Atlanta


You will get errors when trying to obtain values of a key that is not in the dictionary

In [10]:
state_capitals['Florida']

KeyError: 'Florida'

We can check if a <b>key</b> is in the dictionary with the operator <b>in</b> and <b>not in</b>. The expression returns True if the key is in the dictionary, False otherwise

In [21]:
state_capitals={
    'New York': 'Albany',
    'New Jersey': 'Trenton',
    'Georgia' : 'Atlanta',
    'Texas' : 'Austin',
    'Washington' : 'Olympia'
}

In [22]:
'New York' in state_capitals

True

In [23]:
'New Jersey' in state_capitals

True

In [24]:
'Florida' in state_capitals

False

In [26]:
'Florida' not in state_capitals

True

The <b>in</b> operator <b>only</b> check for keys, not values

In [25]:
'Albany' in state_capitals

False

In [27]:
'Atlanta' in state_capitals

False

We add new key-value pairs to a dictionary by

<b>dictionary_name[new_key] = new_value</b>

For example

In [28]:
state_capitals['Florida'] = 'Tallahassee'
state_capitals['Alabama'] = 'Montgomery'

In [29]:
print(state_capitals)

{'New York': 'Albany', 'New Jersey': 'Trenton', 'Georgia': 'Atlanta', 'Texas': 'Austin', 'Washington': 'Olympia', 'Florida': 'Tallahassee', 'Alabama': 'Montgomery'}


In [30]:
state_capitals['Florida'] 

'Tallahassee'

In [31]:
'Florida' in state_capitals

True

If the key is already in the dictionary, its value <b>will be changed</b>

In [32]:
state_capitals['New York'] = "I don't know!"

In [33]:
state_capitals['New York']

"I don't know!"

We can use the <b>del</b> operator to remove a key-value pair in the dictionary

<b>del dictionary_name[key]</b>

In [34]:
del state_capitals['Alabama']

In [35]:
'Alabama' in state_capitals

False

You will get errors when trying to remove a key that is not in the dictionary

In [36]:
del state_capitals['Alaska']

KeyError: 'Alaska'

We can use the len() function to get the number of key-value pairs in the dictionary

In [37]:
len(state_capitals)

6

Note that a key must be an <b>immutable</b> object, while a value can be pretty much anything (numbers, strings, objects, functions...), for example

In [38]:
some_dictionary = {
    [1,2]:[1,2,3]   #recall, a list is mutable, so we get error
} 

TypeError: unhashable type: 'list'

In [40]:
some_dictionary = {
    'string' : 'this is a string',       #key is a string, value is a string
    'number' : 123456,                   #key is a string, value is a number 
    100 : 'the number 100',              #key is a number, value is a string
    'function' : print,                  #key is a string, value is a function
    'some list' : [1,2,3,5,6],           #key is a string, value is a list
    (1,2,3) : [1,2,6,3,7]                #key is a tuple (immutable), value is a list
} 

In [13]:
some_dictionary['string']

'this is a string'

In [14]:
some_dictionary[100]

'the number 100'

In [15]:
some_dictionary['function']

<function print>

In [20]:
some_dictionary['some list']

[1, 2, 3, 5, 6]

We can create an empty dictionary with

<b>dictionary_name = {}</b>

or

<b>dictionary_name = dict()</b>

And add key-value pairs as the program goes

For example, we have a list of employee names, and a list of their salary. We can use a for loop to add the names (as keys) and salaries (as values) to the dictionary

In [41]:
names = ['Alice', 'Bob', 'Carol', 'Daniel']
salaries = [100000, 89000, 90000, 102000]

#empty dictionary
salary_dict = {}

#loop with index to access both lists
for i in range(len(names)):
    salary_dict[names[i]] = salaries[i] #names[i] returns the current name, and salaries[i] the current salary

In [42]:
salary_dict

{'Alice': 100000, 'Bob': 89000, 'Carol': 90000, 'Daniel': 102000}

In [43]:
salary_dict['Bob']

89000

We can also iterate through a dictionary with a for loop. In that case, the counter variable (the one in the for loop) will be the <b>keys</b> in the dictionary. For example, to give everyone a 4% raise in salary:

In [46]:
for name in salary_dict:                     #the keys in this dictionary represent names
    print(name)                              #so the counter variable "name" will be the current name in the dictionary
    salary_dict[name] *= 1.04                

Alice
Bob
Carol
Daniel


In [45]:
salary_dict

{'Alice': 104000.0, 'Bob': 92560.0, 'Carol': 93600.0, 'Daniel': 106080.0}

In [48]:
#we can use a loop to print all pairs nicely
for name in salary_dict:
    print('The salary of %s is $%.2f' % (name, salary_dict[name]))

The salary of Alice is $108160.00
The salary of Bob is $96262.40
The salary of Carol is $97344.00
The salary of Daniel is $110323.20


<h4>Dictionary's Method</h4>

- clear() - clears the contents of the dictionary
- get() - gets the value associated with a specified key
- items() - returns all the keys in a dictionary and their associated values as a sequence of tuples
- keys() - returns all the keys in a dictionary as a sequence of tuples
- pop() - returns the value associated with a specified key and removes that key-value pair from the dictionary
- popitem() - returns a randomly selected key-value pair as a tuple from the dictionary and removes that key-value pair from the dictionary
- values() - returns all the values in the dictionary as a sequence of tuples

In [49]:
salary_dict.get('Alice')

108160.0

In [50]:
salary_dict.items()

dict_items([('Alice', 108160.0), ('Bob', 96262.40000000001), ('Carol', 97344.0), ('Daniel', 110323.2)])

In [51]:
salary_dict.keys()

dict_keys(['Alice', 'Bob', 'Carol', 'Daniel'])

In [52]:
salary_dict.values()

dict_values([108160.0, 96262.40000000001, 97344.0, 110323.2])

In [53]:
#after pop(), we get the salary of 'Alice'
salary_dict.pop('Alice')

108160.0

In [54]:
#but 'Alice' is removed from the dictionary
salary_dict

{'Bob': 96262.40000000001, 'Carol': 97344.0, 'Daniel': 110323.2}

In [55]:
#popitem() get a key-value pair randomly
salary_dict.popitem()

('Daniel', 110323.2)

In [56]:
#and remove it from the dictionary
salary_dict

{'Bob': 96262.40000000001, 'Carol': 97344.0}

In [57]:
#finally, clear() remove everything

In [58]:
salary_dict.clear()
salary_dict

{}

<h3>Set</h3>

Set is a collection type that stores <b>unique items</b> 

We use the set() built-in function to create an empty set

<b>set_name = set()</b>

Then, we can add items with the add() method.

For example:

In [59]:
a_set = set()

a_set.add(10)
a_set.add(100)
a_set.add(1000)
a_set.add(500)
a_set.add(10)

Notice how we add 10 to the set twice, but it appears only once. Again, items in set are unique, so duplications are removed

In [60]:
print(a_set) 

{1000, 10, 100, 500}


Similar to other collection types, we can add different data types to a set

In [62]:
a_set.add('a string')
print(a_set)

{100, 1000, 10, 500, 'a string'}


We can quickly create a set with items by providing a list to the set() function

In [63]:
another_set = set(['Alice','Bob','Emma','Jack'])
print(another_set)

{'Emma', 'Jack', 'Alice', 'Bob'}


A quick way to get the unique items from a list is to convert it to set, then convert back to list, for example:

In [65]:
some_list = ['Alice','Bob','Alice','Carol','Bob','Emma','Bob','Daniel']

temp_set = set(some_list)

some_list_unique = list(temp_set)

print(some_list_unique)

['Bob', 'Carol', 'Daniel', 'Emma', 'Alice']


We can remove item from sets with remove() method. Note that this method makes direct changes to the set

In [66]:
name_set = set(['Bob', 'Carol', 'Daniel', 'Emma', 'Alice'])

name_set.remove('Alice')
name_set.remove('Bob')

print(name_set)

{'Carol', 'Daniel', 'Emma'}


Similar to other collection types, we can iterate a set with for loop

In [67]:
for item in name_set:
    print(item)

Carol
Daniel
Emma


And checking if items are in the set with <b>in</b> and <b>not in</b>

In [68]:
print('Alice' in name_set)
print('Carol' in name_set)

False
True


<h4>Other Methods with Sets</h4>

- union(): get the union of two sets <br>
     set1.union(set2)

- intersection(): get the intersection of two sets <br>
     set1.intersection(set2)

- difference(): get the difference of two sets. (elements that appear in set 1 but do not appear in set 2 <br>
     set1.difference(set2)

- symmetric_difference(): get the symmetric difference of two sets (elements that are not shared by the sets)<br>
     set1.symmetric_difference(set2)

In [70]:
set1 = set(['Alice','Bob','Carol','Daniel'])
set2 = set(['Alice','Jake','Emma','Bob'])

In [71]:
set1.union(set2)

{'Alice', 'Bob', 'Carol', 'Daniel', 'Emma', 'Jake'}

In [72]:
set1.intersection(set2)

{'Alice', 'Bob'}

In [73]:
set1.difference(set2)

{'Carol', 'Daniel'}

In [74]:
set1.symmetric_difference(set2)

{'Carol', 'Daniel', 'Emma', 'Jake'}

<h3>Serializing Objects</h3>

Remember how we write/read files to save and load objects? Another way is to use the pickle library for this. It is a bit easier since you don't have to manually convert attribute to strings when saving, and convert them back to objects when loading.

For example, let's have a simple class

In [75]:
class Student:
    
    def __init__(self,sid,name):
        self.__sid = sid
        self.__name = name
        
    #you can add setters/getters if you want, but they are not necessary in this case
    #since we are just demonstrating the use of pickle
    
    def __str__(self):
        return self.__sid + ', ' + self.__name

Then create some objects and put them into a list

In [76]:
alice = Student('0001','Alice')
bob = Student('0002','Bob')
carol = Student('0003','Carol')
daniel = Student('0004','Daniel')

student_list = [alice, bob, carol, daniel]

Now, import pickle and save the object lists. We use the dump() function to do that; you need to provide the object to save, and the file name. The step:

1. Open a file using 'wb' mode (write in binary) - the extension can be anything, but should be easy to understand
2. Call pickle.dump(object, file)
3. Close the file

In [77]:
import pickle

#open file as 'wb'
out_file = open('student_list.obj', 'wb')

#write
pickle.dump(student_list, out_file)

#close file
out_file.close()

After running the previous cell, you should see a student_list.obj file in the same directory of your program. To load, we can use pickle.load(). Note that you still need to open and close the file

In [80]:
from_file = open('student_list.obj', 'rb')

loaded_list = pickle.load(from_file)

from_file.close()

In [82]:
#now, the loaded_list will contains all four students we created previously
loaded_list

[<__main__.Student at 0x24a02533d08>,
 <__main__.Student at 0x24a026bd348>,
 <__main__.Student at 0x24a02499648>,
 <__main__.Student at 0x24a02533988>]

In [83]:
for student in loaded_list:
    print(student)

0001, Alice
0002, Bob
0003, Carol
0004, Daniel
