
# Session 4
<p>In this session we will learn about how to work with files and dictionaries in python.These modules have been developed at the University of Toronto <i>(Jennifer Campbell and Michelle Craig)</i> and the university of Delft <i>(Mark Bakker)</i>. They have beed adapted to the needs of students at the University of Granada</p>


---
<h2 id="Dictionaries">Section 1: Dictionaries<a class="anchor-link" href="#Dictionaries">¶</a></h2><h2 id="Motivation">Motivation<a class="anchor-link" href="#Motivation">¶</a></h2><p>Suppose we need to represent years and the total North American fossil fuel 
CO2 emissions for those years.</p>
<p>Question: How should we do this?</p>
<ul>
<li><p>One option is to use <em>parallel lists</em>, in which the <code>years</code> list at position <code>i</code> corresponds to the <code>emissions</code> list at position <code>i</code>:</p>
<p>years = [1799, 1800, 1801, 1802, 1902, 2002] # metric tons of carbon, thousands
  emissions = [1, 70, 74, 79, 82, 1733297]</p>
<p>Question: How would operations on the data work?  For example:</p>
<pre><code> (a) to add an entry, such as year `1950` and emissions `734914`?

 We need to modify both lists.  
 We could append or keep both lists sorted (then must find the right spot
 and insert there).
 Either way, both lists must be kept in sync.

 (b) to edit the emissions value for a particular year?

 We need to find the year in the years lists and modify the 
 corresponding item in the emissions list.</code></pre>
</li>
</ul>
<pre><code>In general, storing the vlaues in this format is not terribly convenient.

Notice that the lists don't explicitly represent the associations like (1799, 1).

</code></pre>
<ul>
<li><p>A second option is to use <em>a list of lists</em>.  For example,</p>
<p>years_emissions = [[1799, 1], [1800, 70], [1801, 74], [1802, 79], [1902, 82], [2002, 1733297]]</p>
<p>Better, but still hard to look up a year, because we must search the list to find it.</p>
</li>
</ul>
<p>There is a better way: a new type of object called a <em>dictionary</em>, which is represented by Python's type <code>dict</code>.</p>



<h2 id="Dictionary-basics">Dictionary basics<a class="anchor-link" href="#Dictionary-basics">¶</a></h2><p>A dictionary keeps track of associations for you. Let's consider the emissions example:</p>


In [None]:
# Braces indicate that you are defining a dictionary.
emissions_by_year = {1799: 1, 1800: 70, 1801: 74, 1802: 79, 1902: 82, 2002: 1733297}        

# Look up the emissions for the given year
print(emissions_by_year[1801])

# Add another year to the dictionary
emissions_by_year[1950] = 734914
print(emissions_by_year[1950])


<p>Dictionary entries have two parts: a <em>key</em> and a <em>value</em>.  In our example, the key is the year and the value is the CO2 emissions.</p>
<p>Why is it called a key?
Like a physical (or metaphorical) key, it provides a means of gaining access 
to something.</p>
<p>Keys don't have to be numbers, but they do have to be immutable objects.</p>


In [None]:
d = {1: 5, 3: 45, 4: 10}
d["abc"] = "Hello!"
d[ [1, 2, 3] ] = 77        # error; the list [1, 2, 3] cannot be a key because it is mutable.


<p>And the associated values can be anything:  any type, and mutable or not.</p>


In [None]:
d = {}
d[5] = ("Diane", "978-6024", "BA", 4236)
d["weird"] = ["my", "you", "walrus"]
d["nested"] = {"diane": 4236, "paul": 4234}  # The values can even be dictionaries.
print(d)


<p>Dictionaries themselves are mutable.</p>


In [None]:
print(id(d))
d["me"] = "you"  # Does NOT create a new dict.  It changes this one.
print(id(d))


<h2 id="Dictionary-operations">Dictionary operations<a class="anchor-link" href="#Dictionary-operations">¶</a></h2>


In [None]:
print(emissions_by_year)
        
# extend (add a new key and its value)
emissions_by_year[2009] = 1000000   # Wishful thinking
        
# update (change the value associated with a key)
emissions_by_year[2009] = 10        # Old value is tossed out
print(emissions_by_year)            # Reports most recent values
        
# check for membership
1950 in emissions_by_year           # A dict operator (not a function
                                    # or method).  This one is binary.

In [None]:
# remove a key-value pair
del emissions_by_year[1950]         # A unary dict operator.
1950 in emissions_by_year           # This is now false

In [None]:
# determine length (number of key-value pairs)
len(emissions_by_year)

In [None]:
# Iterating over the dictionary
for key in emissions_by_year:
    print(key)


<p>Why did the keys come out in an unexpected order??</p>
<p>Dictionaries are unordered.<br/>
 The order that the keys are traversed (when you loop through) is arbitrary: 
 there is no guarantee that it will be in the order that they were added.</p>
<p>Silly analogy: A dict is like a filing assistant who is very efficient
 but keeps everything in a secret room.  You have no idea how he organizes
 things, and you don't care -- as long as he can pull the file you need
 when you give him the key.</p>



<h2 id="Dictionary-methods">Dictionary methods<a class="anchor-link" href="#Dictionary-methods">¶</a></h2>


In [None]:
emissions_by_year.keys()

In [None]:
emissions_by_year.values()


<p>Method <code>items</code> produces the (key, value) pairs</p>


In [None]:
emissions_by_year.items()


<p>To work with the data returned by the methods described above, we typically convert it to type list.  For example:</p>


In [None]:
years = list(emissions_by_year.keys())
print(years)


<h3 id="Practice-Exercise:-working-with-dictionaries">Practice Exercise: working with dictionaries<a class="anchor-link" href="#Practice-Exercise:-working-with-dictionaries">¶</a></h3>



<ol>
<li>Create a variable <code>doctor_to_patients</code> that refers to an empty dictionary.</li>
<li>Add an entry for <code>'Dr. Ngo'</code> with <code>1200</code> patients.</li>
<li>Add another entry for <code>'Dr. Singh'</code> with <code>1400</code> patients.</li>
<li>Add a third entry for <code>'Dr. Gray'</code> with <code>1350</code> patinets.</li>
<li>Print the number of patients associated with <code>'Dr. Singh'</code>.</li>
<li>Change the number of patients associated with <code>'Dr. Singh'</code> to <code>1401</code>.</li>
<li>Write an expression to get the number of key-value pairs in the dictionary.</li>
<li>Write an expression to get the doctors.</li>
<li>Write an expression to get the patient quantities.</li>
<li>Write an expression to check whether <code>'Dr. Koch'</code> is a key in the dictionary.</li>
<li>Remove the key-value pair with <code>'Dr. Ngo'</code> as the key.   </li>
</ol>



<h2 id="Iterating-through-a-dictionary">Iterating through a dictionary<a class="anchor-link" href="#Iterating-through-a-dictionary">¶</a></h2>


In [None]:
phone = {'555-7632': 'Paul', '555-9832': 'Andrew', '555-6677': 'Dan', 
         '555-9823': 'Michael', '555-6342' : 'Cathy', '555-7343' : 'Diane'}


<p>(a) Going through the keys</p>


In [None]:
# The proper way:
for key in phone:
    print(key)

# This is equivalent, but not considered good style:
#for key in phone.keys():
#    print(key)


<p>(b) Going through the key-value pairs:</p>


In [None]:
# This gives you a series of tuples.
for item in phone.items():
    print(item)

In [None]:
# You can pull the pieces of the tuple out as you go:
for (number, name) in phone.items():
    print("Name:", name, "; Phone Number:", number)


<h3 id="Practice-Exercise:-looping-over-dictionaries">Practice Exercise: looping over dictionaries<a class="anchor-link" href="#Practice-Exercise:-looping-over-dictionaries">¶</a></h3>



<p>The following dictionary has brand name drugs as keys and generic names as values:</p>
<pre><code>branch_to_generic = {'lipitor': 'atorvastatin',
                     'zithromax': 'azithromycin',
                     'amoxcil': 'amoxicillin',
                     'singulair': 'montelukast',
                     'nexium': 'esomeprazole',
                     'plavix': 'clopidogrel',
                     'abilify': 'ARIPiprazole'}

</code></pre>
<p>Using the dictionary above and for loops, complete the following tasks:</p>
<ol>
<li>Get a list of brand name drugs that start with the letter <code>'a'</code>.</li>
<li>Count the number of generic drugs that end with the letter <code>'n'</code>.</li>
<li>Get a list of brand name drugs in alphabetical order.
(Hint: this can be solved both with or without a for loop.  Once you
have solved it one way, try to solve it using a different approach.)</li>
</ol>



<h2 id="Inverting-a-dictionary">Inverting a dictionary<a class="anchor-link" href="#Inverting-a-dictionary">¶</a></h2><p>Here's a dictionary mapping phone numbers to names.<br/>
 Some people have more than one phone number, of course.</p>


In [None]:
phone_to_person = {'555-7632': 'Paul', '555-9832': 'Andrew', '555-6677': 'Dan', 
                   '555-9823': 'Michael', '555-6342' : 'Cathy', 
                   '555-2222': 'Michael', '555-7343' : 'Diane'}


<p>Suppose we want to create a list of all of Michael's phone numbers:</p>


In [None]:
# Method 1
michael = []
for key in phone_to_person:
    if phone_to_person[key] == 'Michael':
        michael.append(key)
print(michael)


<p>But what if I want to be able to do this for all people? 
 Question: is there some object you could create to make this easy?
 Answer: A dictionary!</p>
<ul>
<li>The original dictionary takes us from numbers to names.</li>
<li>The new dictionary will take us in the reverse direction, from names to numbers</li>
</ul>


In [None]:
new_phone = {}
for (number, name) in phone_to_person.items():
    if name in new_phone:
        new_phone[name].append(number)
    else:
        new_phone[name] = [number]
new_phone


<p>We call this an <em>inverted</em> dictionary.</p>



<h2 id="Reading-from-and-Writing-to-Files">Section 2: Reading from and Writing to Files<a class="anchor-link" href="#Reading-from-and-Writing-to-Files">¶</a></h2><h2 id="Introduction">Introduction<a class="anchor-link" href="#Introduction">¶</a></h2><p>So far, the data in our programs has either been hardcoded into the program itself or it came from the user who typed it in using the keyboard. This is pretty limiting and we will want programs that can read data from files.</p>
<p>In this lesson, we'll work with text files. Text files are files that use one of a number of
standard encoding schemes where the file can be interpretted as printable characters. Later, you might learn about
binary files, where the file contents are not viewable as characters, but we'll start with text files for now.</p>
<h2 id="Opening-a-File">Opening a File<a class="anchor-link" href="#Opening-a-File">¶</a></h2><p>To open a file, we need to specify the name of the file using a string.</p>
<p>We can use variable to represent the name and could:</p>
<ul>
<li>Set it to a string literal, if the program is always going to use the same filename.</li>
<li>Set it to a filename entered by the user using <code>input()</code>.</li>
</ul>
<p>Next, we use the command <code>open</code> and the name of the file:</p>


In [None]:
filename = 'story.txt'
file = open(filename, 'r')
file


<p>This opens the file named <code>story.txt</code> from the current directory. It is open for <em>reading</em> (that's the <code>r</code> mode) and the type of object is <code>io.TextIOWrapper</code>, but just think of it as an open file. The important conceptual idea here is that this object not only knows the contents of the file, but it knows our program's <em>current position</em> in the file. So once our program starts reading, it knows how <strong>much</strong> we've read and is able to keep giving us the next piece.</p>



<h2 id="Reading-from-a-File">Reading from a File<a class="anchor-link" href="#Reading-from-a-File">¶</a></h2><p>There are several other ways to read from a file.  In the following examples, the contents of <code>story.txt</code> are:</p>
<pre><code>Mary had a little lamb

His fleece was white as snow
And everywhere that Mary went
The lamb was sure to go

</code></pre>
<p>1) Read a single line</p>


In [None]:
myfile = open('story.txt', 'r')
s = myfile.readline()   # Read a line into s.
print(s)
s                       # Notice the \n that you only see when you look
                        # at the contents of the variable.


<p>The <code>\n</code> (backslash n) character is a single character representing a new line.</p>


In [None]:
s = myfile.readline()   # The next call continues where we left off.
 print(s)    
 s = myfile.readline()   # And so on...
 print(s)   
 myfile.close()


<p>Notices that after line <code>His fleece was white as snow</code>, there is a blank line.  That is because the second line read contained only whitespace.</p>
<p>We can use this approach to read an entire file, bit by bit, under our control.</p>



<p>2) Read a certain number of characters</p>


In [None]:
filename = 'story.txt'
myfile = open(filename)
s = myfile.read(10)   # Read 10 characters into s.
print(s)
s = myfile.read(10)   # Read the next 10 characters into s.
print(s)
myfile.close()


<p>We can also use this approach to read an entire file, bit by bit, under our control.</p>



<p>3) Read one line at a time from beginning to end.</p>
<p>If we know we want to read line by line through to the end of the file, a <code>for</code> loop makes this easy. This is probably the most common way to read a file. Use this approach unless you have a reason not to.</p>


In [None]:
f = open('story.txt')
for line in f:
    print(line)     # Or do whatever you wish to line

myfile.close()     # Good habit: close a file when you are done with it.


<p>Question: Why is the output from the for loop double-spaced?
Answer: <code>print</code> appends a <code>\n</code> to the string and there is also a <code>\n</code> at the end of each line.</p>
<p>Question: How can you single space the output?
Answer: Strip the newline character from the end of each line before you print.</p>


In [None]:
f = open('story.txt')
for line in f:
    line = line.strip('\n')
    print(line)


<p>4) Read the entire file contents into a single string.</p>


In [None]:
filename = "story.txt"
myfile = open(filename)
s = myfile.read()  # Read the whole file and return it as a string.
print(s)
myfile.close()

In [None]:
s


<p>(5) Use <code>readlines()</code> to read the file into a <code>list</code> of lines.</p>


In [None]:
myfile = open('story.txt')
contents = myfile.readlines() 
type(contents)
contents


<p>Beginners often do one of these last two approaches because they seem easy.</p>
<ul>
<li>Question: What is the downside of reading it all in at once?</li>
<li>Answer: It can potentially take a lot of space!</li>
</ul>
<p>Don't use this technique unless you really need access to the whole file at once.</p>
<p>Usually, we can read a piece, deal with it, and toss it out.</p>



<h2 id="Dealing-with-the-end-of-a-file">Dealing with the end of a file<a class="anchor-link" href="#Dealing-with-the-end-of-a-file">¶</a></h2><p>With the <code>for</code> loop approach, the loop automatically stops when the end of the file is encountered.  Or never even iterates once if the file is empty!</p>
<p>But what happens if you are at the end of the file when you call <code>read</code> or <code>readline</code>?<br/>
You get the empty string.  You then know you can stop trying to read more.</p>
<h3 id="Example">Example<a class="anchor-link" href="#Example">¶</a></h3>


In [None]:
# Detecting the end of the file while reading line by line
myfile = open('story.txt')
next_line = myfile.readline()
while next_line != "":
    print(next_line)
    next_line = myfile.readline()


<h3 id="Practice-Exercise:-reading-a-file">Practice Exercise: reading a file<a class="anchor-link" href="#Practice-Exercise:-reading-a-file">¶</a></h3><p>The file <code>january06.txt</code> contains data from the UTM weather station for January 2006. Download it from the C4M website
to your local machine and put it in the same directory as where Wing is storing your programs. Figuring out where
to store the files or how to specify the paths to your file is half the battle!</p>
<ol>
<li><p>Open it up to see what it looks like.</p>
</li>
<li><p>Write a Python program to open the file and read only the first line</p>
</li>
<li><p>Read the second line (this is still a header)</p>
</li>
<li><p>Read the third line into a variable <code>line</code>.</p>
</li>
<li><p>What is the type of the value that <code>line</code> refers to?</p>
</li>
<li><p>Call the method <code>split()</code> on variable <code>line</code> and save the return value. What is the type that is returned by this method call?</p>
</li>
<li><p>Look up the method <code>split()</code> in the Python 3 documentation.</p>
</li>
</ol>



<h3 id="Practice-Exercise:-getting-information-from-a-file">Practice Exercise: getting information from a file<a class="anchor-link" href="#Practice-Exercise:-getting-information-from-a-file">¶</a></h3><p>Write a program that:</p>
<ol>
<li>opens the file january06.txt</li>
<li>reads in the header and ignores it</li>
<li>uses a loop to read in all the rest of the lines one by one</li>
<li>prints out only the day and the temperature from each line</li>
</ol>



<h3 id="Practice-Exercise:-find-coldest-day-and-time">Practice Exercise: find coldest day and time<a class="anchor-link" href="#Practice-Exercise:-find-coldest-day-and-time">¶</a></h3><p>Now, write a program to find the day and time of the coldest reading in the file and then print that information.</p>
<p>Hint: Be careful. You must convert the values to integers before you compare them. The string '11' &lt; '2'  but 11 &gt; 2.</p>



<h2 id="Writing-to-a-file">Writing to a file<a class="anchor-link" href="#Writing-to-a-file">¶</a></h2>



<p>In addition to opening a file for reading using <code>'r</code>', we'll explore two other modes: <code>'w'</code> and <code>'a'</code>.  Both of those modes are used to write to a file.</p>
<p>Let's start with opening a file using mode <code>'w'</code>.  First, if the file does not exist, it is created:</p>


In [None]:
new_file = open('example.txt', 'w')


<p>Next, we use the write method to write the contents and then we close the file:</p>


In [None]:
new_file.write('This is the first line.\n')
new_file.write('And the second\nand third.')
new_file.close()


<p>We can then read and print the file contents:</p>


In [None]:
new_file = open('example.txt', 'r')
print(new_file.read())
new_file.close()


<p>Now, let's open the file using mode <code>'a'</code>, which stands for append:</p>


In [None]:
new_file = open('example.txt', 'a')
new_file.write('\nAdding another line!')  # Notice the \n character.
new_file.close()

# Next, read and print the file contents again.
new_file = open('example.txt', 'r')
print(new_file.read())
new_file.close()


<p><strong>Warning:</strong> if the file exists already, when it is opened using mode <code>'w'</code>, its contents will be deleted.  This is different from mode <code>'a'</code>, which keeps the existing content and writes any new lines to the end of the file.</p>
<p>Let's open <code>'example.txt'</code> using mode <code>'w'</code> to see how the file changes:</p>


In [None]:
new_file = open('example.txt', 'w')       # The file is opened and its contents are cleared.
new_file.write('Adding some new content') # This will be the one and only line in the file.
new_file.close()

# Next, read and print the file contents again.  
new_file = open('example.txt', 'r')
print(new_file.read())
new_file.close()


<h3 id="Practice-Exercise:-writing-to-a-file">Practice Exercise: writing to a file<a class="anchor-link" href="#Practice-Exercise:-writing-to-a-file">¶</a></h3>



<ol>
<li>Write your name and address to a file named <code>contact.txt</code>.  Once you have executed your program, open <code>contact.txt</code> to verify that its contents are what you expect.</li>
<li>Now, write a program to add your phone number to that file, using <code>open</code>'s append mode.  Again, open the file and check its contents.</li>
</ol>
