Contents
---
- [File Input](#input)
- [Delimiters](#delimiters)
- [CSV files](#csv)
- [Files in other locations](#locations)
- [File Output](#output)


File Input
---
<a class="anchor" id="input"></a>
So far, we have just taken in short words and sentences from the user. However, in reality we will want to examine much larger sets of text, including word documents, spreadsheets, and web pages.  In order to do this, we need to learn how to tell Python to look through a file.

If your Python program is in the same folder that the text file that you would like to read is in, then we just need the open command. Suppose we want to read the lyrics in a file saved as "kanye.txt":

If we try the following code, we will get an error: 

In [7]:
f = open('kanye.txt')
print(f)

TypeError: '_io.TextIOWrapper' object is not subscriptable

Instead, we'll need to use the .read command. Remember to always put your file name in quotations!

In [8]:
f = open('kanye.txt').read()
print(f)

Oh, when it all, it all falls down
I'm telling you, oh, it all falls down


Oh, when it all, it all falls down
I'm telling you, oh, it all falls down


Man, I promise, she's so self conscious
She has no idea what she's doing in college
That major that she majored in don't make no money
But she won't drop out, her parents will look at her funny


Now, tell me that ain't insecure
The concept of school seems so secure
Sophomore, three years, ain't picked a career
She like, screw it, I'll just stay down here and do hair


'Cause that's enough money to buy her a few pairs of new airs
'Cause her baby daddy don't really care
She's so precious with the peer pressure
Couldn't afford a car so she named her daughter Alexus


She had hair so long that it looked like weave
Then she cut it all off now she look like Eve
And she be dealing with some issues that you can't believe
Single black female, addicted to retail and well


Oh, when it all, it all falls down
I'm telling you oh, it all falls down


If we wanted to read the first 10 lines of the song, we might try typing:

In [12]:
f = open('kanye.txt').read()
print(f[0:10])

Oh, when i


Uh, oh, this gives us the first 10 characters, not the first 10 lines. Read is a useful function for manipulating files, in cases when you want to process the entire contents of a file, but it isn't very good when dealing with large files. Instead, we'll iterate over line at a time using the following command:

In [13]:
for line in open('kanye.txt'):   
   print(line)

Oh, when it all, it all falls down

I'm telling you, oh, it all falls down





Oh, when it all, it all falls down

I'm telling you, oh, it all falls down





Man, I promise, she's so self conscious

She has no idea what she's doing in college

That major that she majored in don't make no money

But she won't drop out, her parents will look at her funny





Now, tell me that ain't insecure

The concept of school seems so secure

Sophomore, three years, ain't picked a career

She like, screw it, I'll just stay down here and do hair





'Cause that's enough money to buy her a few pairs of new airs

'Cause her baby daddy don't really care

She's so precious with the peer pressure

Couldn't afford a car so she named her daughter Alexus





She had hair so long that it looked like weave

Then she cut it all off now she look like Eve

And she be dealing with some issues that you can't believe

Single black female, addicted to retail and well





Oh, when it all, it all falls down

I'm t

There are a lot of extra spaces between the lyrics. We can use the strip command to delete them: 

In [2]:
for line in open('kanye.txt'):   
   print(line.strip())

Oh, when it all, it all falls down
I'm telling you, oh, it all falls down


Oh, when it all, it all falls down
I'm telling you, oh, it all falls down


Man, I promise, she's so self conscious
She has no idea what she's doing in college
That major that she majored in don't make no money
But she won't drop out, her parents will look at her funny


Now, tell me that ain't insecure
The concept of school seems so secure
Sophomore, three years, ain't picked a career
She like, screw it, I'll just stay down here and do hair


'Cause that's enough money to buy her a few pairs of new airs
'Cause her baby daddy don't really care
She's so precious with the peer pressure
Couldn't afford a car so she named her daughter Alexus


She had hair so long that it looked like weave
Then she cut it all off now she look like Eve
And she be dealing with some issues that you can't believe
Single black female, addicted to retail and well


Oh, when it all, it all falls down
I'm telling you oh, it all falls down


Since opening the "kanye.txt" file was successful, the operating system returned us a file handle. The file handle is not the actual data contained in the file, but instead it is a “handle” that we can use to read the data. You are given a handle if the requested file exists and you have the proper permissions to read the file.

If the file does not exist, open will fail with a traceback and you will not get a handle to access the contents of the file:

In [43]:
f = open('missingfile.text')

FileNotFoundError: [Errno 2] No such file or directory: 'missingfile.text'

Often times, the first line of data file contains headers, i.e., labels for the columns. For example, consider this football data:

In [7]:
for line in open('football.txt'):   
   print(line.strip())

Team,Games,Wins,Losses,Draws,Goals,Goals Allowed,Points
Arsenal,38,26,9,3,79,36,87
Liverpool,38,24,8,6,67,30,80
Manchester United,38,24,5,9,87,45,77
Newcastle,38,21,8,9,74,52,71
Leeds,38,18,12,8,53,37,66
Chelsea,38,17,13,8,66,38,64
West_Ham,38,15,8,15,48,57,53
Aston_Villa,38,12,14,12,46,47,50
Tottenham,38,14,8,16,49,53,50
Blackburn,38,12,10,16,55,51,46
Southampton,38,12,9,17,46,54,45
Middlesbrough,38,12,9,17,35,47,45
Fulham,38,10,14,14,36,44,44
Charlton,38,10,14,14,38,49,44
Everton,38,11,10,17,45,57,43
Bolton,38,9,13,16,44,62,40
Sunderland,38,10,10,18,29,51,40
Ipswich,38,9,9,20,41,64,36
Derby,38,8,6,24,33,63,30
Leicester,38,5,13,20,30,64,28


If we don't want to store the header row, we can use "next":

In [6]:
with open('football.txt') as f:
    next(f)
    for line in f:
        print(line.strip())

Arsenal,38,26,9,3,79,36,87
Liverpool,38,24,8,6,67,30,80
Manchester United,38,24,5,9,87,45,77
Newcastle,38,21,8,9,74,52,71
Leeds,38,18,12,8,53,37,66
Chelsea,38,17,13,8,66,38,64
West_Ham,38,15,8,15,48,57,53
Aston_Villa,38,12,14,12,46,47,50
Tottenham,38,14,8,16,49,53,50
Blackburn,38,12,10,16,55,51,46
Southampton,38,12,9,17,46,54,45
Middlesbrough,38,12,9,17,35,47,45
Fulham,38,10,14,14,36,44,44
Charlton,38,10,14,14,38,49,44
Everton,38,11,10,17,45,57,43
Bolton,38,9,13,16,44,62,40
Sunderland,38,10,10,18,29,51,40
Ipswich,38,9,9,20,41,64,36
Derby,38,8,6,24,33,63,30
Leicester,38,5,13,20,30,64,28


If you want to store the header for later, you can use readline:

In [10]:
with open('football.txt') as f:
    header = f.readline()
    for line in f:
        print(line.strip())
    print('header:', header)

Arsenal,38,26,9,3,79,36,87
Liverpool,38,24,8,6,67,30,80
Manchester United,38,24,5,9,87,45,77
Newcastle,38,21,8,9,74,52,71
Leeds,38,18,12,8,53,37,66
Chelsea,38,17,13,8,66,38,64
West_Ham,38,15,8,15,48,57,53
Aston_Villa,38,12,14,12,46,47,50
Tottenham,38,14,8,16,49,53,50
Blackburn,38,12,10,16,55,51,46
Southampton,38,12,9,17,46,54,45
Middlesbrough,38,12,9,17,35,47,45
Fulham,38,10,14,14,36,44,44
Charlton,38,10,14,14,38,49,44
Everton,38,11,10,17,45,57,43
Bolton,38,9,13,16,44,62,40
Sunderland,38,10,10,18,29,51,40
Ipswich,38,9,9,20,41,64,36
Derby,38,8,6,24,33,63,30
Leicester,38,5,13,20,30,64,28
header: Team,Games,Wins,Losses,Draws,Goals,Goals Allowed,Points



Suppose we want to count the number of times "falls" appears in the lyrics. In this case, we will want to break up each sentence into a list of words using the split command: 

In [32]:
count = 0
for line in open('kanye.txt'):   
    words = line.strip().split()
    if 'falls' in words:
        count = count +1
        
print(count)

16


Suppose you wanted to print out what lines in the file "falls" appears on. In that case, we can use the enumerate function. The enumerate function iterates through items in a list and creates an index for them. Let's do an easier example first. Let's say I had a list of colors and I wanted to print the color and its index on a separate line. I would type:

In [3]:
colors = ['red', 'blue', 'yellow', 'blue', 'green']
for index, color in enumerate(colors):
    print(index, color)

0 red
1 blue
2 yellow
3 blue
4 green


Now, we can use enumerate to print the lines that "falls" is on:

In [5]:
count = 0
for index, line in enumerate(open('kanye.txt')):   
    words = line.strip().split()
    if 'falls' in words:
        count = count +1
        print('falls is on line', index)
print('count:', count)


falls is on line 0
falls is on line 1
falls is on line 4
falls is on line 5
falls is on line 32
falls is on line 33
falls is on line 60
falls is on line 61
falls is on line 88
falls is on line 89
falls is on line 90
falls is on line 91
falls is on line 92
falls is on line 93
falls is on line 94
falls is on line 95
count: 16


Okay, let's put everything together. Suppose we want to break up the kanye file into words. We'll make a dictionary of the words and their corresponding frequencies. Then, we'll print out the list of words in decending order of frequency. Let's do it:

In [20]:
#create the dictionary of words and frequencies:
word_dict = {}
for line in open('kanye.txt'):
    for word in line.split():
        if word in word_dict:
            word_dict[word] = word_dict[word] + 1
        else:
            word_dict[word] = 1

#create a list to sort the words
word_list = []
for key,val in word_dict.items():
    word_list.append((val,key))

word_list.sort(reverse = True)

for key,value in word_list:
    print(key, value)

29 it
20 all
17 down
16 falls
16 a
16 I
10 I'm
9 to
9 the
8 when
8 telling
8 buy
8 and
8 all,
8 Oh,
7 you,
6 you
6 with
6 that
6 so
6 she
6 of
6 like
5 we
5 her
5 can't
5 'Cause
4 they
4 self
4 ohh,
4 oh,
4 in
4 ain't
4 We
4 And
3 up
3 this
3 that's
3 on
3 me
3 look
3 got
3 get
3 even
3 don't
3 do
3 be
3 The
3 She
2 won't
2 why
2 wanna
2 us
2 thou
2 some
2 she's
2 seems
2 really
2 promise,
2 people
2 our
2 off
2 no
2 money
2 just
2 it,
2 how
2 hate
2 hair
2 had
2 f
2 conscious
2 but
2 at
2 act
2 Then
2 That's
2 But
2 'em
1 years,
1 workin'
1 will
1 white
1 what's
1 what
1 went
1 well
1 weave
1 wealth
1 way
1 watches
1 us,
1 ugliest
1 trying
1 treat
1 three
1 things
1 than
1 terrific
1 tell
1 team
1 store
1 still
1 stay
1 spent
1 spending
1 specific
1 slave
1 shorty's
1 shirt
1 ship)
1 shine
1 shift
1 see
1 secure
1 screw
1 school
1 say
1 road
1 rings
1 riches
1 retail
1 pushing
1 pronounce
1 problem
1 prettiest
1 pressure
1 precious
1 police,
1 picked
1 peer
1 past
1 pass
1 parents
1 p

What happens if had created the list of tuple in the order (key,val) instead of (val,key)? It would have sorted alphabetically:

In [21]:
#create the dictionary of words and frequencies:
word_dict = {}
for line in open('kanye.txt'):
    for word in line.split():
        if word in word_dict:
            word_dict[word] = word_dict[word] + 1
        else:
            word_dict[word] = 1

#create a list to sort the words
word_list = []
for key,val in word_dict.items():
    word_list.append((val,key))

word_list.sort(reverse = True)

for key,value in word_list:
    print(key, value)

29 it
20 all
17 down
16 falls
16 a
16 I
10 I'm
9 to
9 the
8 when
8 telling
8 buy
8 and
8 all,
8 Oh,
7 you,
6 you
6 with
6 that
6 so
6 she
6 of
6 like
5 we
5 her
5 can't
5 'Cause
4 they
4 self
4 ohh,
4 oh,
4 in
4 ain't
4 We
4 And
3 up
3 this
3 that's
3 on
3 me
3 look
3 got
3 get
3 even
3 don't
3 do
3 be
3 The
3 She
2 won't
2 why
2 wanna
2 us
2 thou
2 some
2 she's
2 seems
2 really
2 promise,
2 people
2 our
2 off
2 no
2 money
2 just
2 it,
2 how
2 hate
2 hair
2 had
2 f
2 conscious
2 but
2 at
2 act
2 Then
2 That's
2 But
2 'em
1 years,
1 workin'
1 will
1 white
1 what's
1 what
1 went
1 well
1 weave
1 wealth
1 way
1 watches
1 us,
1 ugliest
1 trying
1 treat
1 three
1 things
1 than
1 terrific
1 tell
1 team
1 store
1 still
1 stay
1 spent
1 spending
1 specific
1 slave
1 shorty's
1 shirt
1 ship)
1 shine
1 shift
1 see
1 secure
1 screw
1 school
1 say
1 road
1 rings
1 riches
1 retail
1 pushing
1 pronounce
1 problem
1 prettiest
1 pressure
1 precious
1 police,
1 picked
1 peer
1 past
1 pass
1 parents
1 p

What happens if we had forgotten .split()? It would have counted the frequency of letters instead of words:

In [22]:
#create the dictionary of words and frequencies:
word_dict = {}
for line in open('kanye.txt'):
    for word in line:
        if word in word_dict:
            word_dict[word] = word_dict[word] + 1
        else:
            word_dict[word] = 1

#create a list to sort the words
word_list = []
for key,val in word_dict.items():
    word_list.append((val,key))

word_list.sort(reverse = True)

for key,value in word_list:
    print(key, value)

524  
238 e
189 a
179 t
177 l
168 o
142 s
131 n
131 h
129 i
100 

81 r
75 d
71 u
63 w
55 c
53 '
52 ,
49 f
44 y
40 m
39 g
36 p
30 I
27 b
21 k
13 v
10 T
8 O
7 W
6 S
6 C
6 A
5 j
5 B
4 0
3 z
3 J
2 P
2 M
2 E
2 4
2 "
1 x
1 V
1 R
1 N
1 F
1 D
1 ?
1 6
1 5
1 2
1 1
1 )
1 (


### Exercise - Dance
Write a program that reads the file dance.txt. It should print the most frequently used words in the song from least to greatest.

In [60]:
#insert dance code

### Exercise - Dance again
Print out the words and their counts in alphabetically descending order.

In [61]:
#insert dance again code

### Exercise - Dance again again
Write a program that prints out the lines that contain the word "clean".

In [6]:
#insert dance again again code

### Exercise Challenge: - kanye
Suppose we wanted to create a program that doesn't check whether "kanye" is a word in the line but checks whether the letters in that line could form the word "Kanye." For example, the line "But she won't drop out, her parents will look at her funny" contains the letters k, a, n, y, and e. Write a program that prints the locations of the lines that form the word "Kanye."

In [None]:
#insert kanye

### Exercise Challenge: - falls
Suppose we wanted to create a program that doesn't check whether "falls" is a word in the line but checks whether the letters in that line could form the word "falls." For example, the line "Single black female, addicted to retail and well" contains the letters f,a,l,l,s. Write a program that prints the locations of the lines that form the word "falls."

In [None]:
#insert falls

Delimiters
---
<a class="anchor" id="delimiters"></a>

Up until now, we have been mostly breaking up sentences by words using .split() with the parenthesis blank. However, we can choose to break up the sentence by any delimiter we want. Consider, for example, the student.txt file:

In [34]:
for line in open('students.txt'):
    print(line.strip())

Jane Doe, 2000, 101 Main St
John Doe, 2001, 123 Oak St
Ann Ko, 1999, 57 Tree St
Paul Smith, 2000, 60 Spring St
Sarah McDonald, 2001, 101 MLK Blvd


In this case, we want to break up each line by commas. To do this, type:

In [41]:
for line in open('students.txt'):
    words = line.strip().split(',')
    print(words)

['Jane Doe', ' 2000', ' 101 Main St']
['John Doe', ' 2001', ' 123 Oak St']
['Ann Ko', ' 1999', ' 57 Tree St']
['Paul Smith', ' 2000', ' 60 Spring St']
['Sarah McDonald', ' 2001', ' 101 MLK Blvd']


We could then keep separate name, birth year, and address lists by typing:


In [42]:
name_list = []
birth_year_list = []
address_list = []
for line in open('students.txt'):
    words = line.strip().split(',')
    name_list.append(words[0])
    birth_year_list.append(words[1])
    address_list.append(words[2])
print(name_list)
print(birth_year_list)
print(address_list)

['Jane Doe', 'John Doe', 'Ann Ko', 'Paul Smith', 'Sarah McDonald']
[' 2000', ' 2001', ' 1999', ' 2000', ' 2001']
[' 101 Main St', ' 123 Oak St', ' 57 Tree St', ' 60 Spring St', ' 101 MLK Blvd']


When searching through a file, we may only care about lines that being with certain strings. For example, consider this file:

In [45]:
for line in open('mailbox.txt'):
    print(line.strip())

From: janedoe@gmail.com Sat Jan 5 2008
To: jackdoe@aol.com
Subject: Saturday Party

From: annesmith@gmail.com Sun Jan 6 2008
To: bobpaul@amazon.com
Subject: I’m mad at you

From: jackmac@mac.com Mon Jan 7 2008
To: catdancy@gmail.com
Subject: Not safe for work

From: paullauren@gmail.com Tues Jan 9 2008
To: mikejoy@aol.com
Subject: For sale


Suppose we only wanted to save the email addresses of the people who sent the emails (in the From: lines). We could use the command .startswith():

In [47]:
for line in open('mailbox.txt'):
    if line.startswith('From:'):
        print(line)

From: janedoe@gmail.com Sat Jan 5 2008

From: annesmith@gmail.com Sun Jan 6 2008

From: jackmac@mac.com Mon Jan 7 2008

From: paullauren@gmail.com Tues Jan 9 2008



Then, we could save those emails by breaking up each line into words and saving the second word:

In [50]:
names=[]
for line in open('mailbox.txt'):
    if line.startswith('From:'):
        words = line.split()
        names.append(words[1])
print(names)

['janedoe@gmail.com', 'annesmith@gmail.com', 'jackmac@mac.com', 'paullauren@gmail.com']


Or more succinctly:

In [54]:
names=[]
for line in open('mailbox.txt'):
    if line.startswith('From:'):
        names.append(line.split()[1])
print(names)

['janedoe@gmail.com', 'annesmith@gmail.com', 'jackmac@mac.com', 'paullauren@gmail.com']


Suppose you only wanted to save the username part of the email addresses to the right of the @ sign. You could use a delimiter again:

In [59]:
for line in open('mailbox.txt'):
    if line.startswith('From:'):
        words = line.split()[1].split('@')
        print(words[0])

janedoe
annesmith
jackmac
paullauren


Or more succinctly:

In [57]:
names=[]
for line in open('mailbox.txt'):
    if line.startswith('From:'):
        print(line.split()[1].split('@')[0])

janedoe
annesmith
jackmac
paullauren


### Exercise - sports
Open the file sports.txt. Break up each line by the delimiter "-". Then print a list of what each student plays in the spring. For example, each line should say something like "Brenda plays track in the spring."

In [62]:
#insert sports code

### Exercise - sports again
In the sports file, break up each line by the delimiter "-". Then, use another delimiter to get the sports unattached from the seasons. Print a count of how many students are enrolled in each sport. For example, print something like "Two students play track."

In [63]:
#insert sports again code

### Exercise - Football
Break up the football.txt data using comma delimiters. Store the team names in a list.

In [11]:
#insert football

### Exercise - Football 2
Store the football.txt data as a list of lists for the different team data. Then, print the that has the minimum absolute value difference between their goals and goals allowed. (Hint: the answer should be Aston Villa.)

In [1]:
#insert football 2


CSV Files
---
<a class="anchor" id="csv"></a>
The so-called CSV (Comma Separated Values) format is the most common import and export format for spreadsheets and databases. For example, you can always store an Excel or Google sheet in CSV format. We can read them as follows:

In [1]:
import csv

degree=[]

with open('faculty.csv') as csvfile:
    data = csv.reader(csvfile, delimiter=',')
    for row in data:
        print(row)
            

['name', ' degree', ' title', ' email']
['Scarlett L. Bellamy', ' Sc.D.', 'Associate Professor of Biostatistics', 'bellamys@mail.med.upenn.edu']
['Warren B. Bilker', 'Ph.D.', 'Professor of Biostatistics', 'warren@upenn.edu']
['Matthew W Bryan', ' PhD', 'Assistant Professor of Biostatistics', 'bryanma@upenn.edu']
['Jinbo Chen', ' Ph.D.', 'Associate Professor of Biostatistics', 'jinboche@upenn.edu']
['Susan S Ellenberg', ' Ph.D.', 'Professor of Biostatistics', 'sellenbe@upenn.edu']
['Jonas H. Ellenberg', ' Ph.D.', 'Professor of Biostatistics', 'jellenbe@mail.med.upenn.edu']
['Rui Feng', ' Ph.D', 'Assistant Professor of Biostatistics', 'ruifeng@upenn.edu']
['Benjamin C. French', ' PhD', 'Associate Professor of Biostatistics', 'bcfrench@mail.med.upenn.edu']
['Phyllis A. Gimotty', ' Ph.D', 'Professor of Biostatistics', 'pgimotty@upenn.edu']
['Wensheng Guo', ' Ph.D', 'Professor of Biostatistics', 'wguo@mail.med.upenn.edu']
['Yenchih Hsu', ' Ph.D.', 'Assistant Professor of Biostatistics', 'hs

Notice that the first row is a header. If you wanted to skip it, you could type:

In [2]:
import csv

degree=[]

with open('faculty.csv') as csvfile:
    data = csv.reader(csvfile, delimiter=',')
    next(data, None)
    for row in data:
        print(row)
            

['Scarlett L. Bellamy', ' Sc.D.', 'Associate Professor of Biostatistics', 'bellamys@mail.med.upenn.edu']
['Warren B. Bilker', 'Ph.D.', 'Professor of Biostatistics', 'warren@upenn.edu']
['Matthew W Bryan', ' PhD', 'Assistant Professor of Biostatistics', 'bryanma@upenn.edu']
['Jinbo Chen', ' Ph.D.', 'Associate Professor of Biostatistics', 'jinboche@upenn.edu']
['Susan S Ellenberg', ' Ph.D.', 'Professor of Biostatistics', 'sellenbe@upenn.edu']
['Jonas H. Ellenberg', ' Ph.D.', 'Professor of Biostatistics', 'jellenbe@mail.med.upenn.edu']
['Rui Feng', ' Ph.D', 'Assistant Professor of Biostatistics', 'ruifeng@upenn.edu']
['Benjamin C. French', ' PhD', 'Associate Professor of Biostatistics', 'bcfrench@mail.med.upenn.edu']
['Phyllis A. Gimotty', ' Ph.D', 'Professor of Biostatistics', 'pgimotty@upenn.edu']
['Wensheng Guo', ' Ph.D', 'Professor of Biostatistics', 'wguo@mail.med.upenn.edu']
['Yenchih Hsu', ' Ph.D.', 'Assistant Professor of Biostatistics', 'hsu9@mail.med.upenn.edu']
['Rebecca A Hubb

If we wanted to write to a csv file, we could type:

In [5]:
emails = ['janedoe@gmail.com', 'jackdoe@amazon.com', 'sallysmith@aol.com']        
with open('emails.csv', 'w', newline='') as csvfile:
    myfile = csv.writer(csvfile)
    myfile.writerow(['list_of_emails'])
    for email in emails:
            myfile.writerow([email])



Notice that we needed to put each string that we wanted to write to the csv file inside brackets. Otherwise, it would create a space between each letter in the string.

### Exercise - Degrees

Write a program that reads in faculty.csv and creates a dictionary of each degree (standardized to not include periods) and the count of each title. Your program should print: {'0': 1, 'PhD': 31, 'MPH': 2, 'ScD': 6, 'MS': 2, 'BSEd': 1, 'MA': 1, 'MD': 1, 'JD': 1}

In [None]:
# insert degrees

### Exercise - Title
Write a program that reads in faculty.csv and creates a dictionary of each title and count (be careful to first account for a typo in the csv file. Your program should print: {'Professor of Biostatistics': 13, 'Assistant Professor of Biostatistics': 12, 'Associate Professor of Biostatistics': 12}

In [None]:
#insert title

### Exercise - email
Write a program that reads in faculty.csv and creates a unique list of the domain names (after the "@" symbol in the email address). Your program should print: {'cceb.med.upenn.edu', 'email.chop.edu', 'upenn.edu', 'mail.med.upenn.edu'}

In [None]:
#insert email

### Exercise - last name
Write a program that reads in faculty.csv and creates a dictionary such that they key is the last name and the value is a list of the degree, title, and email. Be careful to account for duplicate last names. For example:

'Bellamy': [[' Sc.D.', 'Associate Professor of Biostatistics', 'bellamys@mail.med.upenn.edu']]

and

'Li': [[' Ph.D.', 'Assistant Professor of Biostatistics', 'liy3@email.chop.edu'], [' Ph.D.', 'Associate Professor of Biostatistics', 'mingyao@mail.med.upenn.edu'], [' Ph.D', 'Professor of Biostatistics', 'hongzhe@upenn.edu']]

In [None]:
#insert last name

### Exercise - tuple
Write a program that reads in faculty.csv and creates a dictionary such that the key is a tuple of the name and the value is the list of the degree, title, and email. You can assume each tuple name is unique. For example:

('Benjamin', 'C.', 'French'): [' PhD',
  'Associate Professor of Biostatistics',
  'bcfrench@mail.med.upenn.edu']
  
  
 ('Dawei', 'Xie'): [' PhD',
  'Assistant Professor of Biostatistics',
  'dxie@upenn.edu']

In [None]:
#insert tuple

### Exercise - write names
Write a program that reads in faculty.csv and writes the names of the professors to a list called names.csv. You should include a header in the file such that the first line says "Professor Names".

In [None]:
#insert write names

Files in Other Locations
---
<a class="anchor" id="locations"></a>

We have been reading files that are located in the directory of this program. If we needed to search somewhere else for the file, we would need to add a bit more to our file path name. 

On a Mac, if your username was janedoe and you wanted to print a "hello.txt" file located in your Documents folder, you would type:

In [None]:
for line in open('/Users/janedoe/Documents/hello.txt'):
    print(line)

On a PC, you would type:

In [None]:
for line in open('C:/Users/janedoe/Documents/hello.txt'):
    print(line)

Of course, neither of these will work, since we don't have a file called "hello.txt" located there.

In fact, we'll learn better ways of referencing file path names when we learn about the os package in later units.

File Output
---
<a class="anchor" id="output"></a>

To write a file, you have to open it with mode 'w' as a second parameter:

In [75]:
fout = open('output.txt', 'w')
print(fout)

<_io.TextIOWrapper name='output.txt' mode='w' encoding='UTF-8'>


If the file already exists, opening it in write mode clears out the old data and starts fresh, so be careful! If the file doesn’t exist, a new one is created.

The write method of the file handle object puts data into the file. The file object keeps track of where it is, so if you call write again, it adds the new data to the end.

When you are done writing, you have to close the file to make sure that the last bit of data is physically written to the disk so it will not be lost if the power goes off.

In [76]:
fout = open('output.txt', 'w')
line1 = "Oh, when it all, it all falls down,\n"
fout.write(line1)

line2 = "I'm telling you, oh, it all falls down,\n"
fout.write(line2)
fout.close()

Note in the above code that we had to add new line characters. The print statement automatically appends a newline, but the write method does not add the newline automatically. If you want each sentence to be on a different line, you'll need to add a "\n"

To make sure that we actually wrote to that file, we can open it and read it:


In [77]:
for line in open('output.txt'):
    print(line)

Oh, when it all, it all falls down,

I'm telling you, oh, it all falls down,



If we want to clear that file and rewrite the numbers between 1 and 10, we can type:

In [82]:
fout = open('output.txt', 'w')
for i in range(1,11):
    fout.write('Number:'+str(i)+'\n')
fout.close()

Once again, let's check our work:

In [83]:
for line in open('output.txt'):
    print(line)

Number:1

Number:2

Number:3

Number:4

Number:5

Number:6

Number:7

Number:8

Number:9

Number:10



Note: we needed to use plus signs instead of commas. Write does not accept commas between words like print does:

In [84]:
fout = open('output.txt', 'w')
for i in range(1,11):
    fout.write('Number:',str(i),'\n')
fout.close()

TypeError: write() takes exactly one argument (3 given)

In [8]:
import os

filename = input('What file do you want to add words to? ')
say = input('What do you want to say? ')
data = []

if os.path.exists(filename):
    with open(filename, 'a') as f:
        f.write(say+'\n')    
else:
    f = open(filename, 'w')
    f.write(say+'\n')

What file do you want to add words to? journal.txt
What do you want to say? Hi! This is my second one.


### Exercise - multiples
Write a program that stores the first 100 multiples of 7, each on a different line, in a file called 7.

In [85]:
#insert multiples

### Exercise - kanye
Write a program that calculates Kanye's most used words in kanye.txt and prints them to a file, in decending order of frequency.


In [86]:
#insert kanye

Suppose we wanted to ask the user for a file. If that file already exists, we want to add the user's words to the file. If the file doesn't exist, we want to create a new one. We would need to import the os package if using a Mac. We'll get more into importing packages later. For now, let's just use it:

### Exercise - calendar app
Write a program that takes in a date in the form "MM-DD-YY' and a reminder for that day. For example, a user might input "09-27-17" and "Get Lauren a Birthday Present." The program should add this information to the file calendar.txt each time the user calls the program. One caveat: if the user enters a date that is already in the file, then that reminder should be inserted in the right spot, rather than at the end of the file. For example, two reminders for the date "09-27-17" should be next to each other. You don't need to worry about putting all of the dates in chronological order just yet. We'll do that in another program later when we get to the datetime module.

In [None]:
#insert calendar app