# Lecture 9: Files & List Comprehensions

### Jeannie Albrecht and Shikha Singh

In the last few lectures, we focused on **sequences (strings, lists, ranges)** and how to iterate over them using loops.  

Today we will look at how to read from files, and store the contents as a string or list. 

We will also look at some code patterns (mapping and filtering) involving lists and use **list comprehensions** for them.

## Reading from a File

We can read and write to a file using python commands. Today we will only focus on reading from a file. Next week, we will look at how to write to a file.  

### Opening a file
A file is an object that is created by the built-in function `open()`.

In [1]:
book = open('textfiles/prideandprejudice.txt', 'r') # 'r' means open the file for reading

In [2]:
book  

<_io.TextIOWrapper name='textfiles/prideandprejudice.txt' mode='r' encoding='UTF-8'>

In [3]:
type(book) 

_io.TextIOWrapper

**Mode.** The mode `'r'` for reading the file is the default (and optional). So, we can just write:

In [4]:
book = open('textfiles/prideandprejudice.txt') 

**Aside.** Python assumes that (by default) our text files are encoded in ASCII.  We will discuss ASCII encodings when we talk about sorting in the coming weeks. 

### With... as block and iterating over files

**With block to open and close files.**   Technically when you open a file, you must also close it. To avoid writing code to explicitly open and close the file, we will use the `with… as` block which keeps the file open within it.

Within a `with...as` block, we can iterate over the lines of a file the same way we would iterate over any sequence.

In [None]:
with open('textfiles/prideandprejudice.txt') as book:
    for line in book:
        print(line.strip())

In [None]:
with open('textfiles/prideandprejudice.txt') as book: 
    for line in book:
        print(line.strip().split()) 

In [5]:
wordList = []
with open('textfiles/prideandprejudice.txt') as book:  
    for line in book:
        wordList.extend(line.strip().split())
#wordList

In [6]:
len(wordList)

122089

In [7]:
# number of times a word is in the book?
wordList.count('love')

91

In [8]:
wordList.count('dear')

158

In [9]:
wordList.count('the')

4331

## Reading in Student Names

The name of students in this class are in `classNames.txt` in directory `textfiles`. 

In [10]:
# lets try the same example again with .strip()
filename = 'textfiles/classNames02.txt' # 9 am section
with open(filename) as roster:  #  roster: name of file object
    for line in roster:
        print(line.strip())
# file is implicitly closed here

Appiah,Tazmin
Bliven,Jocelyn M.
Brockman,Annika M.
Byun,Justin
Cao,Yaoyue
Casenave Barranguet,Lola
Catlin,Tucker R.
Chong,Andrew H.
Coady,Tendai T.
Cueto,Trishia
Dabinett,Olivia A.
Dohr,Peter R.
Garrity-Rokous,Eamon
George,Tyler J.
Guzman Vazquez,Jose
Horvat,Aleksander
Jaroker,Brenda
Jeon,Ethan
Kang,Jeeyon
Kucharski,Connor J.
Laalai,Elyes
Laws,Matt D.
Lee,Jack J.
Li,Jimmy C.
Ohl,Madeline A.
Patrick,Jonathan C.
Pham,Aiden
Pham,True L.
Phan,Sam M.
Hidalgo, Emily
Quasney,Ari J.
Ren,Jianing
Sasaki,Sammy W.
Song,Will B.
Venci,Charlie P.
Widman,Sylvana L.


### Collecting names in a list

Suppose we want to create a list of all names, where names appear in `firstName (M.I.) lastName` format.  How do we achieve that?

In [11]:
students = [] # initialize
with open(filename) as roster:  #  roster: name of file object
    for line in roster:
        fullName = line.strip().split(',')
        firstName = fullName[1]
        lastName = fullName[0]
        # print(firstName, lastName)
        students.append(firstName + ' ' + lastName)

In [12]:
students

['Tazmin Appiah',
 'Jocelyn M. Bliven',
 'Annika M. Brockman',
 'Justin Byun',
 'Yaoyue Cao',
 'Lola Casenave Barranguet',
 'Tucker R. Catlin',
 'Andrew H. Chong',
 'Tendai T. Coady',
 'Trishia Cueto',
 'Olivia A. Dabinett',
 'Peter R. Dohr',
 'Eamon Garrity-Rokous',
 'Tyler J. George',
 'Jose Guzman Vazquez',
 'Aleksander Horvat',
 'Brenda Jaroker',
 'Ethan Jeon',
 'Jeeyon Kang',
 'Connor J. Kucharski',
 'Elyes Laalai',
 'Matt D. Laws',
 'Jack J. Lee',
 'Jimmy C. Li',
 'Madeline A. Ohl',
 'Jonathan C. Patrick',
 'Aiden Pham',
 'True L. Pham',
 'Sam M. Phan',
 ' Emily Hidalgo',
 'Ari J. Quasney',
 'Jianing Ren',
 'Sammy W. Sasaki',
 'Will B. Song',
 'Charlie P. Venci',
 'Sylvana L. Widman']

### List Comprehensions to Map and Filter


When processing lists, there are common patterns that come up:

**Mapping.**  Iterate over a list and return a new list which results from performing an operation on each element of a given list 
  - E.g., take a list of integers `numList` and return a new list which contains the square of each number in `numList`

**Filtering.** Iterate over a list and return a new list that results from keeping those elements of the list that satisfy some condition
   - E.g., take a list of integers `numList` and return a new list which contains only the even numbers in `numList`

Python allows us to implement these patterns succinctly using list comprehensions 

In [13]:
numList = range(11)

# without list comprehension:
evens = []
for num in numList:
    if num % 2 == 0:
        evens.append(num)
print(evens)

[0, 2, 4, 6, 8, 10]


In [14]:
# with list comprehension
evens = [num for num in numList if num % 2 == 0]
print(evens)

[0, 2, 4, 6, 8, 10]


In [15]:
# mapping and filtering together
evenSquared = [num*num for num in numList if num % 2 == 0]
print(evenSquared)

[0, 4, 16, 36, 64, 100]


### Using Functions We Built

We wrote a few helper functions last few lectures, which are now in a module `sequenceTools`

* isVowel
* countVowels
* wordStartEndCount
* palindromes

We can import these functions from our module into our current interactive python session, using the import command.

In [16]:
from sequenceTools import *

**Fun Facts.** Students with names that start with a vowel? 

Let us use list comprehensions

In [17]:
vowelNames = [name for name in students if isVowel(name[0])]
vowelNames

['Annika M. Brockman',
 'Andrew H. Chong',
 'Olivia A. Dabinett',
 'Eamon Garrity-Rokous',
 'Aleksander Horvat',
 'Ethan Jeon',
 'Elyes Laalai',
 'Aiden Pham',
 'Ari J. Quasney']

**Fun Facts.** Students with long or short names? 

In [18]:
firstNames = [name.split()[0] for name in students]
firstNames

['Tazmin',
 'Jocelyn',
 'Annika',
 'Justin',
 'Yaoyue',
 'Lola',
 'Tucker',
 'Andrew',
 'Tendai',
 'Trishia',
 'Olivia',
 'Peter',
 'Eamon',
 'Tyler',
 'Jose',
 'Aleksander',
 'Brenda',
 'Ethan',
 'Jeeyon',
 'Connor',
 'Elyes',
 'Matt',
 'Jack',
 'Jimmy',
 'Madeline',
 'Jonathan',
 'Aiden',
 'True',
 'Sam',
 'Emily',
 'Ari',
 'Jianing',
 'Sammy',
 'Will',
 'Charlie',
 'Sylvana']

In [19]:
[name for name in firstNames if len(name) > 7]

['Aleksander', 'Madeline', 'Jonathan']

In [20]:
[name for name in firstNames if len(name) < 4]

['Sam', 'Ari']

### CSV Files

A CSV (Comma Separated Values) file is a type of plain text file that stores `tabula` data.  Each row of a table is a line in the text file, with each column on the row separated by commas.  This format is the most common import and export format for spreadsheets and databases.  


For example a simple table such as the following with columns names and ages would be represented in a CSV as:

Table:

| Name     | Age  |
|:----------|:------|
| Harry    |  14  |
| Hermoine |  14 | 
| Dumbledor| 60  |

CSV:

Name,Age  
Harry,14  
Hermoine,14  
Dumbledor,60  

We can handle csv files similar to text files and use string/list methods to process the tabular data.

In [21]:
filename = 'csv/roster02.csv' # 9 am section
allStudents = []
with open(filename) as roster:
    for student in roster:
        studentInfo = student.strip().split(',')
        allStudents.append(studentInfo)
allStudents

[['Appiah', 'Tazmin', '25AAA'],
 ['Bliven', 'Jocelyn M.', '25AAA'],
 ['Brockman', 'Annika M.', '24AAA'],
 ['Byun', 'Justin', '24AAA'],
 ['Cao', 'Yaoyue', '23AAA'],
 ['Casenave Barranguet', 'Lola', '25AAA'],
 ['Catlin', 'Tucker R.', '24AAA'],
 ['Chong', 'Andrew H.', '25AAA'],
 ['Coady', 'Tendai T.', '25AAA'],
 ['Cueto', 'Trishia', '24AAA'],
 ['Dabinett', 'Olivia A.', '25AAA'],
 ['Dohr', 'Peter R.', '25AAA'],
 ['Garrity-Rokous', 'Eamon', '25AAA'],
 ['George', 'Tyler J.', '25AAA'],
 ['Guzman Vazquez', 'Jose', '24AAA'],
 ['Hidalgo', 'Emily C.', '25AAA'],
 ['Horvat', 'Aleksander', '24AAA'],
 ['Jaroker', 'Brenda', '25AAA'],
 ['Jeon', 'Ethan', '24AAA'],
 ['Kang', 'Jeeyon', '25AAA'],
 ['Kucharski', 'Connor J.', '23AAA'],
 ['Laalai', 'Elyes', '25AAA'],
 ['Laws', 'Matt D.', '25AAA'],
 ['Lee', 'Jack J.', '25AAA'],
 ['Li', 'Jimmy C.', '24AAA'],
 ['Ohl', 'Madeline A.', '23AAA'],
 ['Patrick', 'Jonathan C.', '24AAA'],
 ['Pham', 'Aiden', '24AAA'],
 ['Pham', 'True L.', '23AAL'],
 ['Phan', 'Sam M.', '24

In [22]:
with open(filename) as roster:
    allStudents = [line.strip().split(',') for line in roster]

In [23]:
allStudents # list of lists

[['Appiah', 'Tazmin', '25AAA'],
 ['Bliven', 'Jocelyn M.', '25AAA'],
 ['Brockman', 'Annika M.', '24AAA'],
 ['Byun', 'Justin', '24AAA'],
 ['Cao', 'Yaoyue', '23AAA'],
 ['Casenave Barranguet', 'Lola', '25AAA'],
 ['Catlin', 'Tucker R.', '24AAA'],
 ['Chong', 'Andrew H.', '25AAA'],
 ['Coady', 'Tendai T.', '25AAA'],
 ['Cueto', 'Trishia', '24AAA'],
 ['Dabinett', 'Olivia A.', '25AAA'],
 ['Dohr', 'Peter R.', '25AAA'],
 ['Garrity-Rokous', 'Eamon', '25AAA'],
 ['George', 'Tyler J.', '25AAA'],
 ['Guzman Vazquez', 'Jose', '24AAA'],
 ['Hidalgo', 'Emily C.', '25AAA'],
 ['Horvat', 'Aleksander', '24AAA'],
 ['Jaroker', 'Brenda', '25AAA'],
 ['Jeon', 'Ethan', '24AAA'],
 ['Kang', 'Jeeyon', '25AAA'],
 ['Kucharski', 'Connor J.', '23AAA'],
 ['Laalai', 'Elyes', '25AAA'],
 ['Laws', 'Matt D.', '25AAA'],
 ['Lee', 'Jack J.', '25AAA'],
 ['Li', 'Jimmy C.', '24AAA'],
 ['Ohl', 'Madeline A.', '23AAA'],
 ['Patrick', 'Jonathan C.', '24AAA'],
 ['Pham', 'Aiden', '24AAA'],
 ['Pham', 'True L.', '23AAL'],
 ['Phan', 'Sam M.', '24

In [24]:
len(allStudents) # number of students in class

36

## Indexing Lists of Lists

We can treat list of lists just like we would list of any other sequence (e.g. strings).  

To index an element of an inner list (student info), we'd need two indices:
 - first index identifies which list we want, and 
 - second index identifies which element within that list.


In [25]:
allStudents[0]

['Appiah', 'Tazmin', '25AAA']

In [26]:
allStudents[0][1][0] # how do I index Taz's first name?

'T'

**List Comprehension to create a list of Last Names.** How can we create a list of all student's last names?

In [27]:
lastNames = [s[0] for s in allStudents]
lastNames

['Appiah',
 'Bliven',
 'Brockman',
 'Byun',
 'Cao',
 'Casenave Barranguet',
 'Catlin',
 'Chong',
 'Coady',
 'Cueto',
 'Dabinett',
 'Dohr',
 'Garrity-Rokous',
 'George',
 'Guzman Vazquez',
 'Hidalgo',
 'Horvat',
 'Jaroker',
 'Jeon',
 'Kang',
 'Kucharski',
 'Laalai',
 'Laws',
 'Lee',
 'Li',
 'Ohl',
 'Patrick',
 'Pham',
 'Pham',
 'Phan',
 'Quasney',
 'Ren',
 'Sasaki',
 'Song',
 'Venci',
 'Widman']

**Generating random indices.** Let's play a game where we generated random numbers between 0 and 35 and index our list with that number to see whose name comes up.

In [28]:
import random # import module to help generate random numbers

In [29]:
randomIndex = random.randint(0, 35)  
# generates a random integer between 0 and 35

In [30]:
allStudents[randomIndex] # great way to cold call!

['Dabinett', 'Olivia A.', '25AAA']

In [31]:
allStudents[random.randint(0,35)][1]   
# Accessing just the name

'Aiden'

### Exercise:  Number of Students by Year

Let's get to know our class better!  We will write a function `yearList` which takes in two arguments `rosterList` (list of lists) and `year` (int) and returns the list of students in the class with that graduating year.
 



In [32]:
def yearList(rosterList, year):
    """Takes the student info as a list of lists and a year (22-25)
    and returns a list of students graduating that year"""
    
    pass

In [33]:
def yearList(rosterList, year):
    """Takes the student info as a list of lists and a year (22-25)
    and returns a list of students graduating that year"""
    
    return [s[1] for s in allStudents if s[-1][:2] == str(year)]

In [34]:
yearList(allStudents, 22) # seniors?

[]

In [35]:
yearList(allStudents, 23) # juniors?

['Yaoyue', 'Connor J.', 'Madeline A.', 'True L.', 'Jianing']

In [36]:
yearList(allStudents, 24) # sophmores?

['Annika M.',
 'Justin',
 'Tucker R.',
 'Trishia',
 'Jose',
 'Aleksander',
 'Ethan',
 'Jimmy C.',
 'Jonathan C.',
 'Aiden',
 'Sam M.',
 'Sammy W.',
 'Will B.']

In [37]:
len(yearList(allStudents, 25)) # first years?

18

**Student Fun Facts.** Which student in the class has the most number of vowels in their name?!  We'll do this next time.