# Lecture 9: Files & List Comprehensions

### Jeannie Albrecht and Shikha Singh

In the last few lectures, we focused on sequences (strings, lists, ranges) and how to iterate over them using loops.  Today we will look at how to read from files, and store the contents as a string or list. 

We will also look at some code patterns involving lists, strings and counters that are useful when analyzing data. 

## Reading from a File

We can easily read and write to a file using Python commands. Today we will only focus on reading from a file. Next week, we will look at how to write to a file.  

### Opening a file
A file is an object that is created by the built-in function `open()`.

In [None]:
book = open('textfiles/prideandprejudice.txt', 'r') # 'r' means open the file for reading

In [None]:
book  

In [None]:
type(book) 

**Mode.** The mode `'r'` for reading the file is the default (and optional). So, we can just write:

In [None]:
book = open('textfiles/prideandprejudice.txt') 

**Aside.** Python assumes that (by default) our text files are encoded in ASCII.  We will discuss ASCII encodings when we talk about sorting in the coming weeks. 

### With...as block and iterating over files

**With block to open and close files.**   Technically when you open a file, you must also close it to prevent future problems like memory leaks.  To avoid writing code to explicitly open and close, we will use the `with...as` block which keeps the file open within it, and closes the file after exiting the block.

Within a `with...as` block, we can iterate over the lines of a file the same way we would iterate over any sequence.

In [None]:
with open('textfiles/prideandprejudice.txt') as book:
    for line in book:
        print(line)

In [None]:
with open('textfiles/prideandprejudice.txt') as book: 
    for line in book:
        print(line.split()) 

In [None]:
wordList = []
with open('textfiles/prideandprejudice.txt') as book:  
    for line in book:
        wordList.extend(line.strip().split())
len(wordList)

In [None]:
# number of times a word is in the book?
wordList.count('love')

In [None]:
wordList.count('dear')

In [None]:
wordList.count('darcy')

## Reading in Student Names

The name of students in this class are in `classNames.txt` in directory `textfiles`. 

In [None]:
# lets try the same example again with .strip()
filename = 'textfiles/classNames01.txt' # 10 am section
with open(filename) as roster:  #  roster: name of file object
    for line in roster:
        print(line.strip())
# file is implicitly closed here

### Collecting names in a list

Suppose we want to create a list of all names, where names appear in `firstName (M.I.) lastName` format.  How do we achieve that?

In [None]:
students = [] # initialize
with open(filename) as roster:  #  roster: name of file object
    for line in roster:
        fullName = line.strip().split(',')
        firstName = fullName[1]
        lastName = fullName[0]
        # print(firstName lastName)
        students.append(firstName + ' ' + lastName)

In [None]:
students

### List Comprehensions to Map and Filter


When processing lists, there are common patterns that come up:

**Mapping.**  Iterate over a list and return a new list which results from performing an operation on each element of a given list 
  - E.g., take a list of integers `numList` and return a new list which contains the square of each number in `numList`

**Filtering.** Iterate over a list and return a new list that results from keeping those elements of the list that satisfy some condition
   - E.g., take a list of integers `numList` and return a new list which contains only the even numbers in `numList`

Python allows us to implement these patterns succinctly using list comprehensions 

In [None]:
numList = range(11)

# without list comprehension:
evens = []
for num in numList:
    if num % 2 == 0:
        evens.append(num)
print(evens)

In [None]:
# with list comprehension
evens = [num for num in numList if num % 2 == 0]
print(evens)

In [None]:
# mapping and filtering together
evenSquared = [num*num for num in numList if num % 2 == 0]
print(evenSquared)

### Using Functions We Built

We wrote a few helper functions in the last few lectures, which are now in a module called `sequenceTools`.

* isVowel()
* vowelSeq()
* countVowels()
* wordStartEnd()
* palindromes()

We can import these functions from our module into our current interactive python session, using the import command.

In [None]:
from sequenceTools import *

**Fun Facts.** What student names start with a vowel? 

Let us use list comprehensions!

In [None]:
vowelNames = [name for name in students if isVowel(name[0])]
vowelNames

**Fun Facts.** Which students have long or short names? 

In [None]:
firstNames = [name.split()[0] for name in students]
[name for name in firstNames if len(name) > 8]

In [None]:
[name for name in firstNames if len(name) < 4]

### CSV Files

A CSV (Comma Separated Values) file is a type of plain text file that stores `tabula` data.  Each row of a table is a line in the text file, with each column on the row separated by commas.  This format is the most common import and export format for spreadsheets and databases.  


For example a simple table such as the following with columns names and ages would be represented in a CSV as:

Table:

| Name     | Age  |
|:----------|:------|
| Harry    |  14  |
| Hermoine |  14 | 
| Dumbledore| 60  |

CSV:

Name,Age  
Harry,14  
Hermoine,14  
Dumbledore,60  

We can handle csv files similar to text files and use string/list methods to process the tabular data.

In [None]:
filename = 'csv/roster01.csv' # 10 am section
with open(filename) as roster:
    allStudents = []
    for student in roster:
        allStudents.append(student.strip().split(','))

In [None]:
allStudents

In [None]:
with open(filename) as roster:
    allStudents = [student.strip().split(',') for student in roster] # list comprehension

In [None]:
allStudents # list of lists

In [None]:
len(allStudents) # number of students in class

## Indexing Lists of Lists

We can treat a list of lists just like we would a list of any other sequence (e.g. strings).  

To index an element of an inner list (like student info), we'd need two indices:
 - first (leftmost) index identifies which list we want, and 
 - second (rightmost) index identifies which element within that list we want.


In [None]:
allStudents[0] # list of first student's info

In [None]:
allStudents[0][1] # how do I index Nicole's first name?

**List Comprehension to create a list of Last Names.** How can we create a list of all students' last names?

In [None]:
lastNames = [s[0] for s in allStudents]
lastNames

**Generating random indices.** Remember Homework 1 where you were asked to design an algorithm for generating random numbers?  Let's play a game where we generate random numbers between 0 and 35 and index our list with that number to see whose name comes up.

In [None]:
import random # import module to help generate random numbers

In [None]:
randomIndex = random.randint(0, 35)  
# generates a random integer between 0 and 35

In [None]:
allStudents[randomIndex] # great way to cold call!

In [None]:
allStudents[random.randint(0,35)][1]   
# Accessing just the name

### Exercise:  Number of Students by Year

Let's get to know our class better!  We will write a function `yearList` which takes in two arguments `rosterList` (list of lists) and `year` (int) and returns the list of students in the class with that graduating year.
 



In [None]:
def yearList(rosterList, year):
    """Takes the student info as a list of lists and a year (22-25)
    and returns a list of students graduating that year"""
    return [s[1] for s in allStudents if s[-1][:2] == str(year)]

In [None]:
yearList(allStudents, 22) # seniors?

In [None]:
yearList(allStudents, 23) # juniors?

In [None]:
yearList(allStudents, 24) # sophmores?

In [None]:
len(yearList(allStudents, 25)) # first years?

**Student Fun Facts.** Who has the most number of vowels in their name? To be continued! 

## Lab 4 info

In [None]:
filename = 'csv/coffee.csv' # types of coffee
with open(filename) as coffeeTypes:
    allCoffee = []
    for coffee in coffeeTypes:
        allCoffee.append(coffee.strip().split(','))
allCoffee

In [None]:
allCoffee[0] # access first "inner" list

In [None]:
allCoffee[1] # access second inner list 

In [None]:
allCoffee[0][1] # access second element in first inner list

In [None]:
# access second character of second element of first inner list 
allCoffee[0][1][1] 

In [None]:
# create list of only last elements of inner lists
lastCoffee = [coffee[-1] for coffee in allCoffee] 

In [None]:
lastCoffee