# Lecture 10: Files & List Comprehensions

### Jeannie Albrecht and Shikha Singh

In the last lecture, we introduced file reading using the `with as` block.  Today, we will focus on reading CSV files, storing the data as list of lists, and analyzing it to compute important properties.

We wrote a few helper functions last few lectures, which are now in a module `sequenceTools`

* isVowel
* countVowels
* wordStartEnd
* palindromes

We will use them as we work through the examples.

In [None]:
from sequenceTools import *

### CSV Files

A CSV (Comma Separated Values) file is a type of plain text file that stores `tabular` data.  Each row of a table is a line in the text file, with each column on the row separated by commas.  This format is the most common import and export format for spreadsheets and databases.  


For example a simple table such as the following with columns names and ages would be represented in a CSV as:

Table:

| Name     | Age  |
|:----------|:------|
| Harry    |  14  |
| Hermoine |  14 | 
| Dumbledore| 60  |

CSV:

Name,Age  
Harry,14  
Hermoine,14  
Dumbledore,60  

We can handle csv files similar to text files and use string/list methods to process the tabular data.

In [None]:
filename = 'csv/roster02.csv' # 9 am section
allStudents = []
with open(filename) as roster:
    for student in roster:
        studentInfo = student.strip().split(',')
        allStudents.append(studentInfo)
allStudents

In [None]:
filename = 'csv/roster01.csv' # 10 am section
allStudents = []
with open(filename) as roster:
    for student in roster:
        studentInfo = student.strip().split(',')
        allStudents.append(studentInfo)
allStudents

In [None]:
# list comprehension version
with open(filename) as roster:
    allStudents = [line.strip().split(',') for line in roster]

In [None]:
allStudents # list of lists

In [None]:
size = len(allStudents) # number of students in class
size

## Indexing Lists of Lists

We can treat list of lists just like we would list of any other sequence (e.g. strings).  

To index an element of an inner list (student info), we'd need two indices:
 - first index identifies which list we want, and 
 - second index identifies which element within that list.

**Generating random indices.** We will generate random numbers between `0` and `size` and index our list with that number to see whose name comes up.

In [None]:
import random # import module to help generate random numbers
randNum = random.randint(0, size)  
# generates a random integer between 0 and size

allStudents[randNum]

In [None]:
allStudents[randNum][1] # how do I access the first element of the list at random index?

In [None]:
allStudents[random.randint(0,size)][1]   
# Accessing just a random first name, returns a string

**List Comprehension to create a list of Last Names.** How can we create a list of all students' last names?

In [None]:
lastNames = [s[0] for s in allStudents]
#lastNames

In [None]:
# List comprehension to generate a list of first names (without middle initial)
firstNames = [s[1].split()[0] for s in allStudents]
firstNames

### Exercise:  Number of Students by Year

Let's get to know our class better!  We will write a function `yearList` which takes in two arguments `rosterList` (list of lists) and `year` (int) and returns the list of students in the class with that graduating year.
 



In [None]:
def yearList(rosterList, year):
    """Takes the student info as a list of lists and a year (22-25)
    and returns a list of students graduating that year"""
    
    pass

In [None]:
def yearList(rosterList, year):
    """Takes the student info as a list of lists and a year (22-25)
    and returns a list of students graduating that year"""
    
    return [s[1] for s in allStudents if s[-1][:2] == str(year)]

In [None]:
yearList(allStudents, 22) # seniors?

In [None]:
yearList(allStudents, 23) # juniors?

In [None]:
yearList(allStudents, 24) # sophmores?

In [None]:
len(yearList(allStudents, 25)) # first years?

## Exercise:  Most Vowels 

**Student Fun Facts.** Which student in the class has the most number of vowels in their name?!  

In [None]:
def mostVowels(wordList):
    '''Takes a list of strings wordList and returns a list
    of strings in wordList that contain the most number of vowels'''
    
    pass

In [None]:
def mostVowels(wordList):
    '''Takes a list of strings wordList and returns a list
    of strings from wordList that contain the most # vowels'''
    
    maxSoFar = 0 # initialize counter
    result = []
    for word in wordList:
        count = countVowels(word)
        if count > maxSoFar:
            # update: found a better word
            maxSoFar = count
            result = [word] 
        # why do we need this?
        elif count == maxSoFar:  
            result.append(word)
    return result

In [None]:
mostVowels(firstNames)  # which student has most vowels in their name?

### Most Vowels in Words in Book

We can use our helper function to find out which word(s) have the most number of vowels in other word lists, too. 

In [None]:
bookWords = []
with open('textfiles/prideandprejudice.txt') as book:  
    for line in book:
        bookWords.extend(line.strip().split())

mostVowels(bookWords)

## Writing to Files

We can write all the results that we are computing into a file (a persistent structure).  To open a file for writing, we use `open` with the mode 'w'. 

The following code will create a new file named `studentFacts.txt` in the current working directory and write in it results of our function calls.

In [None]:
fYears = len(yearList(allStudents, 25))
sophYears = len(yearList(allStudents, 24))
jYears = len(yearList(allStudents, 23))
sYears = len(yearList(allStudents, 22))
mostVowelNames = ', '.join(mostVowels(firstNames))
with open('studentFacts.txt', 'w') as sFile:
    sFile.write('Fun facts about CS134 students:\n')# need newlines
    sFile.write('Students with most vowels in their name: {}.\n'.format(mostVowelNames))
    sFile.write('No. of first years in CS134: {}.\n'.format(fYears))
    sFile.write('No. of sophmores in CS134: {}.\n'.format(sophYears))
    sFile.write('No. of juniors in CS134: {}\n'.format(jYears))
    sFile.write('No. of seniors in CS134: {}\n'.format(sYears))

We can use `ls -l` to see that a new file `studentFacts.txt` has been created:

In [None]:
ls -l

Use the OS command `more` to view the contents of the file:

In [None]:
cat studentFacts.txt

## Appending to Files

If a file already has something in it, opening it in `w` mode again will erase all its past contents.  If we need to append something to a file, we open it in append `a` model. '

For example, let us append a sentence to `studentFacts.txt`.

In [None]:
with open('studentFacts.txt', 'a') as sFile:
    sFile.write('Goodbye.\n')

In [None]:
cat studentFacts.txt 

## List Methods:  Do not change the List

We have seen several list methods already.  

Here we summarize the list methods that do not modify the list, and others that do modify the list they are called on.

Useful methods that **do not modify the list** they are called on:
   * `.count()`
   * `.index()`
   
The descriptions of these are in the lecture slides.  Examples below.
   

In [None]:
myList = list("Hello World!")

myList.index('l') # gives first index 

In [None]:
myList.index('z')  # gives error if item not present

In [None]:
newList = ['a', 'a', 'a', 'a', 'b', 'b', 'c', 'c']
newList.count('a')

In [None]:
newList.count('z')

## List Methods:  Modify the List

Unlike integers, strings, floats, which are immutable, lists are a mutable objects and can be changed in place. 

This has several implications which we will discuss in this and coming lectures.

Useful methods that **do modify the list** they are called on:
   * `.append()`
   * `.extend()`
   * `.insert()`
   * `.remove()`
   * `.pop()`   
   
Other ways to modify a list in place:
  * direct assignment to a list element
  * sorting a list in place using `.sort()`

Let us work through these with examples. 

In [None]:
myList = [1, 2, 3, 4]  # fresh assignment: creates a new list wiht the name myList

In [None]:
myList[1] = 7   # changing the value by direct assignment

In [None]:
myList.append(5)  # appending an item at the end
myList

In [None]:
myList.extend([6, 8])  # extend method lets you append multiple items

myList 

In [None]:
myList

In [None]:
myList.pop(3)  # removes the item at index 3 and returns it

In [None]:
myList

In [None]:
myList.pop() # remove the last item and returns it

In [None]:
myList

In [None]:
myList.insert(0, 11)  # insert 11 at index 0, shift everything over

In [None]:
myList

In [None]:
myList.remove(5)   # remove(item) removes the item from the list

In [None]:
myList

In [None]:
myList.remove(13) # gives a value error

In [None]:
myList.sort()
myList

## Sort vs Sorted


* `.sort()` method is only for lists and sorts by mutating the list in place 
* Python provides a built in function `sorted` that can be used to sort any sequence (strings, lists, tuples).  It returns a new sorted sequence, and does NOT modify the original sequence

In [None]:
list1 = [6, 3, 4];  list2 = [6, 3, 4]

In [None]:
list1.sort() # sort by mutating list1

In [None]:
sorted(list2) # returns a new sorted list

In [None]:
print(list1, list2)

## Aside: Sorting Strings

Can also sort strings (alphabetically) using `sorted` function.  Notice that the function still returns a `list`, not a `string`.


In [None]:
sorted('shikha')

In [None]:
sorted('jeannie')

In [None]:
sorted('Hello World')

In [None]:
sorted('aaAAbbBBccCCddDD')

### Aside: Sorted by ASCII values

Notice that capital letters come before lower case in default sorting.  Similarly, special characters come before either.  These ordering are decided by the ASCII values of the symbols.

The built-in functions `ord` and `chr` let us access the ASCII value of characters and vice versa.

In [None]:
ord('A')

In [None]:
ord('Z')

In [None]:
ord('a')

In [None]:
ord('z')

In [None]:
chr(111)

In [None]:
chr(33)