Files and Printing
------------------

** See also Examples 15, 16, and 17 from Learn Python the Hard Way**

You'll often be reading data from a file, or writing the output of your python scripts back into a file. Python makes this very easy. You need to open a file in the appropriate mode, using the `open` function, then you can read or write to accomplish your task. The `open` function takes two arguments, the name of the file, and the mode. The mode is a single letter string that specifies if you're going to be reading from a file, writing to a file, or appending to the end of an existing file. The function returns a file object that performs the various tasks you'll be performing: `a_file = open(filename, mode)`. The modes are:

+ `'r'`: open a file for reading
+ `'w'`: open a file for writing. Caution: this will overwrite any previously existing file
+ `'a'`: append. Write to the end of a file. 

When reading, you typically want to iterate through the lines in a file using a for loop, as above. Some other common methods for dealing with files are: 

+ `file.read()`: read the entire contents of a file into a string
+ `file.write(some_string)`: writes to the file, note this doesn't automatically include any new lines. Also note that sometimes writes are buffered- python will wait until you have several writes pending, and perform them all at once
+ `file.close()`: close the open file. This will free up some computer resources occupied by keeping a file open.

Here is an example using files:

### Writing a file to disk

In [1]:
# Create the file temp.txt, and get it ready for writing
f = open("temp.txt", "w")
f.write("This is my first file! The end!\n")
f.write("Oh wait, I wanted to say something else.")
f.close()

In [None]:
# Let's check that we did everything as expected
# Find the file using your file explorer

### Reading a file from disk

Once we read the file, we have the lines in a big string. Let's process that big string a little bit:

In [2]:
# We now open the file for reading
f2 = open("temp.txt", "r")
# And we read the full content of the file in memory, as a big string
f2_content = f2.read()
f2.close()

In [3]:
# another and BETTER way to do the same thing
with open("temp.txt", "r") as f2:
    # And we read the full content of the file in memory, as a big string
    f2_content = f2.read()

In [4]:
# Read the file in the cell above, the content is in f2_content

# Split the content of the file using the newline character \n
lines = f2_content.split("\n")

# Iterate through the line variable (it is a list of strings)
# and then print the length of each line
for line in lines:
    print("Length of line >>", line, "<< is ", len(line))

Length of line >> This is my first file! The end! << is  31
Length of line >> Oh wait, I wanted to say something else. << is  40


### Exercise 1
The command below will create a file called `phonetest.txt`.

In [5]:
s = '''679-397-5255
2126660921
212-998-0902
888-888-2222
800-555-1211
800 555 1212
800.555.1213
(800) 555-1214
1-800-555-1215
1(800)555-1216
800-555-1212-1234
800-555-1212x1234
800-555-1212 ext. 1234
work 1-(800) 555.1212 #1234'''

# Create the file phonetest.txt, and get it ready for writing
with open("phonetest.txt", "w") as f:
    f.write(s)

Write code that:
* Reads the file `phonetest.txt`
* Write a function that takes as input a string, and removes any non-digit characters
* Print out the "clean" string, without any non-digit characters

In [9]:
# your code here
def clean(phone):
    result = ""
    digits = {"0","1","2","3","4","5","6","7","8","9"}
    for c in phone:
        if c in digits:
            result = result + c
    return result    


with open('phonetest.txt', "r") as f:
    phones = f.read()

list_of_phones = phones.split("\n")
for phone in list_of_phones:
    print(clean(phone))

6793975255
2126660921
2129980902
8888882222
8005551211
8005551212
8005551213
8005551214
18005551215
18005551216
80055512121234
80055512121234
80055512121234
180055512121234


### Exercise 2

Comma-Separated Values (CSV) file stores tabular data (numbers and text) in plain text. Each line of the file is a data record. Each record consists of one or more fields, separated by commas. The use of the comma as a field separator is the source of the name for this file format.

Write code that reads the file `baseball.csv`. Then split each line to get the invididual fields. Remember to split using comma (`,`) as the separator Go over the lines and print the fifth column (`team`).

In [11]:
# your code here
with open('baseball.csv', "r") as f:
    content = f.read()
    
teams = content.split('\n')
for team in teams:
    fields = team.split(',')
    print(fields[4])

team
CHN
BOS
NYA
MIL
NYA
SFN
ARI
LAN
ATL
NYN
TOR
TBA
HOU
ARI
ATL
MIN
HOU
LAN
SDN
CIN
OAK
BOS
SFN
NYA
NYN
CHN
BAL
BOS
CHA
TOR
BOS
LAN
SFN
MIL
SLN
CIN
TOR
SLN
TEX
ATL
DET
NYN
LAN
LAN
BOS
KCA
DET
DET
BOS
OAK
DET
NYN
LAA
NYA
NYA
PHI
PHI
NYN
SDN
COL
CLE
TEX
LAN
SFN
LAN
DET
ARI
SDN
LAN
CLE
CIN
CIN
NYN
MIL
PHI
LAN
CLE
BAL
NYN
CHN
COL
OAK
SLN
NYN
NYN
CIN
NYN
CIN
NYA
BOS
TOR
ARI
MIN
SFN
HOU
FLO
SFN
HOU
NYN
NYN


### Exercise 3

* Write a function that reads a file and returns its content as a list of strings (one string per line)
* Write a function that reads the n-th column of a CSV file and returns its contents. Reuse the function that you wrote above

In [None]:
# your code here

#### Solution for exercise 1 (with a lot of comments)

In [None]:
# this function takes as input a phone (string variable)
# and prints only its digits
def clean(phone):
    # We initialize the result variable to be empty. 
    # We will append to this variable the digit characters 
    result = ""
    # This is a set of digits (as **strings**) that will
    # allow us to filter the characters
    digits = {"0","1","2","3","4","5","6","7","8","9"}
    # We iterate over all the characters in the string "phone"
    # which is a parameter of the function clean
    for c in phone:
        # We check if the character c is a digit
        if c in digits:
            # if it is, we append it to the result
            result = result + c
    # once we are done we return a string variable with the result
    return result 

# your code here
# We open the file
with open("phonetest.txt", "r") as f:
    # We read the content using the f.read() command
    content = f.read()

# We split the file into lines
lines = content.split("\n")
# We iterate over the lines, and we clean each one of them
for line in lines:
    print(line, "==>", clean(line))