# Text File Input and Output

Term 1 2019 - Instructor: Teerapong Leelanupab

Teaching Assistant: Suttida Satjasunsern

***

### 1. Writing Text Files

Files are special types of variables in Python. We can open a file for reading, writing or appending using the built-in *open()* function. The first parameter is the location of the file and the second parameter is the action we want to take: 
- "w" = write
- "r" = read
- "a" = append.

So to open a new file for writing, we call the *open()* function, supply a path for the file and specify the "w" action to write.  Note: if the file already exists, it will be completely overwritten!

In [1]:
fout = open("data.txt","w")
fout

<_io.TextIOWrapper name='data.txt' mode='w' encoding='cp1252'>

Since we just specified "data.txt" rather than a complete path, the file will be written to the same directory as our IPython Notebook.

After opening a file to write, we actually write data using the *write()* function with string formatting. Each call will append more text to the file. Note: new line characters are not automatically added.

In [3]:
for i in range(1,6):
    fout.write( "Current value of i is %d\n" % i )  # note, we add a newline with \n at the end of each line

<_io.TextIOWrapper name='data.txt' mode='w' encoding='cp1252'>

When we are finished, we need to close the file.

In [4]:
fout.close()
print(fout)

<_io.TextIOWrapper name='data.txt' mode='w' encoding='cp1252'>


Once a file is closed, we cannot write any more data to it. Trying to do so will give an error message.

In [5]:
fout.write("More data!")

ValueError: I/O operation on closed file.

***
### 2. Reading Text Files

To open a new file for reading, we use the *open()* function again. Note: if the file does not exist, we will get an error message.

In [31]:
fin = open("data.txt","r")  # action "r" means open file to read

After opening a file to read, you can use several functions to access the data. The function *read()* gets the full contents of the file, *readline()* gets a full line of text, and *readlines()* loads all of the text from the file into a list with one value per line.

In [32]:
lines = fin.readlines()
for l in lines:
    print( l.strip() )  # note that we usually need to remove the newline characters from the end of strings

Current value of i is 1
Current value of i is 2
Current value of i is 3
Current value of i is 4
Current value of i is 5
Current value of i is 1
Current value of i is 2
Current value of i is 3
Current value of i is 4
Current value of i is 5


In [44]:
lines.strip()

AttributeError: 'list' object has no attribute 'strip'

Again we close the file when we are finished - this means no more read functions can be called on the file.

In [40]:
fin.close()

In [41]:
#test
step = [(17211426, "Stephanie Gale"), (16212133,"Jill Doyle"), (13388136,"Pat Gilbert"), (17211824,"Daryl Bishop"), (16216364,"Carlos Alvarado"), (17211833,"Alison Rogers") ,(17212834,"Neil Smith"),
(13312141,"Sandra Wright")]
step

[(17211426, 'Stephanie Gale'),
 (16212133, 'Jill Doyle'),
 (13388136, 'Pat Gilbert'),
 (17211824, 'Daryl Bishop'),
 (16216364, 'Carlos Alvarado'),
 (17211833, 'Alison Rogers'),
 (17212834, 'Neil Smith'),
 (13312141, 'Sandra Wright')]

In [43]:
for i in step:
    i = i.strip()

AttributeError: 'tuple' object has no attribute 'strip'

***
### 3. Comma-Separated Files

Frequently, simple datasets are stored as *comma-separated value* (CSV) files. In a CSV file, tabular data is stored as plain text. Each line of the file is a record, and each record consists of one or more fields, separated by commas.

We can manually create a CSV file using the open() and write() functions. 

In [4]:
fout = open("simple.csv","w")
# create the records
for row in range(5):
    # start the record with an identifier
    fout.write("record_%d" % (row+1) )
    # create the fields for each record
    for col in range(4):
        value = (row+1)*(col+1)     # just create some dummy values
        fout.write(",%d" % value )  # notice the comma separator
    # move on to a new line in the file
    fout.write("\n")
# finished, so close the file
fout.close()    

<_io.TextIOWrapper name='simple.csv' mode='w' encoding='cp1252'>
<_io.TextIOWrapper name='simple.csv' mode='w' encoding='cp1252'>
<_io.TextIOWrapper name='simple.csv' mode='w' encoding='cp1252'>
<_io.TextIOWrapper name='simple.csv' mode='w' encoding='cp1252'>
<_io.TextIOWrapper name='simple.csv' mode='w' encoding='cp1252'>
<_io.TextIOWrapper name='simple.csv' mode='w' encoding='cp1252'>
<_io.TextIOWrapper name='simple.csv' mode='w' encoding='cp1252'>
<_io.TextIOWrapper name='simple.csv' mode='w' encoding='cp1252'>
<_io.TextIOWrapper name='simple.csv' mode='w' encoding='cp1252'>
<_io.TextIOWrapper name='simple.csv' mode='w' encoding='cp1252'>


We could just read back the entire file:

In [42]:
fin = open("simple.csv","r")
print( fin.read() )
fin.close()

record_1,1,2,3,4
record_2,2,4,6,8
record_3,3,6,9,12
record_4,4,8,12,16
record_5,5,10,15,20



But more often, we will want to parse the data into numeric values, line by line:

In [9]:
fin = open("simple.csv","r")
# process the file line by line
for line in fin.readlines():
    # remove the newline character from the end
    line = line.strip()
    # split the line based on the comma separator
    parts = line.split(",")
    # extract the identifier as the first value in the list
    record_id = parts[0]
    # convert the rest to integers from strings
    values = []b
    for s in parts[1:]:
        values.append( int(s) )
    # display the record
    print( record_id, values )
# finished, so close the file
fin.close()

record_1 [1, 2, 3, 4]
record_2 [2, 4, 6, 8]
record_3 [3, 6, 9, 12]
record_4 [4, 8, 12, 16]
record_5 [5, 10, 15, 20]


Later in the module we will look at more convenient ways for working with CSV data.