<h2> Working with Files </h2>

<h3>Openning a File</h3>

For a complete understanding of accessing files, please follow the textbook and textbook slide. In this note, I am only introducing working with plain text files in Python.

To begin working with a file, we need to use the <b>open()</b> function to create a file object first. Note that this is true even if you want to write to the file - you can open an empty file in such cases. The syntax is

<b>
file_variable = open(filename, mode)
</b>

- file_variable is the name of the variable that will reference your file object.
- filename is a string specifying the name of the file
- mode is a string specifying how you will use this file in your program:
    - "r" -- open a file for reading only (input) - the file CANNOT be changed to written to
    - "w" -- open a file for writing (output) - if the file already exists, erase its contents.  If it does not exist, create it (this is used to create a new file as stated above
    - "a" -- open a file to be written to the file will be appended to its end

For example (you need to have file1.txt in the same folder as this code file):

In [8]:
#open file1.txt to read
file_1 = open('file1.txt','r')

#note that openning a file will not yield any outputs. If you want to see the result, you need to print the file object
print(file_1)

<_io.TextIOWrapper name='file1.txt' mode='r' encoding='cp1252'>


If the file is in a different folder than the program, you need to include the full path. If you are using Windows, note that all the slashes become double slashes. Below is an example of reading a file from a location in <b>my</b> computer

In [10]:
file_1 = open('C:\\Users\\linhl\\Desktop\\IT5413\\Files\\file1.txt', 'r')

Similarly, if you want to write, the mode is 'w'. Note that you don't need to have a file with the specified name existed in the path; the file will be automatically created if it is not there already. A file name without any path results in the file being created in the same folder as your program

In [23]:
#you can check the program folder after running this line
#you should see file2.txt appear in the folder
file_2 = open('file2.txt','w')

In [12]:
file_3 = open('file1.txt','a')

<h3> Writing to a File</h3>

To write to a file, we use the <b>write()</b> method from the file object. The syntax is

<b>file_variable.write(string)</b>

Note that number <b>must</b> be converted to string with the <b>str()</b> function before being able to be written to a file. Also, you <b>must</b> close the file with the <b>close()</b> method for the changes to take effects.

For example:

In [18]:
file_2 = open('file2.txt','w')
file_2.write('Hello World')
file_2.close()

Note that multiple calls to write() will not separate the strings into different lines. In fact, they will not be separated at all.

In [26]:
file_2 = open('file2.txt','w')
file_2.write('Hello World')
file_2.write('Hello World')
file_2.write('Hello World')
file_2.close()

If you want to write multilple lines, you can use the triple quotes """..."""

In [None]:
file_2 = open('file2.txt','w')
file_2.write("""Hello World!
I am learning programming
And I do not like it""")
file_2.close()

Or we can import os.linesep to format the string

In [25]:
from os import linesep

#to_file is a string that stores everything I want to write to a file
to_file = 'Hello world' + linesep
to_file += 'I am learning programming' + linesep
to_file += 'I am at the file module' + linesep
to_file += 'After we worked with classes and objects last week'

file_2 = open('file2.txt','w')
file_2.write(to_file)
file_2.close()

We can surely add more logic to the writing, for example, if we have a list, we can use a loop to write all items into a file

In [29]:
name_list = ['Alice','Bob','Carol','Daniel','Emma','Fiona','George','Helena']

#we need to open the file before the loop
file_2 = open('file2.txt','w')

#now add a loop to write each item to the file
for name in name_list:
    file_2.write(name + linesep)
    
file_2.close()

<h3>Appending to Files</h3>

As you can see, everytime we open file2.txt to write something with write(), the old content got overwritten. If you want to <b>add</b> instead of overwrite, you need to use appending mode 'a' when open the file. For example

In [30]:
more_names = ['Ian','Jacob','Kaitlyn','Lee','Mark','Nora']

#we need to open the file before the loop, now in 'a' mode
file_2 = open('file2.txt','a')

for name in more_names:
    file_2.write(name + linesep)
    
file_2.close()

<h3>Reading from Files</h3>

Besides writing, reading from files is surely just as important. There are two methods from the file object that you can use:

- read() -- read the whole file 
- readline() -- read the current line and move on to the next line

For example

In [32]:
file_1 = open('file1.txt','r')

from_file_1 = file_1.read()

#we still need to close the file after using
file_1.close()

#what is in the file?
print(from_file_1)

hello
this
is
a
file
with
some
strings


In general, we prefer reading a file line-by-line, since data is usually stored in a format where different lines represent different instances. In such cases, we use readline()

In [33]:
file_1 = open('file1.txt','r')

#read first line
line_1 = file_1.readline()
print(line_1)

#read second line
line_2 = file_1.readline()
print(line_2)

#read third line
line_3 = file_1.readline()
print(line_3)

file_1.close()

hello

this

is



You can see, the lines got a new line character added after being read in. We can remove them with the rstrip() method from string, for example

In [39]:
file_1 = open('file1.txt','r')

#read first line
line_1 = file_1.readline()
print(line_1.rstrip(linesep))

#read second line
line_2 = file_1.readline()
print(line_2.rstrip(linesep))

#read third line
line_3 = file_1.readline()
print(line_3.rstrip(linesep))

file_1.close()

hello
this
is


Data files are usually longer than three lines, and we won't be able to manually read them line-by-line like above. Instead, we can write a loop. The file end when the current line is read as an empty string ''. You can guess, we will use a while loop in this case

In [38]:
file_1 = open('file1.txt','r')

#initialize a string variable different from '' 
#so we can use it in the while condition
line = ' '

while (line != ''):
    #note that readline() automatically move the "pointer" to the next line
    #so we don't have to manually do that
    line = file_1.readline()
    print(line.rstrip(linesep))
    
file_1.close()

hello
this
is
a
file
with
some
strings



Reading from file then print the lines to console is not too useful. Fortunately, now that we know about list, we can store our data in a list, each item represent a line from the file

In [40]:
#empty list to stores the lines in the file
lines = []

file_1 = open('file1.txt','r')

line = ' '

while (line != ''):
    line = file_1.readline()
    lines.append(line.rstrip(linesep))
    
file_1.close()

In [42]:
#the lines list now has all the items from the file
lines

['hello', 'this', 'is', 'a', 'file', 'with', 'some', 'strings', '']

Alternatively, we can use a for loop, as the file object is also an iterable (list-like) object. Using a for loop is a bit simpler:

In [43]:
lines = []

file_1 = open('file1.txt','r')

#in this case, line is the current lines in the file
#and we don't need to use readline() anymore
for line in file_1:
    lines.append(line.rstrip(linesep))
    
file_1.close()

In [44]:
lines

['hello', 'this', 'is', 'a', 'file', 'with', 'some', 'strings']