Reading from and writing to files is typically performed with the <b> pandas </b> module in python. Regardless, it is still very important to understand the basics of file reading and writing (which is very easy in python).

In [19]:
path='example/lesson3textfile.txt'
f = open(path)

By default, the file is in read-only mode. We can treat f like an iterator with the lines being the objects to iterate over.

In [20]:
for line in f:
    print('('+line+')')
    pass

(These violent delights have violent ends
)
(And in their triumph die, like fire and powder
)
(Which, as they kiss, consume.)


Lines come out of text files with <b> end-of-line (EOL) </b> markers intact, and we often need to strip these blank markers. Notice in the example above that the right bracket is placed on the line below. We can remove these end of line markers like such:

In [23]:
lines = [x.rstrip() for x in open(path)]
lines

['These violent delights have violent ends',
 'And in their triumph die, like fire and powder',
 'Which, as they kiss, consume.']

It is very important that you close a file when you are finished working with it. This releases the resources back to the operating system.

In [24]:
f.close()

The <b>with</b> statement takes away having top remember to write f.close() as it automatically closes the file when after the with block is executed.

In [26]:
with open(path) as f:
    lines = [x.rstrip() for x in open(path)]

If we had typed f = open(path, 'w') then a new file would have been created in place of the old one (thus overwriting the old file). One thus needs to be very careful when using the write command 'w.'

The most common methods for readable files are <b>read(i)</b> which advances the files handle position by i bytes, <b>tell()</b> which tells you your current position in the file, and <b>seek(i)</b> which changes the file position to the ith byte in the file. 

In [28]:
f=open(path)
f.read(5)

'These'

In [29]:
f.seek(11)
f.tell()

11

In [30]:
f.read(20)

'nt delights have vio'

In [31]:
f.close()

# Bytes and Unicode with Files

There are fancy ways to open files in binary mode (that express symbols like ç, √, Ω, ...) in the common UTF-8 characters. To do this simply open the file as f = open(path, 'rb'). We contrast binary with the standard way of opening a file below:

In [8]:
f=open('example/lesson3textfile2.txt')
for line in f:
    print(line)

The square root symbol: √ 

The symbol Omega: Ω


In [9]:
f=open('example/lesson3textfile2.txt', 'rb')
for line in f:
    print(line)

b'The square root symbol: \xe2\x88\x9a \n'
b'The symbol Omega: \xce\xa9'


The 'b' at the beginning of the sentence lets one know that we are reading in binary. Here we can see the \n marker at the end of the sentence (the end of line marker). The fancy symbols are defined using the UTF-8 characters.

We can convert these binary phrases to utf8 sentences using the following methods.

In [20]:
f=open('example/lesson3textfile2.txt', 'rb')
line = f.read(40)
line

b'The square root symbol: \xe2\x88\x9a \nThe symbol '

In [21]:
line.decode('utf8')

'The square root symbol: √ \nThe symbol '

Our line is now in the standard utf8 format. Note that the end of line tag "\n" still remains.

# Writing to Files

We can also write text to new files. To do this, we use file's <b> write </b> or <b> writelines </b> method. Lets copy the text from our shakespeare passage to a new file.

In [12]:
with open('temp.txt', 'w') as f:
    f.writelines(x for x in open('example/lesson3textfile.txt') if len(x)>1)
    
with open('temp.txt') as f:
    lines = f.readlines()
    
lines

['These violent delights have violent ends\n',
 'And in their triumph die, like fire and powder\n',
 'Which, as they kiss, consume.']

Firstly, we open (and overwrite any other file) 'temp.txt' and assign this to the variable 'f'. Then we use the files writelines method to write a number of lines to the file 'temp.txt' from the file 'lesson3textfile.txt'.

# Bytes and Unicode with Files