# Accessing files in Python

Accessing files in Python is relatively easy. Here is how to do it. 

## Getting a sample file

As an example of a file to read, we will use a relatively small, unannotated corpus from Project Gutenberg, http://www.gutenberg.org/wiki/Main_Page
Project Gutenberg is an online collection of texts whose copyright has expired. It contains texts in many languages.

We will work with the First Project Gutenberg Collection of Edgar Allan Poe at http://www.gutenberg.org/etext/1062.
Please download the file, making sure to choose a Plain Text version. The file will be called pg1062.txt. **Put it in the same directory where you have this notebook.** This avoids problems with figuring out the directory structure on your computer.

## Reading a file

You should now have the file pg1062.txt in the same directory as this notebook. Accessing this file in Python is easy. Here is some Python code that will print the first line of the file to your screen:

In [1]:
f = open("pg1062.txt")
textline = f.readline()
print(textline)
f.close()

The Project Gutenberg EBook of First Gutenberg Collection of Edgar Allan



Any time you read a file, the lines of Python code you write for that will be the almost the same:

  * "open” takes as its argument a file name, which may include directory information. This returns something like a "bookmark" into the file, which will be at the beginning of the file. 
  * “open” returns a file object. This, then, can be used to access the file contents. For example, it has a method called ``readline()`` which reads the next line of the text. If you call it repeatedly, it keeps reading the next line, and the next after that: The "bookmark" object that is ``f`` keeps advancing through the file.
  * After reading the file, you close the file object. This is not strictly necessary if you are only reading the file -- if you are writing, it is necessary -- but it is good practice.
  
Let's read a few more lines:

In [2]:
f = open("pg1062.txt")
textline = f.readline()
print(textline)
line2 = f.readline()
print(line2)
line3 = f.readline()
print(line3)
f.close()


The Project Gutenberg EBook of First Gutenberg Collection of Edgar Allan

Poe, by Edgar Allan Poe





You can also iterate through the contents of the whole file. Warning: This will scroll through the whole e-book at once.

In [7]:
# reading the whole file in a loop:
f = open("pg1062.txt")
for line in f:
    print(line)
f.close()

The Project Gutenberg EBook of First Gutenberg Collection of Edgar Allan

Poe, by Edgar Allan Poe



This eBook is for the use of anyone anywhere at no cost and with

almost no restrictions whatsoever.  You may copy it, give it away or

re-use it under the terms of the Project Gutenberg License included

with this eBook or online at www.gutenberg.org





Title: First Gutenberg Collection of Edgar Allan Poe



Author: Edgar Allan Poe



Posting Date: June 6, 2010 [EBook #1062]

Release Date: October, 1997



Language: English





*** START OF THIS PROJECT GUTENBERG EBOOK GUTENBERG COLLECTION--E. A. POE ***









Produced by Levent Kurnaz and Jose Menendez















This is our second experimental effort at cataloguing multiple items in

a single file.  In the first instance we use the same index number for

each item, and just used multiple entries for that file in the index.

In this, the second instance, we have used separate index numbers for

the collection and for all the

If you are certain that the file does not contain a whole lot of data, you can also read the whole file into a single string variable, like this:

In [4]:
f = open("pg1062.txt")
wholetext = f.read()
len(wholetext)

54747

# Writing files

Here is how you write to a file in Python. Again, we make a file object with “open”. Only this time we give two arguments.
The second one is “w” for “write”. (There is also "a" for append. While "w" overwrites whatever was in the file before, "a" attaches new material to the end.)

 So we have to decide at the time when we open a file whether we want to read it or write to it. You then write into the file using the "print" command, but with the additional parameter ``file = f``.

In [5]:
# This following command makes a file named 'myoutfile.txt'
# in the directory where the jupyter notebook is.
# I chose to name it '.txt', I could also have chosen
# another extension, but .txt is good because it's plain text.
f = open("myoutfile.txt", "w")

# We use the "print" command to write to a file, but with the additional 
# parameter file=f.
# Note that f is the variable in which we put the file object.
# If I had named the file object "bob", it would have been "print(..., file = bob)"

print("Привет", file=f)
print("Меня зовут Элора.", file = f)
print("Как вас зовут?", file= f)

# And close the file.
f.close()

In [6]:
f = open("myoutfile.txt", "r")

f.read()


'Привет\nМеня зовут Элора.\nКак вас зовут?\n'

Here the “close()” is essential! Your operating system (and, by extension, Python) writes data to files in larger chunks. That is, it may wait with actually writing until you have issued many “print” commands. Only when you close the file does it make sure that all remaining data is written. This is a source for nasty errors if you write a file, don't close it, and subsequently try to read its contents – they may just not be there yet unless you have closed the file.

