# Reading and Writing Files in Python

**Week01, Section 03**

ISM6564 Fall 2023

&copy; 2023 Dr. Tim Smith

-----

In this tutorial, we will learn the various ways in which we can read and write to a file.

**Section Objectives:**

* Open a file for reading as text
* Open a file for writing as binary
* Open a file for reading and writing as text
* Open a file for reading and writing as binary
* Seeking in a file
* Reading a file line by line
* Reading a file character by character
* Reading a file all at once


### Reading files (Sourcing & Loading Corpus)

> _**NOTE: Install vscode plugin go-to-character position.**
> You can use the shortcut ctrl-g (windows) or cmd-g (Mac) to jump to a line and row number.
> Sometimes when you attempt to read a file, you can get an error message about the encoding. This 
> error message will indicate the position of the character that is causing the problem. Ctrl-G
> will not work to move to a specific character position. Instead, you can install the vscode 
> exenston called "go to character position." Once installed, you can use the shortcut 
> ctrl-k ctrl-g (one after the other) to move to a specific character position._

In [5]:
# open ./data/MLK_I_Have_a_Dream.txt in read mode

fp = open('./data/MLK.txt', 'r', encoding='windows-1252')

In [6]:
#Check to see if the file was read without error: Output will either be True of False

print(fp.readable()) 

True


In [7]:
# get the size of the file

import sys                  # import the sys module (this is short for system; and it is a module that is part of the Python standard library)
print(sys.getsizeof(fp))    # print the size of the file

208


In [8]:
# Read all the data at once, and load it into memory

text = fp.read() # note that this will generate an error. 

> **Question: what's wrong? how do we fix it?**
>
> Hint: This would seem to be an encoding/decoding issue. 


In [9]:
# Read first 100 characters

print(text[:100])   # since we have read the entire file into the text variable, we can use the slice operator to print the first 100 characters

print(fp.read(100)) # we can also use the read() method to read the first 100 characters

I am happy to join with you today in what will go down in history as the greatest demonstration for 



> **Question: Why don't we see any output from the print(fp.read()) statement above?**

In [10]:
# As we read from a file, we use a file handle. This file handle is a data structure that maintains a cursor. 
# A cursor is where in the file we are at any given time.

# tell() is a method privded by a file handle object that provides current location of text file pointer
fp.tell()

9117

> **Question: Why is the current position 9117? What's the significance of this number?**

In [11]:
# We can position the cursor to wherever we'd like 

# we can move the cursor to any location in the file
fp.seek(150)
print(fp.tell())
print(fp.read(10))

# if you try to move the cursor to a location that doesn't exist, you'll won't get an error
# but you will not read anything there
fp.seek(1000000)
print(fp.tell())
print(fp.read(10))

150
e years ag
1000000



Other ways to read data from a file

In [12]:
# we can read one line at a time

fp.seek(0)              # go to the beginning of the file
print(fp.readline())    # read one line
print(fp.tell())        # get the current position

I am happy to join with you today in what will go down in history as the greatest demonstration for freedom in the history of our nation.

139


In [13]:
# we can read all remaining lines at once

fp.seek(0)              # reset the file pointer to the beginning of the file
print(fp.readlines())   # readlines returns a list of strings

['I am happy to join with you today in what will go down in history as the greatest demonstration for freedom in the history of our nation.\n', '\n', 'Five score years ago, a great American, in whose symbolic shadow we stand today, signed the Emancipation Proclamation. This momentous decree came as a great beacon light of hope to millions of Negro slaves who had been seared in the flames of withering injustice. It came as a joyous daybreak to end the long night of their captivity.\n', '\n', 'But one hundred years later, the Negro still is not free. One hundred years later, the life of the Negro is still sadly crippled by the manacles of segregation and the chains of discrimination. One hundred years later, the Negro lives on a lonely island of poverty in the midst of a vast ocean of material prosperity. One hundred years later, the Negro is still languishing in the corners of American society and finds himself an exile in his own land. So we have come here today to dramatize a shameful

In [14]:
# we can read one line at a time in a loop

fp.seek(0)                                      # go back to the beginning of the file
for count, line in enumerate(fp.readlines()):   # enumerate() gives us a counter and an iterable
    print(line)                                 # print the line from fp.readline()
    if count > 5:                               # stop after 5 lines (just so we don't print the whole thing to the screen)
        break


I am happy to join with you today in what will go down in history as the greatest demonstration for freedom in the history of our nation.



Five score years ago, a great American, in whose symbolic shadow we stand today, signed the Emancipation Proclamation. This momentous decree came as a great beacon light of hope to millions of Negro slaves who had been seared in the flames of withering injustice. It came as a joyous daybreak to end the long night of their captivity.



But one hundred years later, the Negro still is not free. One hundred years later, the life of the Negro is still sadly crippled by the manacles of segregation and the chains of discrimination. One hundred years later, the Negro lives on a lonely island of poverty in the midst of a vast ocean of material prosperity. One hundred years later, the Negro is still languishing in the corners of American society and finds himself an exile in his own land. So we have come here today to dramatize a shameful condition.



In 

> **Question: Why do we seem to have extra blank lines in our text?**

In [15]:
# Close your file pointer

fp.close()  # close file... be sure to remember this!

> ### The 'with' statement in python
>
> The `with` statement is a context manager that ensures that the file is closed when the block inside the `with` statement is exited. This is helpful because it ensures that you don't accidentally leave a file open. It also allows you to use the file object only inside the indented block. Once you exit the indented block, the file is closed automatically.
>
> ```python
> with open('filename.txt', 'r') as f:
>     # use the file object f
>     # inside this indented block
> # outside this indented block
> # the file is closed automatically
> ```

## Writing Files

* opening a file for writing with 'w' mode will overwrite the file if it already exists
* opening a file for writing with 'a' mode will append to the file if it already exists
* opening a file for writing with 'x' mode will create a new file if it does not already exist
* opening a file for reading with the 'r+' mode will allow you to read and write to the file
* when writing to a file you cannot write integers or floats, you must convert them to strings first
* when writing to a file you must add the newline character '\n' to the end of each line
* when writing to a file you must close the file when you are done writing to it
  *    use 'with' statement as default approach - then you do not need to remember to close the file
* when opening a file for writing you cannot use the 'read', 'readline' or 'readlines' methods, you must use the 'write' method
* when opening a file for reading (using the 'r' mode) you cannot use the 'write' method, you must use the 'read', 'readline' or 'readlines' methods
* the following code demonstrates the behavior of using seek when writing to a file

In [16]:
# you can use seek() to move the cursor to a specific location when writing a file

with open('./data/write_test1.txt', 'w') as f:
    f.write('Hello World!')
    f.seek(0)
    f.write('J')

In [17]:
with open('./data/write_test2.txt', 'w') as f:
    f.write('Hello World!')
    f.seek(100)             # this will move the cursere well past the end of the file
    f.write('Hello World!') 
    
# though this won't generate an error, many text viewers will not be able to view the file, or, will only show the first hello world.
# For instance, try using vscode to view the write_test2.txt file. 
#   If you get an error, is should allow you to choose to open it with a hex editor - do that, and you will see the binary contents and that the second hello world is in fact in the file. 
# Now, try double clicking on the file in your file explorer (windows) or finder (macos)
#   