<img src="../../images/banners/python-oop.png" width="600"/>

# <img src="../../images/logos/python.png" width="23"/> Reading and Writing Files 


## Table of Contents


* [What Is a File?](#what_is_a_file?)
    * [File Paths](#file_paths)
    * [Line Endings](#line_endings)
    * [Character Encodings](#character_encodings)
* [Opening and Closing a File in Python](#opening_and_closing_a_file_in_python)
    * [Text File Types](#text_file_types)
    * [Buffered Binary File Types](#buffered_binary_file_types)
    * [Raw File Types](#raw_file_types)
* [Reading and Writing Opened Files](#reading_and_writing_opened_files)
    * [Iterating Over Each Line in the File](#iterating_over_each_line_in_the_file)
    * [Appending to a File](#appending_to_a_file)
    * [Working With Two Files at the Same Time](#working_with_two_files_at_the_same_time)
    * [Working With Bytes](#working_with_bytes)
* [Summary](#summary)

---

One of the most common tasks that you can do with Python is reading and writing files. Whether it’s writing to a simple text file, reading a complicated server log, or even analyzing raw byte data, all of these situations require reading or writing a file.

<a class="anchor" id="what_is_a_file?"></a>
## What Is a File?
Before we can go into how to work with files in Python, it’s important to understand what exactly a file is and how modern operating systems handle some of their aspects.

At its core, a file is a contiguous set of bytes [used to store data](https://en.wikipedia.org/wiki/Computer_file). This data is organized in a specific format and can be anything as simple as a text file or as complicated as a program executable. In the end, these byte files are then translated into binary 1 and 0 for easier processing by the computer.

Files on most modern file systems are composed of three main parts:

- **Header**: metadata about the contents of the file (file name, size, type, and so on)
- **Data**: contents of the file as written by the creator or editor
- **End of file (EOF)**: special character that indicates the end of the file

<img src="./images/fileformat.webp" alt="file format" style="width: 200px;" align="left"/>

What this data represents depends on the format specification used, which is typically represented by an extension. For example, a file that has an extension of `.gif` most likely conforms to the [Graphics Interchange Format](https://en.wikipedia.org/wiki/GIF) specification. There are hundreds, if not thousands, of file extensions out there. For this section, you’ll only deal with `.txt` file extensions.

<a class="anchor" id="file_paths"></a>
### File Paths
When you access a file on an operating system, a file path is required. The file path is a string that represents the location of a file. It’s broken up into three major parts:

1. **Folder Path:** the file folder location on the file system where subsequent folders are separated by a forward slash `/` (Unix) or backslash `\` (Windows)
2. **File Name:** the actual name of the file
3. **Extension:** the end of the file path pre-pended with a period (`.`) used to indicate the file type

Here’s a quick example. Let’s say you have a file located within a file structure like this:

```
/
│
├── path/
|   │
│   ├── to/
│   │   └── cats.gif
│   │
│   └── dog_breeds.txt
|
└── animals.csv
```

Let’s say you wanted to access the `cats.gif` file, and your current location was in the same folder as path. In order to access the file, you need to go through the path folder and then the to folder, finally arriving at the `cats.gif` file. The Folder Path is `path/to/`. The File Name is `cats`. The File Extension is `.gif`. So the full path is `path/to/cats.gif`.

Now let’s say that your current location or current working directory (cwd) is in the to folder of our example folder structure. Instead of referring to the `cats.gif` by the full path of `path/to/cats.gif`, the file can be simply referenced by the file name and extension `cats.gif`.

```
/
│
├── path/
|   │
|   ├── to/  ← Your current working directory (cwd) is here
|   │   └── cats.gif  ← Accessing this file
|   │
|   └── dog_breeds.txt
|
└── animals.csv
```

But what about `dog_breeds.txt`? How would you access that without using the full path? You can use the special characters double-dot (`..`) to move one directory up. This means that `../dog_breeds.txt` will reference the `dog_breeds.txt` file from the directory of to:

```
/
│
├── path/  ← Referencing this parent folder
|   │
|   ├── to/  ← Current working directory (cwd)
|   │   └── cats.gif
|   │
|   └── dog_breeds.txt  ← Accessing this file
|
└── animals.csv
```

The double-dot (`..`) can be chained together to traverse multiple directories above the current directory. For example, to access `animals.csv` from the to folder, you would use `../../animals.csv`.

<a class="anchor" id="line_endings"></a>
### Line Endings
One problem often encountered when working with file data is the representation of a new line or line ending. The line ending has its roots from back in the Morse Code era, [when a specific pro-sign was used to communicate the end of a transmission or the end of a line](https://en.wikipedia.org/wiki/Prosigns_for_Morse_code#Official_International_Morse_code_procedure_signs).

Later, this [was standardized for teleprinters](https://en.wikipedia.org/wiki/Newline#History) by both the International Organization for Standardization (ISO) and the American Standards Association (ASA). ASA standard states that line endings should use the sequence of the Carriage Return (CR or \r) and the Line Feed (LF or \n) characters (CR+LF or \r\n). The ISO standard however allowed for either the CR+LF characters or just the LF character.

[Windows uses the CR+LF characters](https://unix.stackexchange.com/a/411830) to indicate a new line, while Unix and the newer Mac versions use just the LF character. This can cause some complications when you’re processing files on an operating system that is different than the file’s source. Here’s a quick example. Let’s say that we examine the file `dog_breeds.txt` that was created on a Windows system:

```
Pug\r\n
Jack Russell Terrier\r\n
English Springer Spaniel\r\n
German Shepherd\r\n
Staffordshire Bull Terrier\r\n
Cavalier King Charles Spaniel\r\n
Golden Retriever\r\n
West Highland White Terrier\r\n
Boxer\r\n
Border Terrier\r\n
```

This same output will be interpreted on a Unix device differently:

```Pug\r
\n
Jack Russell Terrier\r
\n
English Springer Spaniel\r
\n
German Shepherd\r
\n
Staffordshire Bull Terrier\r
\n
Cavalier King Charles Spaniel\r
\n
Golden Retriever\r
\n
West Highland White Terrier\r
\n
Boxer\r
\n
Border Terrier\r
\n
```

<a class="anchor" id="character_encodings"></a>
### Character Encodings
Another common problem that you may face is the encoding of the byte data. An encoding is a translation from byte data to human readable characters. This is typically done by assigning a numerical value to represent a character. The two most common encodings are the [ASCII](https://www.ascii-code.com/) and [UNICODE](https://unicode.org/) Formats. [ASCII can only store 128 characters](https://en.wikipedia.org/wiki/ASCII), while [Unicode can contain up to 1,114,112 characters](https://en.wikipedia.org/wiki/Unicode).

ASCII is actually a subset of [Unicode](https://realpython.com/python-encodings-guide/) (UTF-8), meaning that ASCII and Unicode share the same numerical to character values. It’s important to note that parsing a file with the incorrect character encoding can lead to failures or misrepresentation of the character. For example, if a file was created using the UTF-8 encoding, and you try to parse it using the ASCII encoding, if there is a character that is outside of those 128 values, then an error will be thrown.

<a class="anchor" id="opening_and_closing_a_file_in_python"></a>
## Opening and Closing a File in Python

When you want to work with a file, the first thing to do is to open it. This is done by invoking the `open()` built-in function. `open()` has a single required argument that is the path to the file. `open()` has a single return, the file object:

In [2]:
%%writefile myfile.txt
this is line 1
this is line 2
this is line 3
this is line 4
this is line 5
this is line 6

Writing myfile.txt


In [3]:
file = open('myfile.txt')

After you open a file, the next thing to learn is how to close it.

> **Warning:** You should always make sure that an open file is properly closed.

It’s important to remember that it’s your responsibility to close the file. In most cases, upon termination of an application or script, a file will be closed eventually. However, there is no guarantee when exactly that will happen. This can lead to unwanted behavior including resource leaks. It’s also a best practice within Python (Pythonic) to make sure that your code behaves in a way that is well defined and reduces any unwanted behavior.

When you’re manipulating a file, there are two ways that you can use to ensure that a file is closed properly, even when encountering an error. The first way to close a file is to use the `try-finally` block:

In [6]:
reader = open('myfile.txt')
try:
    # Further file processing goes here
    pass
finally:
    reader.close()

> The `try-finally` block is covered in Error Handling section later.

The second way to close a file is to use the `with statement`:

In [9]:
with open('myfile.txt') as reader:
    # Further file processing goes here
    pass

The with statement automatically takes care of closing the file once it leaves the with block, even in cases of error. I highly recommend that you use the with statement as much as possible, as it allows for cleaner code and makes handling any unexpected errors easier for you.

Most likely, you’ll also want to use the second positional argument, mode. This argument is a string that contains multiple characters to represent how you want to open the file. The default and most common is `'r'`, which represents opening the file in read-only mode as a text file:

In [10]:
with open('myfile.txt', 'r') as reader:
    # Further file processing goes here
    pass

Other options for modes are [fully documented online](https://docs.python.org/3/library/functions.html#open), but the most commonly used ones are the following:

|Character | Meaning|
|:--|:--|
|`'r'` | Open for reading (default)|
|`'w'` | Open for writing, truncating (overwriting) the file first|
|`'rb'` or `'wb'` | Open in binary mode (read/write using byte data)|

Let’s talk a little about file objects. A file object is:

> "an object exposing a file-oriented API (with methods such as read() or write()) to an underlying resource." ([Source](https://docs.python.org/3/glossary.html#term-file-object))

There are three different categories of file objects:

- Text files
- Buffered binary files
- Raw binary files

Each of these file types are defined in the `io` module. 

<a class="anchor" id="text_file_types"></a>
### Text File Types

A text file is the most common file that you’ll encounter. Here are some examples of how these files are opened:

```python
open('abc.txt')

open('abc.txt', 'r')

open('abc.txt', 'w')
```

With these types of files, `open()` will return a `TextIOWrapper` file object:

In [1]:
file = open('myfile.txt')

In [2]:
type(file)

_io.TextIOWrapper

This is the default file object returned by `open()`.

<a class="anchor" id="buffered_binary_file_types"></a>
### Buffered Binary File Types

A buffered binary file type is used for reading and writing binary files. Here are some examples of how these files are opened:

In [3]:
file = open('myfile.txt', 'rb')

In [5]:
type(file)

_io.BufferedReader

In [7]:
file = open('myfile.txt', 'wb')

In [8]:
type(file)

_io.BufferedWriter

With these types of files, `open()` will return either a `BufferedReader` or `BufferedWriter` file object:

<a class="anchor" id="raw_file_types"></a>
### Raw File Types

A raw file type is:

> “generally used as a low-level building-block for binary and text streams.” ([Source](https://docs.python.org/3.7/library/io.html#raw-i-o))

It is therefore not typically used.

Here’s an example of how these files are opened:

In [9]:
open('myfile.txt', 'rb', buffering=0)

<_io.FileIO name='myfile.txt' mode='rb' closefd=True>

With these types of files, `open()` will return a `FileIO` file object:

In [11]:
file = open('myfile.txt', 'rb', buffering=0)
type(file)

_io.FileIO

<a class="anchor" id="reading_and_writing_opened_files"></a>
## Reading and Writing Opened Files

Once you’ve opened up a file, you’ll want to read or write to the file. First off, let’s cover reading a file. There are multiple methods that can be called on a file object to help you out:

|Method |What It Does|
|:--|:--|
|`.read(size=-1)` | This reads from the file based on the number of `size` bytes. If no argument is passed or `None` or `-1` is passed, then the entire file is read. |
|`.readline(size=-1)` | This reads at most size number of characters from the line. This continues to the end of the line and then wraps back around. If no argument is passed or `None` or `-1` is passed, then the entire line (or rest of the line) is read. |
|`.readlines()` | This reads the remaining lines from the file object and returns them as a list. |

Using the same `myfile.txt` file you used above, let’s go through some examples of how to use these methods. Here’s an example of how to open and read the entire file using `.read()`:

In [14]:
%%writefile myfile.txt
this is line 1
this is line 2
this is line 3
this is line 4
this is line 5
this is line 6

Overwriting myfile.txt


In [15]:
with open('myfile.txt', 'r') as reader:
    # Read & print the entire file
    print(reader.read())

this is line 1
this is line 2
this is line 3
this is line 4
this is line 5
this is line 6



Here’s an example of how to read 5 bytes of a line each time using the Python `.readline()` method:

In [17]:
with open('myfile.txt', 'r') as reader:
    # Read & print the first 5 characters of the line 5 times
    print(reader.readline(5))

    # Notice that line is greater than the 5 chars and continues
    # down the line, reading 5 chars each time until the end of the
    # line and then "wraps" around
    print(reader.readline(5))
    print(reader.readline(5))
    print(reader.readline(5))
    print(reader.readline(5))

this 
is li
ne 1

this 
is li


Here’s an example of how to read the entire file as a list using the Python `.readlines()` method:

In [18]:
f = open('myfile.txt')
f.readlines()  # Returns a list object

['this is line 1\n',
 'this is line 2\n',
 'this is line 3\n',
 'this is line 4\n',
 'this is line 5\n',
 'this is line 6\n']

The above example can also be done by using `list()` to create a list out of the file object:

In [19]:
f = open('myfile.txt')
list(f)

['this is line 1\n',
 'this is line 2\n',
 'this is line 3\n',
 'this is line 4\n',
 'this is line 5\n',
 'this is line 6\n']

<a class="anchor" id="iterating_over_each_line_in_the_file"></a>
### Iterating Over Each Line in the File

A common thing to do while reading a file is to iterate over each line. Here’s an example of how to use the Python `.readline()` method to perform that iteration:

In [20]:
with open('myfile.txt', 'r') as reader:
    # Read and print the entire file line by line
    line = reader.readline()
    while line != '':  # The EOF char is an empty string
        print(line, end='')
        line = reader.readline()

this is line 1
this is line 2
this is line 3
this is line 4
this is line 5
this is line 6


Another way you could iterate over each line in the file is to use the Python `.readlines()` method of the file object. Remember, `.readlines()` returns a list where each element in the list represents a line in the file:

In [21]:
with open('myfile.txt', 'r') as reader:
    for line in reader.readlines():
        print(line, end='')

this is line 1
this is line 2
this is line 3
this is line 4
this is line 5
this is line 6


However, the above examples can be further simplified by iterating over the file object itself:

In [23]:
with open('myfile.txt', 'r') as reader:
    # Read and print the entire file line by line
    for line in reader:
        print(line, end='')

this is line 1
this is line 2
this is line 3
this is line 4
this is line 5
this is line 6


This final approach is more Pythonic and can be quicker and more memory efficient. Therefore, it is suggested you use this instead.

> Note: Some of the above examples contain `print('some text', end='')`. The `end=''` is to prevent Python from adding an additional newline to the text that is being printed and only print what is being read from the file.

Now let’s dive into writing files. As with reading files, file objects have multiple methods that are useful for writing to a file:

|Method	|What It Does|
|:--|:--|
|`.write(string)`	|This writes the string to the file.|
|`.writelines(seq)`	|This writes the sequence to the file. No line endings are appended to each sequence item. It’s up to you to add the appropriate line ending(s).|

Here’s a quick example of using `.write()` and `.writelines()`:

In [4]:
with open('myfile.txt', 'r') as reader:
    # Note: readlines doesn't trim the line endings
    lines = reader.readlines()

with open('myfile_reversed.txt', 'w') as writer:
    # Alternatively you could use
    # writer.writelines(reversed(dog_breeds))

    # Write the dog breeds to the file in reversed order
    for line in reversed(lines):
        writer.write(line)

In [5]:
with open('myfile_reversed.txt', 'r') as reader:
    # Note: readlines doesn't trim the line endings
    lines = reader.readlines()

In [6]:
lines

['this is line 6\n',
 'this is line 5\n',
 'this is line 4\n',
 'this is line 3\n',
 'this is line 2\n',
 'this is line 1\n']

<a class="anchor" id="appending_to_a_file"></a>
### Appending to a File

Sometimes, you may want to append to a file or start writing at the end of an already populated file. This is easily done by using the `'a'` character for the mode argument:

In [7]:
with open('myfile.txt', 'a') as a_writer:
    a_writer.write('this is line 7')

When you examine `myfile.txt` again, you’ll see that the beginning of the file is unchanged and Beagle is now added to the end of the file:

In [9]:
with open('myfile.txt', 'r') as reader:
    print(reader.read())

this is line 1
this is line 2
this is line 3
this is line 4
this is line 5
this is line 6
this is line 7


<a class="anchor" id="working_with_two_files_at_the_same_time"></a>
### Working With Two Files at the Same Time

There are times when you may want to read a file and write to another file at the same time. If you use the example that was shown when you were learning how to write to a file, it can actually be combined into the following:

In [10]:
d_path = 'myfile.txt'
d_r_path = 'myfile_reversed.txt'
with open(d_path, 'r') as reader, open(d_r_path, 'w') as writer:
    dog_breeds = reader.readlines()
    writer.writelines(reversed(dog_breeds))

<a class="anchor" id="working_with_bytes"></a>
### Working With Bytes

Sometimes, you may need to work with files using byte strings. This is done by adding the `'b'` character to the mode argument. All of the same methods for the file object apply. However, each of the methods expect and return a bytes object instead:

In [28]:
with open('myfile.txt', 'rb') as reader:
    print(reader.readline())

b'this is line 1\n'


Opening a text file using the `b` flag isn’t that interesting. Let’s say we have this picture of python logo (`python.png`):

<img src="./images/python.png" alt="python" width=100 align="center" />

You can actually open that file in Python and examine the contents! Since the `.png` file format is well defined, the header of the file is 8 bytes broken up like this:

|Value	|Interpretation|
|:--|:--|
|`0x89`	|A “magic” number to indicate that this is the start of a PNG|
|`0x50 0x4E 0x47`	|PNG in ASCII|
|`0x0D 0x0A`	|A DOS style line ending `\r\n`|
|`0x1A`	|A DOS style EOF character|
|`0x0A`	|A Unix style line ending `\n`|

Sure enough, when you open the file and read these bytes individually, you can see that this is indeed a `.png` header file:

In [1]:
with open('./images/python.png', 'rb') as byte_reader:
    print(byte_reader.read(1))
    print(byte_reader.read(3))
    print(byte_reader.read(2))
    print(byte_reader.read(1))
    print(byte_reader.read(1))


b'\x89'
b'PNG'
b'\r\n'
b'\x1a'
b'\n'


<a class="anchor" id="summary"></a>
## Summary

|Character|Meaning|
|:--|:--|
|`r`|open for reading (default)|
|`w`|open for writing, truncating the file first|
|`x`|create a new file and open it for writing|
|`a`|open for writing, appending to the end of the file if it exists|
|`b`|binary mode|
|`t`|text mode (default)|
|`+`|open a disk file for updating (reading and writing)|
|`U`|universal newline mode (deprecated)|