# Files
---

**Table of Contents**<a id='toc0_'></a>    
- [General Format ](#toc1_)    
- [Opening a File: `open()` ](#toc2_)    
- [Reading a File: `file.seek(pos) | file.read() | file.readlines()` ](#toc3_)    
- [Using the `with` Keyword ](#toc4_)    
- [Writing To File: `file.open(file, mode='w')` ](#toc5_)    
- [Iterating Through A File ](#toc6_)    

<!-- vscode-jupyter-toc-config
	numbering=false
	anchor=true
	flat=false
	minLevel=2
	maxLevel=6
	/vscode-jupyter-toc-config -->
<!-- THIS CELL WILL BE REPLACED ON TOC UPDATE. DO NOT WRITE YOUR TEXT IN THIS CELL -->

---

- File objects to interact with external files on computer
- Can be any sort of file: audio file, text file, emails, Excel documents, etc... 
- Would need to install certain libraries or modules to interact with some of them
- Built-in `open()` function allows to open and play with basic file types

## <a id='toc1_'></a>General Format  [&#8593;](#toc0_)

```python
for line in open('filename'):
    # Do something
```

In [1]:
from pathlib import Path
from os import path
from typing import Final, TextIO

In [2]:
ROOT: Final[Path] = Path().absolute()

## <a id='toc2_'></a>Opening a File: `open()`  [&#8593;](#toc0_)

In [3]:
# Open the testfile.txt we made earlier
MY_FILE: Final[TextIO] = open(path.join(ROOT, '..', 'demofiles', 'testfile1.txt'), encoding='utf8')

## <a id='toc3_'></a>Reading a File: `file.seek(pos) | file.read() | file.readlines()`  [&#8593;](#toc0_)

- `file.seek(pos)`
  - Move the cursor to a specified position position
  - `0` for starting at the beginning
- `file.read()`
  - Start reading the file from the current position

In [4]:
# We can now read the file: Seek to the beginning
MY_FILE.seek(0)

0

In [5]:
# Then start reading. Capture in a variable
CONTENTS: Final[str] = MY_FILE.read()
print(f'Reading the content of textfile1.txt: {CONTENTS}')

Reading the content of textfile1.txt: 
This is a new added using w+ line
This is a second line
This is a third line
This is a fourth line
This is a fifth line


In [6]:
# But what happens if we try to read it again?
print(MY_FILE.read())




- Nothing shows up
  - Imagine the reading *cursor* is at the end of the file after having read it
  - So there is nothing left to read
  - We need to reset the *cursor* in order to read again

In [7]:
# Reset: Seek to the start of file (index 0)
MY_FILE.seek(0)

0

In [8]:
# Now read again
print(f'Reading textfile1.txt one more time: {MY_FILE.read()}')

Reading textfile1.txt one more time: 
This is a new added using w+ line
This is a second line
This is a third line
This is a fourth line
This is a fifth line


- In order to not have to reset every time, we can also use the `file.readlines()`
- **Use this with caution for large files, since everything will be held in memory** 

In [9]:
MY_FILE.seek(0)
# Readlines returns a list of the lines in the file
print(f'Using readlines() to get each line: {MY_FILE.readlines()}')

Using readlines() to get each line: ['\n', 'This is a new added using w+ line\n', 'This is a second line\n', 'This is a third line\n', 'This is a fourth line\n', 'This is a fifth line']


A better approach is to loop through the list

In [10]:
MY_FILE.seek(0)

for line in MY_FILE.readlines():
    print(line, end="")


This is a new added using w+ line
This is a second line
This is a third line
This is a fourth line
This is a fifth line

In [11]:
# Close the file when done
MY_FILE.close()

## <a id='toc4_'></a>Using the `with` Keyword  [&#8593;](#toc0_)

- We can also make use of the `with` keyword to access the contents of a file
- Using this method, there is no need to worry about closing the file

In [12]:
contents: str

with open(path.join(ROOT, '..', 'demofiles', 'testfile1.txt'), encoding='utf8') as my_new_file:
    contents = my_new_file.read()

print('Using the `with` keyword:')
print(contents)

Using the `with` keyword:

This is a new added using w+ line
This is a second line
This is a third line
This is a fourth line
This is a fifth line


## <a id='toc5_'></a>Writing To File: `file.open(file, mode='w')`  [&#8593;](#toc0_)

- By default, using the `file.open()` function will read the file
- To write, we need to pass the argument `'w'`
- Opening the file for writing
  - Add a second argument to the function, 
  - `'w'` for write (overwrite)
  - `'a'` for append only
  - `'r'` for read only (default)
  - `'r+'` for read/write
  - `'w+'` for read/write (overwrite or create new file)

In [13]:
my_file: TextIO

with open(path.join(ROOT, '..', 'demofiles', 'testfile2.txt'), mode='w+', encoding='utf8') as my_file:
    
    # Now write to the file (Overwrite)
    my_file.write('\nThis is a newly added line using w+')
    
    # Read the file
    my_file.seek(0)
    print(my_file.read())
    
    # Add another write
    my_file.write('\nThis is a second line')
    my_file.write('\nThis is a third line')
    my_file.write('\nThis is a fourth line')
    my_file.write('\nThis is a fifth line')
    
    # Read the file again
    my_file.seek(0)
    print(my_file.read())


This is a newly added line using w+

This is a newly added line using w+
This is a second line
This is a third line
This is a fourth line
This is a fifth line


## <a id='toc6_'></a>Iterating Through A File  [&#8593;](#toc0_)

- This is a good technique for CSV processing
- Looping through each line in the file

```python
for line in open('filename'):
    # Do something
```
- Now we can use a little bit of flow to tell the program to loop through every line of the file and do something
- **Note: If file is empty, this will throw an error**

In [14]:
for num, line in enumerate(open(path.join(ROOT, '..', 'demofiles', 'testfile2.txt'))):
    print(f'Line {num+1}: {line}')

Line 1: 

Line 2: This is a newly added line using w+

Line 3: This is a second line

Line 4: This is a third line

Line 5: This is a fourth line

Line 6: This is a fifth line


- It is important to note a few things here:
  1. We could have called the `num` and `line` object anything
  1. By not calling `.read()` on the file, the whole text file was not stored in memory
  1. Notice the indent on the second line for print. This whitespace is required in Python