# 9.5 Working with Files

Now that we've learned about how to open, close, read, and write to/from files, let's examine some common scenarios we'll encounter and how to acheive them in Python.  The scenarios we'll cover are:

- File statistics such as file size, number of words in a file, and number of lines.
- Searching within a file.
- Appending data to a file.
- Working with two files at the same time.


## File Statistics

There are lots of statistics that we could try to calculate or find for a file, but the most common that I've run into, espcially text based files, are:

- File Size
- Number of Words
- Number of Lines

### Getting the File Size

Getting the size of a file is actually something that you can do without opening the file.  There are two different ways using either the `os` or the `pathlib` module.

In [1]:
import os
os.path.getsize('resources/darth_plagueis_tragedy.txt')

751

In [2]:
import pathlib
path = pathlib.Path('resources/darth_plagueis_tragedy.txt')
print(path.stat().st_size)

751


<div class="alert alert-info">
    <b>Note:</b> <code>pathlib</code> is only available for Python 3.4+.
</div>

### Getting the number of words of a file

In [3]:
path = pathlib.Path('resources/darth_plagueis_tragedy.txt')
with open(path) as reader:
    num_words = 0
    for line in reader:
        
        # remember that .split() will take a line and turn it into 
        # a list with each delimited string as an entry.  The
        # default delimeter is a space (' ')
        num_words += len(line.split())
        
print(num_words)

138


### Getting the number of lines in a file

In [4]:
path = pathlib.Path('resources/darth_plagueis_tragedy.txt')
with open(path) as reader:
    
    # Remember that .readlines() returns a list of all the lines
    num_lines = len(reader.readlines())
    print(f'There are {num_lines} lines in {path}')

There are 5 lines in resources/darth_plagueis_tragedy.txt


## Searching Within a File

In [5]:
path = pathlib.Path('resources/darth_plagueis_tragedy.txt')

with open(path) as reader:
    for line in reader:
        if line.find('Jedi') >= 0:
            print(line)

I thought not. It’s not a story the Jedi would tell you. 



<div class="alert alert-warning">
    <b>Warning:</b> <code>find()</code> is case sensitive which means you'll need to account for that.
</div>

In [6]:
path = pathlib.Path('resources/darth_plagueis_tragedy.txt')

with open(path) as reader:
    for line in reader:
        if line.find('jedi') >= 0:
            print(line)

### Searching for a non-case sensitive string

In [7]:
path = pathlib.Path('resources/darth_plagueis_tragedy.txt')

with open(path) as reader:
    for line in reader:
        
        # we take the search string and make sure all characters
        # are lower case and then search for the string
        location = line.lower().find('jedi')
        if location >= 0:
            print(line)
            print('-' * location + 'ᐃ')

I thought not. It’s not a story the Jedi would tell you. 

------------------------------------ᐃ


## Appending Data to a File

Appending data to a file is done by passing the `'a'` permission flag into the `open()` function.  Let's start with a `test_data.txt` file that contains some numbers in it:

In [8]:
import random

path = pathlib.Path('resources/test_data.txt')
with open(path, 'w') as fh:
    for _ in range(10):
        fh.write(f'{random.randint(0, 100)}|')

In [9]:
with open(path) as reader:
    print(reader.read())

86|24|74|55|48|59|5|49|11|70|


Next, let's append some more numbers to the file

In [10]:
with open(path, 'a') as fh:
    for _ in range(5):
        fh.write(f'{random.randint(0, 100)}|')
        
with open(path) as reader:
    print(reader.read())

86|24|74|55|48|59|5|49|11|70|56|53|26|21|11|


## Working with Two Files at the Same Time

`open()` statements can be chained together with the `with` statement to properly open and close two files at the same time.

In [11]:
with open('resources/darth_plagueis_tragedy.txt', 'r') as reader, open('resources/darth_stats.txt', 'w') as writer:
    for i, line in enumerate(reader):
        words = line.lower().strip().split()
        num_entries = words.count('the')
        writer.write(f'"the" appears {num_entries} times in line {i + 1}\n')
        
with open('resources/darth_stats.txt') as fh:
    print(fh.read())

"the" appears 2 times in line 1
"the" appears 0 times in line 2
"the" appears 1 times in line 3
"the" appears 0 times in line 4
"the" appears 8 times in line 5

