# Two Ways to Read File, Line by Line

The first method of reading a file, line by line should be familiar for those who come from other languages:

In [1]:
with open('data20.txt') as fileobj:
    while True:
        line = fileobj.readline()
        if line == '':
            # EOF, bail out
            break
        print(repr(line))

'# Comment line\n'
'line one\n'
'line two\n'
'\n'
'# Another comment\n'
'line four\n'
'last line\n'


Notes:

* Each line ends with a line ending marker (CRLF or LF, depend on the OS and the file)
* Empty line is represented as `'\n'`
* There is no `eof` method or function

The problem with this method is there is no way to test for end of file condition, therefore we must rely on the return value from the `readline` method. If we know in advance that the file is not too large, we can apply a slightly different variation:

In [2]:
with open('data20.txt') as fileobj:
    for line in fileobj.readlines():
        print(repr(line))

'# Comment line\n'
'line one\n'
'line two\n'
'\n'
'# Another comment\n'
'line four\n'
'last line\n'


The problem with the above is if the file is large, we might run into performance issues.

The second method is more Pythonic and somewhat strange at first:

In [3]:
with open('data20.txt') as fileobj:
    for line in fileobj:
        print(repr(line))

'# Comment line\n'
'line one\n'
'line two\n'
'\n'
'# Another comment\n'
'line four\n'
'last line\n'


This method is short, sweet and is the recommended way to read a file, line by line. It also does not carry a performance penalty because it only read the next line on demand.

# Tip: Open Multiple Files

A common scenario is to open a file for reading and another for writing, I often see code like this:

In [4]:
with open('data20.txt') as input_file:
    with open('out20.txt', 'w') as output_file:
        for line in input_file:
            # Do some processing the line here
            output_file.write(line)

Instead of nesting the context managers (the with statements), we can achieve the same without nesting:

In [5]:
with open('data20.txt') as input_file, open('out20.txt', 'w') as output_file:
    for line in input_file:
        # Do some processing the line here
        output_file.write(line)

# Recipe: Filter out Unwanted Lines

There are times when we want to filter out unwanted lines (such as comment). The obvious approach is:

In [6]:
with open('data20.txt') as input_file:
    for line in input_file:
        # Filter out empty lines
        if line == '\n':
            continue
        print(repr(line))

'# Comment line\n'
'line one\n'
'line two\n'
'# Another comment\n'
'line four\n'
'last line\n'


Another approach is apply one or more filters:

In [7]:
import itertools
with open('data20.txt') as input_file:
    good_lines = itertools.filterfalse(lambda line: line == '\n', input_file)
    for line in good_lines:
        print(repr(line))

'# Comment line\n'
'line one\n'
'line two\n'
'# Another comment\n'
'line four\n'
'last line\n'


The above works for Python 3, in Python 2, replace `filterfalse` with `ifilterfalse`

Using the second approach, we can create a number of reusable, stackable filters:

In [8]:
import itertools

def is_empty(line):
    """ Predicate which returns True if a line is empty """
    return line == '\n'

def is_comment(line):
    """ Predicate which returns True if a line is a comment line """
    return line.strip().startswith('#')

with open('data20.txt') as input_file:
    # Stacking the filters
    good_lines = itertools.filterfalse(is_empty, input_file)
    good_lines = itertools.filterfalse(is_comment, good_lines)
    
    for line in good_lines:
        print(repr(line))

'line one\n'
'line two\n'
'line four\n'
'last line\n'


Notes:

* In the above example, `is_empty` and `is_comment` are the reusable filters which we can stack them up.
* One advantage this approach offer is it moves all the filtering outside of the loop, helping us concentrating better in the processing steps

# Recipe: Filter in Wanted Lines

In this recipe, we will process only comment lines by reusing the `is_comment` filter created earlier.

In [9]:
with open('data20.txt') as input_file:
    good_lines = filter(is_comment, input_file)
    
    for line in good_lines:
        print(repr(line))

'# Comment line\n'
'# Another comment\n'


Notes

* In the above example, `is_empty` and `is_comment` are the reusable filters which we can stack them up
* Use `itertools.ifilter` (Python 2) or `filter` (Python 3) to filter *in* the lines that we want

# Recipe: Read File with Line Number

Instead of creating a line counter and manually increment it, we can make use of `enumerate`. Note that by default, this function returns a zero-based count while line numbers are one-based.

In [10]:
with open('data20.txt') as input_file:
    for line_number, line in enumerate(input_file, start=1):
        print('{:>4}: {}'.format(line_number, line), end='')

   1: # Comment line
   2: line one
   3: line two
   4: 
   5: # Another comment
   6: line four
   7: last line


Notes

* By supplying a named argument `start` to the `enumerate` function, we can control the start number
* Since each line already have a line feed embedded, we tell the `print` function not to send a line feed at the end. This is done by the `end=''` named argument

# Recipe: Retrieve Lines by Line Numbers

There are times when we need to retrieve lines from a file using line numbers. Instead of reading the file line by line, there is a better way.

In [3]:
import linecache

print(linecache.getline('data20.txt', 6), end='')
print(linecache.getline('data20.txt', 2), end='')

line four
line one


Notes

* The `linecache` module provides an efficient way to  retrieves lines from a file, given the line number
* The first line is line 1
* As with the cases of reading lines from a file, the lines contain and EOL
* We can access the lines in any order