# Introduction

We often need to deal with a CSV source that contains both "meat" (the usuable, good lines) and "fat" (the lines we want to discard). Here are a couple of techniques which we can use to filter out unwanted lines.

## Good-Line Predicate

In this approach, we create a simple function which takes as input a data line and return `True` if it is a good line ("meat") or `False` otherwise.

In [1]:
import csv
from io import StringIO

# Here is a buffer which contains "fat" such as
# blank lines or comments
buffer = StringIO("""
# The Carpenters
501,karen,bash
502,john,tcsh

# Peter, Paul and Mary
601,peter,bash
602,paul,tclsh
603,mary,zsh
""")

def is_good_line(line):
    """
    Definition of a good line: not blank and 
    does not start with a comment
    """
    line = line.strip()
    return line != '' and not line.startswith('#')

#
# Main: filter out the "fat" from the buffer
# and feed the good lines to a CSV reader
#
meat = filter(is_good_line, buffer)
for record in csv.reader(meat):
    print(record)

['501', 'karen', 'bash']
['502', 'john', 'tcsh']
['601', 'peter', 'bash']
['602', 'paul', 'tclsh']
['603', 'mary', 'zsh']


## Filter Generator

In this approach, we create a generator which takes in a sequence of input lines and only return those that are good. The advantage of this approach is we can optionally alter the input lines such as removing leading spaces.

In [2]:
import csv
from io import StringIO

# Here is a buffer which contains "fat" such as
# blank lines, comments, or leading spaces
buffer = StringIO("""
# The Carpenters
  501,karen,bash
  502,john,tcsh

# Peter, Paul and Mary
  601,peter,bash
  602,paul,tclsh
  603,mary,zsh
""")

def clean_lines(lines):
    """
    Filter out blank or comment lines, also remove leading spaces
    """
    for line in lines:
        line = line.strip()
        if line and not line.startswith('#'):
            yield line

#
# Main: filter out the "fat" from the buffer
# and feed the good lines to a CSV reader
#
meat = clean_lines(buffer)
for record in csv.reader(meat):
    print(record)

['501', 'karen', 'bash']
['502', 'john', 'tcsh']
['601', 'peter', 'bash']
['602', 'paul', 'tclsh']
['603', 'mary', 'zsh']
