This super simple example counts words in a file.  We're going to define the input file as the muchAdo.txt, which contains the play Much Ado About Nothing.  We'll define the output file as "simple_counts".

In [1]:
import logging
logging.getLogger().setLevel(logging.ERROR)
logging.basicConfig()

import re
import apache_beam as beam
from apache_beam.io import ReadFromText
from apache_beam.io import WriteToText
from apache_beam.options.pipeline_options import PipelineOptions
from apache_beam.options.pipeline_options import StandardOptions

input_file = "muchAdo.txt"
output_file = "simple_counts"

Next, we define a function that processes the lines how we want (in this case, finding individual words in a line)

In [2]:
def process_line(line):
  """Returns an iterator over the words of this line.

  Args:
    line: the line being processed

  Returns:
    The processed line.
  """
  text_line = line.strip()
  words = re.findall(r'[A-Za-z\']+', text_line)
  return words

Next, we'll instantiate the pipeline with our desired runner and set up the pipeline and output 

In [3]:
options = PipelineOptions()
options.view_as(StandardOptions).runner = 'DirectRunner'

p = beam.Pipeline(options=options)

lines = p | "read" >> ReadFromText(input_file)

counts = (lines
          | "split" >> beam.ParDo(process_line).with_output_types(unicode)
          | "pair_with_1" >> beam.Map(lambda x: (x, 1))
          | "group" >> beam.GroupByKey()
          | "count" >> beam.Map(lambda(x, ones): (x, sum(ones)))
        )

output = counts | "format" >> beam.Map(lambda(word, c): "%s: %s"%(word,c))

output | "write" >> WriteToText(output_file)

<PCollection[write/Write/WriteImpl/FinalizeWrite.None] at 0x7f909ad2a8d0>

Now the pipeline is all set up, but it hasn't actually done anything.  The final step is to run it.

In [None]:
result = p.run()
result.wait_until_finish()

'DONE'

See the file contents:

In [None]:
! more "simple_counts-00000-of-00001"

sunburnt: 1
pardon: 4
needful: 1
foul: 8
four: 2
hath: 67
protest: 4
sleep: 2
friend's: 1
hanging: 1
appetite: 1
evermore: 1
saved: 1
yonder: 1
conjure: 1
muzzle: 1
vile: 2
crept: 1
'Shall: 1
Watch: 5
endings: 1
neighbours: 2
MUCH: 18
[7m--More--(0%)[m