Title: COMPSCI 32  
Author: Mike Smith  
Copyright: Copyright 2021 by Michael D. Smith. All rights reserved.

# A1S1: Scripts #

## P2: Grab the Dialogue ##

Did you try running other text files through the script we developed together as the first of our problems to be solved? I hope you tried messing with it after class. Curiosity and experimentation will get you far in this class.

Of course, our script doesn't do a whole lot and I imagine you can think of many features you'd like to see added to it (e.g., the ability to read the book more slowly!), but even this short script taught us a lot about programming in Python. And believe it or not, you learned a lot about writing code in any programming language through this simple problem-to-be-solved.

The code in the code block below, which you can also find in `seuss1.py` in the `a1s1` directory of our course repo,[^fn1] is the same as that which we developed together, except that I put our text files in a subdirectory, and more importantly, I added a `close` statement to the end of the script. Why did I add this statement?

In [None]:
### a1s1/seuss1.py
my_open_book = open('txts/CatInTheHat.txt')

while True:
    the_line = my_open_book.readline()
    print(the_line, end='')
    if the_line == '':
        # We've read the entire book!
        print("\nThe End.")
        break

my_open_book.close()

**Step 1: Adding robustness.** Adding this `close` statement doesn't add a new feature to our script.  Its purpose is to keep our execution environment clean, and because it doesn't add any noticeable functionality to our script, it is often hard to remember to include a `close` command for every included `open` command. When I first started programming, I often forget this pairing, and now I use a story from my childhood to help me remember.

When I was a small boy, my parents would constantly remind me to close the screen door to our home while I came into the house. Why did they do that? Yes! They wanted me to keep the bugs out of the house. We won't go, at this time, into the details of the bugs/errors that can occur if you don't close a file when you're done with it, but suffice it to say if you ever open a file, you should close it.

It's perfectly fine to do this at the end of your script, as I've done here. If it's not clear, the interpreter stops executing your script when it reaches the script's end. Putting this `close` command right before the end of the script ensures that closing the file is one of the last things the interpreter does for us before stopping execution.

**Step 2: Using** `with-as`**.** What if I wanted to close the file earlier in the script? I could do that, but if I'm making as many changes to the script as we made last time, I might mistakenly move the `close` to some point before I'm actually done reading the file. For example, let's move the `close` statement into the loop and execute the script again.

In [None]:
### NOT a1s1/seuss1.py
my_open_book = open('txts/CatInTheHat.txt')

while True:
    the_line = my_open_book.readline()
    print(the_line, end='')
    if the_line == '':
        # We've read the entire book!
        print("\nThe End.")
        break

    my_open_book.close()

Yuk. We got a `ValueError: I/O operation on closed file.`

Notice that there's very little difference between the first and second pieces of code. Basically, four little spaces. This is a nightmare waiting to happen.

To avoid such nightmares, Python provides us with a piece of syntactic sugar that lets us automatically include the paired `close` when we're inserting the functionally necessary `open`. It accomplishes this by turning the `open` into a `with-as` [compound statement](https://docs.python.org/3/reference/compound_stmts.html).

In [None]:
### a1s1/seuss2.py
with open('txts/CatInTheHat.txt') as my_open_book:
    while True:
        the_line = my_open_book.readline()
        print(the_line, end='')
        if the_line == '':
            # We've read the entire book!
            print("\nThe End.")
            break

The block of code nested inside the `with-as` compound statement does the `open` and assigns our virtual finger to the name `my_open_book` just as before, but it also recognizes when control leaves this nested code block and automatically calls `close` on `my_open_book` for us. I wish I had something like this when I was a kid. 

This is one example of a way that designers of new programming language have included language features that help you to avoid common errors and problems (i.e., errors and problems that repeatedly haunted programmers in older languages). Of course, you're protected only if you use these features! I encourage you to use this form for your own scripts.

**Step 3: A new problem specification.** While our first problem-to-be-solved has been tremendous fun, the Unix command `cat` will also print out the contents of a simple text file. Let's try to do something more interesting like turning this story into a theatrical script. You know, a textual format that a couple of actors --- playing the narrator, the Cat, the Fish, Things One and Two, and the Mother --- could perform. In case you've forgotten, but the narrator's sister, Sally, never speaks in this story.

What then is our problem specification? It could be something as simple as, "Find each piece of dialogue in the story and print it after the name of the character in the story who says it."

**Step 4: Splitting the problem into smaller pieces.** One of the keys to successfully writing code is to break a large problem into smaller pieces and then write code to solve each of those pieces. Later, you can pull the pieces together in order to create a script that meets the entire specification.

What are the smaller pieces in this problem? Well, we have to find the dialogue in this story. We also have to find which character says each of these lines of dialogue. Let's find the dialogue first, since that should be easier given that each piece of dialogue is enclosed in double quotes in this story.

**Step 5: Reuse.** I like to start a project like this by asking myself, "Can I keep anything from a script I wrote recently?" Let's look at the last script (`a1s1/seuss2.py`) we executed. What does this script do that I could reuse in my new script, and what in it are statements that I'll no longer need? This little exercise is always a good idea. You'll start much quicker if you jumpstart your new script with code from an old script that already works. Reuse is good, for our environment and in your life as a programmer.

In our new script, we will still need to open the file containing the story, and we'll need to loop over the lines in that story. Let's keep the first three (non-comment) statements from `seuss2.py`. We'll also need to know when we've reached the end of the story. Plus, we'll need to eventually close the open file, which `with-as` will do automatically for us! Basically, we just need to delete the line with the `print` and then start adding pseudocode at the end of the `while` loop to solve our new problem. After doing all this, we're left with the following:

In [None]:
### a1s1/seuss3.py
with open('txts/CatInTheHat.txt') as my_open_book:
    while True:
        the_line = my_open_book.readline()
        
        if the_line == '':
            # We've read the entire book!
            print("\nThe End.")
            break

        # new pseudocode goes here

**Step 6: Finite State Machines.** One way to solve the first small piece of our larger problem is to think of your script operating in one of several different states as it accomplishes the task you have set for it. We start our script in some initial (or start) state, which we will call `S0`. We read our input looking for some event that transitions us from this start state to some other state that we will call `S1`.

Figure 2-1 is the beginning of a diagram of a finite state machine (FSM), where the states are represented as labeled circles (also called *nodes*) and *transitions* as directed arrows from one state to another. Each transition is annotated with the event that cause the transition. The start state is identified by the special transition labeled `Start`.

See `Generic-FSM.png` in Module "Act I, Scene I" on Canvas.

Figure 2-1: A simple, generic FSM diagram.

Before we continue, let's make this concrete for the problem in front of us. We want to find each line of dialogue in our input file, and we know that the dialogue in this story is surrounded by double quotes. When our script begins, it starts in a state `S0`, which for our processing problem indicates that we have yet to see a double quote. While this is technically true, I find it more helpful to think of this start state as one in which our script is searching for a double quote character. When the script finds a double quote character, this event tells us that we have found the start of a line of dialogue, and the event causes a transition into a new state labeled `S1`. 

What is this new state `S1`? Well, the characters we are reading now are characters that we want to remember and later print as dialogue. We could think of `S1` as our script collecting the characters associated with the current line of dialogue. This string of characters ends when the current line of dialogue ends, which is when we see another double quote.

In state `S1`, we again are searching for a double quote as the event that exits this state. When the script finds a double quote, this event will cause another state transition as the script moves again to a different kind of processing of the input story. What is the processing we want the script to do in this state following `S1`? Well, when we see the double quote at the end of a line of dialogue, we simply want to switch back to searching for the next piece of dialogue. That action is described by our start state `S0`, and so we can transition from `S1` to `S0` when our script encounters the next double quote (i.e., the one that ends the current line of dialogue).

Our script will keep ping ponging back and forth between these two states, happily collecting lines of dialogue, until what happens? Is there any other event to which our FSM should react? 

Yes! Eventually, our input story ends (i.e., our script reads EOF, or end of file). In addition to a distinguished start state, FSMs have a set of distinguished end, or goal, states. End states in this depiction of a FSM are nodes with a double circle. Figure 2-2 illustrates our complete FSM for finding dialogue in an input text file.

See `Dialogue-FSM.png` in Module "Act I, Scene I" on Canvas.

Figure 2-2: A complete FSM diagram for finding dialogue.

More formally, we have designed what is called a *deterministic finite state machine*. By deterministic, we mean that there is at most a single transition out of a state for each possible event. State `S1` has two transitions out, but they are labeled with different events. There are also nondeterministic FSMs, and these allow a single event to cause multiple transitions out of a state. You'll encounter them if you continue to study computer science.

**Looking deeper (optional).** Both deterministic and nondeterministic FSMs are mathematical models of computation, and they are defined by the quintuple of an input alphabet of events, a set of all states, a start state, a transition function, and a set of (possibly empty) final sets. The start state and final states are included in (i.e., are a subset of) the set of all states. 

The transition function maps states and input events to states. For example, the transition function `f` for the FSM diagram above would return `S1` when called with `f(S0,'"')`. Notice that the transition function does not have to be defined for all combinations of input symbols and states. If no transition function is defined for some pair of current state and input event, the implementation of your FSM should probably raise an error. 

For example, our FSM above does not have a transition from `S1` on encountering EOF. This is because we don't expect our story to end while we are in the middle of a line of dialogue, i.e., while we are searching for the dialogue's ending double-quote character.

We have just dipped our toes into the deep topic of FSMs. They are a great representation for modeling lots of mechanical, biological, and linguistic systems.

Enough about the representation, let's see how we can use the diagram we've drawn to help us think about the code we want to write. As I mentioned earlier, pictures can sometimes help us to write the code we need.

**Step 7: Encoding the state information.** The first thing to note about our picture is that it contains only three states, and the third state (`S2`) represents the termination of our script. This means that our script has only to keep track of which of the first two states (`S0` and `S1`) it is in at any point in time. We can do this with a single Boolean variable. Let's call this variable `looking_for_open_quote`. When it is `True`, it indicates that we are in state `S0` and when it is `False` we are in `S1`. With this decision and our diagram, we can start writing some pseudocode.

In [None]:
with open('txts/CatInTheHat.txt') as my_open_book:
    # Set our FSM to the start state
    looking_for_open_quote = True

    while True:
       the_line = my_open_book.readline()
        
       if the_line == '':
            # We've read the entire book!
            print("\nThe End.")
            break

        # new pseudocode goes here
        # if in s0, i.e., looking_for_open_quote
        #     do some work
        # else in s1, i.e., looking for close quote
        #     do other work

This pseudocode pulls the processing that every state does (i.e., reading a line of characters from the input file) out of each state's work. We could have duplicated that code in each state, but that was unnecessary when we already had it in the outermost loop. We will talk more, in the next scene, about the benefits of collecting commonly executed code together in one place.

**Quick quiz:** Leaving the check for EOF outside any code associated with a particular state changes our drawn FSM. Think about how this coding decision changes the FSM and redraw it.

**Step 8: Start coding the states.** Let's expand our pseudocode for state `S0`. What does it need to do? Well, it can ignore any line that does not contain a double-quote character. On the other hand, if the line contains a double-quote character, we know that the line, at least, is the beginning piece of a line of dialogue. I say "at least" because the dialogue that starts on this line may end on this line or extend to one or more following lines.

I want to pause here and highlight the fact that we are using the word "line" in two different contexts that we must keep straight. A line from the file, which is completely contained in the object called `the_line`, and a line of dialogue from the story, which can start anywhere in a line from the file and continue an arbitrary point in this `the_line` or in any later-read `the_line`.

For example, the first line of dialogue in *The Cat in the Hat* starts in the middle of the eighth file line and continues to the end of the ninth file line.

In [None]:
### NOT a script and therefore NOT executable
8 And I said, "How I wish
9 We had something to do!"

Ok, let's put these ideas into some pseudocode. In particular, let's deal with the transition first and focus on what we'll do for it while in state `s0`.

In [None]:
        # if in s0, i.e., looking_for_open_quote
        #     do some work
        #     if found opening double quote
        #         move to s1
        #     else
        #         stay in s0

All I've done here is check for the event that causes the transition from `S0` to `S1`.  The event that causes this transition is the presence of a double-quote character in `the_line` we just read from the file. So, how do we check for a double-quote character in the string object named `the_line`?

**Step 9: Strings as a sequence of characters.** To this point, we've been relying on your intuition for what constitutes a string in Python. We created them by defining string literals. We read file lines as a string and compared them against string literals (i.e., to test for EOF). And we printed them out to the terminal. Probably none of this work forced you to think about how we were representing strings in the computer; we simply operated on them as big blobs. However, if we want to ask whether a string contains a particular character, we are going to have to know if Python allows us to ask about the components of a string object. 

Short answer: it does. A string in Python is a *sequence* of characters. I emphasized sequence in this definition because sequence is a very useful abstraction for lots of different objects that we would like to manipulate in our Python scripts. The abstraction you should have in your mind for a sequence is *an ordered collection of items*. 

It doesn't matter if the items are of the same type or kind, although in the case of a string, each item in the ordered collection is of the same kind (i.e., a character). We will soon play with the `list` datatype in Python, which allows you to create an ordered sequence of lots of different kinds of things. For example, the objects on my office bookshelf when viewed from left to right could be represented as a Python `list` containing a stuffed animal, and then a picture of my kids, and then this course's textbook, followed by an old CD, and then some other books.

**Step 10: Membership test.** What makes a sequence a very useful abstraction is that there are commonly used operations that we can perform on any Python sequence. For the task at hand, Python sequences permit us to easily test whether an item is a member of a given sequence. A membership test is exactly what we need to answer the question of whether the string named `the_line` contains a double-quote character! 

With this knowledge, let's turn the two `if` statements in our pseudocode snippet for state `S0` into Python code:

In [None]:
        if looking_for_open_quote:  # in state S0
            # do some work
            if '"' in the_line:
                # move to s1
            else:
                # stay in s0

If I wanted to know that there was no double quote in `the_line`, I could have written the expression: `'"' not in the_line`. `in` and `not in` are binary infix operators that compute a `bool`. Of course, transitioning to state `S1` is simply setting our state encoding appropriately, which means for us that we set `looking_for_open_quote` to `False`. Staying in state `S0` means that `looking_for_open_quote` stays `True`, which means we could delete the entire `else` clause since there's no work to be done.

In [None]:
        if looking_for_open_quote:  # in state S0
            # do some work
            if '"' in the_line:
                looking_for_open_quote = False

**Step 11: Indexing and slicing.** I left the comment "do some work" in the pseudocode because transitioning on seeing a double quote is not all the work we need to do here. We want to capture each line of dialogue from the story and print out just this dialogue. This means that we need to remember the characters we have read from this opening double quote until its matching ending one. Python makes this easy with a couple more common operations on sequences like strings.

Specifically, all sequences permit us to identify an item in a sequence by its index and to slice a subsequence from a larger sequence by using two indices. Let's look at this using line 8 of the file `CatInTheHat.txt`, which reads '`And I said, "How I wish`'.

In [None]:
### NOT a script and therefore NOT executable
  +---+---+---+---+---+---+---+---+---+---+---+---+---+---+-   -+---+---+----+
  | A | n | d |   | I |   | s | a | i | d | , |   | " | H | ... | s | h | \n |
  +---+---+---+---+---+---+---+---+---+---+---+---+---+---+-   -+---+---+----+
  0   1   2   3   4   5   6   7   8   9  10  11  12  13  14    21  22  23   24
-24 -23 -22 -21 -20 -19 -18 -17 -16 -14 -13 -12 -11 -10  -9    -3  -2  -1

Python, like many program languages and much of computer science, starts numbering at 0. If I want to know the first item in a Python sequence, I can use its square bracket notation and specify index 0. For instance, `print(the_line[0])` returns '`A`'. I can also ask for the length of any sequence object with the Python built-in function `len`. `len(the_line)` returns `24`. Notice that we count the letters, spaces, punctuation, and the special carriage return character at the end of this string. They are all items in the string currently named `the_line`.

In [None]:
### EXECUTABLE, but NOT part of our script
the_line = 'And I said, "How I wish\n'
print(the_line[0])
len(the_line)

While counting from the front of a string is often what you need, sometimes you might find it easier to count backwards from the end of the string. This is possible in Python by using negative indices in the square bracket notation. The last character in our `the_line` example is accessed by writing `the_line[-1]`, and the initial capital `'A'` is `the_line[-24]` or `the_line[-len(the_line)]`. You will get an `IndexError` if the index you use for a given string `s` is less than `-len(s)` or greater than or equal to `len(s)`.

In [None]:
### EXECUTABLE, but NOT part of our script
the_line = 'And I said, "How I wish\n'
print(the_line[-24])             # prints the first character
print(the_line[-len(the_line)])  # also prints the first character
the_line[len(the_line)]          # index out of bounds!

*Slicing* in Python allows us to take any subsequence of items within our sequence object. For example, `the_line[6:10]` would return the string `'said'`.

In [None]:
### EXECUTABLE, but NOT part of our script
the_line = 'And I said, "How I wish\n'
print(the_line[6:10])

Notice that our second index in this expression is one greater than the index of the last item in our slice. This allows us to write `the_line[0:len(the_line)]` to create a copy of the string object named `the_line`. Since a programmer will often find herself writing `0` as the first index and `len(s)` as the second index in a slice, where `s` is a name for some string object, Python will assume you mean `0` as the first index if you elide it and `len(s)` as the second index if you elide it. To copy a string `s`, you can simply type `s[:]`. 

I tell you all this because we want the slice `the_line[12:]` as the start of our dialogue line.

In [None]:
### EXECUTABLE, but NOT part of our script
the_line = 'And I said, "How I wish\n'
print(the_line[12:])

But how do we know that the double quote sits at index 12 in this current string object? All that the `in` operator told us was that a double quote character existed in `the_line` (i.e., the result of its computation was `True`).

**Step 12: For loops.** Well, again, programmers ask such a question often enough that the developers of Python made it easy to answer this question. But I'm going to divert us for a moment to talk about how, for example, a beginning programmer would solve this problem in a programming language like C. This will introduce to you the other major looping construct in Python (and many other languages): the `for` loop. More importantly, this explains what takes place behind the scenes in the Python abstraction that we will eventually use.

The following code block fills in some of the work we need done in state `S0`. In particular, how do we find the location of the first double quote character in the\_line and capture the beginning of the dialogue[^fn2] line? (You can see the entire script by viewing `a1s1/seuss4.py.`)

In [None]:
        if looking_for_open_quote:  # in state S0
            # do some work; some of it follows
            for i in range(len(the_line)):
                if the_line[i] == '"':
                    dialog = the_line[i:]
                    break

            if '"' in the_line:
                looking_for_open_quote = False

The `for` statement, in general, iterates over the items in a sequence. The `in` keyword here plays a similar membership role as we saw earlier, but as part of the `for` statement, we are asking `i` to name each item in the specified sequence in order, and for each of those values, execute an iteration of the indented block of statements. 

To see this `for` loop in action, we need to know what the sequence is in this particular case. We already know what value is produced by `len(the_line)`, and we pass the integer value computed by `len` to another Python built-in called `range`. This builtin is quite interesting, but for now we simply need to think of it as computing a sequence object containing the integers from `0` to one less than the integer value passed to `range`.

Let's now look at the body of the `for` loop we've written. It asks whether any of the characters, starting at index 0, in `the_line` is a double quote. When if-statement in the `for` loop finds a double-quote character at some index `i`, the body of the if-statement names the slice of `the_line` from `i` to the end of `the_line` with the variable `dialog`. We end the `if` body with a `break` because we don't need the `for` loop to continue checking the rest of `the_line` once we've found a double quote. (Actually, there are other cases we need to handle in a moment, but not for our example from line 8 of `CatInTheHat.txt`.)

Our original `if` statement that checked for membership, is now performing redundant work for us. The following rewrite of our script integrates this logic with the `for` loop we just wrote. Make sure you understand why the logic in this and the previous script are equivalent.

In [None]:
        if looking_for_open_quote:  # in state S0
            # do some work; some of it follows
            for i in range(len(the_line)):
                if the_line[i] == '"':
                    dialog = the_line[i:]
                    looking_for_open_quote = False
                    break

**Step 13: String find.** As promised, Python makes it easy for us to find the index of the first occurrence of a character in a string using the `find` method. In particular, much of the work we did above in the `for` loop becomes: `i = the_line.find('"')`. 

I say "much" because you might ask what the method returns if the character we seek isn't in the current string, and the answer to this question is integer value `-1`. In other words, given a string `s` and a substring `c`, `s.find(c)` returns an integer between `0` and `len(s)` if `c` occurs within `s` and `-1` if not. 

While `find` makes our life much easier, it doesn't replace the need for an if-statement in our new solution. Here's what our script looks like when we replace the for-loop solution for state `S0` with this "find+if" solution. I've also updated our comments to explain what we have and haven't yet accomplished.

In [None]:
        if looking_for_open_quote:  # in state S0
            i = the_line.find('"')
            if i != -1:
                # Found an open quote
                dialog = the_line[i:]
                looking_for_open_quote = False
                # FIXME! Need to handle short dialogue.

**Step 14: Design patterns for error handling.** When you been programming long enough, like any practiced skill, you'll start to see repeated patterns. This is one for error handling. A command, function, or method carves out one or more values in the range of its return value and distinguishes those values as error conditions. In our current example, `find` needs only the integers in the range `0` to `len(s)-1` to fully accomplish its stated functionality. The person who wrote the implementation of `find` was free to pick any value outside this range of valid return values as an error condition. She might have picked several such values to indicate a number of different error conditions.

One of the things you'll learn as your programming skill grows is that you should always check and handle the error conditions that might occur in the commands, functions, and methods you use. The only reason not to do this is because you have carefully reasoned out why the error condition cannot occur (e.g., we might know that `the_line` must contain a double-quote character when the script executes `the_line.find('"')` because of some processing we did immediately prior to this point in our script that determined that specific fact).

You should also realize that just because a design pattern is common doesn't mean it isn't without headaches. This pattern results in us adding the statement `if i != -1:` to our script, which we are supposed to read as "if `find` returned a valid index into `the_line` then we found an opening double-quote character at index `i`." Yea, it doesn't look much like that to me either and that's why I stuck in the comment at the start of the body of our inner if-statement.

While not an issue with the structure of the logic in our current script, many programmers also dislike this design pattern because it can break up the flow of your script with lots of point checks for uncommon error conditions. When you are constantly distracted by infrequently true error checks, it can make it very hard for you and others reading your script to understand how the script performs its main function. We can solve this headache by using a different design patter for error handling, which keeps exceptional events out of the main flow of your algorithm. We will talk about this other design pattern soon.

**Step 15: Never go too long without testing.** Returning to the script we're trying to build, we have a choice of how to proceed. On the one hand, we can continue to write code to handle all the cases that might occur in state `S0` (e.g., the case we've flagged in our "FIXME!" comment). Our minds are focused on state `S0`, and it may be enticing to continue working on it until we've handled every case we can imagine. On the other hand, we could switch and start writing some code for state `S1`. In particular, we could write `S1` code that complements the code we just finished in `S0`. The benefit of this approach is that it would allow us to see if we can get our script to run on some subset of the possible inputs.

My advice: When you have the opportunity to test what you've written, take it. It's a good practice to build your script in pieces that you can regularly test. I like the feeling of accomplishment when I get another small piece of my larger problem to work. Plus, it is often efficient from an overall time perspective if you don't spend a lot of time coding a design that was doomed to failure based on one of your earliest design decisions. Regular testing will keep you from wasting a lot of time on doomed designs. Finally, this class will encourage you to follow this approach by the way in which we construct our problem sets. 

So what do we need to write for state `S1` that integrates with what we just finished writing for `S0`? When our script is in state `S1`, it is looking for the close quote corresponding to the open quote we saw in state `S0`. This sounds a lot like the work we just finished.

In [None]:
        if looking_for_open_quote:  # in state S0
            i = the_line.find('"')
            if i != -1:
                # Found an open quote
                dialog = the_line[i:]
                looking_for_open_quote = False
                # FIXME! Need to handle short dialogue.

        else:
            i = the_line.find('"')
            if i != -1:
                # Found a close quote
                dialog += the_line[:i+1]
                print("\nACTOR:", dialog)
                looking_for_open_quote = True
            else:
                dialog += the_line
            # FIXME! Is this all the work in state S1?

It is! In both states, we use the `find` method on strings to determine whether `the_line` contains a double-quote character, and if it does, we either start capturing (i.e., assigning to `dialog` in state `S0`) or finish capturing (i.e., the strange `+=` assignment of `dialog` in state `S1`). We also include the appropriate FSM transition in each case (i.e., the `looking_for_open_quote` assignments). The `print` statement in state `S1` prints the line of dialogue as the specification required us to do (albeit without the specifics of what actor says the line).

**Step 16: Concatenation and overloading.** The first of the interesting new things in the code for state `S1` is the `+=` operator that I just mentioned. This is equivalent to the statement: `dialog = dialog + the_line[:i+1]`. Let's think about what this expanded statement might mean. The variable to the left of the `+` operator is the part of the current line of dialogue that appeared in previous file lines; the variable to the right of the `+` operator is the final part of the current line of dialogue up to and including the ending double quote. 

**Looking deeper (for understanding).** Do you fully understand why the code we just described includes the ending double quote? If not, you may wish to read the documentation about the string method `find` and review how the indices in slicing work. If you left off the `+1` in `the_line[:i+1]`, this would be an example of what's called an off-by-one error. They are quite common in programming, and you should train yourself to be alert to situations where they might occur.

The code in state `S1` prints the entire line of dialogue once we've found the closing double quote character. This must mean that `dialog` contains the entire dialogue after the `+` operation in the statement above the `print` statement. Hopefully, you're starting to realize that the `+` operator ,when surrounding by string values, performs a concatenation operation. You can read the Python documentation for strings to see that our reasoning is correct.

In Python, you can place a `+` operator in your scripts between two integer values, between two string values, and even between some other types of values. Unsurprisingly, the `+` operator surrounded by two integers will simply add those two values and return the sum. The `+` operator when given two string values will concatenate those strings, which is a fundamentally different operation. This is called *overloading*, and we will return to this concept when we dive deeper into abstraction in Act II. Oh, and in both cases, `+=` is simply a shorthand for these operations when the name given to the returned value is the same as the name used by the lefthand operand to the `+` operator.

The other interesting new thing in the code for state `S1` is the addition of an `else` clause, which wasn't needed in state `S0`. The body of the `else` executes when `the_line` does not contain a double-quote character. Its execution concatenates the current string named `the_line` to whatever dialogue had been collected from the immediately preceding file lines. This makes sense because our script needs to capture everything between the opening and closing double quotes.

Finally, the last line of the preceding code block exists to remind us that we haven't carefully considered all the cases that may occur when reading lines from a story. We might have, but we have neither tested on a lot of different examples nor carefully thought through the cases we might encounter. But that's ok for now, since we simply want to see if we can get the script to work for the cases we think we have covered!

**Step 17: Testing.** Typically, you want to start testing with small inputs that exercise your script in a limited number of ways. This is a great method for incrementally learning what works and then clearly seeing what doesn't. Being methodical may not be your style, but it makes testing and debugging less of a nightmare. 

Here's the full script we've developed to this point:

In [None]:
### a1s1/seuss5.py
with open('txts/Hat.txt') as my_open_book:
    # Set our FSM to the start state
    looking_for_open_quote = True

    while True:
        the_line = my_open_book.readline()

        if the_line == '':
            # We've read the entire book!
            print("\nThe End.")
            break

        # new pseudocode goes here
        if looking_for_open_quote:  # in state S0
            i = the_line.find('"')
            if i != -1:
                # Found an open quote
                dialog = the_line[i:]
                looking_for_open_quote = False
                # FIXME! Need to handle short dialogue.

        else:
            i = the_line.find('"')
            if i != -1:
                # Found a close quote
                dialog += the_line[:i+1]
                print("\nACTOR:", dialog)
                looking_for_open_quote = True
            else:
                dialog += the_line
            # FIXME! Is this all the work in state S1?

The first test input I'll run is called `Hat.txt`, a short poem from Shel Silverstein that contains only a single line of dialogue spanning two file lines. Notice that the script explicitly opens that file.

Tada! When you run `seuss5.py`, it works as expected. Unfortunately, that's not what will probably happen on your first test run, but one can always hope!

When testing a script that loops, it's a good idea to test whether inputs that require only one iteration of your loop function correctly. That's what we just did with our `Hat.txt` input file (i.e., it contains only a single line of dialogue). If it does, you can then build from there. How about two iterations of this processing in one run of the script? In our case this means we need a test input file with two lines of dialogue.

Change `seuss5.py` to open `Importnt.txt`, and run it as a second test. Success again!

So far so good, but I said we had skipped the case in our script when a story included a line of dialogue fully contained on one line of the input file. The file `Snowman.txt` builds upon our successful tests, but adds this different case. Let's see if we really need extra code to handle this case.

Hmmm, I guess we do. This test run failed to do what we expected; it printed only the first three lines of dialogue.

**Step 18: Beware of hidden assumptions.** Our script is short enough that we can probably analyze our problem with `Snowman.txt` by thinking through the script's operation. This will not always be the case, and we will soon learn how to use a debugger to help us understand the state of our script at any point in its execution.

What happens when our script reads line 24 of the file `Snowman.txt` and our variable `the_line` holds the string value: `'And said, "At least I\'ll try."\n'`? The script's state is `looking_for_open_quote` and the variable `i` will be set to `10`. The script recognizes the open quote and sets the variable `dialog` to `'"At least I\'ll try."\n'`. It also sets `looking_for_open_quote` to `False` indicating a transition to state `S1` and proceeds to the next iteration of the `while` loop not realizing that it missed the closing double quote. Since the input file contains no more double quotes after file line 24, the script ends while searching for a close quote and leaves an unprinted line of dialogue in the variable `dialog`.

The issue here is a subtle one. Our FSM diagram from earlier says that we should transition from state `S0` to state `S1` on seeing a double-quote character, but this FSM also assumes that events come *one at a time* during our processing. In our script, `the_line` contains the next *two* events when a line of dialogue fits within a single file line, and we cannot blithely consider that we're done with the processing of `the_line` once we've found *a* double-quote character. That's not what our FSM diagram says to do. It says to start processing the input in state `S1` on the character immediately following the opening double quote. It doesn't say to skip over the characters between the opening double quote and the end of `the_line` string and to start processing with the next line of the file.

Now that we know what we did wrong, we can fix this problem in at least two different ways. We could faithfully adhere to our FSM diagram and write some code that enters state `S1` using only whatever remains in `the_line` after an open quote is found in state `S0`, or we could check for short lines of dialogue in state `S0` and skip the transition to `S1` in these cases. 

Sometimes one choice is better than the other, but there's probably not much difference in this case. Let's maintain our current loop structure that processes file lines and decide to no longer adhere closely to our (initially helpful) FSM diagram. 

Yes, this is a choice! The FSM diagram was meant to help us think about the problem. It was not meant to constrain our implementation.

**Step 19: Function composition.** Once we have thought through the problem and outlined a sensible solution, implementing it is fairly straightforward.

In [None]:
### a1s1/seuss6.py
with open('txts/Snowman.txt') as my_open_book:
    # Set our FSM to the start state
    looking_for_open_quote = True

    while True:
        the_line = my_open_book.readline()

        if the_line == '':
            # We've read the entire book!
            print("\nThe End.")
            break

        # new pseudocode goes here
        if looking_for_open_quote:  # in state S0
            i = the_line.find('"')
            if i != -1:
                # Found an open quote
                dialog = the_line[i:]
                if '"' not in dialog[1:]:
                    looking_for_open_quote = False
                else:
                    # Grab entire dialog from this line ...
                    short_dialog = dialog[1:].split('"')[0]
                    print("\nACTOR: " + '"' + short_dialog + '"')
                    # ... and stay in state S0

        else:
            i = the_line.find('"')
            if i != -1:
                # Found a close quote
                dialog += the_line[:i+1]
                print("\nACTOR:", dialog)
                looking_for_open_quote = True
            else:
                dialog += the_line
            # FIXME! Is this all the work in state S1?

In `seuss6.py`, we protect the transition to state `S1` (within the work in state `S0`) with a conditional branch that makes sure that the rest of the file line (i.e., `dialog[1:]`) does not contain a double-quote character. If it does, we slice out the short line of dialogue, print it, and stay in state `S0` looking for the next open quote (i.e., the body of the new `else` within the work in state `S0`). 

The syntax on the righthand side of the `=` operator where we set `short_dialog` is a bit intimidating, and let's take a moment and make sure we understand exactly what it does. The key to understanding is to remember that the Python order of computations takes place left to right. With that in mind, we can read `dialog[1:].split('"')[0]` as saying: Take the object named `dialog` and treat it as a sequence. We want every item in this sequence except the first (i.e., the 0th item), as specified by the slice `[1:]`. Given this new sequence, split it into two subsequences at the sequence item that is a double quote character, as specified by `.split('"')`. The double-quote character where the split is performed is not included in either of these subsequences. The value returned by the `split` is itself a sequence; it is simply a sequence comprised of the two subsequences computed by the `split`. And finally, we index into these this new sequence to grab the first of these two subsequences (i.e., at index `0`), which is everything in between the opening double quote and the ending double quote! And this explains why the subsequent `print` concatenates opening and closing double-quote characters back onto the line of dialogue as we print it.

The righthand side of `short_dialog` assignment statement is an example of *function composition*, and we will see it used often in the code we will read. In fact, we saw an earlier example in our `for` loop code (i.e., `range(len(the_line))`). This example looks a lot like function composition in mathematics, where you evaluate the stuff in the innermost pair of parentheses first and then work your way out to the outermost parentheses. Overall, try not to let either of these different forms of function composition overwhelm or intimidate you. When you come across them, just take them a step at a time. You'll eventually come to appreciate that

In [None]:
### EXECUTABLE, but NOT part of our script
dialog = '"At least I\'ll try."' 
short_dialog = dialog[1:].split('"')[0]
short_dialog

is easier to read *and comprehend* than

In [None]:
### EXECUTABLE, but NOT part of our script
dialog = '"At least I\'ll try."' 
string_of_dialog_and_other_stuff = dialog[1:]
tuple_of_dialog_and_other_stuff = string_of_dialog_and_other_stuff.split('"')
short_dialog = tuple_of_dialog_and_other_stuff[0]
short_dialog

Function composition removes the need to give names to the intermediate results, and without names, our attention is not drawn to this work, which necessary for the computation but not really important to the big picture. For human readers, your script should be like a good story. It describes the main action without getting bogged down in unnecessary details of how exactly the characters interact and get from place to place. Unlike a story, your script does contain the details, which a computer needs to know, since it does only what it is exactly told to do. Humans, unlike computers, are distracted by these details. 

Recall our earlier example of this helpful use of abstraction. We replaced a for-loop and its associated code with an invocation of the `find` method on a string. Both computed a similar result, but we preferred the `find` string method over the for-loop implementation because it was both more concise and more expressive of what this step in our script was doing. In the next act, we will see how we can create our own compact representations so that each line of our script more closely resembles each step in the pseudocode we are trying to implement.

**Step 20: There is no character.** Before we go, I want to correct something you may have come to believe. While I have used the term "character" to describe single items in the sequence known as a Python string, you should know that there is no distinguished thing that is a character in Python. What we think of as a character is, in Python, just a Python string of length 1. In other words, when I earlier wrote that `print(the_line[0])` returns `'A'`, I was careful to write the result as a Python string. Go ahead and see what `type(the_line[0])` tells you about the type of the object returned from the string index method. We will talk more about types when we reach Act II.

In [None]:
### EXECUTABLE, but NOT part of our script
the_line = 'And said, "At least I\'ll try."' 
type(the_line[0])

[^fn1]: Please see the course Canvas site for information on how to access the Github repository containing copies of the code we develop in class, as well as the files needed for our homework and programming assignments.

[^fn2]: [ Here is a fun article](https://www.wired.com/story/why-you-hate-media-technically-speaking/) on the difference between dialogue and dialog.