# more on conditions and loops

In this class, we're going to return to the 'in' keyword in a bit more detail, since you'll need it for HW4, and then spend our time discussing a new way of looping through texts.

#### `in`, revisited

It's often the case that we want to check the presence of something in a text or list of strings. We might, for example, want to know which of our novels contains a particular word or one of several words, as an index to which of very many novels to subject to further analysis. Or, we might have run some kind of analysis on many novels, and want to filter the results for only the examples that match certain criteria. 

In general in Python, we test for the presence or absence of something with the 'in' keyword. It's usually possible to do this work "by hand" by looping over the search domain and asking "item == search_term" but in accordance with the principle of not doing ourselves what Python already does, we prefer not to do that unless we're going to need to do something with the items once we've found them. 

Consider the basic case:

In [1]:
s = "I am a rather elderly man."
ls = s.split()

"I" in s

True

`x in y` is an **expression**, which means it evaluates to something, which loosely means it can be run so that it reduces to a particular value; in the case of `in`, that value is either `True` or `False`. Note that `True` or `False` are *boolean* values, not strings (`"True"`), which means that Python can do logic things with them: for example, you can write `if True:` and your if-clause will always run (not too helpful). Because python is case sensitive and tends to be specific about its keywords, you need to use capital letters: `false` won't work.  

You can capture the results of `x in y` and store the T/F in a variable if you want, which might be helpful if you were putting together metadata, for example ("Does this text contain X?"). 

More commonly, you'll see this pattern:

In [2]:
s1 = "I am a rather elderly man."
s2 = "The nature of my avocations for the last thirty years has brought me into more than ordinary contact with what would seem an interesting and somewhat singular set of men, of whom as yet nothing that I know of has ever been written:—I mean the law-copyists or scriveners."
s3 = "I have known very many of them, professionally and privately, and if I pleased, could relate divers histories, at which good-natured gentlemen might smile, and sentimental souls might weep."

ls = [s1, s2, s3]

print(s1 in ls)

for s in ls:
    if "scrivener" in s:
        print(f'Found "scrivener" in {s}')

True
Found "scrivener" in The nature of my avocations for the last thirty years has brought me into more than ordinary contact with what would seem an interesting and somewhat singular set of men, of whom as yet nothing that I know of has ever been written:—I mean the law-copyists or scriveners.


Note that this kind of search is pretty crude: it's just trying to pull substrings. It's smart enough to look inside the items of a list, but it won't match word boundaries or capital letters or anything like that. 

In [3]:
print('the' in s1)
ls = s1.split()
print(ls)
print('am' in ls)

True
['I', 'am', 'a', 'rather', 'elderly', 'man.']
True


You don't need to fall back on regular expressions for super simply cases, though. Instead, you can use a bit of boolean logic:

In [4]:
s4 = "She is a rather elderly woman"
s5 = "I think she is a rather elderly woman"

def find_she(s):
    if 'she' in s or 'She' in s:
        return True

print(find_she(s4))
print(find_she(s5))

True
True


In [5]:
s6 = "He is a rather elderly man"
s7 = "I think he is a rather elderly man"

def find_he(s):
    if ('he' or 'He') in s:
        return True

print(find_he(s4))
print(find_he(s5))

True
True


Probably the most useful structure of this type uses negation, which lets us determine when things are *not* in something. For example, we can easily return all the strings that don't contain the pronoun "I":

In [6]:
s8 = "Some time prior to the period at which this little history begins, my avocations had been largely increased. The good old office, now extinct in the State of New York, of a Master in Chancery, had been conferred upon me. It was not a very arduous office, but very pleasantly remunerative. I seldom lose my temper; much more seldom indulge in dangerous indignation at wrongs and outrages; but I must be permitted to be rash here and declare, that I consider the sudden and violent abrogation of the office of Master in Chancery, by the new Constitution, as a—premature act; inasmuch as I had counted upon a life-lease of the profits, whereas I only received those of a few short years. But this is by the way."
ls_bartleby = s8.split('.')
print(ls_bartleby)
# fix this list: 2 problems

for sentence in ls_bartleby:
    if 'I' not in sentence:
        # print(sentence)
        pass

['Some time prior to the period at which this little history begins, my avocations had been largely increased', ' The good old office, now extinct in the State of New York, of a Master in Chancery, had been conferred upon me', ' It was not a very arduous office, but very pleasantly remunerative', ' I seldom lose my temper; much more seldom indulge in dangerous indignation at wrongs and outrages; but I must be permitted to be rash here and declare, that I consider the sudden and violent abrogation of the office of Master in Chancery, by the new Constitution, as a—premature act; inasmuch as I had counted upon a life-lease of the profits, whereas I only received those of a few short years', ' But this is by the way', '']


#### `while` loops

Last class, we talked about how to search for substrings using `str.find()`, and how we can use the `start` and `end` parameters to find subsequent instances of the same substring. Consider the following stanza of Langston Hughes' "Weary Blues": 

`Droning a drowsy syncopated tune,
Rocking back and forth to a mellow croon,
I heard a Negro play.
Down on Lenox Avenue the other night
By the pale dull pallor of an old gas light
He did a lazy sway. . . .
He did a lazy sway. . . .
To the tune o’ those Weary Blues.
With his ebony hands on each ivory key
He made that poor piano moan with melody.
O Blues!
Swaying to and fro on his rickety stool
He played that sad raggy tune like a musical fool.
Sweet Blues!
Coming from a black man’s soul.
O Blues!
In a deep song voice with a melancholy tone
I heard that Negro sing, that old piano moan—
“Ain’t got nobody in all this world,
Ain’t got nobody but ma self.
I’s gwine to quit ma frownin’
And put ma troubles on the shelf.”`

Now, let's try to break it up into lines. We could us `str.split()`, of course, but let's assume for a second that that's not on the table. Instead, we can look through it for newlines and slice the string like we did last time.

The problem is that the poem isn't a sonnet; we don't know how many lines there are, and we probably want to try to write something generic, so we don't want to count them ourselves and write `for i in range(22)`. 

We've looked at two different kinds of `for`-loops so far in this class: the basic `for`-loop, which loops over all of the elements of something, and the `range`-based `for`-loop, which runs a specific number of times by looping over all of the number in a range. The `while` loop covers the other cases you might encounter; it's actually the most general case of loop and can be used to write the others if you really want. 

The `while` loop is an *indefinite* loop, unlike the other two: there's no way to predict in advanced how many times it will run. Instead, we set a condition that must be true, and the loop runs **while** the condition is true. If the condition becomes false, the loop stops. In our case, we can start the loop with an index of 0, then loop character by character over the string, recording all the newlines and slicing out each line of the poem. That's going to be a bit complicated, though, so let's start with a simple example: we can count all the pieces of punctuation.

In [7]:
weary_blues = """Droning a drowsy syncopated tune,
Rocking back and forth to a mellow croon,
I heard a Negro play.
Down on Lenox Avenue the other night
By the pale dull pallor of an old gas light
He did a lazy sway. . . .
He did a lazy sway. . . .
To the tune o’ those Weary Blues.
With his ebony hands on each ivory key
He made that poor piano moan with melody.
O Blues!
Swaying to and fro on his rickety stool
He played that sad raggy tune like a musical fool.
Sweet Blues!
Coming from a black man’s soul.
O Blues!
In a deep song voice with a melancholy tone
I heard that Negro sing, that old piano moan—
“Ain’t got nobody in all this world,
Ain’t got nobody but ma self.
I’s gwine to quit ma frownin’
And put ma troubles on the shelf.”"""

In [8]:
def count_punct(s):
    # we define these variables *outside* the loop, because we don't want to reset them every time we go through the loop
    index = 0
    count = 0
    while (index < len(s)): # we'll stop at the end of the poem
        if s[index] in '.,;:!?\'\"':
            count += 1 # add to our count
    print(count)
#count_punct(weary_blues)

Okay, so what will happen if we run our new function? More importantly, if we try it, what goes wrong and how do we fix it?

Now, if you think about the kinds of loops we've seen so far, you might realize we could rewrite our new while-loop with a for-loop, since we're actually just looping over every character in the poem. Let's return to our lines example, and try to get a list containing the indices of all the newlines in the poem:

In [9]:
def find_newlines(s):
    index = 0
    newlines = []
    while (index < len(s)):
        newline = s.find('\n', index)
        if newline > 0:
            # print(newline) 
            newlines.append(newline)
            index = newline + 1
    return newlines
# find_newlines(weary_blues)

Does that work? What's happening with our function? If you uncomment the print statement, can you get any insight?

We solve this problem in one of two ways: either with a more specific loop condition, or by specifically telling the loop when to stop using the `break` keyword

Finally, can you think about what we'd need to slice out the lines? We'll do it in class on Monday, but you technically have all the tools you need right now. 