# Ranges, more for-loops and finding things

Let's pickup from the pre-lecture video, where we saw how lists are mutable, which lets us do things like this:

In [32]:
string = "It is a truth universally acknowledged, that a single man in possession of a good fortune, must be in want of a wife."
ls = string.split()
print(ls)
print('--------------------------')

ls[-1] = "husband."
print(ls)
covid_ls = string.split()
covid_ls = covid_ls[:-2]
covid_ls.extend("several cats.".split())
print(ls2)


['It', 'is', 'a', 'truth', 'universally', 'acknowledged,', 'that', 'a', 'single', 'man', 'in', 'possession', 'of', 'a', 'good', 'fortune,', 'must', 'be', 'in', 'want', 'of', 'a', 'wife.']
--------------------------
['It', 'is', 'a', 'truth', 'universally', 'acknowledged,', 'that', 'a', 'single', 'man', 'in', 'possession', 'of', 'a', 'good', 'fortune,', 'must', 'be', 'in', 'want', 'of', 'a', 'husband.']
['It', 'is', 'a', 'truth', 'universally', 'acknowledged,', 'that', 'a', 'single', 'man', 'in', 'possession', 'of', 'a', 'good', 'fortune,', 'must', 'be', 'in', 'want', 'of', 'several', 'cats.']


### A bit more on list mutability

This brings us to an important side-note, which is that because lists can get pretty big, python doesn't like to copy them. Consider what happens with strings:

In [2]:
string = "I shall go back again to the bleak shore\nAnd build a little shanty on the sand,"
string2 = string
print("string1:\n" + string)
print("string2:\n" + string2)
print('----------------')

string2 = string2[:len(string2)//2]
print("string1:\n" + string)
print("string1:\n" + string2)
print('----------------')

ls = string.split()
ls2 = ls
print(ls)
print(ls2)
print('----------------')

string1:
I shall go back again to the bleak shore
And build a little shanty on the sand,
string2:
I shall go back again to the bleak shore
And build a little shanty on the sand,
----------------
string1:
I shall go back again to the bleak shore
And build a little shanty on the sand,
string1:
I shall go back again to the bleak shor
----------------
['I', 'shall', 'go', 'back', 'again', 'to', 'the', 'bleak', 'shore', 'And', 'build', 'a', 'little', 'shanty', 'on', 'the', 'sand,']
['I', 'shall', 'go', 'back', 'again', 'to', 'the', 'bleak', 'shore', 'And', 'build', 'a', 'little', 'shanty', 'on', 'the', 'sand,']
----------------


Now, you might be able to guess where this is going: what do you think will happen if we take our two new lists and try the same thing? Uncomment the lines below to check

In [3]:
ls2[0] = 'You' 
# print(ls2)
# print(ls)

There are a few reasons for this, but basically what you need to know is this: python doesn't copy things like lists (a category that includes dictionaries, which we'll get to next week) when you assign them to a new varaible; instead it creates another reference to the same underlying structure. Sometimes, python will fork the list into two different entities; there are rules for when this happens but they're complex. What you should know instead is this: **assume that references to lists will be to the same underlying list**. If that's not going to work for you, you can use a function specific for this situation to make a new list:

In [4]:
string3 =  'In such a way that the extremest band\nOf brittle seaweed will escape my door'
ls3 = string3.split()
ls4 = ls3.copy()
print(ls3)
print(ls4)
print('----------------')

ls3[13] = 'your'
print(ls3)
print(ls4)
print('----------------')

['In', 'such', 'a', 'way', 'that', 'the', 'extremest', 'band', 'Of', 'brittle', 'seaweed', 'will', 'escape', 'my', 'door']
['In', 'such', 'a', 'way', 'that', 'the', 'extremest', 'band', 'Of', 'brittle', 'seaweed', 'will', 'escape', 'my', 'door']
----------------
['In', 'such', 'a', 'way', 'that', 'the', 'extremest', 'band', 'Of', 'brittle', 'seaweed', 'will', 'escape', 'your', 'door']
['In', 'such', 'a', 'way', 'that', 'the', 'extremest', 'band', 'Of', 'brittle', 'seaweed', 'will', 'escape', 'my', 'door']
----------------


In [5]:
string = 'But by a yard or two; and nevermore'
ls = string.split()
print(ls)

indices = [0, 1, 2]
for i in range(len(ls)):
    ls[i] = 'word'
print(ls)

['But', 'by', 'a', 'yard', 'or', 'two;', 'and', 'nevermore']
['word', 'word', 'word', 'word', 'word', 'word', 'word', 'word']


## Ranges

You got to experiment with this a bit on the homework, but you should know one or two other things about ranges in python. Basically, a range is a python way to specify, well, a range of numbers. A range is a specicial kind of thing, made with the `range()` function, and which we can loop over using a for-loop. A range isn't a list, however, which mostly matters if you try to print it, though you can wrap it in the aptly-named `list()` function to turn it into a list if you ever need to. Try adding a line to do that, and to print out the resulting list, below. 

In [6]:
zero_to_nine = range(10)
print(zero_to_nine)

range(0, 10)


Mostly, this matters if you're trying to sort out the particular details of more complicated ranges. Like string slicing, ranges support the creation of less commonly useful sequences as well as consecutive numbers starting from zero. If you supply two numbers, the first becomes the start, and the second the stop. Add a third, and you get an interval, too. And by having a negative interval, you can go backwards. 

In [7]:
print(list(range(10, 20)))
print(list(range(0, 100, 5))) # if you want an interval, you need a start point
# print(list(range(100, 5))) # what will this do? 

[10, 11, 12, 13, 14, 15, 16, 17, 18, 19]
[0, 5, 10, 15, 20, 25, 30, 35, 40, 45, 50, 55, 60, 65, 70, 75, 80, 85, 90, 95]


It's good to know this stuff is possible, but what really matters is `for i in range(n)` where `i` is an index to a list. Usually, that's `range(len(ls))`, like in the homework, but if the length of the list is based off of something else, you might use that instead, to be clearer.

In [8]:
row = ["title", "author", "date", "nwords", "filename"]
num_cols = len(row)
"""
<code ommitted here>
"""
for i in range(num_cols):
    print(row[i])

title
author
date
nwords
filename


## Finding things

Remember last week how I made a thing about off-by-one errors? That's about to come back. So far, we've covered various things that you can hopefully see if might be possibly to built the scaffolding for some kind of textual analysis out of: ways of looping over text, details about lists and strings, etc. We've got more of that to cover, but we're going to detour to python's string searching functions first, so that we can motivate some of our next few concepts with some more practical and literary examples. First, let's take a text. We'll use a sonnet, since poems are shorter than novels and it's helpful to be able to type it out in full. This one's by Edna St. Vincent Millay, and like everything she wrote, it's beautiful, austere and depressing. You can replace it with something more upbeat and the code should still work fine. 

In [9]:
sonnet = """I shall go back again to the bleak shore
And build a little shanty on the sand,
In such a way that the extremest band
Of brittle seaweed will escape my door
But by a yard or two; and nevermore
Shall I return to take you by the hand;
I shall be gone to what I understand,
And happier than I ever was before.
The love that stood a moment in your eyes,
The words that lay a moment on your tongue,
Are one with all that in a moment dies,
A little under-said and over-sung.
But I shall find the sullen rocks and skies
Unchanged from what they were when I was young."""
print(sonnet)

I shall go back again to the bleak shore
And build a little shanty on the sand,
In such a way that the extremest band
Of brittle seaweed will escape my door
But by a yard or two; and nevermore
Shall I return to take you by the hand;
I shall be gone to what I understand,
And happier than I ever was before.
The love that stood a moment in your eyes,
The words that lay a moment on your tongue,
Are one with all that in a moment dies,
A little under-said and over-sung.
But I shall find the sullen rocks and skies
Unchanged from what they were when I was young.


My opinionated take is that people over-use this feature (and that for folks dealing with textual data, it's important to spend some time grappling with newlines), but if you wrap a string in triple quotes, python expects it to span multiple lines and treats its newlines as a part of the string you're creating, not a part of your code (the same goes for quotes in triple-quoted strings).

You may notice that this sonnet, like some of its ilk, is a couple (3, in this case) of sentences long. Now, pretending this sonnet were longer, we might be interested in finding instances of a particular word. Say, "seaweed". Now, python has a few different ways to go about this. 

First of all, if you want to know if something is in something, with python, you just ask: 

In [10]:
print("seaweed" in sonnet)
print("minnow" in sonnet)

True
False


Eventually we'll get to feeding python regular expressions, but for now, just note that:

In [11]:
"unchanged" in sonnet

False

But if we were working with *The Rime of the Ancient Mariner*, or *Corsons Inlet* or another longer poem, we might not just want to know `"minnow" in corsons_text`, but specifically where it shows up. Before uncommenting and running the following line, think about what sort of information you expect Python might return from a function like this? 

In [12]:
# sonnet.find('seaweed')

Were you surprised? There's actually a lot we can do with this. You might start by printing out the word we just found (yes, it's redundant—we already know the word is "seaweed", but it's a good exercise). Don't just print out `"seaweed"`, though: slice it out of the whole string.

In [26]:
start = 0 # change these so the slice below prints "seaweed"
end = 0
print(sonnet[start:end])




We can make find a bit more versatile if we know more about it's features. Like range, it lets us provide a few extra pieces of information. Specifically, we can provide a start, and if we provide a start, we can alo give a stop (if we provide neither, start is index 0, and stop is the end of the string). 

The stop isn't that useful; but the start is. Usually, the way that we use find like this is to search relative to some other location we've found with search. We'll talk about how to automate this next class, but for now, think about this:

In [24]:
per_1 = sonnet.find('.')
per_2 = sonnet.find('.') # what else do you need to put in here to find the second period in the poem?
print(per_1, per_2)

305 305


Now that we've got the second period, we can use the locations of these two items to slice out whole sentences. Give it a try below, and note pay attention to the boundaries. 

In [25]:
start = 0 # change these so the slice below prints the poem's second sentence. You can ignore the newlines and they should print out fine (ie, work as though the string were one line)
end = 0
print(sonnet[start:end])




Incidentally, what's the first character in the string you just printed out?

Now, we just searched for periods in sequence, but it's worth thinking about our `"seaweed"` case too: supposed we wanted the first sentence that contains `"seaweed"`. We can search for the period *after*, but how can we find the period *before*? (Note that we're stuck looking for periods here—we'll sneak in some regular expressions in a few weeks so that we can deal with `?` and `!` as well). 

You might think that `find()` would have some weird syntax with negative numbers to go backwards, but actually, it doesn't. Instead, we use the closely related `rfind()` method. Remember how `find()` gives you the lowest index in the string that matches the substring we're searching for? `rfind()` gives you the highest (this isn't *exactly* searching from the left 

Okay, one last thing: next class, we'll cover how to do this efficiently using `find()` and a new structure called a `while` loop, but can you think of how we might use a range-based for loop to get all the indices for periods in the poem? We won't need find for this; we can just loop over the characters one by one

In [29]:
# your for-loop to print the indices of all the '.' in string

for i in range(0):
    pass