Understanding Lists and Manipulating Lines
(Adapted with some modifications from Allison Parrish's excellent tutorial: https://github.com/aparrish/rwet/blob/master/understanding-lists-manipulating-lines.ipynb

Strings and Lists

In [None]:
message = "importantly"

In [None]:
message[1]

In [None]:
message[-2]

In [None]:
type(message)

In [None]:
message[-5:-2]

In [None]:
max(message)

In [None]:
min(message)

In [None]:
list(message)

In [None]:
message_letters = list(message)

In [None]:
message_letters

In [None]:
type(message_letters)

In [None]:
message_letters[-2]

In [None]:
message_letters[-5:-2]

In [None]:
max(message_letters)

In [None]:
min(message_letters)

In [None]:
sorted(message_letters)

In [None]:
sorted(message)

Splitting Strings into Lists

In [None]:
"this is a test".split()

In [None]:
"this is a test, yes it is".split(',')

In [None]:
x = "this is a test".split()

In [None]:
x

In [None]:
type(x)

Turning Lists into Strings

In [None]:
glue = " "

In [None]:
glue.join(x)

In [None]:
type(x)

In [None]:
new_x = glue.join(x)

In [None]:
type(new_x)

In [None]:
" and ".join(x)

In [None]:
exam = " "

In [None]:
exam.join(x)

In [None]:
element_list = ["hydrogen", "helium", "lithium", "beryllium", "boron"]
glue = " and "
glue.join(element_list)

In [None]:
glue.join(x)

When we're working with .split() and .join(), our workflow usually looks something like this:

Split a string to get a list of units (usually words).
Use some of the list operations discussed above to modify or slice the list.
Join that list back together into a string.
Do something with that string (e.g., print it out).
With this in mind, here's a program that splits a string into words, randomizes the order of the words, then prints out the results:

In [None]:
import random
text = "it was a dark and stormy night"
words = text.split()
random.shuffle(words)
text = ' '.join(words)
print(text)

In [None]:
random.shuffle(text)

Lists and randomness

Python's random library provides several helpful functions for performing chance operations on lists. The first is shuffle, which takes a list and randomly shuffles its contents:

In [None]:
import random
ingredients = ["flour", "milk", "eggs", "sugar"]
random.shuffle(ingredients)
ingredients

In [None]:
import random
ingredients = ["flour", "milk", "eggs", "sugar"]
random.choice(ingredients)

In [None]:
import random
ingredients = ["flour", "milk", "eggs", "sugar"]
random.sample(ingredients, 3)

List comprehensions: Applying transformations to lists

A very common task in both data analysis and computer programming is applying some operation to every item in a list (e.g., scaling the numbers in a list by a fixed factor), or to create a copy of a list with only those items that match a particular criterion (e.g., eliminating values that fall below a certain threshold). Python has a succinct syntax, called a list comprehension, which allows you to easily write expressions that transform and filter lists.

A list comprehension has a few parts:

a source list, or the list whose values will be transformed or filtered;
a predicate expression, to be evaluated for every item in the list;
(optionally) a membership expression that determines whether or not an item in the source list will be included in the result of evaluating the list comprehension, based on whether the expression evaluates to True or False; and
a temporary variable name by which each value from the source list will be known in the predicate expression and membership expression.
These parts are arranged like so:

[ predicate expression for temporary variable name in source list if membership expression ]

The words for, in, and if are a part of the syntax of the expression. They don't mean anything in particular (and in fact, they do completely different things in other parts of the Python language). You just have to spell them right and put them in the right place in order for the list comprehension to work.

Text files and lists of lines

The open() function allows you to read text from a file. When used as the source in a list comprehension, the predicate expression will be evaluated for each line of text in the file. For example:

In [None]:
[line for line in open("sea_rose.txt")]

In [None]:
text = [line for line in open("sea_rose.txt")]

In [None]:
text

In [None]:
open("sea_rose.txt").read()

In [None]:
poem = open("sea_rose.txt").read()

In [None]:
poem

In [None]:
print(poem)

In [None]:
type(text)

In [None]:
type(poem)

In [None]:
poem[3:9]

In [None]:
text[3:9]

In [None]:
random.sample(poem, 3)

In [None]:
random.sample(text, 3)

In [None]:
text

In [None]:
random.sample(text, 3)

In [None]:
sorted(poem)

In [None]:
sorted(text)

In [None]:
[line[:5] for line in text]

In [None]:
[line for line in text]

In [None]:
["Overwrite rose poem" for line in text]

String methods in the predicate expression

Recall from the tutorial on strings that string expressions in Python have a number of methods that can be called on them that return a copy of the string with some transformation applied, like .lower() (converts the string to lower case) or .replace() (replaces matching substrings with some other string). We can use these in the predicate expression to effect that transformation on every item in the list. To make every string in the upper case, for example, call the .upper() method on the temporary variable line. This makes Frost look really mad:

In [None]:

[line.upper() for line in text]

In [None]:
[line.replace("rose", "tulip") for line in text]

In [None]:
print(poem.replace("rose", "tulip"))

In [None]:
k = [line.lower() for line in text]

In [None]:
k

In [None]:
[line.replace("rose", "tulip") for line in k]

In [None]:
print(poem.lower().replace("rose", "tulip"))

In [None]:

["☛ " + line + " ☚" for line in text]

In [None]:
[line.upper().strip(",;.!:—") + " STOP" for line in text]

In [None]:
[line.replace("leaf", "dog").replace("rose", "thorn").replace("sand", "bottle").replace("flower", "bug") for line in text]

In [None]:
print(text)

In [None]:
print("\n".join(text))

Filtering Lines

In [None]:
[line for line in text if len(line) < 18]

In [None]:
[line.strip() for line in text if len(line) < 18]

In [None]:
[line for line in text if "rose" in line]

In [None]:
poem_words = poem.split()

In [None]:
poem_words

In [None]:
len(poem_words)

In [None]:
len(text)

In [None]:
import random
random.sample(poem_words, 20)

In [None]:
random.sample(text, 5)

In [None]:
[item for item in poem_words if len(item) > 7]

In [None]:
[item for item in poem_words if item.startswith("a")]

Iterating over lists with "for"

In [None]:
for item in poem_words:
    print(item)

In [None]:
for item in poem_words:
    yelling = item.upper()
    print(yelling)

In [None]:
for item in poem_words:
    if len(item) == 2:
        print(item.upper())

EXERCISES FROM LAST WEEK RELATED TO STRINGS:

EXERCISE: Create a variable called poem and assign the text of "Sea Rose" to that variable. Use the len() function to find out how many characters are in it. Then, use the count() method to find out how many times the string rose occurs within it.

EXERCISE: Write an expression, or a series of expressions, that prints out "Sea Rose" from the first occurence of the string sand up until the end of the poem. (Hint: Use the .find() method, discussed in class in addition to string slicing methods). My code, which I'll share later, is three lines long and uses two variables: "poem", as stipulated in this prompt, and another variable to identify and hold the location of the string "sand". Another hint: your first line should be "poem = open("sea_rose.txt").read()" minus the quotation marks.

EXERCISE: Write an expression that evaluates to a string containing the first fifty characters of "Sea Rose" followed by the last fifty characters of "Sea Rose."


EXERCISES FROM THIS WEEK RELATED TO LISTS AND LINES:

EXERCISE: Try filtering lines in the sea-rose poem so that only lines containing the word "rose" print out. Hint: you'll need to incorporate an "if" statement in your first line to indicate that only lines that meet a certain condition should print. Review the "filtering lines" section in Parrish's tutorial and the examples with "if" statements in my "Lists and Lines" Jupyter notebook.

ADVANCED EXERCISE (for this one, you may just want to look at my solution; this exercise is a variation on the the one above: filter lines in the sea-rose poem so that lines containing the word "rose" print in upper case and all other lines print in lower case. Hint: in addition to the "if" statement, this time modified using the .upper() method, you'll also need to add an "else" statement to print the remaining lines in lower case using the .lower() method. You'll also want to incorporate a "for" statement, which signals a loop, telling Python that you want it to iterate through each line of the poem in turn to check to see if that line contains the word "rose." The conditional "if" statement says, in effect, "if the line contains the word "rose," print that line in all upper case. The else statement translates roughly as "for all other lines, print in lower case." 

EXERCISE: Apply each of the different functions available in the "random" library to the sea rose poem. As a refresher, the three we worked with are random.sample, random.choice, and random.shuffle. Try applying each of them to the poem at the level of lines (i.e., write a simple program that randomly shuffles the lines of the sea-rose poem (random.shuffle), then write a new program that randomly selects and prints one line of the poem (random.choice), then write a third program that randomly samples some specific number of lines (3 or 4 or however many you want). Hint: to work with the poem at the level of lines, you'll want to split the poem at the end of each line to create a list of lines. And don't forget to import the "random" library. Here's what your first three lines of code for each of these programs will look like:

import random
poem = open("sea_rose.txt").read()
lines = poem.split("\n")

From there, you'll want to review how to apply each of the random functions by visiting my Jupyter notebook or Allison Parrish's tutorial. One more hint: for both random.shuffle and random.sample, you'll want to use the join operation toward the end to turn your list back into a string. So while the third line of each program turns your string into a list of lines, for two of the functions, you'll want to turn those lists back into strings before you print them. Your last line in each program will use the "print" command.


EXERCISE: Try the above exercise working with another poem instead of sea_rose.txt. For this new poem, you'll need to follow the same procedure you previously learned to get sea_rose.txt into the proper directory. Then open the poem and follow the prompt above. A related challenge: use the string replace method--chaining several replace expressions together--to substitute at least three words in the poem with three different words of your choosing. You can find an example on sea_rose.txt in the notebook.