# Storytelling with Code Fall 2021  
## Third Notebook
### PYTHON BASICS: EXPRESSIONS AND STRINGS
(Based on Allison Parrish's excellent tutorial, with extensive modifications)
https://github.com/aparrish/rwet/blob/master/expressions-and-strings.ipynb


This notebook is an introduction to working with text in Python. We'll start out with numbers as a way to position text (or "strings" as they're more commonly called in programming) within the larger context of other data types.  

You'll need to download this notebook, as well as the images associated with it, and then upload them to your own Jupyter Notebook environment. If you've forgotten how to do that, review the first notebook of the semester.

## ARITHMETIC EXPRESSIONS  
You can use your notebook as a calculator to evaluate different arithmetic expressions:

In [None]:
5 + 3

In [None]:
4 + 5 * 6

In [None]:
(4 + 5) * 6

In [None]:
10+20+30

In [None]:
19

For a list of the different arithmetic operators available to you, see [this](https://www.w3schools.com/python/gloss_python_arithmetic_operators.asp) handy resource guide.

## EXPRESSIONS OF INEQUALITY  
There are also special operators in Python that allow you to compare two values to see if they're equal. Several of these symbols will likely be familiar to you from grade school:

In [None]:
3 * 5 == 9 + 6

In [None]:
20 == 7 * 3

In [None]:
17 < 18

In [None]:
17 > 18

In [None]:
22 >= 22

In [None]:
22 <= 22

In [None]:
22 =< 22

Some comparison operators can also be applied to strings (i.e., text) (Note: the "!+" operator can be translated as "not equal to")

In [None]:
'python' == 'python'

In [None]:
'python'!='cat'

In [None]:
'python'!='python'

For a list of comparison operators in Python, see [this](https://www.w3schools.com/python/gloss_python_comparison_operators.asp) guide.

# Variables  
A variable is a label assigned to a value or piece of data. That data may be numerical or textual, as well as quantitatively large or small (e.g., a collection of poems or a single letter or number). To associate a particular value to a variable, we use the "=" operator. The advantage of variables is that they can serve as surrogates for the data they reference. If you assign the text of a novel, for example, to the variable "x," you can type "x" whenever you want to manipulate, process, or display the contents of the novel. Here are some examples of variables in action (be sure to run each code cell; note that executing a code cell that merely assigns a value to a variable won't result in any visible output; be sure to run it nonetheless and then move on to the next cell).

In [None]:
ourclass = "Storytelling with Code"

In [None]:
ourclass

In [None]:
y = "2021"

In [None]:
y

In [None]:
k = (4 + 5) * 6

In [None]:
k

In [None]:
house_of_dust = 'A HOUSE OF STEEL\nBY AN ABANDONED LAKE\nUSING CANDLES\nINHABITED BY LOVERS'

In [None]:
house_of_dust

In [None]:
print(house_of_dust)

In [None]:
material = ['SAND', 'DUST', 'LEAVES', 'PAPER', 'TIN', 'ROOTS', 'BRICK', 'STONE', 'DISCARDED CLOTHING', 'GLASS', 'STEEL', 'PLASTIC', 'MUD', 'BROKEN DISHES', 'WOOD', 'STRAW', 'WEEDS']

In [None]:
material

If you haven't assigned any information to a variable name, you'll get an error message:

In [None]:
Ramona

In [None]:
Ramona = "A Beverly Cleary character"

In [None]:
Ramona

If you assign a value to a variable and subsequently reuse that variable name and assign it a new value, Jupyter notebook will honor the most recent assignment:

In [None]:
print(Ramona)

In [None]:
Ramona = 3 * 3

In [None]:
Ramona

Variable names may be short (a single letter can function as a variable) or longer and more descriptive. For an overview of the rules and constraints placed on variable names, see [this guide](https://www.w3schools.com/python/python_variables_names.asp). 

# DATA TYPES  
Variables can store data of different types, such as text (called "strings" in programming parlance); numbers (e.g., integers or whole numbers and floats or numbers with decimal points); and lists (which can themselves store data of different types, such as strings or integers). For this class, we'll be working primarily with strings and lists that contain strings.  Other data types that can play an important role in digital humanities and creative coding include dictionaries and tuples. To learn more about these other data types, consult the relevant modules at the [Python for Everybody](https://www.py4e.com/lessons) website, an online introductory Python textbook created by Dr. Charles Severance<br>

Data types are distinguishable from one another at the level of syntax or appearance: strings, for example, are enclosed in quotation marks, while lists are enclosed in square brackets. These surface distinctions signal deeper behavioral and semantic differences: you can do things with strings that you can't do with numbers, for instance, and vice versa. Below I've applied the "type" function to ask the Python interpreter to indicate what class of data the expression evaluates to. As you run each cell, take note of the various syntactic practices that distinguish one data type from another. 

In [None]:
type(100 + 1)

In [None]:
type(3.14)

In [None]:
"Suppose there is a pigeon, suppose there is."

In [None]:
type("Suppose there is a pigeon, suppose there is.")

In [None]:
type(["suppose", "there", "is", "a", "pigeon"])

In [None]:
type(["s", "u", "p", "p", "o", "s", "e"])

In [None]:
type([1, 3, 5, 7])

In [None]:
type([1, "a", 3])

# Strings  
A string is a sequence of characters. You can think of "string" as roughly synonymous with "text." Strings can be manipulated in a variety of ways to analyze them or achieve new combinatorial outputs. You can slice, combine, replace, print, change, scramble, search, and randomize them. In Python, strings are enclosed in either single or double quotation marks, like so:

In [None]:
"This is a string"

In [None]:
'This is a string too'

In [None]:
'Suppose there is a pigeon, suppose there is.'

As we've already seen, you can assign a string to a variable:

In [None]:
roastbeef = "Suppose there is a pigeon, suppose there is."

In [None]:
roastbeef

In [None]:
cat_message = "我爱猫！😻, 4"

In [None]:
type(cat_message)

What happens if you write a string that includes internal dialog with quotation marks? How does Python do with those, given that it regards quotation marks as a signal meaning "Aha! This is a string!"?

In [None]:
"And then he said, "I hated the steak," and vanished."

There are a couple workarounds to this problem. The easiest one is to use triple quotation marks around the entire string, which tells the interactive interpreter to treat everything else enclosed as string literals:

In [None]:
"""And then he said, "I hated the steak," and vanished."""

## Asking Questions About Strings  
Now that you know something about strings, let's look at some of the ways you can examine and manipulate them. To kick things off, let's look at the "len" function, which tells you how many characters are in a string:

In [None]:
len("Suppose there is a pigeon, suppose there is.")

Be sure to observe all the formatting requirements for the function, such as inserting the string between quotation marks and parentheses. One other thing to note about the results, above, is that blank spaces are counted as characters.

You can also obtain the length value for two or more strings and add them together using the "+" operator:

In [None]:
len("Camembert") + len("Cheddar")

The "in" operator allows you to find out if a substring is contained inside a larger string:

In [None]:
"foo" in "buffoon"

In [None]:
"foo" in "reginald"

The "startswith" method tells you if a string starts with some particular combination of characters:

In [None]:
"foodie".startswith("foo")

The "endswith" method in parallel fashion lets you determine if a string ends with a particular sequence of characters:

In [None]:
"foodie".endswith("foo")

The "islower" method evaluates a string to determine if it's in all lower case:

In [None]:
"foodie".islower()

And of course there's an upper case equivalent:

In [None]:
"foodie".isupper()

In [None]:
"YELLING ON THE INTERNET".islower()

In [None]:
"YELLING ON THE INTERNET".isupper()

The "find" method searches for the first occurence of a substring and returns its location within the larger string. If the substring is not found, a value of "-1" is returned:

In [None]:
"Now is the winter of our discontent".find("win")

In [None]:
"Now is the winter of our discontent".find("lose")

You can use the "count" method to count the number of times a specified substring occurs:

In [None]:
"I got rhythm, I got music, I got my man, who could ask for anything more".count("I got")

Some comparison operators, as we've already seen, can be used on strings as well as numbers:

In [None]:
"pants" == "pants"

In [None]:
"pants" == "trousers"

In [None]:
x = "pants"
y = "trousers"
x == y

## Simple String Transformations
Python provides a number of methods for simple string transformations. The following methods illustrate how to convert a string to all lowercase or uppercase; capitalize just the first letters of each word; and remove extra spaces:

In [None]:
"ARGUMENTATION! DISAGREEMENT! STRIFE!".lower()

In [None]:
"She is. not. happy about this.".upper()

In [None]:
"the yellow wallpaper".title()

In [None]:
" got some random whitespace in some places here     ".strip()

A particularly valuable transformation for computational poetry is the "replace" method. Use the parentheses following "replace" to first specify the substring you want to remove, followed by the substring you'd like to replace it with. Be sure to adhere to all the necessary syntactical requirements when using this (and every other) method (e.g., original string and substrings in quotation marks, comma separating old substring from new in parentheses).

In [None]:
"I got rhythm, I got music, I got my man, who could ask for anything more".replace("I got", "I used to have")

Remember that variables are your friend:

In [None]:
x = "I got rhythm, I got music, I got my man, who could ask for anything more"
x.replace("I got", "I used to have")

In [None]:
y = "I got"
t = "I used to have"
x.replace(y, t)

## Reading in the Contents of a File as a String  
Until now, we've been typing short strings into the interpreter. But as you progress in your programming, you'll want to be able to work with text files containing poems, short stories, novellas, novels, plays, etc. This next section shows you how to read a file into Jupyter Notebook in order to search, transform, display, and otherwise manipulate it. Key to working with any file in Python is making sure it's in the right file format. For literary works, we'll want to change the original file format (e.g., .html or .docx) into what's called "UTF-8" format with a .txt file extension. UTF-8 encodes a text in a way that makes it easy for Python to process it and helps ensure that you don't get lots of weird textual artifacts you don't want. You can learn more about text encoding and UTF-8 [here](https://blog.hubspot.com/website/what-is-utf-8).  

This is the point at which the text editor becomes a vital tool for the budding digital poet. To start, point your browser to [the poetry foundation](https://www.poetryfoundation.org/poems/48188/sea-rose), where you'll find a copy of H.D.'s poem "Sea Rose." ([H.D.](https://www.poetryfoundation.org/poets/h-d) (Hilda Dolittle) was a 20th-century avant-garde bisexual poet.) In your browser, select and copy the text of the poem (you should exclude the title and author from your selection) and paste it into your text editor. Next, figure out how to change the file format in your particular text editor. In BBEdit, for example, which is the text editor I use most often, I can select "Save As" from the "File" menu, choosing "UTF-8" from a long list of options. You may need to google and otherwise research how to do this in your text editor. Equally important is what you name your file: you should save it as "sea_rose.txt" minus the quotation marks (your text editor may automatically add the ".txt" file extension). Once you've done that, you'll need to upload it to Jupyter notebook. If you don't recall how to do that, you'll want to review our first notebook. You're now ready to open and read the file:

In [None]:
open("sea_rose.txt").read()

You can improve the formatting by using the print statement:

In [None]:
print(open("sea_rose.txt").read())

Now try assigning the whole poem to a variable:

In [None]:
text = open("sea_rose.txt").read()
print(text)

Let's try applying some of the transformation methods we learned earlier:

In [None]:
text.lower().count("rose")

In [None]:
print(text.replace("rose", "tulip"))

Hmm . . . it looks like the first instance of "rose" (first word in the first line) was not replaced with "tulip". Can you see why? Here's how to fix it:

In [None]:
print(text.lower().replace("rose", "tulip"))

Now let's capitalize the first word:

In [None]:
x = text.lower()
y = x.replace("rose", "tulip")
print(y.capitalize())

Let's try two replacements: "rose" with "tulip" and "leaf" with "thorn"

In [None]:
print(text.lower().replace("rose", "tulip").replace("leaf", "thorn"))

# String Indexing and Slicing 
Individual characters or substrings can be accessed using string indexing methods in Python. Each character is mapped to a specific index position in the sequence. Somewhat counterintuitively, the first positional slot is numbered "0" rather than "1," like so:
![Slide1.jpeg](attachment:Slide1.jpeg) To retrieve an individual character, you specify the relevant positional slot for that character in square brackets following the string:

In [None]:
"bungalow"[2]

You can achieve the same result by assiging the string to a variable:

In [None]:
message = "bungalow"
message[2]

In [None]:
message[3]

What happens if you specify a positional slot that is out of range for that sequence?

In [None]:
message[17]

Let's check the data type for an expression that evaluates to an individual character in a string:

In [None]:
type(message[3])

We can find out the number of characters in a string using the "len" method:

In [None]:
len(message)

In [None]:
len(message[3])

Indexes can be expressions too

In [None]:
message[2 * 3] #bungalow

## Negative Indexes  
In addition to working with a string from beginning to end (or left to right), you can also retrieve characters in reverse using negative indexes.

![Slide2.jpeg](attachment:Slide2.jpeg)

In [None]:
message[-2]

In [None]:
message[-987]

There is a lack of symmetry or parallelism in terms of indexing left to right versus right to left. While the first positional slot in regular indexing moving from left to right is numbered "0", the first positional slot in negative indexing moving right to left is not "0", but "-1".
![Slide3.jpeg](attachment:Slide3.jpeg)

Note that per the "Terrapins" example above, the following two expressions would evaluate to the same thing:  
"terrapins"[0] and "terrapins"[-9]

In [None]:
"terrapins"[0]

In [None]:
"terrapins"[-9]

Exercise: creating as many new code cells as you need, retrieve each letter in "terrapins" using both regular and negative indexing. Start with "t", retrieve it using both methods, then move sequentially through each additional letter in the string. 

## String Slicing
You can extract part of a string by specifying the start and stop index positions, separated by a colon.

In [None]:
"terrapins"[0:5]

Somewhat confusingly, the character occupying the stop index position is not included; the string extraction stops one character short of the last specified index position. The interactive interpreter counts up to but not including the character in that slot. Let's take another look at our "terrapins" index key to illustrate:  
![Slide1.jpeg](attachment:Slide1.jpeg)
The character occuping the slot labeled "5" is "p", yet "terrapins"[0:5] does not return the "p" as part of the string slice (e.g., "terrap"). Instead the system stops one character short to give us "terra". Let's look at another example.

In [None]:
"House of Dust"[0:5]

Keep in mind that when slicing strings, the values in the brackets can include mathematical expressions:

In [None]:
t = "terrapins"
t[1+4:3*3]

The values in brackets can also include variables:

In [None]:
x = 5
t[x:9]

In [None]:
s = 3*3

In [None]:
t[x:s]

You can also use negative indexing to slice strings.

In [None]:
"terrapins"[-4:-1]

Note that transposing those two index positions won't work; if you're working with negative indexing, the index position furthest from the right must be specified first.

In [None]:
"terrapins"[-1:-4]

If index slicing always goes up to but not including the stop index position, how do we retrieve the final character of a string? One way is to specify the index position immediately following the one occupied by the last character. Since the "s" of "terrapins" occupies slot "9", we can specify a stop index of "10":

In [None]:
"terrapins"[5:10]

In [None]:
"House of Dust"[9:13]

Another workaround is to use the "len" method we tried out earlier to find out how many characters were in a string: 

In [None]:
"house of dust"[-4:len('house of dust')]

Python also lets you omit an index position if you want to either start your slice with the first index slot or end at the last index slot:

In [None]:
"house of dust"[:5]

In [None]:
"house of dust"[9:]

In [None]:
"house of dust"[:]

In [None]:
"house of dust"[-4:]

## Putting Strings Together  
You can join or concatenate two or more strings by using the "+" operator. For mathematical expressions, this operator computes the sum of two numbers; for string expressions, it yokes two or more strings together:

In [None]:
17 + 92

In [None]:
"count" + "down"

In [None]:
"combinatorial " + "poetry"

In [None]:
part1 = "Nickel, what is nickel, "
part2 = "it is originally rid of a cover."
part1 + part2

## Strings with Multiple Lines  
Note that the first cell below results in an error message as the interpreter tries to deal with line breaks. The following cells show two possible solutions. The second solution uses the triple quotation marks we saw earlier:

In [None]:
poem = "Rose, harsh rose, 
marred and with stint of petals, 
meagre flower, thin, 
spare of leaf,"

In [None]:
poem = "Rose, harsh rose,\nmarred and with stint of petals,\nmeagre flower, thin,\nspare of leaf,"
print(poem)

In [None]:
poem = """Rose, harsh rose, 
marred and with stint of petals, 
meagre flower, thin, 
spare of leaf,"""
print(poem)

EXERCISE 1: Identify a poem you want to work with. Make sure you save it in "UTF-8" format and upload it to your Jupyter notebook so that it's available to you in the right directory on your computer. Create a variable and assign the text of your poem to that variable. Use the len() function to find out how many characters are in your poem. Then, use the count() method to find out how many times one or more specific strings occur within it. 

EXERCISE 2: Transform your poem by 1.) using the "swapcase" string method we encountered during class; and 2.) replacing at least three distinct words with three new words using the replace method. You might first try achieving each of these transformations separately (one version of the poem that swaps the case, another version that replaces words), but as a more advanced step, try creating output that combines both of them in a single transformation. Hint: variables are your friend!

EXERCISE 3: Try concatenating the Sea Rose poem with your chosen poem to achieve a new poem that combines them together. Hint: assign a different variable to each poem using the open file method we saw earlier and then concatenate the two variables. Do you remember which operator you can use for concatenation?

EXERCISE 4: Write an expression, or a series of expressions, that prints out "Sea Rose" from the first occurence of the string "sand" up until the end of the poem. (Hint: Use the .find() method, discussed in class in addition to string slicing methods). My code, which I'll share later, is three lines long and uses two variables: "poem" and another variable to identify and hold the location of the string "sand". Another hint: your first line should be "poem = open("sea_rose.txt").read()" minus the quotation marks. Third hint: remember that you can use variables in lieu of explicit numbers to slice strings! In other words, you can have a variable that holds a number, and that variable can subsequently be used in your string slicing brackets.)

EXERCISE 5: Write an expression that evaluates to a string containing the first fifty characters of "Sea Rose" followed by the last fifty characters of "Sea Rose." (Hint: you'll be using string slicing methods and concatenation in this exercise. First line of code can again be "poem = open("sea_rose.txt").read()" minus the quotation marks. Then create two new variables, one to hold the first 50 characters, the second to hold the last 50, then concatenate those two variables and print them)