# Strings

## We can do things with strings

We've already seen  in Data 8 some operations that can be done with strings.

In [2]:
first_name = "Franz"
last_name = "Kafka"
full_name = first_name + last_name
print(full_name)

FranzKafka


Remember that computers don't understand context.

In [3]:
full_name = first_name + " " + last_name
print(full_name)

Franz Kafka


## Strings are made up of sub-strings

You can think of strings as a [sequence](https://github.com/dlab-berkeley/python-intensive/blob/master/Glossary.md#sequence) of smaller strings or characters. We can access a piece of that sequence using square brackets `[]`.

In [4]:
full_name[1]

'r'

<div class="alert alert-danger">
Don't forget, Python (and many other langauges) start counting from 0.
</div>

In [5]:
full_name[0]

'F'

In [6]:
full_name[4]

'z'

## You can slice strings using  `[ : ]`

If you want a range (or "slice") of a sequence, you get everything *before* the second index, i.e,. Python slicing is *exclusive*:

In [7]:
full_name[0:4]

'Fran'

In [8]:
full_name[0:5]

'Franz'

You can see some of the logic for this when we consider implicit indices.

In [9]:
full_name[:5]

'Franz'

In [10]:
full_name[5:]

' Kafka'

If we want to find out how long a string is, we can use the `len` function:

In [11]:
len(full_name)

11

## Strings have methods

* There are other operations defined on string data. These are called **string [methods](https://github.com/dlab-berkeley/python-intensive/blob/master/Glossary.md#method)**. 
* The Jupyter Notebooks lets you do tab-completion after a dot ('.') to see what methods an [object](https://github.com/dlab-berkeley/python-intensive/blob/master/Glossary.md#object) (i.e., a defined variable) has to offer. Try it now!

In [12]:
str.encode

<method 'encode' of 'str' objects>

Let's look at the `upper` method. What does it do? Let's take a look at the documentation. Jupyter Notebooks let us do this with a question mark ('?') before *or* after an object (again, a defined variable).

In [13]:
str.upper?

So we can use it to upper-caseify a string. 

In [14]:
full_name.upper()

'FRANZ KAFKA'

You have to use the parenthesis at the end because upper is a method of the string class.
<p></p>
<div class="alert alert-danger">
Don't forget, simply calling the method does not change the original variable, you must *reassign* the variable:
</div>

In [15]:
print(full_name)

Franz Kafka


In [16]:
full_name = full_name.upper()
print(full_name)

FRANZ KAFKA


For what it's worth, you don't need to have a variable to use the `upper()` method, you could use it on the string itself.

In [17]:
"Franz Kafka".upper()

'FRANZ KAFKA'

What do you think should happen when you take upper of an int?  What about a string representation of an int?

In [18]:
1.upper()

SyntaxError: invalid syntax (<ipython-input-18-02adcf0a0b2a>, line 1)

In [19]:
"1".upper()

'1'

## Challenge 1: Write your name

1. Make two string variables, one with your first name and one with your last name.
2. Concatenate both strings to form your full name and [assign](https://github.com/dlab-berkeley/python-intensive/blob/master/Glossary.md#assign) it to a variable.
3. Assign a new variable that has your full name in all upper case.
4. Slice that string to get your first name again.

In [20]:
first_name = 'Katharine'
last_name = 'Chua'

full_name = first_name + ' ' + last_name
cap_full_name = full_name.upper()
cap_full_name[:9]

'KATHARINE'

## Challenge 2: Try seeing what the following string methods do:

    * `split`
    * `join`
    * `replace`
    * `strip`
    * `find`

In [21]:
my_string = "It was a Sunday morning at the height of spring."

In [22]:
# split
print(my_string.split(' '))
print(my_string.split('a'))

['It', 'was', 'a', 'Sunday', 'morning', 'at', 'the', 'height', 'of', 'spring.']
['It w', 's ', ' Sund', 'y morning ', 't the height of spring.']


In [23]:
# join (Return a string which is the concatenation of the strings in the
# iterable.  The separator between elements is S.) 

words = my_string.split()

print(''.join(words))
print('~'.join(words))

ItwasaSundaymorningattheheightofspring.
It~was~a~Sunday~morning~at~the~height~of~spring.


In [54]:
# replace
my_string.replace('Sunday', 'Monday').replace('morning', 'afternoon')

'It was a Monday afternoon at the height of spring.'

In [64]:
my_string2 = "It was a Sunday morning at the height of spring.             "

In [65]:
# strip (Return a copy of the string S with leading and trailing whitespace removed.) 
my_string2.strip('.')

'It was a Sunday morning at the height of spring.             '

In [68]:
# find 
print(my_string.find('a'))
print(my_string.find('e'))

4
29


## Challenge 3: Working with strings

Below is a string of Edgar Allen Poe's "A Dream Within a Dream":

In [70]:
poem = '''Take this kiss upon the brow!
And, in parting from you now,
Thus much let me avow —
You are not wrong, who deem
That my days have been a dream;
Yet if hope has flown away
In a night, or in a day,
In a vision, or in none,
Is it therefore the less gone?  
All that we see or seem
Is but a dream within a dream.

I stand amid the roar
Of a surf-tormented shore,
And I hold within my hand
Grains of the golden sand —
How few! yet how they creep
Through my fingers to the deep,
While I weep — while I weep!
O God! Can I not grasp 
Them with a tighter clasp?
O God! can I not save
One from the pitiless wave?
Is all that we see or seem
But a dream within a dream?'''

What is the difference between `poem.strip("?")` and `poem.replace("?", "")` ?

In [74]:
poem.strip("?")

print("poem.strip(?) is removing the whitespace trailing after the question mark.")

poem.strip(?) is removing the whitespace trailing after the question mark.


In [75]:
poem.replace("?", "")

print("poem.replace('?', '') is replacing the question mark with blank space.")

poem.replace('?', '') is replacing the question mark with blank space.


At what index does the word "*and*" first appear? Where does it last appear?

In [85]:
print("The word 'and' first appears at index", poem.lower().find('and'),".")
print("The word 'and' last appears at index", poem.lower().rfind('and'),".")

The word 'and' first appears at index 30 .
The word 'and' last appears at index 407 .


How can you answer the above accounting for upper- and lowercase?

In [88]:
print('We can account for upper and lower case by using "poem.lower()".')

We can account for upper and lower case by using "poem.lower()".


## Challenge 4: Counting Text

Below is a string of Robert Frost's "The Road Not Taken":

In [89]:
poem = '''Two roads diverged in a yellow wood,
And sorry I could not travel both
And be one traveler, long I stood
And looked down one as far as I could
To where it bent in the undergrowth;

Then took the other, as just as fair,
And having perhaps the better claim,
Because it was grassy and wanted wear;
Though as for that the passing there
Had worn them really about the same,

And both that morning equally lay
In leaves no step had trodden black.
Oh, I kept the first for another day!
Yet knowing how way leads on to way,
I doubted if I should ever come back.

I shall be telling this with a sigh
Somewhere ages and ages hence:
Two roads diverged in a wood, and I—
I took the one less traveled by,
And that has made all the difference.'''

Using the `len` function and the string methods, answer the following questions:

How many characters (letters) are in the poem?

In [90]:
print("There are", len(poem), "characters in the poem.")

There are 729 characters in the poem.


How many words?

In [91]:
print("There are", len(poem.split(' ')), "words in the poem.")

There are 125 words in the poem.


How many lines? (HINT: A line break is represented as  `\n`  )

In [92]:
print("There are", len(poem.split('\n')), "lines in the poem.")

There are 23 lines in the poem.


How many stanzas?

In [53]:
print("There are", len(poem.split('\n\n')), "stanzas in the poem.")

There are 4 stanzas in the poem.


Remove commas and check the number of unique words again. Why is it different?

In [44]:
without_commas = poem.strip(',')
len(without_commas)

729