# Strings

We've already encountered Python strings in a number of contexts.  Today, we'll explore some more advanced stuff you can do with strings.

## Slicing strings
You already know how to get a single character by its index.  But you can also specify a *range* of characters:

In [None]:
s = "The quick brown fox jumped over the lazy dog."

print(s[8:5])

What happens if:
* the range contains negative indices?
* you leave off one of the numbers of the range (e.g., `s[7:]`)
* the start of the range (left number) is greater than the end of the range (right number)?

*Challenge*:
* We saw that NumPy slices are just "views" of the original array.  Is this true of strings, or not?
* What is the result of the code below?  Is this behavior the same or different from lists?  Why do you think this is?

In [None]:
str1 = "One, two, three!"
str2 = str(str1)

print(str1 is str2)

Write code to print out all of the email addresses with a `.edu` domain:

In [None]:
emails = ["nixie.knox@alphabetbooks.org", "rosyrobinross@mit.edu", "dude193897@gmail.com", "yolanda.yorgenson@tufts.edu", "spammingyouraddressbook.edu@yahoo.com", "willy.waterloo@harvard.edu"]

print(emails)

*Challenge*: Write a function to verify whether an email address is valid.  Assume that a valid email address must:
* contain no spaces
* contain exactly one `@` symbol
* the domain name must contain a `.`

## String functions

What do the following Python string functions do?  Try them out!

You can type `help(str.split)` for information about the `split` function, or you can type `help(str)` for a list of all string functions.

* `split`
* `find`
* `strip`
* `replace`
* `startswith` and `endswith`

*Challenge*: Look at some of the more exotic string functions: `join`, `expandtabs`, `casefold`.


In [None]:
# Experiment time!

## Practice!

**1)** In the code below, what does the `letterstat` function do?  Read the code carefully and experiment with it if you're not sure.

In [None]:
def letterstat(string):
    count = 0
    for i in range(2, len(string)):
        if string[i] == string[i-1]:
            count = count + 1
    return(count)

print(letterstat("Hello, world!"))

*Challenge*:
* Make the `letterstat` function throw an exception.  Does it break if the string is shorter or longer than a certain length?  What else could go wrong?
* `letterstat` works on a data type other than strings.  What other type works, and what does `letterstat` compute in this case?

**2)** Write a function that returns the number of words in a string.  Assume a "word" is any sequence of non-whitespace characters.

In [None]:
# Your code here...

**3)** Write a function that takes a string containing a pair of parentheses, and returns just the content inside the parenthesis.  For example,

    get_parenthesis("This code is (mostly) Python.")

should return the string `"mostly"`.


In [None]:
# Your code here

*Challenge*: Make your function handle the following corner cases:
* String doesn't contain parentheses (return None)
* String only contains and opening or closing parenthesis (Return the part between the parenthesis and the beginning/end of the string)
* String contains multiple sets of parentheses (return a list of strings, one item for each part)
* String contains *nested* parentheses

## Formatting strings with `str.format()`

## Bonus: Formatting strings with f-strings

## Historical note

Python has a couple other ways to format strings.  These are generally more clunky than `str.format()` and f-strings, but you may see in older code, or code written by programmers who are more familiar with other languages.

The first is to simply convert values to strings and concatenate them together with the `+` operator:

In [None]:
years = 87
print(str(years // 20) + " score and " + str(years % 20) + " years ago...")

This style is reminiscent of Java.  It works, but it's hard to read and pretty much impossible to type right the first time.  You inevitably end up missing some `+` signs or some spaces in the string between values or some other character.

The second way is to use the "string formatting" operator, `%`:

In [None]:
years = 87
print("%d score and %d years ago..." % (years // 20, years % 20))

This style is Python's "old way" of formatting strings, and is heavily inspired by C.  The `%d` characters are replaced by the items in the parenthesis (which is technically called a *tuple*, but that's a topic for another day).  You can also use other character options besides `%d`, such as `%f`, `%g`, `%x`, and `%c`.  Try some of these -- what do they do?  (`%g` is most interesting with floating-point values, `%x` is interesting for integers larger than 9, and `%c` is most interesting for numbers between 60 and 120.)