# 5. Strings

Here is the table of contents for this notebook:

- 5.1 A string is a sequence
- 5.2 Getting the length of a string using `len`
- 5.3 Traversal through a string with a loop
- 5.4 String slices
- 5.5 Strings are immutable
- 5.6 Looping and counting
- 5.7  The `in` operator
- 5.8 String comparison
- 5.9 String methods
- 5.10 Parsing strings
- 5.11 f-strings
- 5.12 Exercises

## 5.1 A string is a sequence

A string is a _sequence_ of characters. You can access the characters one at a time with the _index operator_:

In [None]:
fruit = 'banana'
letter = fruit[1]

The second statement extracts the character at index position 1 from the `fruit` variable and assigns it to the `letter` variable.

The expression in brackets is called an _index_. The index indicates which character in the sequence you want (hence the name).

But you might not get what you expect:

In [None]:
print(letter)

For most people, the first letter of “banana” is “b”, not “a”. But in Python, the index is an offset from the beginning of the string, and the offset of the first letter is zero. This is called zero-based indexing.

In [None]:
letter = fruit[0]
print(letter)

So “b” is the 0th letter (“zero-th”) of “banana”, “a” is the 1th letter (“one-th”), and “n” is the 2th (“two-th”) letter.

|b|a|n|a|n|a|
|-|-|-|-|-|-|
|0|1|2|3|4|5|

You can use any expression, including variables and operators, as an index, but the value of the index has to be an integer. Otherwise you get:

In [None]:
fruit[1.5]

**Exercise 5.1**

Given the string below, print letter 'w' using the index operator.

In [None]:
some_string = 'hey, how are you?'
# YOUR CODE HERE

## 5.2 Getting the length of a string using `len`

`len` is a built-in function that returns the number of characters in a string:

In [None]:
fruit = 'banana'
len(fruit)

To get the last letter of a string, you might be tempted to try something like this:

In [None]:
length = len(fruit)
last = fruit[length]

The reason for the IndexError is that there is no letter in “banana” with the index 6. Since we started counting at zero, the six letters are numbered 0 to 5. Remember:

|b|a|n|a|n|a|
|-|-|-|-|-|-|
|0|1|2|3|4|5|

To get the last character, you have to subtract 1 from length:

In [None]:
length = len(fruit)
last = fruit[length-1]
print(last)

Alternatively, you can use negative indices, which count backward from the end of the string. The expression `fruit[-1]` yields the last letter, `fruit[-2]` yields the second to last, and so on.

|b|a|n|a|n|a|
|-|-|-|-|-|-|
|0|1|2|3|4|5|
|-6|-5|-4|-3|-2|-1|

In [None]:
fruit[-1]

In [None]:
fruit[-6]

In [None]:
fruit[0] == fruit[-6]

**Exercise 5.2**

Given the string below, print letter 'h' using the index operator with a negative index.

In [None]:
animal = 'elephant'
# YOUR CODE HERE

## 5.3 Traversal through a string with a loop

A lot of computations involve processing a string one character at a time. Often they start at the beginning, select each character in turn, do something to it, and continue until the end. This pattern of processing is called a _traversal_. One way to write a traversal is with a `while` loop:

In [None]:
index = 0
while index < len(fruit):
    letter = fruit[index]
    print(letter)
    index = index + 1

This loop traverses the string and displays each letter on a line by itself. The loop condition is `index < len(fruit)`, so when `index` is equal to the length of the string, the condition is false, and the body of the loop is not executed. The last character accessed is the one with the index `len(fruit)-1`, which is the last character in the string.

You can do traversal with a `for` loop as well:

In [None]:
for letter in fruit:
    print(letter)

Each time through the loop, the next character in the string is assigned to the variable `letter`. The loop continues until no characters are left.

Note that in the `while` loop, we iterate over the indices (0, 1 ..) and use these indices to get the letters. In the `for` loop we directly iterate over the letters which makes it simpler.

What if we need indices in the `for` loop? We can use the same pattern in the while loop to initialize a variable and increment it by 1:

In [None]:
index = 0
for letter in fruit:
    print(index, letter)
    index = index + 1

This pattern is so common that Python provides a built-in way to create a counter, which eliminates the need for initializing and incrementing the variable:

In [None]:
for index, letter in enumerate(fruit):
    print(index, letter)

**Exercise 5.3**

Write a `while` loop that starts at the last character in the string and works its way backwards to the first character in the string, printing each letter on a separate line, except backwards.



In [None]:
# YOUR CODE HERE

## 5.4 String slices

A segment of a string is called a _slice_. Selecting a slice is similar to selecting a character:

In [None]:
s = 'Monty Python'
print(s[0:5])

In [None]:
print(s[6:12])

The operator [n:m] returns the part of the string from the character with index n to the character with the index m-1.

If you omit the first index (before the colon), the slice starts at the beginning of the string. If you omit the second index, the slice goes to the end of the string:

In [None]:
fruit = 'banana'
print(fruit[:3])

In [None]:
print(fruit[3:])

If the first index is greater than or equal to the second the result is an empty string, represented by two quotation marks:

In [None]:
fruit[3:3]

An empty string contains no characters and has length 0, but other than that, it is the same as any other string.

**Exercise 5.4**

Given the `fruit` variable that contains the string `'banana'`, what does `fruit[:]` mean? Check you answer by running the code.

In [None]:
# YOUR CODE HERE

## 5.5 Strings are immutable

It is tempting to use the operator on the left side of an assignment, with the intention of changing a character in a string. For example:

In [None]:
greeting = 'Hello, world!'
greeting[0] = 'J'

The “object” in this case is the string and the “item” is the character you tried to assign. For now, an _object_ is the same thing as a value, but we will refine that definition later. An _item_ is one of the values in a sequence.

The reason for the error is that strings are _immutable_, which means you can’t change an existing string. The best you can do is create a new string that is a variation on the original:

In [None]:
greeting = 'Hello, world!'
new_greeting = 'J' + greeting[1:]
print(new_greeting)

This example _concatenates_ a new first letter onto a slice of `greeting` using the `+` operator. It has no effect on the original string. You can concatenate any number of strings this way:

In [None]:
your_name = 'Frank'
print('Hello, ' + your_name + '!')

If you would like to repeat the same string many times you can even use the `*` operator:

In [None]:
a_string = 'Hey'
a_string * 5

**Exercise 5.5**

Given two strings of same length, use a `for` loop to intertwine them starting with the first string. If the strings are:

```python
string1 = 'abc'
string2 = 'xyz'
```

expected result is `axbycz`.

In [None]:
string1 = 'abc'
string2 = 'xyz'
intertwined_string = ''
# YOUR CODE HERE
print(intertwined_string)

## 5.6 Looping and counting

The following program counts the number of times the letter “a” appears in a string:

In [None]:
word = 'banana'
count = 0
for letter in word:
    if letter == 'a':
        count = count + 1
print(count)

This program demonstrates another pattern of computation called a _counter_. The variable `count` is initialized to 0 and then incremented each time an “a” is found. When the loop exits, `count` contains the result: the total number of a’s.

**Exercise 5.6**

For a given string, count the number of 'yo' substrings.

In [None]:
some_string = 'yo yo yo yo ma'
substring = 'yo'
yo_count = 0
# YOUR CODE HERE
print(yo_count)

## 5.7  The `in` operator

The word `in` is a boolean operator that takes two strings and returns `True` if the first appears as a substring in the second:

In [None]:
'a' in 'banana'

In [None]:
'banana' in 'a'

In [None]:
'na' in 'banana'

## 5.8 String comparison

The comparison operators work on strings. To see if two strings are equal:

In [None]:
word = 'banana'
if word == 'banana':
    print('All right, bananas.')

Other comparison operations are useful for putting words in alphabetical order:

In [None]:
word = 'avocado'
if word < 'banana':
    print('Your word, ' + word + ', comes before banana.')
elif word > 'banana':
    print('Your word, ' + word + ', comes after banana.')
else:
    print('All right, bananas.')

In [None]:
word = 'cherry'
if word < 'banana':
    print('Your word, ' + word + ', comes before banana.')
elif word > 'banana':
    print('Your word, ' + word + ', comes after banana.')
else:
    print('All right, bananas.')

Python does not handle uppercase and lowercase letters the same way that people do. All the uppercase letters come before all the lowercase letters, so:

In [None]:
word = 'Zucchini'
if word < 'banana':
    print('Your word, ' + word + ', comes before banana.')
elif word > 'banana':
    print('Your word, ' + word + ', comes after banana.')
else:
    print('All right, bananas.')

A common way to address this problem is to convert strings to a standard format, such as all lowercase, before performing the comparison.

## 5.9 String methods

Strings are an example of Python _objects_. An object contains both data (the actual string itself) and _methods_, which are effectively functions that are built into the object and are available to any _instance_ of the object.

In [None]:
word = 'banana'
word

In [None]:
word.capitalize()

In [None]:
word.upper()

Calling a _method_ is similar to calling a function (it takes arguments and returns a value) but the syntax is different. We call a method by appending the method name to the variable name using the period as a delimiter.

For example, the method `upper` takes a string and returns a new string with all uppercase letters:

Instead of the function syntax `upper(word)`, it uses the method syntax `word.upper()`.

This form of dot notation specifies the name of the method, `upper`, and the name of the string to apply the method to, `word`. The empty parentheses indicate that this method takes no argument.

A method call is called an invocation; in this case, we would say that we are invoking `upper` on the `word`.

In [None]:
# You can use help to get some simple documentation on a method
help(str.upper)

All string methods are accesible in the Python documentation:

https://docs.python.org/3/library/stdtypes.html#string-methods

**Exercise 5.7**

For a given string, count the number of 'yo' substrings. Include the substrings with capital 'Y' and 'O'. Use the `lower` method. We haven't covered it but you can guess how it works since you have seen the `upper` method.

In [None]:
some_string = 'yo YO Yo yO ma'
substring = 'yo'
yo_count = 0
# YOUR CODE HERE
print(yo_count)

There is a string method named `find` that searches for the position of one string within another:

In [None]:
index = word.find('a')
print(index)

In this example, we invoke `find` on `word` and pass the letter we are looking for as a parameter.

The `find` method can find substrings as well as characters:

In [None]:
word.find('na')

Note that it found only the first occurence of `'na'`. It can take as a second argument the index where it should start:

In [None]:
word.find('na', 3)

One common task is to remove white space (spaces, tabs, or newlines) from the beginning and end of a string using the `strip` method:

In [None]:
line = '  Here we go  '
line.strip()

Some methods such as `startswith` return boolean values.

In [None]:
line = 'Have a nice day'
line.startswith('h')

In [None]:
line.lower()

In [None]:
line.lower().startswith('h')

In the last example, the method `lower` is called and then we use `startswith` to see if the resulting lowercase string starts with the letter “h”. As long as we are careful with the order, we can make multiple method calls in a single expression.

You can also remove punctuation with string methods. One way to do that is the `replace` method.

In [None]:
sentence = 'Irene, Bram, Dean and Zhanna are running away from the dog.'
sentence = sentence.replace(',', '') # replace comma with empty string
sentence = sentence.replace('.', '') # replace dot with empty string
sentence

### `find` vs `index`

The methods `find` and `index` are seemingly similar:

In [None]:
'hola'.find('l')

In [None]:
'hola'.index('l')

both of them find the first occurrence:

In [None]:
'abab'.find('b')

In [None]:
'abab'.index('b')

but there are some important differences to be aware of. Look what happens if the character is not in the string:

In [None]:
'hola'.find('b')

In [None]:
'hola'.index('b')

additionally `index` can be used for other data structures whereas `find` is only for strings:

In [None]:
some_list = [100, 200, 300]
some_list.index(200)

In [None]:
some_list = [100, 200, 300]
some_list.find(200)

`find` has one additional feature over `index` as you will see in the next section.

**Exercise 5.8**

You have seen the following string methods:

`capitalize()`, `upper()`, `lower()`, `find()`, `index()`, `strip()`, `startswith()`, `replace()`

It is very important to keep these in mind and use them when working with strings. It will save you a lot time/code. Also, as you can guess, there are more string methods. So it might be worth a try to search for a method that you don't know, but you expect that it exists. For example, we don't know if there is a method to check if a string ends with a specified substring. But we know that `startswith()` exists so highly probably it should exist.

Search if there is a method called `endswith()` and use it on a string and a substring of your choice.


In [None]:
# YOUR CODE HERE

## 5.10 Parsing strings

Often, we want to look into a string and find a substring. For example if we were presented a series of lines formatted as follows:

`From stephen.marquard@uct.ac.za Sat Jan  5 09:14:16 2008`

and we wanted to pull out only the second half of the address which is called the domain (i.e., `uct.ac.za`) from each line, we can do this by using the find method and string slicing.

First, we will find the position of the at-sign in the string. Then we will find the position of the first space after the at-sign. And then we will use string slicing to extract the portion of the string which we are looking for.


In [None]:
line = 'From stephen.marquard@uct.ac.za Sat Jan  5 09:14:16 2008'
at_position = line.find('@')
at_position

In [None]:
space_after_at = line.find(' ', at_position)
space_after_at

In [None]:
line[at_position:space_after_at]

but we don't want the @ sign so:

In [None]:
domain = line[at_position+1:space_after_at]
domain

You might be thinking "why don't we just hard code the indices as `line[22:31]`?"

In [None]:
line[22:31]

Sure this works for this e-mail, but the first approach will work for an e-mail address of any length and hard coding the indices won't.

## 5.11 f-strings

Often we would like to print text combining strings and variables. We can use the `+` operator as you have seen previously:

In [None]:
mean_error = 0.284352343
print('The mean error is ' + str(mean_error) + ' electron volts')

f-strings provide a better way to use variables within the string

In [None]:
print(f'The mean error is {mean_error} electron volts')

morever it let's us easily format the values of the variables:

In [None]:
print(f'The mean error is {mean_error:.2f} electron volts')

In [None]:
number = 1/3
print(f'The result is {number:.2%}')

where `f'{(value):(width).(precision)(number_format)}}'`

for more information on f-strings:

https://rowannicholls.github.io/python/intro/f_strings.html

## 5. 12 Exercises

**Exercise 5.9**

Take the code from section 5.6

```python
word = 'banana'
count = 0
for letter in word:
    if letter == 'a':
        count = count + 1
print(count)
```

And encapsulate it in a function called `letter_counter`, and generalize it so that it accepts any string and letter as arguments and it is case-insensitive.

Example input/output pairs

- letter_counter('banana', 'a') -> 3
- letter_counter('eyjafjallajökull', 'l') -> 4
- letter_counter('eyjafjallajökull', 'e') -> 1
- letter_counter('Eyjafjallajökull', 'e') -> 1

Note that function should return the result not print it.

In [None]:
# YOUR CODE HERE

**Exercise 5.10**

There is a string method called `count` that can replace the `letter_counter` function your wrote in the previous exercise. Read the documentation of this method at:

https://docs.python.org/library/stdtypes.html#string-methods

Use the string method `count` to count the number of times the letter "a" occurs in “banana”.

In [None]:
# YOUR CODE HERE