# Strings

## $ \S 1 $ Strings

### $ 1.1 $ Strings and characters in Python

A __string__ is a sequence of characters enclosed in either single `'` or double `"` quotes. The type corresponding to strings is called `str`.

__Example:__

In [None]:
a = "It is time for all good men"
b = 'to come to the aid of their country.'
print(a, type(a))   # Printing the string a and its type.
print(b, type(b))   # Doing the same for the string b.

It is time for all good men <class 'str'>
to come to the aid of their country. <class 'str'>


Note that the pair of quotes delimiting a string do not appear
on the screen when it is printed.

📝 If a string contains newline characters, it can be delimited by `"""` (triple double quotes).
This can be used to break up a longer string over several lines.

In [None]:
poem =  """
Whose woods these are I think I know.   
His house is in the village though;   
He will not see me stopping here   
To watch his woods fill up with snow.
"""
print(poem, type(poem))


Whose woods these are I think I know.   
His house is in the village though;   
He will not see me stopping here   
To watch his woods fill up with snow.
 <class 'str'>


📝 A long string can also be created across several lines by enclosing it in parentheses, as follows:

In [None]:
long_string = ("The one who has conquered himself "
               "is a far greater hero than he who has "
               "defeated a thousand times a thousand men.")
print(long_string)

The one who has conquered himself is a far greater hero than he who has defeated a thousand times a thousand men.


Note the extra spaces at the end of the first two lines; without them, subsequent
lines would be concatenated without any separation.

Unlike several other languages, Python does not have a special type for
characters: a character is represented simply as a string of length 1.

In [None]:
letter = 'a'
print(letter, type(letter))    # Note that letter is of type str.

a <class 'str'>


To get the $ i $-th character of a string called, say, $ s $, use `s[i]`; the output is also string (albeit one having only $ 1 $ character).

<div class="alert alert-warning">In Python, indices are <i>always</i> counted starting from <b> $ 0 $  (zero)</b>, not $ 1 $. To avoid confusion, we adapt our terminology accordingly. For example, we refer to 'm' as the <i>0-th</i> character of the string 'magic', 'a' as its first character, and so on... In particular, if a string has length $ n $ (i.e., if it consists of $ n $ characters), then its last index is $ n - 1 $. </div>

__Example:__

In [None]:
g = "Gandalf"
s = "Sauron"

print(g[0], type(g[0]))
print(g[3], type(g[3]))

# Since s contains 6 letters, the last one is indexed by 5:
print(s[5])

G <class 'str'>
d <class 'str'>
n


By prefixing an index with a minus sign $ - $, we start counting to the 'left'
from the 0-th character. For example, `s[-1]` is the _last_ character of $ s $,
`s[-2]` its _next-to-last_ character, and so on.

__Example:__ Here is a representation of the letters in the string "rocket" and
their respective indices:

$$
\begin{array}{r|r|r|r|r|r|r|}
\text{string} &  \text{r} & \text{o} & \text{c} & \text{k} & \text{e} & \text{t} \\ \hline
\text{indices} & 0 & 1 & 2 & 3 & 4 & 5 \\ 
 & -6 & -5 & -4 & -3 & -2 & -1 \\
\end{array}
$$

__Exercise:__ In the preceding example, what happens if you try to access the $ 6 $-th element
of the string? What about the element at position $ -7 $?

📝 If we want to create a string that has a single quote as one of its
characters, we can enclose the string in double quotes, and vice-versa.

In [None]:
explosion = "'BOOM!'"
last = explosion[-1]
print(last)

next_to_last = explosion[-2]
print(next_to_last)

'
!


Finally, any Unicode character can be used in a string:

In [2]:
fruits = "🍉🫐🍓🥝🍎🍍"
print(fruits)

greeting = "こんにちは"
print(greeting)

🍉🫐🍓🥝🍎🍍
こんにちは


### $ 1.2 $ Special characters

In a string, the backslash `\` plays the role of a special character called the
**escape character**. It can be used, for instance, to represent whitespace
characters — tab `\t`, backspace `\b`, newline `\n` — or to turn another special
character into an ordinary character — such as a single quote `\'`, double quote
`\"` or the backslash itself `\\`. These cases are tabulated below. Note however
that there are other escape combinations beyond these, which we will not
consider.

| Code   |  Result  |
| :----- | :------- |
| `\'`   | single quote (')  |
| `\"`   | double quotes (") |
| `\\`   | backslash (\\)    |
| `\t`   | tab               |
| `\b`   | backspace         |
| `\n`   | new line          |

⚠️ Depending on your environment, the backspace character `\b` may only shift
the cursor backwards by one position, but not delete anything. If this happens,
then in order to delete the previous character and move the cursor left by one
position one should use `\b \b` (two backspaces with a space in between). This
will move the cursor left, overwrite the previous character with a space and
move the cursor back once more.

__Exercise:__ Which string is displayed on the screen after each of the
following strings are printed (using `print`)?

(a) `'it\'s\ta'`

(b) `"powerfuls\b!"`

(c) `"trolley\b\b  "`  

(d) `"this\nis\ta\ntest"`

(e) `'\'3.14\''`

__Raw strings__ in Python are created by adding the `r` prefix before a string
literal. They treat backslashes `\` as literal characters rather than escape
characters.

__Example:__

In [2]:
# Normal string: \n creates a newline:
print("First line\nSecond line")  

# Raw string: \n is printed as is:
print(r"First line\nSecond line")

First line
Second line
First line\nSecond line


Raw strings are particularly useful when working with:

* File paths in Windows, to avoid having to type double backslashes.
* Regular expressions (regexes), in order to avoid excessive escaping.
* Any text where you want backslashes to be treated literally.

## $ \S 2 $ Operations on strings

Strings can be __concatenated__ using the binary operator __+__.

__Example:__

In [None]:
string_1 = "ancient"
string_2 = "magic"
string_3 = "spells"

print(string_1 + string_2)
print(string_1 + " " + string_2 + " " + string_3)

ancientmagic
ancient magic spells


__Exercise:__ Suppose that the statements `a = "hello"` and `b = 'world'` have
just been run through the interpreter. Determine the output of each of the
following statements:

(a) `a + a`

(b) `b + " " + a`

(c) `a * 3`

(d) `2 * b`

(e) `(-1) * a`

(f) `1 + "1"`

(g) `a * b`

(h) `a - a`

(i) `0 == '0'`

(j) `True == "True"`

In [None]:
a = "hello"
b = 'world'

📝 Extending the interpretation of `+` as concatenation of strings, if one
uses `*` to "multiply" a string by a positive integer $ n $, then the result is
a new string which consists of $ n $ copies of the original string, concatenated
one after another. More concisely, for strings, `*` denotes __repetition__.  The
remaining arithmetic operators (`-`, `**`, `/`, `//` and `%`) cannot be applied to
strings.

In [None]:
word = "foo"
print(word * 3)
print(3 * word)

foofoofoo
foofoofoo


The function `len` applied to a string returns its **length**, i.e., the number of characters it contains, which is always a non-negative integer.

__Example:__

In [1]:
s = "Look to Windward"
print(s)
print(len(s), type(len(s)))

Look to Windward
16 <class 'int'>


In [None]:
# The empty string is the only string that has length 0:
print(len(""))

# Note that whitespaces also count as valid characters:
print(len("   "))    # <--- This string consists of three spaces.

0
3


🚫 A string is an __immutable__ object, meaning that its individual characters
_cannot be modified_. Trying to do so will make the interpreter throw a
`TypeError`.

In [None]:
boat = "Boat"
# Let's try to modify the first character:
boat[0] = 'C'

The __colon operator `:`__, as in `[i:j]`, is used to __slice__ a string from
its $ i $-th character (inclusive) to its $ j $-th character (exclusive).
This operation does not modify the string (after all, strings are immutable);
rather, it creates a _new_ string which consists of the characters having index
ranging from $ i $ up to and including $ j - 1 $.

__Example:__

In [None]:
string = 'magic'

# Slice from the 0th character to the 2nd (not including the 2nd):
init = string[0:2]
print(init, type(init))

# Slice from the 2nd character to the 5th (not including the 5th):
final = string[2:5]   
print(final, type(final))

print(init + final)

ma <class 'str'>
gic <class 'str'>
magic


Omitting the first index in a slice has the same effect as slicing from the
beginning.  Similarly, if we omit the second index, then the string will be
sliced until the end.

In [None]:
word = 'automaton'
print(word[:4])
print(word[4:])

auto
maton


One frequently has a need to create an independent copy of a string. To do this,
a _full slice_ `[:]` can be used.

__Example:__

In [None]:
string_1 = 'potion'

# Omit both indices in the slice to create a copy of the original string:
string_2 = string_1[:]    

string_1 = 'magic'
print(string_1, string_2)

magic potion


__Exercise:__ Let $ s $ be a variable whose value is a string. True or False? Explain.

(a) `s[:] == s[0:]`

(b) `s[:] == s[0:len(s)]`

(c) `s[:] == s[0:-1]`


The slice construct also admits a third argument, which specifies the
__step size__ in the slicing operation. The syntax of a slice whice makes use of
all three arguments is thus:
`[<start index (inclusive)> : <stop index (exclusive)> : <step size>]`. If omitted,
the step size is set to $ 1 $ by default. Step sizes can also be negative, which
causes the string to be sliced in the right-to-left direction.

__Example:__

In [1]:
s = "magic"
print(s[::1])   # Full slice, explicit step size of 1.
print(s[::2])   # Slice consisting of even-indexed characters.
print(s[1::2])  # Slice consisting of odd-indexed characters.
print(s[::-1])  # Slice which amounts to the reverse of the string.

magic
mgc
ai
cigam


📝 As above, to create a copy of string $ s $ with its characters reversed, we use `s[::-1]`.

__Exercise:__ Suppose that the variable $ p $ holds the string "thought". What
is the output of the following statements?


(a) `p[::2]`

(b) `p[1::4]`

(c) `p[2:4:2]`

(d) `p[2:5:2]`

(e) `p[2:6:2]`

(f) `p[2:7:2]`

(g) `p[::-1]`

(h) `p[4:0:-1]`

(i) `p[4::-2]`

(j) `p[1::-1]`

Here is a summary of the operations on strings that we have considered thus far:

| Operation       | Meaning                    |
| :--------------| :---------------------------|
| `+`             | concatenation              |
| `*`             | repetition                 |
| `<string>[]`    | indexing (accessing)       |
| `<string>[:]`   | slicing                    |
| `len(<string>)` | length                     |


## $ \S 3 $ Comparing strings

All of the comparison operators introduced in the previous notebook work for
strings as well. Strings are ordered according to the __lexicographic__ (or
__dictionary__) __order__.  Therefore:

* The operators `==` and `!=` tell whether two given strings have the
  same value or not, i.e., if they consist of exactly the same characters in the
  same order (this includes distinguishing uppercase from lowercase letters).
* When applied to two strings $ a $ and $ b $, `a < b` returns `True` if and
  only if $ a $ comes before $ b $ in the dictionary order. Similarly for
  `<=`, `>` and `>=`.

__Exercise:__ Let $ a,\,b,\,q,\,r $ be as defined in the code cell below.
Determine the value of:

(a) `a != b`

(b) `a < b`

(c) `b < q `

(d) `q >= r`

(e) `b < a < q < r`

In [None]:
a = "potion"
b = "portion"
q = "quarterstaff"  
r = "robe"

## $ \S 4 $ Some useful string methods

A __method__ is a function that is associated with every object of a given class.  A
method can access and directly modify an object's state without the
corresponding data having to be passed explicitly as an argument, as in the case
of functions. However, methods can also return values to the caller (not simply
modify the associated object). Methods are called using the notation
`<object>.<method>`.

As an example, suppose that we need to design a Python class to represent bank
accounts. Some natural candidates for methods (with self-descriptive names) 
that could be associated with instances of this class are:

* `deposit`
* `withdraw`
* `transfer`
* `check_balance`
* `print_statement`

Whenever we instantiate a member of this class (i.e., create a new account), say
`acc1`, it will automatically inherit all methods associated with that class. The
instruction `acc1.deposit(10)` would modify some internal attribute of this
specific account reflecting its balance so that $ 10 $ gets added to it. Of
course, this has no effect on the state of any other instance of the class, say
`acc2`. In contrast, the call `acc1.transfer(5, acc2)` would modify the states
of both accounts (by adding $ 5 $ to the balance of the second while while
subtracting $ 5 $ from the balance of the first) and might also return some value
such as a string indicating success or failure of the operation or the date and time.

Since text is one of the most common forms of data, being able to manipulate and
process it effectively is essential for many applications.  Python provides
several built-in methods for the string type. We mention some of the most
important ones very briefly so that the reader may get a rough idea of what is
possible (don't try to memorize their names!).

To get a full list of the methods and attributes associated to a given object $
x $ (such as a string), you can run `dir(x)` (note that the output may be
truncated if it is too long).  For more details and explanations on string
methods specifically, please refer to the official
[documentation](https://docs.python.org/3/library/stdtypes.html#string-methods).

__Examples of string methods:__

In [4]:
# Create a string:
s = "Shall I compare thee to a summer’s day? Thou art more lovely and more temperate."
print(s)

Shall I compare thee to a summer’s day? Thou art more lovely and more temperate.


In [6]:
# replace -- Replaces occurrences of a substring with another substring:
s_quite = s.replace("more", "quite")
print(s_quite)

Shall I compare thee to a summer’s day? Thou art quite lovely and quite temperate.


In [7]:
# find -- Returns the index of the first occurrence of a substring:
index = s.find("I")
print(index)

6


In [8]:
# count -- Counts the occurrences of a substring:
c = s.count("more")
print(c)

2


In [9]:
# strip, lstrip and rstrip -- Removes leading or trailing characters/whitespace:
s_with_spaces = "\t\n   " + s + "   "

# lstrip removes the specified characters from the beginning of the string.
# By default, it removes all whitespace characters (including '\n' and '\t'):
s_lstripped = s_with_spaces.lstrip()     
print(s_lstripped)

# 'rstrip' removes the specified characters from the end of the string:
s_rstripped = s.rstrip('et.')
print(s_rstripped)

Shall I compare thee to a summer’s day? Thou art more lovely and more temperate.   
Shall I compare thee to a summer’s day? Thou art more lovely and more tempera


In [10]:
# split -- Splits the string into a list using a delimiter (default = space):
words = s.split()
print(words)

['Shall', 'I', 'compare', 'thee', 'to', 'a', 'summer’s', 'day?', 'Thou', 'art', 'more', 'lovely', 'and', 'more', 'temperate.']


In [11]:
# join -- Joins elements of an iterable (e.g., list) using the string:
separator = " "    # This is the string to which join will be applied.
list_of_words = ["Parallel", "lines", "have", "so", "much", "in", "common."]

sentence = separator.join(list_of_words)
print(sentence)

Parallel lines have so much in common.


__Exercise:__ Given the string `gentle` below, study the effects of the following string methods on it:

(a) `capitalize`

(b) `title`

(c) `upper`

(d) `title`

(e) `center(<width>)`

In [34]:
gentle = "dO NoT gO geNTLe iNtO thAt gOOd nIGHt."