# Strings

## Table of Contents

- [1. Introduction](#1.-Introduction)
- [2. Strings](#2.-Strings)
- [3. Indexing](#3.-Indexing)
- [4. Slicing](#4.-Slicing)
- [5. Strings Immutability](#5.-Strings-Immutability)
- [6. String Operations](#6.-String-Operations)
- [7. String Methods](#7.-String-Methods)
- [8. Formatting](#8.-Formatting)
- [9. Summary](#9.-Summary)

## 1. Introduction

Humans communicate using natural languages such as English, Farsi, Mandarin, Turkish, Dutch, or Spanish.
When it comes to communicating with a machine, computer scientists designed and developed a set of so-called programming languages. 
However, these artificial languages must support the representation of regular natural communication—as in the end, humans are aiming at solving human-related problems via computers.

In this chapter, we introduce a very useful and frequent data type, namely the string type.
Strings have been created to represent natural language in its written form—that is, they are nothing more than regular text (as we often see it in books, newspapers, and chats with friends, among others).
They are called strings as they can be seen as a sequence or "string" of characters.
We have already seen some examples that include variables and values of such type, and probably, you already have an intuition on how to use it.
Besides getting familiar with the notion of this data type, we will also explore some of the operations we can perform with it and some additional hands-on examples.
Strings are also interesting because it allows us to apply the notion of *iteration*.

## 2. Strings

We have seen so far a number of basic data types: integers (`int`), floats (`float`), and Booleans (`bool`).
However, we will need additional data types to describe and manipulate other type of data (like regular text written in natural languages). 
For instance, if we want to calculate the average of a list of integers, the basic data types that we have covered so far are not sufficient.

We will start with the **string** type, which is a (non-basic) data type `str` in Python. 
A string represents a **sequence** of characters.
You can consider a *sequence* as a group of *ordered* elements.
In Python, strings are written surrounded by simple or double quotes.
there is no functional difference between both representations.
You can choose the style you prefer.

In [None]:
'This is a single-quoted string'

In [None]:
"This is a double-quoted string"

Python also supports representing empty strings.
You can do that as follows.

In [None]:
# Single-quoted version of an empty string
''

In [None]:
# Double-quoted version of an empty string
""

The empty string is still considered to have a value even if that value is just an empty sequence of characters.
In order words, it does not have a `None` value nor is of a `NoneType`.
You can verify this by checking the type of a variable that contains an empty string.

In [None]:
empty_str: str = ''
type(empty_str)

A common mistake we make when we are getting familiar with programming is forgetting to use the single or double quotes to define strings.

In [None]:
print(Print my string but forgot the quotes)

The previous text was conceived to be a string but the developer forgot the quotes.
This might happen to you as well.
When you get an error like the previous one, ensure all your strings have been defined using single or double quotes.
Let us correct our mistake.

In [None]:
print('Print my string but forgot the quotes')
print('Not this time :)')

## 3. Indexing

One of the operations that we can perform on strings is selecting one of the characters, via *indexing*.
Indexing looks as follows:

```python
string_variable_or_value[index_expression]
```
The first part will refer to a string variable or value (`string_variable_or_value`).
The expression in square brackets is called an **index** (`index_expression`). 
This expression must yield an integer value.
The index indicates which character in the sequence you want to extract (hence the name).
Let us see an example.

In [None]:
bike: str = 'gazelle'
letter: str = bike[1]
letter

The first assignment statement assigns the value "gazelle" to the variable `bike`.
The second statement selects the character at position `1` from the value stored in the `bike` variable, and assigns it to the variable `letter`. 
The index `1` does not yield the first letter of "gazelle" (i.e. "g"), but the second one.
This is because Python uses 0-based index.
The first letter of a string is obtained by index `0`.

The following table presents the index of each letter in the string `'gazelle'`.

| 0 | 1 | 2 | 3 | 4 | 5 | 6 |
|---|---|---|---|---|---|---|
| g | a | z | e | l | l | e |

Thus, `g` is the $0^{th}$ letter of `'gazelle'`, `a` is the $1^{st}$ letter, `z` is the $2^{th}$ letter, and so on. 

In [None]:
bike: str = 'gazelle'
first_letter: str = bike[0]
first_letter

Beware, the type hint of `letter` is `str`. 
In some programming languages individual characters have their own type (usually called char), which represents an integer.
However, this topic is out of the scope of this book.

You can also as an index an expression that contains variables and operators. Let us see some examples.

In [None]:
# Using a variable as index
i: int = 0
letter: str = bike[i]
print(letter)

In [None]:
# Using arithmetical expressions (including variables) as index
i: int = 0
j: int = 1

letter = bike[i + j]
print(letter)

letter = bike[i + j * 2]
print(letter)

Bear in mind that the value of the index **must** be an integer otherwise you get an error.

In [None]:
letter: str = bike[1.5]

<div class="alert alert-success">
    <b>Do It Yourself!</b><br>
    Assign the value 'Maurits Cornelis Escher' to the variable <i>artist</i>. Then, print the first letter of the first, middle and last name of the artist using indexing.
</div>

In [None]:
# Remove this line and add your code here

### String Length

Given a string, or a sequence of any type, we can compute its **length**, which is the number of characters or items in the string or sequence).
In Python, `len` is a built-in function to obtain the length of a sequence, and thus of a string.

In [None]:
bike: str = 'gazelle'
len(bike)

In [None]:
len('gazelle')

Given the fact that the first letter is accessed via the index `0`, the last letter is accessed via `len - 1`.

In [None]:
length: int = len(bike)
bike[length]

In [None]:
bike[length - 1]

<div class="alert alert-info">
    <b>Negative indices</b><br>
    Python allows a concise way to access sequence (or string) elements "from the back" by using negative numbers as index. Try to find out the values of <code>bike[-1]</code>, <code>bike[-2]</code>, and so on.
</div>


<div class="alert alert-success">
    <b>Do It Yourself!</b><br>
    Make sure you have declared the variable <i>artist</i> and you have assigned it the value 'Maurits Cornelis Escher'. Then, print the penultimate (second last) and antepenultimate (third last) letters of the artist name.
</div>

In [None]:
# Remove this line and add your code here

## 4. Slicing

We are now able to select individual characters of a string and to iterate over all characters of a string, but sometimes we want just a part (segment) of a string.
A segment of a string is called a **slice**.
A slice is obtained by giving a range of indices.

In the next cell, we show how we can obtain the segments `Data` and `Science` from the giving string.

In [None]:
ds_str: str = 'Data Science'

# Compute and print the length of the string
ln: int = len(ds_str)
print(ln)

# Slice the string
data: str = ds_str[0:4]
science: str = ds_str[5:ln]

# Print the "Data" part
print(data)
print(len(data))

# Print the "Science" part
print(science)
print(len(science))

The operator `[n:m]` returns the part of the string from the $n^{th}$ character to the $m^{th}$
character, including the first but excluding the last. 

Notice that:
- If you omit the first index (before the colon), the slice starts at the beginning of the string (`[:m]`).
- If you omit the second index, the slice goes to the end of the string (`[n:]`).

Beware, take care of the *start* and *end* indices of the string. 
This is a frequent source of errors.

In [None]:
ds_str: str = 'Data Science'

data: str = ds_str[:4]
science: str = ds_str[5:]

print(data)
print(science)

If the first index is greater than or equal to the second the result is an empty string, represented
by two quotation marks:

In [None]:
ds: str = 'Data Science'

data: str = ds[4:4]
data

<div class="alert alert-success">
    <b>Do It Yourself!</b><br>
    Assign the string 'skateboard' to the variable <i>word</i>. Now, print the first and second half in independent lines by just slicing the string and using the <i>len</i> function.  Hint: The number used for slicing should be a integer. Use int(half) to convert a float to an integer.
</div>

In [None]:
# Remove this line and add your code here

## 5. Strings Immutability

There is one additional and important string property we shall consider now.
That property is *immutability*.
**Immutability** means that the string can not be changed, it is not possible to replace an existing string, a single character, or a subsequence of characters by another value.
It also means that it is not possible to use the `[]` operator on the left hand side of an assignment.

In [None]:
greeting: str = 'Hello Data Scientist'
greeting[0] = 'h'

If you want to change a string you *must* create a new string.

In the cell below, we create a new string `new_greeting` by concatenating the letter `h` with the slice consisting of all characters of the original string except the first character.
The original string is not *changed*.

In [None]:
print(greeting)
new_greeting: str = 'h' + greeting[1:]
print(new_greeting)

<div class="alert alert-success">
    <b>Do It Yourself!</b><br>
    Assign the string 'break' to a variable. Replace the first letter by 'g' and the last one by 't'.
</div>

In [None]:
# Remove this line and add your code here

## 6. Sting Operations
In this section, we present some of the most used string operations when it comes to programming.

### The `in` Operator

The word `in` is a Boolean operator that takes two strings and returns `True` if the first appears as a substring in the second.

In [None]:
'zel' in 'gazelle'

In [None]:
'par' in 'gazelle'

For example, the following function prints all the letters from `word1` that also appear in `word2`.

In [None]:
def in_both(word1: str, word2: str) -> None:
    """
    Prints the letters that appear in both words.
    :param word1: first word
    :param word2: second word
    """
    for letter in word1:
        if letter in word2:
            print(letter) 

in_both('trek', 'gazelle')

<div class="alert alert-success">
    <b>Do It Yourself!</b><br>
    Count the number of vowels in the word 'supercalifragilisticexpialidocious' but this time use the <i>in</i> operator.
</div>

In [None]:
# Remove this line and add your code here

### String Comparison

An important operation on strings is checking whether strings are equal or not.
If you have to search for a certain word in a text or dictionary you will need such an operation. 

Python offers a number of relational operators that work on strings, for instance to check whether two strings are equal.

In [None]:
word: str = input('> ')
if word == 'apple':
    print('Hmmm, an apple!')

Other relational operations are useful for putting words in **lexicographical order** (or *dictionary order*).
In this type of order, digits precede letters, and uppercase letters precede lowercase characters. 
For instance, the word "Pineapple" comes before "apple" when using lexicographical order.

In [None]:
word: str = input('> ')
if word < 'apple':
    print('Your word, ' + word + ', comes before apple!')
elif word > 'apple':
    print('Your word, ' + word + ', comes after apple!')
else:
    print('Hmmm, an apple!')

<div class="alert alert-info">
    <b>Comparing uppercase and lowercase characters</b><br>
    If you do not want to make any distinction between uppercase and lowercase characters, you can convert strings to a standard format, such as all lowercase, before doing string comparison.
</div>

<div class="alert alert-success">
    <b>Do It Yourself!</b><br>
    In the following cell you will find three words. Can you print them in alphabetical order? Use the comparison operator for that purpose.
</div>

In [None]:
word1: str = 'purple'
word2: str = 'green'
word3: str = 'red'

# Remove this line and add your code here

## 7. String Methods

A string is an example of a Python **object**.
For now, an object is equivalent to a value.
However, it has more information than a normal value. 
An object contains *data* and a set of *methods*.
**Methods** are functions that are built into the object. 
We will get to know more about objects, methods, and other terms in coming chapters.
For now, we just need to have the intuition.

The Python function `dir` lists all the methods available for an object. 
Let us see the methods that an object of type *string* has.

In [None]:
text: str = 'Data Science'
dir(text)

As you can see, Python provides a whole collection of useful methods on strings.

Calling a method is similar to calling a function, the only difference is that you place first the name of the variable and then the name of the method separated by a dot. 
Something like `variable.method()`.
For instance, instead of the function syntax `upper(word)`, we use the method syntax `word.upper()`.

In [None]:
word: str = 'gazelle'
new_word: str = word.upper()
new_word

This form of dot notation specifies the name of the method (`upper`), and the name of the string to apply the method to (`word`). 
The empty parentheses indicate that this method takes no arguments.

A method call is called an **invocation**. 
In this case, we would say that we are invoking the method `upper` on `word`.

As it turns out, there is a string method named `find` that aims at finding a substring (including only one character) in another string and returning the index of the base string where the substring starts.

In [None]:
word: str = 'gazelle'
index: int = word.find('z')
index

The `find` method can also be directly invoked on a string **object**.

In [None]:
index: int = 'sparta'.find('par')
index

The `find` method can take 1 or 2 **optional arguments**.
- The *first optional argument* is the *index* where the search in the string object should **start** (inclusive).
- The *second optional argument* is the *index* where the search in the string object should **stop** (exclusive).

In [None]:
name: str = 'bob'
name.find('b', 1, 2)

This search fails because `b` does not appear in the index range from `1` to `2`, not including `2`.
Searching up to but not including the second index makes `find` consistent with the slice operator.

In [None]:
name[1:2].find('b')

## 8. Formatting

Strings are useful when they can be enriched with information coming from other expressions (including variables) implemented in the program.
This is what we call string **formatting**.
In Python, we have different options to format a string.
Here, we present three alternatives, namely, the *format operator* `%`, the *format method*, and f-strings.
You are free to use whichever you prefer, we personally prefer the latter due to its readability and understandability properties, among other reasons.

### Format Operator

With the **format operator** `%` we can build a string by replacing parts of it with data stored in variables.
Remember that when `%` is used with integers it is known as the *modulus operator*. 
When playing around with strings we call it the *format operator*.
To use them in this context, you should write:

In [None]:
placeholder: str = 'this'
'Your string here with a placeholder like %s' % placeholder

The first operand (before the format operator) should always be a string containing *format specifiers* or placeholders. 
The second operand is one or more variables. 
If you have more than one variable they should be stored in a tuple (we will talk about this data type in another chapter of the book).
Additionally, ensure that the number of format specifiers is equal to the number of variables, and the order is the same.

A format specifier is a marker starting with "%" that is immediately followed by a character (e.g. "s", "d").
Examples of supported format specifiers are "%d" to format an integer, "%g" to format floats, and "%s" to format strings.
The table below shows the most commonly used format specifiers.

| Format specifier | Description | Output |
|:--------------:|:------------|:-------:|
| `%e`  | Scientific format.     | `5.000000e+00` |
| `%s`  | Text or string value.  | `this is text` |
| `%d`  | Integer value.         | `456` |
| `%f`  | Floating-point number. | `3.141516` |

Let us see some other examples.

In [None]:
'Format %f,' % 45.6

We use the "%f" as a format specifier of a floating-point value.

In [None]:
days: int = 365
'A year has %d days' % days

"%d" is used as a placeholder for integer values.

In [None]:
who: str = 'Tom'
budget: float = 1.99999999
days: int = 365

'%s says that he is allowed to spend %g euros \
every single day of the %d days of the year.' % (who, budget, days)

You can include as many values as needed in your formatted string.
Be reminded that you can get an error if your don't write all needed elements to format the string.
Additionally, notice that we have used the character "\" at the end of the first string line to indicate that we are breaking the expression at that point, but we will continue in the next line. 
Without that character, you will get an error becuase the expression is interpreted as incomplete.

In [None]:
day: str = 'Monday'
hour: int = 5
place: str = 'the park'

'See you on %s at %d in %s' % (day, hour)

Or when you use a wrong format specifier(s).

In [None]:
day: str = 'Monday'
hour: int = 5
place: str = 'the park'

'Se you on %d at %d in %s' % (day, hour, place)

<div class="alert alert-success">
    <b>Do It Yourself!</b><br>
    Create three variables: the first one will contain your name, the second one your age, and the third one your passion. Print a string that says 'My name is <i>name</i>. I am <i>age</i> years old. And my passion is <i>passion</i>.' Use the format operator to create this string.
</div>

In [None]:
# Remove this line and add your code here

<div class="alert alert-info">
    <b>More information about the format operator</b><br>
    For more information on the format operator, 
    see <a href="https://docs.python.org/3/library/stdtypes.html#printf-style-string-formatting"><b>printf-style String Formatting</b></a>. 
</div>

<div class="alert alert-info">
    <b><code>format</code> method</b><br>
    A more powerful alternative is
the string format method, which you can read about at <a href="https://docs.python.org/3/library/stdtypes.html#str.format"><b><code>str.format()</code></b></a>. 
</div>

### The `format` Method

The `format` method is available to all string objects.
The string that uses the method should represent placeholders in th eform of curly braces (i.e. "{}").
You can include names within the curly braces to be clearer about the placeholder you want to replace.
The method receives as parameters a sequence of values or variables.
As it is the case for the format operator, ensure the number of placeholders is the same as the number of values, and that the order is the same (if you are not using identifiers within your placeholders).
Thus, the `format` method can be used as follows.

```python
'String with placeholder 1 ({}), placeholder 2 ({}), ... placeholder n ({})'.format(placeholder_1, placeholder_2, ..., placeholder_n)
```

Or as follows, if you prefer to use identifiers for your placeholders.

```python
'String with placeholder 1 ({placeh1}), placeholder 2 ({placeh2}), ... placeholder n ({placehn})'.format(placeh1=placeholder_1, placeh2=placeholder_2, ..., placehn=placeholder_n)
```

Let us now see the examples of the previous section translated into the `format` method vocabulary.

In [None]:
'Format {}'.format(45.6)

You can use the ":f" sequence within the curly braces to indicate that you want to display the value as a floating-point number (with 6 decimal digits by default). 

In [None]:
'Format {:f}'.format(45.6)

In [None]:
days: int = 365
'A year has {} days'.format(days)

In [None]:
who: str = 'Tom'
budget: float = 1.99999999
days: int = 365

'{} says that he is allowed to spend {} euros \
every single day of the {} days of the year.'.format(who, budget, days)

Same example using placeholder identifiers.

In [None]:
who: str = 'Tom'
budget: float = 1.99999999
days: int = 365

'{person} says that he is allowed to spend {budget} euros \
every single day of the {days} days of the year.' \
.format(person=who, budget=budget, days=days)

### F-strings

Another elegant way of formatting and creating string is via the **f-strings** or *literal string interpolation*.
F-strings are called this way because of the preceding `f` letter.
This language construct provides a convenient way of embedding expressions within a string.

They are built as follows:
```python
var1 = 'value1'
var2 = 'value2'
...

f'{var1 + 3} some text here {var2 * 2}'
```

Notice that the letter `f` is placed before the quotes are opened.
Then, within the string, expressions are placed within curly braces `{}`.
The expressions are evaluated during runtime, and the string is built.
The process of evaluating a string literal containing one or more placeholders that yield a value is known as **string interpolation**.

Let us rewrite the examples we introduced before with f-strings.

In [None]:
f'Format {45.6}'

In [None]:
days: int = 365
f'A year has {days} days'

In [None]:
who: str = 'Tom'
budget: float = 1.99999999
days: int = 365

f'{who} says that he is allowed to spend {budget:g} euros \
every single day of the {days} days of the year.'

<div class="alert alert-success">
    <b>Do It Yourself!</b><br>
    Create three variables: the first one will contain your name, the second one your age, and the third one your passion. Print a string that says 'My name is <i>name</i>. I am <i>age</i> years old. And my passion is <i>passion</i>.' Use f-strings to create this string.
</div>

In [None]:
# Remove this line and add your code here

## 9. Summary

In this chapter, we have learned that string is a built-in *data type* used to represent natural language text.
Particularly, text represented in such a way is called a string because it is a sequence (or "string") of characters.
The type is written `str` in Python.
We can either use *single or double quotes* to surround the text we want to represent as a string.

Given that a string is, in the end, a sequence, we can access specific elements of the sequence by *indexing* the original string (we specify a number or *index* within the "[]" characters).
We can also extract substrings from the original string by means of slicing it (we specify at least the starting end ending index from where we want to extract the *slice* or substring).
We have also discovered that strings are *immutable**; 
that is, they cannot be modified, rather you need to create a new string from scratch with the required modifications.

There are some Boolean operations you can perform with strings that are very handy when writing programs.
Specifically, the `in` and comparison operators are of great help.
Additionally, string values are really *objects* and objects expose a set of "special functions" called methods.
You can use all string functions when dealing with a string, for example, the `upper` or `find` *methods*.
Finally, strings are very useful when they can be formatted based on external values coming from other expressions or variables in your code. 
For that, we have different alternatives like the *format operator*, the *format method*, and *f-strings*.
However, we recommend using the latter.

Expect to use this data type quite often from now on. 
In the end, programs are written to help humans solve problems. 
Humans still need to communicate with each other via natural language, even if it is in textual form via a Jupyter notebook, website, mobile application, or any other sort of program.

---

This Jupyter Notebook is based on Chapter 6 of the book Python for Everybody and Chapter 8 of the book Think Python.

---

# (End of Notebook)

&copy; 2023 - **TU/e** - Eindhoven University of Technology