# Python: Strings

### String

`String`. A series of characters.
For example, `"aaabbbccc"`, `"48Dfua@~!0H"`, `"m"`, `"How are you?"`</br>
<li>The length of a string is the number of characters. A special case is an empty string: 
    a string with no characters, a string of zero length. It is usually marked as Ø.</li>

### Character

`Character`. The basic unit of a string.

For example, the characters of the string `"How are you?"` are in the following order: `H`, `o`, `w`, ` `, `a`, `r`, `e`, ` `, `y`, `o`, `u`, `?` . The first character is `H`, the fourth character is a blank symbol, and the last character is a question mark `?` 

For example, the second character of the string `"你好嗎？"` the second character is `"好"` and the fourth character is the full form question mark `？`.

The ASCII Table specifies the basic 128 characters in a computer, including upper and lower case letters, punctuation, Arabic numbers, mathematical operators, and miscellaneous symbols.

### Substring

`Substring`. A string within a string.

For example, the substrings of `algo` are `Ø`, `a`, `l`, `g`, `o`, `al`, `lg`, `go`, `alg`, `lgo`, `algo`.

### Prefix

`Prefix`. A string that starts with a piece of string.

For example, the prefixes of `algo` are `Ø`, `a`, `al`, `alg`, `algo`.

### Suffix ###

`Suffix`. The string at the end of a string.

For example, the suffixes of `algo` are `Ø`, `o`, `go`, `lgo`, `algo`.

### Sequence ###

`Sequence`. A series of characters.

For example, `aaabbbccc`, `48Dfua@~!0H`, `m`, How are you?

A character is actually a number in a computer. A string is actually a series of numbers in a computer.

The only difference between a string and a number series is that a string has **a finite number of characters** and a series has **an infinite number of numbers**. Without this difference, strings and series are identical.


### Subsequence ###

`Subsequence`. A string of characters that is formed by picking characters from left to right.

For example, the subsequences of `algo` are `Ø`, `a`, `l`, `g`, `o`, `al`, `ag`, `ao`, `lg`, `lo`, `go`, `alg`, `alo`, `ago`, `lgo`, `algo.`



In [1]:
a = '5'
b = "7"
print(a + b)

57


Numbers inside apostrohes are strings, not numbers.<br>
Using a `+` as operator between two strings will join them together. This is called **concatenation** of two strings. We can create a sequence of text through concatenating several strings.

In [2]:
seq1 = 'Rose'
seq2 = ' ' + "is" + ''' a rose'''
print(seq1)
print(seq2)
print(seq1 + seq2)

Rose
 is a rose
Rose is a rose


We can't subtract or divide a string, but we can multiply it:

In [3]:
sequence = seq1 + 2 * seq2
print(sequence)

Rose is a rose is a rose


<div class="alert alert-info alert-success">
    Internally Python stores strings in form of arrays (something like a list), so in fact strings consist of sequences of single characters.<br>
Thus we can access them likewise:
    </div>

In [4]:
print(sequence[3])

e


In [5]:
print(sequence[10:15])

rose 


<div class="alert alert-box alert-info">
    Task: Write a for-loop, iterate over the sequence called <code>sequence</code> and print each character in a new line.
</div>

In [6]:
for c in sequence:
    print(c)

R
o
s
e
 
i
s
 
a
 
r
o
s
e
 
i
s
 
a
 
r
o
s
e


Same as with lists, we can get the length of a string with `len()`:

In [7]:
print(len(sequence))

24


### Intermezzo: New casting methods

The casting methods

```python
int()
float()
str()
```
are already known.<br>
<br>
Furthermore we can cast between characters and their corresponding [ASCII](https://www.asciitable.com/) value in both directions with
```python
ord() # Character -> ASCII value.

# and

chr() # ASCII value to character.
```

In [8]:
print(ord('a'))
print(ord('ü'))
print(ord('我'))

97
252
25105


In [9]:
chr(97)

'a'

In [10]:
# printing out every character
for i in range(97, 97+26):
    print(chr(i), end=' ')

a b c d e f g h i j k l m n o p q r s t u v w x y z 

### String methods

We can get a list of built-in methods with
```python
help(str)
```

In [11]:
print(sequence.replace('is', 'was'))
print(sequence)

Rose was a rose was a rose
Rose is a rose is a rose


In [12]:
# Override sequence:
sequence = sequence.upper()
print(sequence)

ROSE IS A ROSE IS A ROSE


In [13]:
sequence = sequence.lower()
print(sequence)

rose is a rose is a rose


In [14]:
sequence = sequence.capitalize()
print(sequence)

Rose is a rose is a rose


#### Return the index (start) of a substring

In [15]:
filepath = 'data/wiki_selection.txt'
print(filepath.find('/'))
print(filepath.rfind('/')) # rfind searches from the end.

4
4


### Split a string

If we `split` a string into substrings, we'll receive a **list** of these substrings.

In [16]:
sequence = 'Rose is a rose is a rose'
print(type(sequence))

<class 'str'>


In [17]:
sequence = sequence.split()
print(type(sequence))
print(sequence)

<class 'list'>
['Rose', 'is', 'a', 'rose', 'is', 'a', 'rose']


The method `split()` without argument splits at every whitespace. These **separators** / **delimiters** are not included into the result.<br>

Split a string of multiple lines into a list of single lines:

In [18]:
sequence = 'Rose is a rose is a rose'
lines = (sequence+'\n')*3
print(lines)
lines = lines.splitlines() # same as lines.split('\n')
print(lines)

Rose is a rose is a rose
Rose is a rose is a rose
Rose is a rose is a rose

['Rose is a rose is a rose', 'Rose is a rose is a rose', 'Rose is a rose is a rose']


### Join a list to a string

We can use the string method `join()` to join the elements of a list into one string back again. As it is a method of the class `str`, we have to call it on a string. This string (in the following example `' '` (a whitespace) is inserted in between all elements of the list.

In [19]:
print(sequence)
sequence = sequence.split()
print(type(sequence))
print(sequence)

sequence = ' '.join(sequence)
print(sequence)
print(type(sequence))

Rose is a rose is a rose
<class 'list'>
['Rose', 'is', 'a', 'rose', 'is', 'a', 'rose']
Rose is a rose is a rose
<class 'str'>


<div class="alert alert-box alert-info">
    Task: Split the string <code>Rose is a rose is a rose</code> with a choosen separator into a list.<br>
    Then join this list with a choosen separator back into a sequence, so that the result is <code>Rose is a rose is a rose</code> back again.
</div>

### String conditions

In [20]:
a = 'elephant'
a.endswith('ant')

True

In [21]:
a = 'elephant'
a.startswith('eleph')

True

In [22]:
a = 'elephant'
a.isalnum()

True

In [23]:
a = 'elephant'
print(a.isalpha())

True


In [24]:
a = '123'
print(a.isdigit())

True


In [25]:
a = '0.123'
print(a.isdecimal())

False


In [26]:
a = '10e-4'
print(a.isnumeric())

False


### Substrings

In [27]:
a = 'elephant'
'ant' in a

True

In [28]:
sequence = 'Rose is a rose is a rose'
sequence.count('ose')

3

In [29]:
filename = 'ro.jpgses.jpg'
print(filename.removesuffix('.jpg')) # This method is new to Python 3.9 and does not work with older versions!

AttributeError: 'str' object has no attribute 'removesuffix'

In [None]:
filename = 'ro.jpgses.jpg'
filename.replace('.jpg', '') # will replace both 

### String formatting

%-operator style

In [None]:
name = '🐍'
print('Hello, %s' % name)

<br>
.format style

In [None]:
name = '🐍'
print('Hello, {}'.format(name))

print('A rose is a {} is a {}'.format('rose', 'hose'))

print('A Rose is a {val1} is a {val2}'.format(val1='rose', val2='🌷'))

<br>
literal string interpolation (Python 3.6+)

In [None]:
print(f'Hello, {name}!')

# it's possible to embedd Python expressions
a = ' is a rose'
print(f'A Rose{a * 2}.')