# String operations

by Koenraad De Smedt at UiB

---
Strings are immutable sequences of characters. There are many operations on strings which are useful for text processing. In this notebook you will learn the following basic kinds of operations.

1.   Taking ‘slices’ of strings
2.   Case conversion
3.   Performing tests on strings
4.   Joining, stripping, replacing and translating strings

A later notebook will introduce regular expressions for searching and replacing patterns in strings.

---


##Slicing by index

We can make new strings by taking *slices* out of strings, from a starting position to an ending position.

The following diagram shows all *indexes* where the string `'Digital'` can be sliced. A string always starts from index 0.

<img src="https://git.app.uib.no/desmedt/teaching/-/raw/main/Digital-slicing.png" alt = "slicing" width = 420px>

So, we can make a new string `'git'` by taking a slice from index 2 to index 5. This is written as `[2:5]`.

<img src="https://git.app.uib.no/desmedt/teaching/-/raw/main/Digital-slicing-25.png" alt = "slicing from 2 to 5" width = 420px>

In [None]:
word = 'Digital'
word[2:5]

If you omit the start index, the default is 0. If you omit the end index, the default is the end of the string.

In [None]:
print(word[:5])
print(word[5:])

To get a slice of one character, write the index where that character *starts*, without a colon.

In [None]:
word[0]

In [None]:
word[6]

After the last position, there is nothing. It is an error to try and take a slice at an index beyond the last character.

In [None]:
word[7]

Indices may be negative, for counting from the end.

In [None]:
word[-2:]

In [None]:
word[:-2]

## Case conversion methods

Datatypes may have *methods* associated with them. These are like functions but they are attached to objects like suffixes with periods. Note the obligatory parentheses.

Strings have a method `.upper()` which returns a new version of the string with all capitals. The original string remains unchanged.

In [None]:
word.upper()

Likewise, `.lower()` returns a version of the string with small letters.

In [None]:
word.lower()

The original string is still there, unchanged.

In [None]:
word

The `.title()` method returns a version of the string where the initial letter of every word is a capital and the remainder is lowercase.

In [None]:
'programming made EASY'.title()

In contrast, the `.capitalize()` method returns a version of the string where the initial letter of the string is a capital and the rest is lowercase.

In [None]:
'it was a dark and stormy night...'.capitalize()

Obviously, case conversion only has an effect for writing systems which have a case distinction. See the notebook on *Writing Systems* for details.

##Tests


Strings (and string slices) can be compared for equality, which is case-sensitive, which means that capitals and small letters are different.

In [None]:
word[:5] == 'Digit'

In [None]:
word[:5] == 'digit'

Check if a string is included in another string.

In [None]:
'sáme' in 'Davvisámegiella'

Diacritics make a difference.

In [None]:
'same' in 'Davvisámegiella'


Check if all *cased characters* in the string are uppercase and there is at least one cased character (that is, a character for which uppercase and lowercase versions exist).

In [None]:
print('MONTY PYTHON'.isupper())
print('UiB'.isupper())

Similarly for lowercase.

In [None]:
print('low, lower and lowest'.islower())
print('iPhone'.islower())

Check if all characters in a string are alphanumeric. Emojis, spaces, punctuation etc. are not alphanumeric.

In [None]:
print('Python 3'.isalnum())
print('😟🙂'.isalnum())

##Join
The `.join` method makes a new string by inserting a given separator string between all elements of a sequence given as argument.

In [None]:
' & '.join('abcde')

Join with *newline*.

In [None]:
'\n'.join('abcde')

Printing actually renders the newlines as separate lines.

In [None]:
print('\n'.join('abcde'))

##Strip, replace and translate
Make a new string in which whitespace (spaces, tabs, newlines) are stripped from the beginning and end of the string. By the way, if a string spans several lines, it must be enclosed in *triple* quotes (single or double).

In [None]:
poemtitle = '''
     The Raven. A poem by Edgar Allan Poe.

'''

poemtitle.strip()

The `replace()` method takes two arguments. It makes a new string in which each occurrence of a substring (first argument) is replaced by another string (second argument).

In [None]:
'Den hvite hvalen'.replace('hv', 'kv')

Operations can be stacked. The following first strips and then replaces.

In [None]:
print(poemtitle.strip().replace('Allan', 'A.'))

Sometimes it is useful to specify multiple replacements of single characters at the same time. For instance, in the [transcription from a DNA template to an RNA molecule](https://qph.fs.quoracdn.net/main-qimg-e00cf258b6278e83e599288017229563-c), the nucleotides are transcribed as follows: *A* → *U*, *C* → *G*, *T* → *A* and *C* → *G*. The following makes a translation table from two arguments: the characters to be translated and their translations in the same order.

In [None]:
dnatemplate = 'ATGTATAACGTGGCGTAAGCGTACGCTATAGCCTGA'
transtable = str.maketrans('ACTG','UGAC')
dnatemplate.translate(transtable)

## Type coercion

The `int` function coerces its argument to an integer, so that it can be used in mathematical operations.

In [None]:
int('1001 Nights'[:4]) * 2

The `str` function does the opposite: it coerces its argument to a string if possible.

In [None]:
str(1001 * 2) + ' Nights'

### Exercises:

1.   Check if `word[0]` gives the same result as `word[0:1]`. Explain why this is so.
2.   When may casefolding be useful and when not?
3.   How can you disregard case differences when using `in`?
4.   Test the methods `.isalpha()` and `.isdigit()` on strings to check if they are all alphabetical or all digits, respectively.
5.   Using the variable `nada` (below), make a string in which each word is on a separate line. Print the result.
6.   (optional) Translate a string using a [Caesar cipher](https://en.wikipedia.org/wiki/Caesar_cipher). How can you decode the resulting string?

In [None]:
nada = 'Nada se hace de la nada'