# String operations

by Koenraad De Smedt at UiB

---
There are many ways to access, manipulate and test strings. String operations are the basis of many kinds of text processing. This notebook introduces some operations on strings. A later notebook will introduce regular expressions for searching and replacing patterns in strings.

For some information on how strings in non-Latin scripts are handled in Python, see the *Note on Scripts*.

---


##Slicing by index

We can make new strings by taking *slices* out of strings, from a starting position to an ending position. 

The following diagram shows all *indexes* where the string `'Digital'` can be sliced. **A string always starts from index 0**.

<img src="https://git.app.uib.no/desmedt/teaching/-/raw/main/Digital-slicing.png" alt = "slicing" width = 420px>

So, in order to take the slice `'git'`, we need to start at index 2 and stop at index 5. This is written as `[2:5]`.

<img src="https://git.app.uib.no/desmedt/teaching/-/raw/main/Digital-slicing-25.png" alt = "slicing from 2 to 5" width = 420px>

In [None]:
name = 'Digital Humanities'
name[2:5]

If you omit the start index, the default is 0. If you omit the end index, the default is the end of the string.

In [None]:
print(name[:5])
print(name[5:])

To get a slice with one character, write the index where that character starts, without a colon.

In [None]:
name[0]

In [None]:
name[6]

After the last position, there is nothing. It is an error to specify an index that points beyond the last character.

In [None]:
'Digital'[7]

Indices may be negative, for counting from the end.

In [None]:
name[-5:]

In [None]:
name[:-5]

Strings can be compared for equality, which is case-sensitive, which means that capitals and small letters are really different.

In [None]:
name[:5] == 'Digit'

In [None]:
name[:5] == 'digit'

##Case conversion methods

Datatypes may have *methods* associated with them. These are like functions but they are attached like suffixes with periods to objects. 

Strings have a method `.upper()` which converts a string to all capitals. This returns a *new* string while the original string remains unchanged.

In [None]:
name.upper()

Likewise, `.lower()` converts a string to small letters. These operations result in new strings; the original string does not change.

In [None]:
name.lower()

:The `.title()` method returns a version of the string where the initial letter of every word is a capital and the remainder is lowercase.

In [None]:
book = 'programming made EASY'
print(book.title())

In contrast, the `.capitalize()` method capitalizes only the first letter of the string while the rest is lowercase.

In [None]:
'it was a dark and stormy night.'.capitalize()

Obviously, case conversion only has an effect for writing systems which have a case distinction. See also the notebook on writing systems (scripts).

##Tests

Check if a string is all uppercase or all lowercase.

In [None]:
print('NTNU'.isupper())
print('Python'.islower())

Check if all characters in a string are alphanumeric.

In [None]:
print('Ferdinand'.isalnum())
print('Ferdinand de Saussure'.isalnum())
print('Python3'.isalnum())
print('😀'.isalnum())

Check if a string is included in another string.

In [None]:
'smør' in 'julesmørbrød med roastbiff'

In [None]:
'smör' in 'julesmørbrød med roastbiff'

##Join
The `.join` method makes a new string by inserting a given separator string between all elements of a sequence.

In [None]:
' & '.join('abcde')

Join with *newline*.

In [None]:
'\n'.join('abcde')

Printing actually renders the newlines as separate lines.

In [None]:
print('\n'.join('abcde'))

##Strip and replace
Strip whitespace (including newlines) from the beginning and end of the string. By the way, if a string spans several lines, it must be enclosed in *triple* quotes.

In [None]:
'''
     Mål og meining.
'''.strip()

Use `replace()` to replace each occurrence of a substring by another string.

In [None]:
'Den hvite hvalen'.replace('hv', 'kv')

##Type coercing
The `str` function coerces its argument to a string if possible. Look at the differences.


In [None]:
x = 9 * 2
y = 9 ** 2
print(x + y)
str(x) + str(y)

Conversely, the `int` function coerces its argument to an integer, so that it can be used in mathematical operations.

In [None]:
int('39') + 1

### Exercises:

1.   Using the variable `name`, as defined above, write code to obtain the string `human`.
2.   When may casefolding be useful and when not?
3.   How can you disregard case differences when using `in`?
4.   Test the methods `.isalpha()` and `.isdigit()` on strings to check if they are all alphabetical or all digits, respectively.
5.   Using the following variable, make a string in which each word is on a separate line. Print the result.

> `s = 'A mushroom is not a plant'`