<img src="img/python-logo-notext.svg"
     style="display:block;margin:auto;width:10%"/>
<br>
<div style="text-align:center; font-size:200%;"><b>More Strings</b></div>
<br/>
<div style="text-align:center;">Dr. Matthias Hölzl</div>

# Comparison of strings

In [None]:
"a" == "a"

In [None]:
"A" == "a"

In [None]:
"A" < "B"

In [None]:
"A" < "a"

In [None]:
"a" < "A"

Strings are ordered as in the dictionary (lexicographically).

In [None]:
"ab" < "abc"

In [None]:
"ab" < "ac"

In [None]:
"ab" != "ac"

## Inset: sorting lists/iterables

Iterables can be sorted with the `sorted()` function. With the
named argument `key`, a function can be specified that determines
how the sorting is done:

In [None]:
numbers = [3, 8, -7, 1, 0, 2, -3, 3]

In [None]:
sorted(numbers)

In [None]:
sorted(numbers, key=abs)

In [None]:
strings = ["a", "ABC", "xy", "Asdfgh", "foo", "bar", "quux"]

In [None]:
sorted(strings)

In [None]:
def lower(my_string):
    return my_string.lower()

In [None]:
sorted(strings, key=lower)

In [None]:
sorted(strings, key=len)

## Comparison of Unicode strings

The built-in comparison functions work only for simple (ASCII) strings. For strings containing Unicode characters, sorting/comparison is more difficult.

The default module for dealing with Unicode in Python is `locale`; it relies on the locale settings of the
operating system:

In [None]:
import locale

locale.getlocale()

In [None]:
my_strings = ["o", "oa", "oe", "ö", "oz", "sa", "s", "ß", "ss", "sz"]

In [None]:
sorted(my_strings)

In [None]:
sorted(my_strings, key=locale.strxfrm)

In [None]:
locale.setlocale(locale.LC_COLLATE, "de_DE.UTF-8")

In [None]:
sorted(my_strings, key=locale.strxfrm)

In [None]:
locale.setlocale(locale.LC_COLLATE, "C")

In [None]:
sorted(my_strings, key=locale.strxfrm)

The locale settings are global per process and therefore mainly suitable
to interact with the user.

If you have to deal with strings in different languages,
using libraries like `PyUCA` (written in Python and therefore
easier to install) or `PyICU` (more complete implementation of the
Unicode specification based on a C++ library) might be a good idea.

## Converting a string to uppercase/lowercase

In [None]:
text = "Das ist ein Text"
print(text.lower())
print(text)

In [None]:
"Das ist ein Text".upper()

The `lower()` method does not always perform the desired conversions.
The `casefold` method is sometimes useful for this:

In [None]:
s1 = "daß er sehe"
s1.upper()

In [None]:
s1.lower()

In [None]:
s1.casefold()

## Mini workshop

- Notebook `workshop_070_more_strings_and_sorting`
- "Shout" section

# String literals (again)

- String literals are enclosed in single or double quotes
    - `"Hello, world!"`
    - `'Hello world!'`
    - It doesn't matter which form you choose, unless you want to
      have double quotes in the string
    - `"He says 'Huh?'"`
    - `'She replies: "Exactly."'`

- String literals, can contain Unicode characters:
    - `"おはようございます"`
    - `"😠🙃🙄"`

In [None]:
print("Er sagt 'Huh?'")
print('Sie antwortet: "Genau."')
print("おはようございます")
print("😠🙃🙄")

- Special characters can be specified with *escape notation*:
    - `
`, `\t`, `\`, `"`, `\'`, ...
    - `\u`, `\U` for Unicode code points (16 or 32 bit)
    - `\N{...}` for Unicode

In [None]:
print("a\tbc\td\n123\t4\t5")

In [None]:
print('"Let\'s go crazy", she said')

In [None]:
print("C:\\Users\\John")

In [None]:
print("\u0394 \u03b1 \t\U000003b2 \U000003b3")
print("\U0001F62E \U0001f61a \U0001f630")

In [None]:
print("\N{GREEK CAPITAL LETTER DELTA} \N{GREEK SMALL LETTER ALPHA}")
print("\N{smiling face with open mouth and smiling eyes} \N{winking face}")

- String literals can also be enclosed in triple quotes
- This type of literal can span multiple lines

In [None]:
"""Das ist
ein String-Literal,
das über mehrere
Zeilen geht."""

In [None]:
print(
    """Mit Backslash am Ende der Zeile \
kann der Zeilenvorschub unterdrückt werden."""
)

## Concatenation of strings

Strings can be concatenated with `+`:

In [None]:
"Ein" + " " + "String"

The `join()` method can be used to join multiple strings with a delimiter:

In [None]:
" ".join(["das", "sind", "mehrere", "strings"])

In [None]:
", ".join(["Pferd", "Katze", "Hund"])

In [None]:
"".join(["ab", "cde", "f"])

## Mini workshop

- Notebook `workshop_070_more_strings_and_sorting`
- Section "Welcome 1"

## String comparison (again)

To compare strings with Unicode characters, it is convenient to use them in
bring Unicode normal form.

In [None]:
s1 = "café"
s2 = "cafe\u0301"

In [None]:
print(s1, s2)
s1 == s2

In [None]:
import unicodedata

unicodedata.normalize("NFC", s1) == s1

In [None]:
unicodedata.normalize("NFC", s2) == s1

In [None]:
unicodedata.normalize("NFD", s1) == s2

# String interpolation: F strings

Python offers the possibility to insert values ​​of variables into strings:

In [None]:
name = "Hans"
zahl = 12
f"Hallo, {name}, die Zahl ist {zahl + 1}"

In [None]:
spieler_name = "Hans"
anzahl_spiele = 10
anzahl_gewinne = 2

ausgabe = f"Hallo {spieler_name}!\nSie haben {anzahl_spiele}-mal gespielt und dabei {anzahl_gewinne}-mal gewonnen."
print(ausgabe)

In [None]:
ausgabe = f"""\
Hallo {spieler_name}!
Sie haben {anzahl_spiele}-mal gespielt \
und dabei {anzahl_gewinne}-mal gewonnen.\
"""
print(ausgabe)

In [None]:
ausgabe = (
    f"Hallo {spieler_name}!\n"
    f"Sie haben {anzahl_spiele}-mal gespielt "
    f"und dabei {anzahl_gewinne}-mal gewonnen."
)
print(ausgabe)

## Mini workshop

- Notebook `workshop_070_more_strings_and_sorting`
- Section "Welcome 2"

## Mini workshop

- Notebook `workshop_070_more_strings_and_sorting`
- Section "Pirates 4"

## Finding substrings in strings

The `in` operator also works with strings as arguments. To find the index
of a substring in a string you can use the `index()` method.

In [None]:
"a" in "abc"

In [None]:
"x" not in "abc"

In [None]:
"bc" in "abc"

In [None]:
"cb" in "abc"

In [None]:
"Halloween".index("Hallo")

In [None]:
"Halloween".index("we")

In [None]:
# "Team".index("I")

## Workshop

- Notebook `lecture_920x_Workshop_Caesar_Encryption`