Strings
=====================

Introduction
------------

A string can be considered as a **container** of characters.

Each *character* has **two indices** (one positive and one negative).

This object is essential because it allows manipulation of the signifier. It also has special methods to work with it.

Syntax Considerations
--

A string is delimited by either a **"** or a **'**.

In [None]:
print('Ceci est une nouvelle chaîne de caractère')

It can span multiple lines if the line ends with **\\** (which does not create a line break).

In [None]:
print("Chaine de caractère\nNouvelle ligne. \
Ceci n'est pas une nouvelle ligne")

It can also be written across multiple lines like this:

In [None]:
print("Chaine de caractère\nNouvelle ligne. "
      "Ceci n'est pas une nouvelle ligne")

In [None]:
c = "a" "b"
print(c)

or if we use triple **"** or triple **'''**.

In [None]:
print("""Ceci est
une chaîne de caractères
sur plusieurs lignes""")

Manipulation Methods
--------------------


In [None]:
dir("")

Here are the available methods:

There are methods that allow working with strings:

* the **count** and **index** methods are similar to those of containers;
* the **+** operator allows concatenation and the **\*** operator allows repetition;

In [None]:
"xXx" + "O"

In [None]:
"xXx" * 8

* the **startswith** method allows you to check if the string *starts* with a substring;
* the **endswith** method allows you to check if the string *ends* with a substring;

In [None]:
"Ceci est une phrase".startswith("Ceci est")

In [None]:
"Ceci est une phrase".endswith("un mot")

* the **split** method allows you to split a string using a separator;
* the **join** method does the opposite and is called on the glue string.

In [None]:
"Voici des mots".split(" ")

In [None]:
"_".join(["a", "b", "c"])

* the **lower** method converts the string to lowercase;
* the **upper** method converts the string to uppercase;
* the **swapcase** method swaps lowercase and uppercase letters;
* the **title** method capitalizes the first letter of each word and makes the rest lowercase;
* the **capitalize** method capitalizes the first letter of the string and makes the rest lowercase;

In [None]:
'tEsT 42 tESt'.lower()

In [None]:
'tEsT 42 tESt'.upper()

In [None]:
'tEsT 42 tESt'.swapcase()

In [None]:
'tEsT 42 tESt'.title()

In [None]:
'tEsT 42 tESt'.capitalize()

Formatting Methods
---------------------

Here is a mix of existing methods:

In [None]:
'test'.center(30)

In [None]:
'test'.rjust(30)

In [None]:
"42".zfill(4)

In [None]:
'  -*- test -*- test -*-  '.strip()

In [None]:
import string
string.whitespace

In [None]:
print("truc\rautre chose")

In [None]:
print("deux choses\rtrois")

In [None]:
help("".rfind)

In [None]:
'  -*- test -*- test -*-  '.strip(' -*')

In [None]:
'  -*- test -*- test -*-  '.replace('*', '').replace('-', '').strip().replace("  ", " ")

In [None]:
import string
print(dir(string))

In [None]:
string.whitespace

In [None]:
'  \t\x0b test -*- test\n'.strip(string.whitespace)

In [None]:
print(string.printable)

String Formatting
--

### Purpose

In [None]:
def afficher_resultat(resultat):
    print("Le résulat de l'opération est %s." % resultat)

In [None]:
afficher_resultat(42)

This allows creating dynamic strings.

### Using the modulo operator

Here are the details of the syntax of the modulo operator:

In [None]:
'%s' % 'test'

In [None]:
'%s' % 42

In [None]:
"j'ai le %s et le %s !!" % (4, 2)

In [None]:
"%(decimal)s%(unite)s = %(decimal)s * 10 + %(unite)s" % {"decimal": 4, "unite": 2}

In [None]:
"%d" % 42.94

In [None]:
"%f" % 42

In [None]:
"Le résultat est : %6.2f" % 42

In [None]:
"%06.2f" % 42

In [None]:
"%-6.2f" % 42

In [None]:
"%+6.2f" % 42

In [None]:
"%6.2f" % 4264263.123456

Note that the modulo function works like **sprintf** in C, with some additional features, and there is also the `format` method which is similar to the one used to format strings in C++.

### `format` Method

In [None]:
"ceci est le {} du {}".format(4, 2)

In [None]:
"{1} {0}".format(4, 2)

In [None]:
"{1} {0} ({1})".format(4, 2)

In [None]:
"{a} {b}".format(a=4, b=2)

In [None]:
"{0:.2f}".format(42.345)

In [None]:
l = [1, 2, 3]
"{l[0]}".format(l=l)

In [None]:
"{l.append}".format(l=l)

In [None]:
d = {"a": 1, "b": 2}
"{d[a]}".format(d=d)

In [None]:
d = {"a": 1, "b": 2}
"{0[a]}".format(d)

In [None]:
name = 'Fred'
age = 42
f'He said his name is {name} and he is {age} years old.'

---

Character Manipulation
--

In [None]:
s = "String OF Characters"

In [None]:
s.lower()

In [None]:
print(s)

In [None]:
s = s.lower()

```mermaid
flowchart LR

S -.->|Old Pointer| M1[Memory area of 'String OF Characters']
S -->|New Pointer| M2[Memory area of 'string of characters']
```

A string is an **immutable** object. This means that its memory area cannot be modified.

Modifying a string actually creates a new one in a different memory area.

Therefore, to modify a string and keep track of it, reassignment is necessary — which is something you should definitely avoid with **mutable** objects.

Use Cases and Performance Considerations
--

In [None]:
def count_words1(sentence):
    return len(sentence.split())

In [None]:
def count_words2(sentence):
    return sentence.count(" ") + 1

In [None]:
count_words1("Ceci est une phrase avec sept mots")

In [None]:
count_words2("Ceci est une phrase avec sept mots")

In [None]:
# from timeit import timeit
%timeit count_words1("Ceci est une phrase avec sept mots")

In [None]:
%timeit count_words2("Ceci est une phrase avec sept mots")

In [None]:
def trunc_sentence1(sentence):
    return " ".join(sentence.split()[1:-1])

In [None]:
trunc_sentence1("Ceci est une phrase avec sept mots")

In [None]:
def trunc_sentence2(sentence):
    nb_espaces = sentence.count(" ")
    m = M = sentence.index(" ")
    for i in range(nb_espaces - 1):
        M = sentence.index(" ", M + 1)
    return sentence[m+1:M]

In [None]:
trunc_sentence2("Ceci est une phrase avec sept mots")

In [None]:
%timeit trunc_sentence1("Ceci est une phrase avec sept mots")

In [None]:
%timeit trunc_sentence2("Ceci est une phrase avec sept mots")

---