Strings
=====================

Introduction
------------

A string can be considered as a **container** of characters.

Each *character* has **two indices** (one positive and one negative).

This object is essential because it allows manipulation of the signifier. It also has special methods to work with it.

Syntax Considerations
--

A string is delimited by either a **"** or a **'**.

In [1]:
print('Ceci est une nouvelle chaîne de caractère')

Ceci est une nouvelle chaîne de caractère


It can span multiple lines if the line ends with **\\** (which does not create a line break).

In [2]:
print("Chaine de caractère\nNouvelle ligne. \
Ceci n'est pas une nouvelle ligne")

Chaine de caractère
Nouvelle ligne. Ceci n'est pas une nouvelle ligne


It can also be written across multiple lines like this:

In [3]:
print("Chaine de caractère\nNouvelle ligne. "
      "Ceci n'est pas une nouvelle ligne")

Chaine de caractère
Nouvelle ligne. Ceci n'est pas une nouvelle ligne


In [4]:
c = "a" "b"
print(c)

ab


or if we use triple **"** or triple **'''**.

In [5]:
print("""Ceci est
une chaîne de caractères
sur plusieurs lignes""")

Ceci est
une chaîne de caractères
sur plusieurs lignes


Manipulation Methods
--------------------


In [6]:
dir("")

['__add__',
 '__class__',
 '__contains__',
 '__delattr__',
 '__dir__',
 '__doc__',
 '__eq__',
 '__format__',
 '__ge__',
 '__getattribute__',
 '__getitem__',
 '__getnewargs__',
 '__getstate__',
 '__gt__',
 '__hash__',
 '__init__',
 '__init_subclass__',
 '__iter__',
 '__le__',
 '__len__',
 '__lt__',
 '__mod__',
 '__mul__',
 '__ne__',
 '__new__',
 '__reduce__',
 '__reduce_ex__',
 '__repr__',
 '__rmod__',
 '__rmul__',
 '__setattr__',
 '__sizeof__',
 '__str__',
 '__subclasshook__',
 'capitalize',
 'casefold',
 'center',
 'count',
 'encode',
 'endswith',
 'expandtabs',
 'find',
 'format',
 'format_map',
 'index',
 'isalnum',
 'isalpha',
 'isascii',
 'isdecimal',
 'isdigit',
 'isidentifier',
 'islower',
 'isnumeric',
 'isprintable',
 'isspace',
 'istitle',
 'isupper',
 'join',
 'ljust',
 'lower',
 'lstrip',
 'maketrans',
 'partition',
 'removeprefix',
 'removesuffix',
 'replace',
 'rfind',
 'rindex',
 'rjust',
 'rpartition',
 'rsplit',
 'rstrip',
 'split',
 'splitlines',
 'startswith',
 'stri

Here are the available methods:

There are methods that allow working with strings:

* the **count** and **index** methods are similar to those of containers;
* the **+** operator allows concatenation and the **\*** operator allows repetition;

In [7]:
"xXx" + "O"

'xXxO'

In [8]:
"xXx" * 8

'xXxxXxxXxxXxxXxxXxxXxxXx'

* the **startswith** method allows you to check if the string *starts* with a substring;
* the **endswith** method allows you to check if the string *ends* with a substring;

In [9]:
"Ceci est une phrase".startswith("Ceci est")

True

In [10]:
"Ceci est une phrase".endswith("un mot")

False

* the **split** method allows you to split a string using a separator;
* the **join** method does the opposite and is called on the glue string.

In [11]:
"Voici des mots".split(" ")

['Voici', 'des', 'mots']

In [12]:
"_".join(["a", "b", "c"])

'a_b_c'

* the **lower** method converts the string to lowercase;
* the **upper** method converts the string to uppercase;
* the **swapcase** method swaps lowercase and uppercase letters;
* the **title** method capitalizes the first letter of each word and makes the rest lowercase;
* the **capitalize** method capitalizes the first letter of the string and makes the rest lowercase;

In [1]:
'tEsT 42 tESt'.lower()

'test 42 test'

In [2]:
'tEsT 42 tESt'.upper()

'TEST 42 TEST'

In [3]:
'tEsT 42 tESt'.swapcase()

'TeSt 42 TesT'

In [4]:
'tEsT 42 tESt'.title()

'Test 42 Test'

In [5]:
'tEsT 42 tESt'.capitalize()

'Test 42 test'

Formatting Methods
---------------------

Here is a mix of existing methods:

In [6]:
'test'.center(30)

'             test             '

In [21]:
'test'.rjust(30)

'                          test'

In [22]:
"42".zfill(4)

'0042'

In [23]:
'  -*- test -*- test -*-  '.strip()

'-*- test -*- test -*-'

In [24]:
import string
string.whitespace

' \t\n\r\x0b\x0c'

In [25]:
print("truc\rautre chose")

autre chose


In [26]:
print("deux choses\rtrois")

troischoses


In [27]:
help("".rfind)

Help on built-in function rfind:

rfind(...) method of builtins.str instance
    S.rfind(sub[, start[, end]]) -> int
    
    Return the highest index in S where substring sub is found,
    such that sub is contained within S[start:end].  Optional
    arguments start and end are interpreted as in slice notation.
    
    Return -1 on failure.



In [28]:
'  -*- test -*- test -*-  '.strip(' -*')

'test -*- test'

In [29]:
'  -*- test -*- test -*-  '.replace('*', '').replace('-', '').strip().replace("  ", " ")

'test test'

In [30]:
import string
print(dir(string))

['Formatter', 'Template', '_ChainMap', '__all__', '__builtins__', '__cached__', '__doc__', '__file__', '__loader__', '__name__', '__package__', '__spec__', '_re', '_sentinel_dict', '_string', 'ascii_letters', 'ascii_lowercase', 'ascii_uppercase', 'capwords', 'digits', 'hexdigits', 'octdigits', 'printable', 'punctuation', 'whitespace']


In [31]:
string.whitespace

' \t\n\r\x0b\x0c'

In [32]:
'  \t\x0b test -*- test\n'.strip(string.whitespace)

'test -*- test'

In [33]:
print(string.printable)

0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ!"#$%&'()*+,-./:;<=>?@[\]^_`{|}~ 	



String Formatting
--

### Purpose

In [None]:
def afficher_resultat(resultat):
    print("Le résulat de l'opération est %s." % resultat)

In [None]:
afficher_resultat(42)

This allows creating dynamic strings.

### Using the modulo operator

Here are the details of the syntax of the modulo operator:

In [7]:
'%s' % 'test'

'test'

In [8]:
'%s' % 42

'42'

In [9]:
"j'ai le %s et le %s !!" % (4, 2)

"j'ai le 4 et le 2 !!"

In [10]:
"%(decimal)s%(unite)s = %(decimal)s * 10 + %(unite)s" % {"decimal": 4, "unite": 2}

'42 = 4 * 10 + 2'

In [11]:
"%d" % 42.94

'42'

In [12]:
"%f" % 42

'42.000000'

In [13]:
"Le résultat est : %6.2f" % 42

'Le résultat est :  42.00'

In [14]:
"%06.2f" % 42

'042.00'

In [15]:
"%-6.2f" % 42

'42.00 '

In [16]:
"%+6.2f" % 42

'+42.00'

In [17]:
"%6.2f" % 4264263.123456

'4264263.12'

Note that the modulo function works like **sprintf** in C, with some additional features, and there is also the `format` method which is similar to the one used to format strings in C++.

### `format` Method

In [18]:
"ceci est le {} du {}".format(4, 2)

'ceci est le 4 du 2'

In [19]:
"{1} {0}".format(4, 2)

'2 4'

In [20]:
"{1} {0} ({1})".format(4, 2)

'2 4 (2)'

In [21]:
"{a} {b}".format(a=4, b=2)

'4 2'

In [22]:
"{0:.2f}".format(42.345)

'42.34'

In [23]:
l = [1, 2, 3]
"{l[0]}".format(l=l)

'1'

In [24]:
"{l.append}".format(l=l)

'<built-in method append of list object at 0x7fc1b4157940>'

In [25]:
d = {"a": 1, "b": 2}
"{d[a]}".format(d=d)

'1'

In [26]:
d = {"a": 1, "b": 2}
"{0[a]}".format(d)

'1'

In [27]:
name = 'Fred'
age = 42
f'He said his name is {name} and he is {age} years old.'

'He said his name is Fred and he is 42 years old.'

---

Character Manipulation
--

In [40]:
s = "String OF Characters"

In [41]:
s.lower()

'string of characters'

In [42]:
print(s)

String OF Characters


In [43]:
s = s.lower()

```mermaid
flowchart LR

S -.->|Old Pointer| M1[Memory area of 'String OF Characters']
S -->|New Pointer| M2[Memory area of 'string of characters']
```

A string is an **immutable** object. This means that its memory area cannot be modified.

Modifying a string actually creates a new one in a different memory area.

Therefore, to modify a string and keep track of it, reassignment is necessary — which is something you should definitely avoid with **mutable** objects.

Use Cases and Performance Considerations
--

In [None]:
def count_words1(sentence):
    return len(sentence.split())

In [None]:
def count_words2(sentence):
    return sentence.count(" ") + 1

In [None]:
count_words1("Ceci est une phrase avec sept mots")

In [None]:
count_words2("Ceci est une phrase avec sept mots")

In [None]:
# from timeit import timeit
%timeit count_words1("Ceci est une phrase avec sept mots")

In [None]:
%timeit count_words2("Ceci est une phrase avec sept mots")

In [None]:
def trunc_sentence1(sentence):
    return " ".join(sentence.split()[1:-1])

In [None]:
trunc_sentence1("Ceci est une phrase avec sept mots")

In [None]:
def trunc_sentence2(sentence):
    nb_espaces = sentence.count(" ")
    m = M = sentence.index(" ")
    for i in range(nb_espaces - 1):
        M = sentence.index(" ", M + 1)
    return sentence[m+1:M]

In [None]:
trunc_sentence2("Ceci est une phrase avec sept mots")

In [None]:
%timeit trunc_sentence1("Ceci est une phrase avec sept mots")

In [None]:
%timeit trunc_sentence2("Ceci est une phrase avec sept mots")

---