# Strings

In the section on Basic Data Types in Python, you learned how to define **strings**: objects that contain sequences of character data. Processing character data is integral to programming. It is a rare application that doesn’t need to manipulate strings at least to some extent.

## String Manipulation

The sections below highlight the operators, methods, and functions that are available for working with strings.

### String Operators

You have already seen the operators `+` and `*` applied to numeric operands in the section on Operators and Expressions in Python. These two operators can be applied to strings as well.

In [1]:
s = 'foo'
t = 'bar'
u = 'baz'

In [2]:
s + t

'foobar'

In [3]:
s + t + u

'foobarbaz'

In [101]:
print('Go team' + '!!!')

Go team!!!


### The * Operator

The `*` operator creates multiple copies of a string. If `s` is a string and `n` is an integer, either of the following expressions returns a string consisting of `n` concatenated copies of `s`:

```python
s * n
n * s
```

Here are examples of both forms:

In [106]:
s = 'foo.'

In [107]:
s * 4

'foo.foo.foo.foo.'

In [109]:
4 * s

'foo.foo.foo.foo.'

The multiplier operand `n` must be an integer. You’d think it would be required to be a positive integer, but amusingly, it can be zero or negative, in which case the result is an empty string:

In [9]:
'foo' * -8

''

If you were to create a string variable and initialize it to the empty string by assigning it the value `'foo' * -8`, anyone would rightly think you were a bit daft. But it would work.

### The in Operator

Python also provides a membership operator that can be used with strings. The in operator returns `True` if the first operand is contained within the second, and `False` otherwise:

In [10]:
s = 'foo'

In [11]:
s in 'That\'s food for thought.'

True

In [12]:
s in 'That\'s good for now.'

False

There is also a `not in` operator, which does the opposite:

In [119]:
'z' not in 'abc'

True

In [118]:
'z' not in 'xyz'

False

### Built-in String Functions

As you saw in the tutorial on Basic Data Types in Python, Python provides many functions that are built-in to the interpreter and always available. Here are a few that work with strings:


|Function | Description|
|:--|:--|
|`chr()` | Converts an integer to a character |
|`ord()` | Converts a character to an integer |
|`len()` | Returns the length of a string |
|`str()` | Returns a string representation of an object |

These are explored more fully below.

In [126]:
ord('?')

63

#### `ord(c)`
> Returns an integer value for the given character.

At the most basic level, computers store all information as numbers. To represent character data, a translation scheme is used which maps each character to its representative number.

The simplest scheme in common use is called [ASCII](https://en.wikipedia.org/wiki/ASCII). It covers the common Latin characters you are probably most accustomed to working with. For these characters, ord(c) returns the ASCII value for character c:

In [15]:
ord('a')

97

In [16]:
ord('#')

35

ASCII is fine as far as it goes. But there are many different languages in use in the world and countless symbols and glyphs that appear in digital media. The full set of characters that potentially may need to be represented in computer code far surpasses the ordinary Latin letters, numbers, and symbols you usually see.

[Unicode](http://www.unicode.org/standard/WhatIsUnicode.html) is an ambitious standard that attempts to provide a numeric code for every possible character, in every possible language, on every possible platform. Python 3 supports Unicode extensively, including allowing Unicode characters within strings.

As long as you stay in the domain of the common characters, there is little practical difference between ASCII and Unicode. But the `ord()` function will return numeric values for [Unicode characters](https://realpython.com/courses/python-unicode/) as well:

In [127]:
ord('€')

8364

In [128]:
ord('∑')

8721

#### `chr(n)`

> Returns a character value for the given integer.

`chr()` does the reverse of `ord()`. Given a numeric value `n`, `chr(n)` returns a string representing the character that corresponds to `n`:

In [133]:
chr(97)

'a'

In [134]:
chr(35)

'#'

`chr()` handles Unicode characters as well:

In [135]:
chr(8364)

'€'

In [136]:
chr(8721)

'∑'

#### len(s)

With `len()`, you can check Python string length. `len(s)` returns the number of characters in `s`:

In [142]:
s = 'I am a string.'
len(s)

14

#### `str(obj)`

> Returns a string representation of an object.

Virtually any object in Python can be rendered as a string. `str(obj)` returns the string representation of object obj:

In [24]:
str(49.2)

'49.2'

In [25]:
str(3+4j)

'(3+4j)'

In [26]:
str(3 + 29)

'32'

In [146]:
str('foo')

'foo'

## String Indexing

Often in programming languages, individual items in an ordered set of data can be accessed directly using a numeric index or key value. This process is referred to as indexing.

In Python, strings are ordered sequences of character data, and thus can be indexed in this way. Individual characters in a string can be accessed by specifying the string name followed by a number in square brackets (`[]`).

String indexing in Python is zero-based: the first character in the string has index 0, the next has index 1, and so on. The index of the last character will be the length of the string minus one.

For example, a schematic diagram of the indices of the string 'foobar' would look like this:

<img src="./images/string-indexing.png" alt="string indexing" width=300 align="center" />

The individual characters can be accessed by index as follows:

In [28]:
s = 'foobar'

In [29]:
s[0]

'f'

In [30]:
s[1]

'o'

In [31]:
s[3]

'b'

In [32]:
len(s)

6

In [33]:
s[len(s)-1]

'r'

Attempting to index beyond the end of the string results in an error:

In [35]:
s[6]

IndexError: string index out of range

String indices can also be specified with negative numbers, in which case indexing occurs from the end of the string backward: `-1` refers to the last character, `-2` the second-to-last character, and so on. Here is the same diagram showing both the positive and negative indices into the string `'foobar'`:

<img src="./images/string-indexing-negative.png" alt="negative string indexing" width=300 align="center" />

Here are some examples of negative indexing:

In [36]:
s = 'foobar'

In [37]:
s[-1]

'r'

In [38]:
s[-2]

'a'

In [39]:
len(s)

6

In [40]:
s[-len(s)]

'f'

Attempting to index with negative numbers beyond the start of the string results in an error:

In [42]:
s[-7]

IndexError: string index out of range

For any non-empty string `s`, `s[len(s)-1]` and `s[-1]` both return the last character. There isn’t any index that makes sense for an empty string.

## String Slicing

Python also allows a form of indexing syntax that extracts substrings from a string, known as string slicing. If `s` is a string, an expression of the form `s[m:n]` returns the portion of `s` starting with position `m`, and up to but not including position `n`:

In [43]:
s = 'foobar'

In [44]:
s[2:5]

'oba'

> **Remember:** String indices are zero-based. The first character in a string has index `0`. This applies to both standard indexing and slicing.

Again, the second index specifies the first character that is not included in the result—the character `'r'` (`s[5]`) in the example above. That may seem slightly unintuitive, but it produces this result which makes sense: the expression `s[m:n]` will return a substring that is `n - m` characters in length, in this case, `5 - 2 = 3`.

If you omit the first index, the slice starts at the beginning of the string. Thus, `s[:m]` and `s[0:m]` are equivalent:

In [45]:
s = 'foobar'

In [46]:
s[:4]

'foob'

In [47]:
s[0:4]

'foob'

Similarly, if you omit the second index as in `s[n:]`, the slice extends from the first index through the end of the string. This is a nice, concise alternative to the more cumbersome `s[n:len(s)]`:

In [48]:
s = 'foobar'

In [49]:
s[2:]

'obar'

In [50]:
s[2:len(s)]

'obar'

For any string `s` and any integer `n` (`0 ≤ n ≤ len(s)`), `s[:n]` + `s[n:]` will be equal to `s`:

In [51]:
s = 'foobar'

In [52]:
s[:4] + s[4:]

'foobar'

In [53]:
s[:4] + s[4:] == s

True

Omitting both indices returns the original string, in its entirety. Literally. It’s not a copy, it’s a reference to the original string:

In [54]:
s = 'foobar'
t = s[:]

In [55]:
id(s)

140167800080304

In [56]:
id(t)

140167800080304

In [57]:
s is t

True

If the first index in a slice is greater than or equal to the second index, Python returns an empty string. This is yet another obfuscated way to generate an empty string, in case you were looking for one:

In [60]:
s[2:2]

''

In [61]:
s[4:2]

''

Negative indices can be used with slicing as well. `-1` refers to the last character, `-2` the second-to-last, and so on, just as with simple indexing. The diagram below shows how to slice the substring `'oob'` from the string `'foobar'` using both positive and negative indices:

<img src="./images/string-slicing.png" alt="negative string indexing" width=300 align="center" />

Here is the corresponding Python code:

In [62]:
s = 'foobar'

In [63]:
s[-5:-2]

'oob'

In [64]:
s[1:4]

'oob'

In [65]:
s[-5:-2] == s[1:4]

True

### Specifying a Stride in a String Slice

There is one more variant of the slicing syntax to discuss. Adding an additional : and a third index designates a stride (also called a step), which indicates how many characters to jump after retrieving each character in the slice.

For example, for the string `'foobar'`, the slice `0:6:2` starts with the first character and ends with the last character (the whole string), and every second character is skipped. This is shown in the following diagram:

<img src="./images/string-slicing-stride.png" alt="negative string indexing" width=300 align="center" />

Similarly, `1:6:2` specifies a slice starting with the second character (index 1) and ending with the last character, and again the stride value 2 causes every other character to be skipped:

<img src="./images/string-slicing-stride-2.png" alt="negative string indexing" width=300 align="center" />

The illustrative REPL code is shown here:

In [66]:
s = 'foobar'

In [67]:
s[0:6:2]

'foa'

In [68]:
s[1:6:2]

'obr'

As with any slicing, the first and second indices can be omitted, and default to the first and last characters respectively:

In [69]:
s = '12345' * 5
s

'1234512345123451234512345'

In [70]:
s[::5]

'11111'

In [71]:
s[4::5]

'55555'

You can specify a negative stride value as well, in which case Python steps backward through the string. In that case, the starting/first index should be greater than the ending/second index:

In [73]:
s = 'foobar'

In [74]:
s[5:0:-2]

'rbo'

In the above example, `5:0:-2` means “start at the last character and step backward by 2, up to but not including the first character.”

When you are stepping backward, if the first and second indices are omitted, the defaults are reversed in an intuitive way: the first index defaults to the end of the string, and the second index defaults to the beginning. Here is an example:

In [79]:
s = '12345' * 5
s

'1234512345123451234512345'

In [80]:
s[::-5]

'55555'

This is a common paradigm for reversing a string:

In [81]:
s = 'If Comrade Napoleon says it, it must be right.'
s[::-1]

'.thgir eb tsum ti ,ti syas noelopaN edarmoC fI'

### Interpolating Variables Into a String

In Python version 3.6, a new string formatting mechanism was introduced. This feature is formally named the Formatted String Literal, but is more usually referred to by its nickname **f-string**.

One simple feature of f-strings you can start using right away is variable interpolation. You can specify a variable name directly within an f-string literal, and Python will replace the name with the corresponding value.

For example, suppose you want to display the result of an arithmetic calculation. You can do this with a straightforward `print()` statement, separating numeric values and string literals by commas:

In [82]:
n = 20
m = 25
prod = n * m

In [83]:
print('The product of', n, 'and', m, 'is', prod)

The product of 20 and 25 is 500


But this is cumbersome. To accomplish the same thing using an f-string:

- Specify either a lowercase `f` or uppercase `F` directly before the opening quote of the string literal. This tells Python it is an f-string instead of a standard string.
- Specify any variables to be interpolated in curly braces (`{}`).

Recast using an f-string, the above example looks much cleaner:

In [84]:
n = 20
m = 25
prod = n * m

In [85]:
print(f'The product of {n} and {m} is {prod}')

The product of 20 and 25 is 500


Any of Python’s three quoting mechanisms can be used to define an f-string:

In [86]:
var = 'Bark'

In [87]:
print(f'A dog says {var}!')

A dog says Bark!


In [88]:
print(f"A dog says {var}!")

A dog says Bark!


In [90]:
print(f'''A dog says {var}!''')

A dog says Bark!


### Modifying Strings

In a nutshell, you can’t. Strings are one of the data types Python considers immutable, meaning not able to be changed. In fact, all the data types you have seen so far are immutable. (Python does provide data types that are mutable, as you will soon see.)

A statement like this will cause an error:

In [91]:
s = 'foobar'

In [92]:
s[3] = 'x'

TypeError: 'str' object does not support item assignment

In truth, there really isn’t much need to modify strings. You can usually easily accomplish what you want by generating a copy of the original string that has the desired change in place. There are very many ways to do this in Python. Here is one possibility:

In [93]:
s = s[:3] + 'x' + s[4:]

In [94]:
s

'fooxar'

There is also a built-in string method to accomplish this:

In [95]:
s = 'foobar'
s = s.replace('b', 'x')

In [96]:
s

'fooxar'

Read on for more information about built-in string methods!

### Built-in String Methods

You learned in the section on Variables in Python that Python is a highly object-oriented language. Every item of data in a Python program is an object.

You are also familiar with functions: callable procedures that you can invoke to perform specific tasks.

Methods are similar to functions. A method is a specialized type of callable procedure that is tightly associated with an object. Like a function, a method is called to perform a distinct task, but it is invoked on a specific object and has knowledge of its target object during execution.

The syntax for invoking a method on an object is as follows:

```python
obj.foo(<args>)
```

This invokes method `.foo()` on object obj. <args> specifies the arguments passed to the method (if any).

You will explore much more about defining and calling methods later in the discussion of object-oriented programming. For now, the goal is to present some of the more commonly used built-in methods Python supports for operating on string objects.

In the following method definitions, arguments specified in square brackets (`[]`) are optional.

#### Case Conversion

Methods in this group perform case conversion on the target string.

`s.capitalize()`
> Capitalizes the target string.

`s.capitalize()` returns a copy of s with the first character converted to uppercase and all other characters converted to lowercase:

In [1]:
s = 'foO BaR BAZ quX'
s.capitalize()

'Foo bar baz qux'

Non-alphabetic characters are unchanged:

In [2]:
s = 'foo123#BAR#.'
s.capitalize()

'Foo123#bar#.'

`s.lower()`
> Converts alphabetic characters to lowercase.

`s.lower()` returns a copy of s with all alphabetic characters converted to lowercase:

In [3]:
'FOO Bar 123 baz qUX'.lower()

'foo bar 123 baz qux'

`s.swapcase()`
> Swaps case of alphabetic characters.

`s.swapcase()` returns a copy of s with uppercase alphabetic characters converted to lowercase and vice versa:

In [4]:
'FOO Bar 123 baz qUX'.swapcase()

'foo bAR 123 BAZ Qux'

`s.title()`
> Converts the target string to “title case.”

`s.title()` returns a copy of s in which the first letter of each word is converted to uppercase and remaining letters are lowercase:

In [5]:
'the sun also rises'.title()

'The Sun Also Rises'

This method uses a fairly simple algorithm. It does not attempt to distinguish between important and unimportant words, and it does not handle apostrophes, possessives, or acronyms gracefully:

In [6]:
"what's happened to ted's IBM stock?".title()

"What'S Happened To Ted'S Ibm Stock?"

`s.upper()`
> Converts alphabetic characters to uppercase.

`s.upper()` returns a copy of s with all alphabetic characters converted to uppercase:

In [7]:
'FOO Bar 123 baz qUX'.upper()

'FOO BAR 123 BAZ QUX'

#### Find and Replace
These methods provide various means of searching the target string for a specified substring.

Each method in this group supports optional `<start>` and `<end>` arguments. These are interpreted as for string slicing: the action of the method is restricted to the portion of the target string starting at character position `<start>` and proceeding up to but not including character position `<end>`. If `<start>` is specified but `<end>` is not, the method applies to the portion of the target string from `<start>` through the end of the string.

`s.count(<sub>[, <start>[, <end>]])`
> Counts occurrences of a substring in the target string.

`s.count(<sub>)` returns the number of non-overlapping occurrences of substring `<sub>` in `s`:

In [9]:
'foo goo moo'.count('oo')

3

The count is restricted to the number of occurrences within the substring indicated by `<start>` and `<end>`, if they are specified:

In [10]:
'foo goo moo'.count('oo', 0, 8)

2