# Strings and Character Data in Python

## String Manipulation

### String Operators

#### The `+` Operator

In [183]:
s = 'foo'
t = 'bar'
u = 'baz'

s + t

s + t + u

'foobarbaz'

In [None]:
s[:] == s

In [3]:
print('Go team' + '!!!')

Go team!!!


#### The `*` Operator

In [5]:
# when a string multiplies with (must be) an integer 
s = 'foo.'
s * 4

'foo.foo.foo.foo.'

In [10]:
# this way also works
4 * s

'foo.foo.foo.foo.'

In [8]:
# the integer can be zero or negative
# the result is an empty string
'foo' * -8

''

#### The `in` Operator
a membership operator

In [11]:
# return True when 'foo' is contained within the second string
s = 'foo'

s in 'That\'s food for thought.'

True

In [13]:
# return False otherwise
s in 'That\'s good for now.'

False

There is also `not in` operator.


In [16]:
'z' not in 'abc'

True

In [15]:
'z' not in 'xyz'

False

## Built-in String Functions

- `chr()`: Converts an integer to a charactor
- `ord`: Converts a character to an integer
- `len()`: Returns the length of a string
- `str()`: Returns a string representation of an object

#### `ord(c)`
Returns an integer value for the given character.



Python supports ASCII codes.

In [18]:
ord('a')

97

In [17]:
ord('#')

35

Python supports Unicode characters as well

In [19]:
ord('€')

8364

In [20]:
ord('∑')

8721

#### `chr(n)`
Returns a character value for the given integer.

In [22]:
chr(97)

'a'

In [21]:
chr(35)

'#'

In [25]:
chr(8364)

'€'

In [24]:
chr(8721)

'∑'

#### `len(s)`
Returns the length of a string.

In [27]:
s = 'I am a string.'
len(s)

14

#### `str(obj)`

Returns a string representation of an object.

In [28]:
str(49.2)

'49.2'

In [29]:
str(3+4j)

'(3+4j)'

In [30]:
str(3 + 29)

'32'

In [31]:
str('foo')

'foo'

## String Indexing

In [32]:
s = 'foobar'

In [33]:
s[0]

'f'

In [34]:
s[1]

'o'

In [35]:
s[3]

'b'

In [36]:
len(s)

6

In [37]:
s[len(s)-1]

'r'

In [38]:
# index beyond the boundary results in an error
s[6]

IndexError: string index out of range

Indices can also be specified with negative numbers, where indexing occurs from the end of the string backward beginning from -1 (not from 0).

In [39]:
s = 'foobar'

In [40]:
s[-1]

'r'

In [41]:
s[-2]

'a'

In [42]:
len(s)

6

In [43]:
s[-len(s)]

'f'

In [44]:
# negative number beyond the start of the string
s[-7]

IndexError: string index out of range

## String Slicing

In [45]:
s = 'foobar'
s[2:5]

'oba'

In [46]:
# omit the first index, slice starts at the beginning
s = 'foobar'

s[:4]

'foob'

In [47]:
s[0:4]

'foob'

In [48]:
# omit the second index, slice extends to the end
s[2:]

'obar'

In [49]:
s[2:len(s)]

'obar'

In [50]:
# this will be equal to s itself
s[:4] + s[4:]

'foobar'

In [51]:
# verify
s[:4] + s[4:] == s

True

In [52]:
# omitting both indices returns the original string
# Literally Not a copy
s = 'foobar'
t = s[:]

In [53]:
# id of s
id(s)

1520843556592

In [54]:
# id of t
# They point to the same object
id(t)

1520843556592

In [55]:
# verify if they are referring to the same object
s is t

True

If the first index is greater than or equal to the second index, Python returns an empty string.

In [56]:
s[2:2]

''

In [57]:
s[4:2]

''

In [58]:
# can also use negative indices
s = 'foobar'
s[-5:-2]

'oob'

In [59]:
s[1:4]

'oob'

In [60]:
s[-5:-2] == s[1:4]

True

## Specifying a Stride(step) in a String Slice
Adding an additional `:` and a third index designates a stride (also called a step).

In [62]:
s = 'foobar'

In [63]:
s[0:6:2]

'foa'

In [64]:
s[1:6:2]

'obr'

In [65]:
# the first and second indeces can be omitted
s = '12345' * 5
s

'1234512345123451234512345'

In [66]:
s[::5]

'11111'

In [67]:
s[4::5]

'55555'

You can also specify a negative stride value, in which case Python steps backward. In that case, the first index should be greater than the second index.

In [69]:
s = 'foobar'

s[5:0] # this returns empty string

''

In [70]:
# this returns a strided string
s[5:0:-2]

'rbo'

In [71]:
s = '12345' * 5
s

'1234512345123451234512345'

When the first and second indices are omitted, the first index defaults to the end of the string, and the second index defaults to the beginning. 


In [72]:
s[::-5]

'55555'

In [73]:
# a commin paradigm for reversing a string
s = 'If Comrade Napoleon says it, it must be right.'
s[::-1]

'.thgir eb tsum ti ,ti syas noelopaN edarmoC fI'

## Interpolating Variables Into a String 

A new string formatting mechanism was introduced in Python 3.6. It is formally named Formatted String Literal or more usually referred to as f-string.
- specify either a lowercase _f_ or uppercase _F_ directly before the opening quote of the string literal. This tells Python it is an _f_-string instead of a standard string.
- Specify any variables to be interplolated in curly braces {}

In [74]:
n = 20
m = 25
prod = n * m
print('The product of', n, 'and', m, 'is', prod)

The product of 20 and 25 is 500


In [75]:
# compare above method with this f-string method
# this is much cleaner
n = 20
m = 25
prod = n * m
print(f'The product of {n} and {m} is {prod}')

The product of 20 and 25 is 500


Any of Python's three quoting ways can be used to define an *f*-string.


In [80]:
var = 'Bark'

In [77]:
print(f'A dog says {var}!')

A dog says Bark!


In [78]:
print(f"A dog says {var}!")

A dog says Bark!


In [79]:
print(f'''A dog says {var}!''')

A dog says Bark!


## Modifying Strings

In [81]:
# strings are immutable
s = 'foobar'
s[3] = 'x'

TypeError: 'str' object does not support item assignment

In [82]:
# there isn't much need to modify strings
# you can easily accomplish what yo want by this
s = s[:3] + 'x' + s[4:]
s

'fooxar'

In [83]:
# or use built-in method
s = 'foobar'
s = s.replace('b', 'x')
s

'fooxar'

## Built-in String Methods

### Case Conversion

#### `s.capitalize()`
Capitalizes the target string (with the first character converted to uppercase only).

In [84]:
s = 'foO BaR BAZ quX'
s.capitalize()

'Foo bar baz qux'

In [85]:
# non-alphabetic characters are unchanged
s = 'foo123#BAR#.'
s.capitalize()

'Foo123#bar#.'

#### `s.lower()`
Converts alphabetic characters to lowercase.

In [86]:
'FOO Bar 123 baz qUX'.lower()

'foo bar 123 baz qux'

#### `s.swapcase()`
Swaps case of alphabetic characters.

In [87]:
'FOO Bar 123 baz qUX'.swapcase()

'foo bAR 123 BAZ Qux'

#### `s.title()`

Converts the target string to “title case.”

In [89]:
'the sun also rises'.title()

'The Sun Also Rises'

#### `s.upper()`

Converts alphabetic characters to uppercase.

In [90]:
'FOO Bar 123 baz qUX'.upper()

'FOO BAR 123 BAZ QUX'

### Find and Replace
Each method in this group supports optional \<start\> and \<end\> arguments, interpreted as for string slicing and following the rule of string slicing.

#### `s.count(<sub>[, <start>[, <end>]])`

Counts occurrences of a substring in the target string.

In [91]:
'foo goo moo'.count('oo')

3

In [92]:
# restrict the search by setting indices
'foo goo moo'.count('oo', 0, 8)

2

#### `s.endswith(<suffix>[, <start>[, <end>]])`

Determines whether the target string ends with a given substring.

In [93]:
'foobar'.endswith('bar')

True

In [94]:
'foobar'.endswith('baz')

False

In [95]:
'foobar'.endswith('oob', 0, 4)

True

In [96]:
'foobar'.endswith('oob', 2, 4)

False

#### `s.find(<sub>[, <start>[, <end>]])`

Searches the target string for a given substring. It returns the lowest index in s where substring \<sub\> is found. And it returns -1 if nothing have been found.

In [97]:
'foo bar foo baz foo qux'.find('foo')

0

In [98]:
'foo bar foo baz foo qux'.find('grault')

-1

In [101]:
'foo bar foo baz foo qux'.find('foo', 4)

8

In [102]:
'foo bar foo baz foo qux'.find('foo', 4, 7)

-1

#### `s.index(<sub>[, <start>[, <end>]])`

Searches the target string for a given substring.
This method is identical to `.find()`, except that it raises an exception if \<sub\> is not found rather than returning -1.

In [103]:
'foo bar foo baz foo qux'.index('grault')

ValueError: substring not found

#### `s.rfind(<sub>[, <start>[, <end>]])`
Searches the target string for a given substring starting at the end. It returns the highest index in s where substring \<sub\> is found.

In [104]:
'foo bar foo baz foo qux'.rfind('foo')

16

In [105]:
'foo bar foo baz foo qux'.rfind('grault')

-1

In [106]:
'foo bar foo baz foo qux'.rfind('foo', 0, 14)

8

In [107]:
'foo bar foo baz foo qux'.rfind('foo', 10, 14)

-1

#### `s.rindex(<sub>[, <start>[, <end>]])`

Searches the target string for a given substring starting at the end.

This method is identical to `.rfind()`, except that it raises an exception if \<sub\> is not found rather than returning -1.

In [108]:
'foo bar foo baz foo qux'.rindex('grault')

ValueError: substring not found

#### `s.startswith(<prefix>[, <start>[, <end>]])`

Determines whether the target string starts with a given substring.

In [109]:
'foobar'.startswith('foo')

True

In [110]:
'foobar'.startswith('bar')

False

In [111]:
'foobar'.startswith('bar', 3)

True

In [112]:
'foobar'.startswith('bar', 3, 2)

False

### Character Classification
Methods in this group classify a string based on the characters it contains.



#### `s.isalnum()`

Determines whether the target string consists of alphanumeric characters.

In [114]:
'abc123'.isalnum()

True

In [115]:
'abc$123'.isalnum()

False

In [116]:
''.isalnum()

False

#### `s.isalpha()`

Determines whether the target string consists of alphabetic characters.

In [117]:
'ABCabc'.isalpha()

True

In [118]:
'abc123'.isalpha()

False

#### `s.isdigit()`

Determines whether the target string consists of digit characters.

In [119]:
'123'.isdigit()

True

In [120]:
'123abc'.isdigit()

False

#### `s.isidentifier()`

Determines whether the target string is a valid Python identifier.

In [121]:
'foo32'.isidentifier()

True

In [122]:
'32foo'.isidentifier()

False

In [123]:
'foo$32'.isidentifier()

False

In [124]:
# it returns True even thought that would not actually be a valid identifier
'and'.isidentifier()

True

In [125]:
# You can test if a string matches a Python keyword using iskeyword() function
from keyword import iskeyword
iskeyword('and')

True

You can use `isidentifier()` and `iskeyword()` to verify if a string could serve as a valid Python identifier.

#### `s.islower()`

Determines whether the target string’s alphabetic characters are lowercase.

In [126]:
'abc'.islower()

True

In [127]:
'abc1$d'.islower()

True

In [128]:
'Abc1$D'.islower()

False

#### `s.isprintable()`

Determines whether the target string consists entirely of printable characters.

In [130]:
'a\tb'.isprintable()

False

In [131]:
'a b'.isprintable()

True

In [132]:
# this is the only .isxxx() method that return True for an empty string
''.isprintable()

True

In [133]:
'a\nb'.isprintable()

False

#### `s.isspace()`

Determines whether the target string consists of whitespace characters. It returns True is s is nonempty and all characters are whitespace characters.

The most commonly encountered whitespace characters are space ' ', tab '\t', and newline '\n'.

In [134]:
' \t \n '.isspace()

True

In [135]:
'   a   '.isspace()

False

A few other ASCII characters also qualify as whitespace. '\f' and '\r' are the escape sequences for the ASCII Form Feed and Carriage Return characters; '\u2005' is the escape sequence for the Unicode Four-Per-Em Space.

In [136]:

'\f\u2005\r'.isspace()

True

#### `s.istitle()`

Determines whether the target string is title cased.

In [137]:
'This Is A Title'.istitle()

True

In [138]:
'This is a title'.istitle()

False

In [139]:
# non-alphabetical characters will be ignored
'Give Me The #$#@ Ball!'.istitle()

True

#### `s.isupper()`

Determines whether the target string’s alphabetic characters are uppercase.

In [140]:
'ABC'.isupper()

True

In [141]:
'ABC1$D'.isupper()

True

In [142]:
'Abc1$D'.isupper()

False

### String Formatting
Methods in this group modify or enhance the format of a string.

#### `s.center(<width>[, <fill>])`

Centers a string in a field.

In [143]:
# by default, padding consists of the ASCII space character
'foo'.center(10)

'   foo    '

In [144]:
# paddign wth '-' characters
'bar'.center(10, '-')

'---bar----'

#### `s.expandtabs(tabsize=8)`

Expands tabs in a string. It replaces each '\t' with spaces. By default, a tab stops at every 8th column.

In [145]:
'a\tb\tc'.expandtabs()

'a       b       c'

In [146]:
'aaa\tbbb\tc'.expandtabs()

'aaa     bbb     c'

In [147]:
# set the tab to stop at 4th column
'a\tb\tc'.expandtabs(4)

'a   b   c'

In [148]:
'aaa\tbbb\tc'.expandtabs(tabsize=4)

'aaa bbb c'

#### `s.ljust(<width>[, <fill>])`

Left-justifies a string in field.

In [149]:
'foo'.ljust(10)

'foo       '

In [150]:
'foo'.ljust(10, '-')

'foo-------'

#### `s.lstrip([<chars>])`

Trims leading characters from a string, with any whitespace characters removed from the left end.    

In [151]:
'   foo bar baz   '.lstrip()

'foo bar baz   '

In [152]:
'\t\nfoo\t\nbar\t\nbaz'.lstrip()

'foo\t\nbar\t\nbaz'

In [174]:
# If the optional <chars> argument is specified, 
# it is a string that specifies the set of characters at the left side of s 
# to be removed 
'http://www.realpython.com'.lstrip('/:pth')

'www.realpython.com'

#### `s.replace(<old>, <new>[, <count>])`

Replaces occurrences of a substring within a string.

In [155]:
'foo bar foo baz foo qux'.replace('foo', 'grault')

'grault bar grault baz grault qux'

In [156]:
'foo bar foo baz foo qux'.replace('foo', 'grault', 2)

'grault bar grault baz foo qux'

#### `s.rjust(<width>[, <fill>])`

Right-justifies a string in a field.

In [157]:
'foo'.rjust(10)

'       foo'

In [158]:
'foo'.rjust(10, '-')

'-------foo'

#### `s.rstrip([<chars>])`

Trims trailing characters from a string.

In [160]:
'   foo bar baz   '.rstrip()

'   foo bar baz'

In [161]:
'foo\t\nbar\t\nbaz\t\n'.rstrip()

'foo\t\nbar\t\nbaz'

In [162]:
'foo.$$$;'.rstrip(';$.')

'foo'

#### `s.strip([<chars>])`

Strips characters from the left and right ends of a string.

In [163]:
s = '   foo bar baz\t\t\t'

In [167]:
s = s.lstrip()
s = s.rstrip()
s

'foo bar baz'

In [168]:
'www.realpython.com'.strip('w.moc')

'realpython'

In [169]:
'   foo bar baz\t\t\t'.lstrip().rstrip()

'foo bar baz'

In [170]:
'   foo bar baz\t\t\t'.strip()

'foo bar baz'

In [171]:
'www.realpython.com'.lstrip('w.moc').rstrip('w.moc')

'realpython'

In [172]:
'www.realpython.com'.strip('w.moc')

'realpython'

#### `s.zfill(<width>)`

Pads a string on the left with zeros.

In [177]:
'42'.zfill(5)

'00042'

In [175]:
'+42'.zfill(8)

'+0000042'

In [176]:
'-42'.zfill(8)

'-0000042'

In [178]:
# returns unchanged if s is already at least width 3
'-42'.zfill(3)

'-42'

In [179]:
# also pads non-numeric characters
'foo'.zfill(6)

'000foo'