### Introducing the Python String Module
The Python string module has predefined string constants that we can use for testing. The printable module in "string" contains 100 printable ASCII characters, including letters
in both cases, digits, space characters, and punctuation:

In [2]:
import re
import string
printable = string.printable

In [3]:
printable

'0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ!"#$%&\'()*+,-./:;<=>?@[\\]^_`{|}~ \t\n\r\x0b\x0c'

In [4]:
# The first 62 characters are numbers and alphabets.
printable[0:62]

'0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ'

In [5]:
# Find out the digits in printable
re.findall('\d',printable)

['0', '1', '2', '3', '4', '5', '6', '7', '8', '9']

In [6]:
# Find all digits, letters and underscore
all_items = re.findall('\w', printable)

In [7]:
print(all_items)

['0', '1', '2', '3', '4', '5', '6', '7', '8', '9', 'a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z', 'A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J', 'K', 'L', 'M', 'N', 'O', 'P', 'Q', 'R', 'S', 'T', 'U', 'V', 'W', 'X', 'Y', 'Z', '_']


In [8]:
# Find out the different "space" characters
re.findall('\s',printable)

[' ', '\t', '\n', '\r', '\x0b', '\x0c']

Regular expressions are not confined to ASCII. A \d will match all Unicode digits, not just ASCII characters '0' through '9'. The same applies for letters:
Lets create a custom string with some special symbols and e with circumflex 'ê' and e with breve 'ĕ' from Latin.

x = 'abc' + '-/*' + '\u00ea' + '\u0115'

In [9]:
re.findall('\w', x) # as expected, it found all the letters.

NameError: name 'x' is not defined

## Discover Patterns Using Specifiers

In [10]:
# Lets consider a source string to see how that works
source = '''I wish to wish the wish you wish to wish, but if you wish the wish the witch wishes, I won’t wish the wish you wish to wish.'''

In [11]:
re.findall('wish',source) # search for all wish in source

['wish',
 'wish',
 'wish',
 'wish',
 'wish',
 'wish',
 'wish',
 'wish',
 'wish',
 'wish',
 'wish',
 'wish']

In [12]:
re.findall('wish|the', source) # find wish or the anywhere

['wish',
 'wish',
 'the',
 'wish',
 'wish',
 'wish',
 'wish',
 'the',
 'wish',
 'the',
 'wish',
 'wish',
 'the',
 'wish',
 'wish',
 'wish']

In [13]:
re.findall('^wish', source) # Find wish at the beginning of source

[]

In [14]:
re.findall('^I', source) # That didn't work, but this will, because we have an 'I' at the beginning.

['I']

In [15]:
re.findall('^I wish', source) # This as well

['I wish']

In [16]:
re.findall('wish$', source) # Find wish at the end

[]

In [17]:
re.findall('wish.$', source) # That didn't work but this will.

['wish.']

'^' and '$' are called anchors.

'^' anchors the search to the beginning of the
search string, and '$' anchors it to the end.

In [18]:
twister = '''I wish I may, I wish I might ... Have a dish of fish tonight.'''

In [19]:
re.findall('[wf]ish', twister) # finding w or f followed by ish

['wish', 'wish', 'fish']

In [20]:
re.findall('[wsh]+', twister) # Find one or more runs of w,s or h:

['w', 'sh', 'w', 'sh', 'h', 'sh', 'sh', 'h']

In [21]:
re.findall('I (?=wish)', source) # Find all I that are followed by wish.

['I ']

In [22]:
re.findall('I (?=wish)', twister) # Find all I that are followed by wish.

['I ', 'I ']

In [23]:
re.findall('(?<=I) wish', source) # And last, wish preceded by I

[' wish']

In [24]:
re.findall('(?<=I) wish', twister) # And last, wish preceded by I

[' wish', ' wish']

Sometimes, regular expression pattern rules conflict with the Python string rules. For example, the following expression should logically match any word that begins with fish:

In [28]:
re.findall('\bfish', twister)

[]

But it doesn't. That's  because '\b' has a special meaning in Python. It is the escape sequence for backslash, while in regular expression it means the beginning of a word. These confusions can be avoided by using an 'r' character before the regular expression which disables Python's escape sequences.

In [29]:
re.findall(r'\bfish', twister)

['fish']