# How to Use Regular Expressions in Python

### Here’s a complete list of the metacharacters used for Regular Expressions:
<br>   

.  ^   $   *   +   ?   {   }   [   ]   \   |   (   )

1. [ and ]. They’re used for specifying a character class, which is a set of characters that you wish to match. Characters can be listed individually, or a range of characters can be indicated by giving two characters and separating them by a '-'. For example, [abc] will match any of the characters a, b, or c; this is the same as [a-c]
<br>

2. If you wanted to match only lowercase letters, your RE would be [a-z]
<br>

3. All metacharacters except \ (back slash) are not active inside classes. Classes refers to "[ ]". For example, "[akm*]" will match any of the characters 'a', 'k', 'm', or ' * '. ' * ' is usually a metacharacter, but inside a character class it’s stripped of its special nature
<br>

4. You can match the characters not listed within the class by complementing the set. This is indicated by including a '^' as the first character of the class. For example, [^5] will match any character except '5'. 
<br>

5. If the caret appears elsewhere in a character class, it does not have special meaning. For example: [5^] will match either a '5' or a '^'.

In [6]:
import re

txt = "My name is Khan and I am not a terrorist. My best number is 78."

x = re.findall("[bcd]", txt)
print(f'Regular Expression of [bcd] will show {x}')

x = re.findall("[a-zA-Z]", txt)
print(f'\nRegular Expression of [a-zA-Z] will show {x}')

x = re.findall("[^a-zA-Z]", txt)
print(f'\nRegular Expression of [^a-zA-Z] will show {x}')

Regular Expression of [bcd] will show ['d', 'b', 'b']

Regular Expression of [a-zA-Z] will show ['M', 'y', 'n', 'a', 'm', 'e', 'i', 's', 'K', 'h', 'a', 'n', 'a', 'n', 'd', 'I', 'a', 'm', 'n', 'o', 't', 'a', 't', 'e', 'r', 'r', 'o', 'r', 'i', 's', 't', 'M', 'y', 'b', 'e', 's', 't', 'n', 'u', 'm', 'b', 'e', 'r', 'i', 's', '7', '8']

Regular Expression of [^a-zA-Z] will show [' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', '.', ' ', ' ', ' ', ' ', ' ', '7', '8', '.']


6. In Python string literals, the backslash can be followed by various characters to signal various special sequences
<br>

7. It is also used to escape all the metacharacters so you can still match them in patterns; for example, if you need to match a [ or \, you can precede them with a backslash to remove their special meaning: "\ [" or "\ \ ".
<br>


8. Some of the special sequences beginning with '\' represent predefined sets of characters that are often useful, such as the set of digits, the set of letters, or the set of anything that isn’t whitespace

 - \d - Matches any decimal digit; this is equivalent to the class [0-9].   
 <br>   
 
 - \D - Matches any non-digit character; this is equivalent to the class [^0-9].  
 <br>   
 - \s - Matches any whitespace character; this is equivalent to the class [ \t\n\r\f\v].  
 <br>    
 - \S - Matches any non-whitespace character; this is equivalent to the class [^ \t\n\r\f\v].  
 <br>   
 - \w - Matches any alphanumeric character; this is equivalent to the class [a-zA-Z0-9_].  
 <br>   
 - \W - Matches any non-alphanumeric character; this is equivalent to the class [^a-zA-Z0-9_]

In [11]:
import re

txt = "My name is Khan and I am not a terrorist. My best number is 78."

x = re.findall("[a-zA-Z ]", txt)
print(f'\nRegular Expression of [a-zA-Z ] will show {x}')

x = re.findall("[^a-zA-Z ]", txt)
print(f'\nRegular Expression of [^a-zA-Z ] will show {x}')

x = re.findall("[a-zA-Z0-9]", txt)
print(f'\nRegular Expression of [^a-zA-Z] will show {x}')


Regular Expression of [a-zA-Z ] will show ['M', 'y', ' ', 'n', 'a', 'm', 'e', ' ', 'i', 's', ' ', 'K', 'h', 'a', 'n', ' ', 'a', 'n', 'd', ' ', 'I', ' ', 'a', 'm', ' ', 'n', 'o', 't', ' ', 'a', ' ', 't', 'e', 'r', 'r', 'o', 'r', 'i', 's', 't', ' ', 'M', 'y', ' ', 'b', 'e', 's', 't', ' ', 'n', 'u', 'm', 'b', 'e', 'r', ' ', 'i', 's', ' ']

Regular Expression of [^a-zA-Z ] will show ['.', '7', '8', '.']

Regular Expression of [^a-zA-Z] will show ['M', 'y', 'n', 'a', 'm', 'e', 'i', 's', 'K', 'h', 'a', 'n', 'a', 'n', 'd', 'I', 'a', 'm', 'n', 'o', 't', 'a', 't', 'e', 'r', 'r', 'o', 'r', 'i', 's', 't', 'M', 'y', 'b', 'e', 's', 't', 'n', 'u', 'm', 'b', 'e', 'r', 'i', 's', '7', '8']


These sequences can be included inside a character class. For example, [\s,.] is a character class that will match any whitespace character, or ' , ' or ' . '

9. The final metacharacter in this section is '.' .It matches anything except a newline character, and there’s an alternate mode (re.DOTALL) where it will match even a newline. ' . ' is often used where you want to match “any character”

### Another capability is that you can specify that portions of the RE must be repeated a certain number of times.

1. ' * ' doesn’t match the literal character ' * '; instead, it specifies that the previous character can be matched zero or more times, instead of exactly once. For example, ca*t will match 'ct' (0 'a' characters), 'cat' (1 'a'), 'caaat' (3 'a' characters), and so forth

2. ' * ' doesn’t match the literal character ' * '; instead, it specifies that the previous character can be matched zero or more times, instead of exactly once. For example, ca*t will match 'ct' (0 'a' characters), 'cat' (1 'a'), 'caaat' (3 'a' characters), and so forth

1. ' * ' doesn’t match the literal character ' * '; instead, it specifies that the previous character can be matched zero or more times, instead of exactly once. For example, ca*t will match 'ct' (0 'a' characters), 'cat' (1 'a'), 'caaat' (3 'a' characters), and so forth

1. ' * ' doesn’t match the literal character ' * '; instead, it specifies that the previous character can be matched zero or more times, instead of exactly once. For example, ca*t will match 'ct' (0 'a' characters), 'cat' (1 'a'), 'caaat' (3 'a' characters), and so forth

In [20]:
import re

txt = '''My name is Khan and I am not a terrorist.
       My best number is 78.'''

x = re.findall("[a-zA-Z ]", txt)
print(f'\nRegular Expression of [a-zA-Z ] will show {x}')

x = re.findall("[^a-zA-Z]", txt)
print(f'\nRegular Expression of [^a-zA-Z ] will show {x}')

x = re.findall("[a-zA-Z0-9]", txt)
print(f'\nRegular Expression of [a-zA-Z0-9] will show {x}')


Regular Expression of [a-zA-Z ] will show ['M', 'y', ' ', 'n', 'a', 'm', 'e', ' ', 'i', 's', ' ', 'K', 'h', 'a', 'n', ' ', 'a', 'n', 'd', ' ', 'I', ' ', 'a', 'm', ' ', 'n', 'o', 't', ' ', 'a', ' ', 't', 'e', 'r', 'r', 'o', 'r', 'i', 's', 't', ' ', ' ', ' ', ' ', ' ', ' ', ' ', 'M', 'y', ' ', 'b', 'e', 's', 't', ' ', 'n', 'u', 'm', 'b', 'e', 'r', ' ', 'i', 's', ' ']

Regular Expression of [^a-zA-Z ] will show [' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', '.', '\n', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', '7', '8', '.']

Regular Expression of [a-zA-Z0-9] will show ['M', 'y', 'n', 'a', 'm', 'e', 'i', 's', 'K', 'h', 'a', 'n', 'a', 'n', 'd', 'I', 'a', 'm', 'n', 'o', 't', 'a', 't', 'e', 'r', 'r', 'o', 'r', 'i', 's', 't', 'M', 'y', 'b', 'e', 's', 't', 'n', 'u', 'm', 'b', 'e', 'r', 'i', 's', '7', '8']


In [61]:
## re.compile(r"[-+]?(\d*\.\d+|\d+)")
import re

txt1 = '3+2bhk'
txt2 = '2.5 bhk'
txt3 = '5BHK'

x1 = re.findall(r"(\d+)", txt1)
print(f'\nRegular Expression of [-+] will show {x1}')

x2 = re.findall(r"[-+]?(\d\.\d+)", txt2)
print(f'\nRegular Expression of [^a-zA-Z ] will show {x2}')

x3 = re.findall(r"(\d)", txt3)
print(f'\nRegular Expression of [a-zA-Z0-9] will show {x3}')


Regular Expression of [-+] will show ['3', '2']

Regular Expression of [^a-zA-Z ] will show ['2.5']

Regular Expression of [a-zA-Z0-9] will show ['5']


In [66]:
text = 'Pune, Maharashtra, India'
print(text.split(',')[0].strip())
print(text.split(',')[1].strip())
print(text.split(',')[2].strip())

Pune
Maharashtra
India
