good tutorial url:
    https://www.machinelearningplus.com/python/python-regex-tutorial-examples/
    https://www.rexegg.com/regex-disambiguation.html#lookarounds

In [1]:
import re
regex = re.compile('\s+')

- The above code imports the 're' package and compiles a regular expression pattern that can match at least one or more space characters.
- If you intend to use a particular pattern multiple times, then you are better off compiling a regular expression.
- '\s+' matches any whitespace character. By adding a '+' notation at the end will make the pattern match at least 1 or more spaces. So, this pattern will match even tab '\t' characters as well.

- Adding a '+' symbol to it mandates the presence of at least 1 digit to be present in order to be found.
- Similar to '+', there is a '*' symbol which requires 0 or more digits in order to be found. It practically makes the presence of a digit optional in order to make a match

In [3]:
text = """101 COM    Computers
205 MAT   Mathematics
189 ENG   English"""

I have three course items in the format of “[Course Number] [Course Code] [Course Name]”. The spacing between the words are not equal.

I want to split these three course items into individual units of numbers and words. How to do that?

In [6]:
regex.split(text)

['101',
 'COM',
 'Computers',
 '205',
 'MAT',
 'Mathematics',
 '189',
 'ENG',
 'English']

### 4. Finding pattern matches using findall, search and match
#### Let’s suppose you want to extract all the course numbers, that is, the numbers 101, 205 and 189 alone from the above text. How to do that?

In [7]:
regex_num = re.compile('[0-9]+')
regex_num.findall(text)

['101', '205', '189']

#### The findall method extracts all occurrences of the 1 or more digits from the text and returns them in a list.

In [10]:
text2 = """COM    Computers
205 MAT   Mathematics 189"""
regex_num.search(text2)

<_sre.SRE_Match object; span=(17, 20), match='205'>

In [11]:
regex_num.match(text2)

match returned nothing whereas search returned the match.
#### regex.search() returns a particular match object that contains the starting and ending positions of the first occurrence of the pattern.
#### Likewise, regex.match() also returns a match object. But the difference is, it requires the pattern to be present at the beginning of the text itself.

#### to get the matched text from .match or .search, use group() method

In [12]:
result = regex_num.search(text2)
result.group()

'205'

### 5. How to substitute one text with another using regex?
#### To replace texts, use the regex.sub()


In [13]:
text = """101   COM \t  Computers
205   MAT \t  Mathematics
189   ENG  \t  English"""  
print(text)

101   COM 	  Computers
205   MAT 	  Mathematics
189   ENG  	  English


From the above text, I want to even out all the extra spaces and put all the words in one single line.

In [15]:
regex = re.compile('\s+')#will match all variable spaces, including tab(\t and newline \n)
regex.sub(' ',text) #replace variable spaces with single spcae.

'101 COM Computers 205 MAT Mathematics 189 ENG English'

In [None]:
Suppose you only want to get rid of the extra spaces but want to keep the course entries in the new line itself. 

This can be done using a negative lookahead (?!\n). It checks for an upcoming newline character and excludes it from the pattern.