In [2]:
import re

a|b -- Matches either a or b.

+ -- 1 or more occurrences of the pattern to its left, e.g. 'i+' = one or more i's
* -- 0 or more occurrences of the pattern to its left
? -- match 0 or 1 occurrences of the pattern to its left
{n} -- repeat exact n times
{n,} -- repeat at least n times
{n, m} -- repeat [n-m] times

[]: Square brackets can be used to indicate a set of chars, so [abc] matches 'a' or 'b' or 'c'.

a, X, 9, < -- ordinary characters just match themselves exactly.  
. (a period) -- matches any single character except newline '\n'

\d -- decimal digit [0-9] (some older regex utilities do not support \d)
\D -- Match a nondigit: [^0-9]

\w -- (lowercase w) matches a "word" character: a letter or digit or underbar [a-zA-Z0-9_]. Note that although "word" is the mnemonic for this, it only matches a single word char, not a whole word. 
\W -- (upper case W) matches any non-word character.

\t, \n, \r -- tab, newline, return

The meta-characters which do not match themselves because they have special meanings are: . ^ $ * + ? {} [ ] \ | ( )

^ (Caret) -- match the start the string 
$  -- match the end of the string

\s -- (lowercase s) matches a single whitespace character -- space, newline, return, tab, form [ \n\r\t\f]
\S -- (upper case S) matches any non-whitespace character.

\b -- boundary between word and non-word
\B -- nonword boundary
-> \b should be the same as \s+ except it can also capture the start and end of the input string.
-> if we just use \s+, the word that happens to be in start and end of the string wouldn't be captured.

\ -- inhibit the "specialness" of a character. So, for example, use \. to match a period or \\ to match a slash. If you are unsure if a character has special meaning, such as '\t', you can put a slash in front of it, \\t, to make sure it is treated just as a character.

> **match** vs **findall**

* **match** checks for a match only at the beginning of the string
* **search** checks for a match anywhere in the string
* **findall** checks for all the occurence
* **finditer**

In [5]:
input_string = 'The cookie is very good'
match = re.match(r'cookie', input_string)
# If-statement after match() tests if it succeeded
if match:
    print ('Found', match.group()) ## 'cookie'
else:
    print ('Did not find')

Did not find


In [16]:
input_string = 'The cookie is very good. cookie'
match = re.search(r'cookie', input_string)
# If-statement after match() tests if it succeeded
if match:
    print ('Found', match.group()) ## 'cookie'
else:
    print ('Did not find')

Found cookie


In [17]:
input_string = 'The cookie is very good. cookie'
match = re.findall(r'(coo)kie', input_string)
# If-statement after match() tests if it succeeded
if match:
    print ('Found', match) ## 'cookie'
else:
    print ('Did not find')

Found ['coo', 'coo']


In [23]:
identified_pattern = []
input_string = 'The cookie is very good. cookie'
pattern_list = [r'cookie']
for p in pattern_list:        
    iter = re.finditer(p, input_string)
    temp_pattern = [(m.start(0), m.start(0)+len(m.group()), m.group()) for m in iter]
    identified_pattern+=temp_pattern
    
identified_pattern    

[(4, 10, 'cookie'), (25, 31, 'cookie')]

> **?:**

A group is, by default, capturing -- meaning you can fetch groups (sub-matches inside parens) with the group(int) method. 

A non-capturing group is just that -- don't capture the submatch. The group(int) method doesn't return submatches from non-capturing groups. Generally, if you don't need the value of a submatch, the group should be non-capturing, as there is no reason for the regex engine to collect the data for you, if you don't intend to use it. 

For example,
* (?:abc){3} matches abcabcabc. No groups.
* (abc){3} matches abcabcabc. First group matches abc.

In short, regular expression will first capture the whole pattern and output it as match. With (abc), it will try to match the input string using the pattern abc again and output it as group 1. That's why we will have 2 outputs. abcabcabc and abc

Reference
* [refer](https://coderanch.com/t/466558/java/regular-expression)
* [notation-in-regular-expression](https://stackoverflow.com/questions/36524507/notation-in-regular-expression)
* [refcapture](https://www.regular-expressions.info/refcapture.html)

In [19]:
str = """COBOL is a compiled English-like computer programming language designed for business use. 122. On 30 OCT 2015 is a big date unlike 1 NOV 2010 """

# all = re.findall(r"[\d]{1,2} [ADFJMNOS]\w* [\d]{4}", str) 
all = re.findall(r"([\d]{1,2}\s(?:JAN|NOV|OCT|DEC)\s[\d]{4})", str)
for s in all:
    print(s)

30 OCT 2015
1 NOV 2010


In [4]:
str = """COBOL is a compiled English-like computer programming language designed for business use. 122. On 30 OCT 2015 is a big date unlike 1 NOV 2010 """

all = re.findall(r"([\d]{1,2}\s(JAN|NOV|OCT|DEC)\s[\d]{4})", str)
for s in all:
    print(s)

('30 OCT 2015', 'OCT')
('1 NOV 2010', 'NOV')


> **?=**

* https://www.regular-expressions.info/lookaround.html
* https://stackoverflow.com/questions/1570896/what-does-mean-in-a-regular-expression/1570916#1570916