### Quick recap


| Symbols | Explaination                                                               | Example        |
|---------|----------------------------------------------------------------------------|----------------|
| `[]`    | A set of characters                                                        | "[a-m]"        |
| `\`     | Signals a special sequence (can also be used to escape special characters) | "\d"           |
| `.`     | Any character (except newline character)                                   | "he..o"        |
| `^`     | Starts with                                                                | "^hello"       |
| `$`     | Ends with                                                                  | "world$"       |
| `*`     | Zero or more occurrences                                                   | "aix*"         |
| `+`     | One or more occurrences                                                    | "aix+"         |
| `{}`    | Exactly the specified number of occurrences                                | "al{2}"        |
| `\|`    | Either or                                                                  | "falls\|stays" |
| `()`    | Capture and group                                                          |                |

**Deeper recap:**

| Code | Meaning                                |
|------|----------------------------------------|
| \d   | a digit                                |
| \D   | a non-digit                            |
| \s   | whitespace (tab, space, newline, etc.) |
| \S   | non-whitespace                         |
| \w   | alphanumeric                           |
| \W   | non-alphanumeric                       |
| [abc] | any of a,b, or c                      |
| [^abc] | not a,b, or c                        |
| [a-g] | characters between a & g              |

**Anchors**

`\b`
Matches the empty string, but only at the beginning or end of a word. A word is defined as a sequence of word characters. Note that formally, \b is defined as the boundary between a \w and a \W character (or vice versa), or between \w and the beginning/end of the string. This means that r'\bfoo\b' matches 'foo', 'foo.', '(foo)', 'bar foo baz' but not 'foobar' or 'foo3'.

`\B`

Matches the empty string, but only when **it is not at the beginning or end of a word**. This means that r'py\B' matches 'python', 'py3', 'py2', but not 'py', 'py.', or 'py!'. \B is just the opposite of \b.


**Escaped characters**

| Class | Explanation |
|:-----:|:-----------:|
| \\. \\* \\\\ | escaped special characters |
| \t \n \r | tab, linefeed, carriage return |

**Repetitions**

1.) A pattern followed by the meta-character * is repeated zero or more times. 

2.) Replace the * with + and the pattern must appear at least once. 

3.) Using ? means the pattern appears zero or one time. 

4.) For a specific number of occurrences, use {m} after the pattern, where m is replaced with the number of times         the pattern should repeat. 

5.) Use {m,n} where m is the minimum number of repetitions and n is the maximum. Leaving out n ({m,}) means the           value appears at least m times, with no maximum.


**Summary Table**


| Symbol   |      Meaning      |
|:----------:|:-------------:|
| * |  zero or more times |
| + |    at least once   |
| ? | zero or one time |
| {m} | exactly m times |
| {m,n} | m for minimum reps, n for maximum. |
| {m,} | at least m times, no maximum|

**Lookaround**


| Lookaround | Name                | What it Does                                                                         |
|------------|---------------------|--------------------------------------------------------------------------------------|
| (?=foo)    | Lookahead           | Asserts that what immediately follows the current position in the string is foo      |
| (?<=foo)   | Lookbehind          | Asserts that what immediately precedes the current position in the string is foo     |
| (?!foo)    | Negative Lookahead  | Asserts that what immediately follows the current position in the string is not foo  |
| (?<!foo)   | Negative Lookbehind | Asserts that what immediately precedes the current position in the string is not foo |

More on https://www.regular-expressions.info/ .


**Exercise 1.0**

Check if the given sentence contains "ab" in it using regex.

In [2]:
import re
text = "This is exercise is the abc of the regex"
# your code here

pattern = r'ab'
re.findall(pattern, text)

['ab']

**Exercise 1.1**

Check which of the given sentences contains "ab" in it.

In [3]:
texts = ["This string doesn't contain what you are looking for", 
         "This string contains abc", 
         "Everyab wordab endab withab whatab yourab lookingab forab."]

pattern = r'ab'
for i, text in enumerate(texts):
    if re.findall(pattern, text):
        print(f"Sentence {i} contains the pattern.")
    else:
        print(f"Sentence {i} doesn't contain the pattern.")    

Sentence 0 doesn't contain the pattern.
Sentence 1 contains the pattern.
Sentence 2 contains the pattern.


**Exercise 1.2**

Check which of the given sentences contains digits in it.

In [4]:
texts = [ "1'm wr1t1ng us1ng numb3r5 4nd of l3tt3r5.",
        "This string doesn't contain any number", 
         "This string contains 4 numbers: 1, 2 and 3"]

for i, text in enumerate(texts):
    if re.findall(r'\d+', text):
        print(f"Sentence {i} contains the pattern.")
    else:
        print(f"Sentence {i} doesn't contain the pattern.")    

Sentence 0 contains the pattern.
Sentence 1 doesn't contain the pattern.
Sentence 2 contains the pattern.


**Exercise 1.3**

Count how many digits are in the given text.

In [6]:
text = "1'm wr1t1ng us1ng numb3r5 4nd of l3tt3r5."
# your code here

len(re.findall(r'\d', text))

10

**Exercise 1.4**

Count how many numbers sequences are in the given text.

In [97]:
text = "In this sentence there are 3 sequences: 123, 456."

# your code here

len(re.findall(r'\d{2,}', text))

2

**Exercise 1.5**

Count how many letters between "A" and "G" there are in the given text.  Consider only capital letters.

In [18]:
text = "ThIs An ExAmPlE."

# your code here

len(re.findall(r'[A-G]', text))

4

**Exercise 1.6**

Count how many letters between "A" and "G" there are in the given text.  Consider only capital letters.

In [36]:
import re

def text_match(text):
        # your code here
        if re.findall(r'[A-G]', text):
                return 'Found a match!'
        else:
                return('Not matched!')

print(text_match("ab"))
print(text_match("abc"))
print(text_match("abbc"))
print(text_match("aabbc"))

Not matched!
Not matched!
Not matched!
Not matched!


**Exercise 1.7**

Write a Python program to find sequences of lowercase letters (whatever length) joined with a underscore.


In [107]:
def text_match(text):
        # your code here
        if re.findall('^[a-z_]*$',text): 
                return 'Found a match!'
        else:
                return('Not matched!')

print(text_match("aab_cbbbc"))
print(text_match("aab_Abbbc"))
print(text_match("Aaab_abbbc"))

Found a match!
Not matched!
Not matched!


 **Exercise 1.8**
 
 
 Write a Python program that matches a word at the end of string, with optional punctuation.

In [41]:
def text_match(text):
        # your code here
        if re.findall(r'\w+\D+\S+$', text):
                return 'Found a match!'
        else:
                return('Not matched!')

print(text_match("The quick brown fox jumps over the lazy dog."))
print(text_match("The quick brown fox jumps over the lazy dog...!!"))
print(text_match("The quick brown fox jumps over the lazy dog "))

Found a match!
Found a match!
Not matched!


**Exercise 1.9**

Write a Python program that matches a numerical sequence of the form `+39 + (possible spaces) + ten numbers` that starts with a specified prefix (in the example it should be +39).



In [45]:
def match_num(string):
    # your code here
    if re.findall(r'^[+]39\s?\d{10}', string):
        return True
    else:
        return False
    
print(match_num('+39 3333333333'))
print(match_num('+45 2345861123'))

True
False


**Exercise 2.0**

Write a Python program to search a literals string in a string and also find the location within the original string where the pattern occurs.



In [59]:
pattern = 'fox'
text = 'The quick brown fox jumps over the lazy dog.'
# your code here
match = re.search(pattern, text)
s = match.start()
e = match.end()
print(f'Found {pattern} in "{text}" from {s} to {e}')

Found fox in "The quick brown fox jumps over the lazy dog." from 16 to 19


**Exercise 2.1**


Write a Python program to abbreviate 'Road' as 'Rd.' in a given string

In [68]:
street = '21 Ramkrishna Road'
# your code here

def road_rd(text):
    new = re.sub('Road', 'Rd.', text)
    print(new)
    
road_rd(street)

21 Ramkrishna Rd.


**Exercise 2.2**

Write a Python program to replace all occurrences of space, comma, or dot with an underscore.







In [80]:
text = 'M7 W1 D2 NLP Module'
# your code here

def punct_to_underscore(text):
    new = re.sub(r'\s+|,+|\.+', '_', text)
    print(new)
    
punct_to_underscore(text)

M7_W1_D2_NLP_Module


**Exercise 2.3**

Write a Python program to replace all occurrences of space, comma, or dot with an underscore in a the name of a file except for the extension.

Hint: use the |: in the left part match the space and the commas and on the right part use a negative lookahead that ensure that a dot is not followed by some text at the end of the string.

For the negative lookahed: `\.(?!\w*$)`

In [81]:
text = 'M7 W1 D2 NLP Module.ipynb'
# your code here

def punct_to_underscore(text):
    new = re.sub(r'\s+|,+|\.+\.(?!\w*$)', '_', text)
    print(new)
    
punct_to_underscore(text)

M7_W1_D2_NLP_Module.ipynb


**Exercise 2.4**

Write a Python program to remove all the words between brackets, brackets included.

This: https://stackoverflow.com/questions/3075130/what-is-the-difference-between-and-regular-expressions
can be helpful

In [121]:
text = "This is an (hard) exercise that is very easy (to fail)."

# your code here

def brackets(text):
    new = re.sub(r'\(.+?\)', '', text)
    print(new)
    
brackets(text)

This is an  exercise that is very easy .


**Exercise 2.5**

Write a Python program to remove all the words between <>, brackets included.

In [88]:
text = "Remove <this> tags </this> please!"

# your code here

def strange_brackets(text):
    new = re.sub(r'\<.+?\>', '', text)
    print(new)
    
strange_brackets(text)

Remove  tags  please!


**Exercise 2.6**

Write a Python program to extract values between quotation marks of a string.



In [114]:
text = '"Python", "PHP", "Java"'
# your code here

def quote(text):
    new = re.findall(r'"(.*?)"', text)
    print(new)
    
quote(text)

['Python', 'PHP', 'Java']


**Exercise 2.7**

Write a Python program to insert spaces between words starting with capital letters.



In [122]:
def capital_words_spaces(str1):
  return re.sub(r"(\w)([A-Z])", r"\1 \2", str1)

print(capital_words_spaces("Python"))
print(capital_words_spaces("PythonExercises"))
print(capital_words_spaces("PythonExercisesPracticeSolution"))

Python
Python Exercises
Python Exercises Practice Solution


**Exercise 2.8**

Extract the table of content of the text.



In [116]:
text = """

Introduction
-------

This is the intro.

Chapter 1
-------

Hello. This is the first chapter and it contains numbers.

Chapter 2
-------

The middle of the book.

Chapter 3
-------

Finally the last chapter...

Conclusions
-------

The end of this torture.
"""

# your code here

def clean_torture(text):
    new = re.findall(r'\w+\s\d|\w{10,12}', text)
    print(new)

clean_torture(text)

['Introduction', 'Chapter 1', 'Chapter 2', 'Chapter 3', 'Conclusions']
