### Notes:

|Character | Output|
|:--|:--|
|\d|digit |
|\D|anything other than digits |
|\w|letter, digit or underscore |
|\W|anything other than letter, digit and underscore |
|\\)|)|
|\\(|(|
|\s| Any space, tab, or newline character|
|\S| Any character that is not a space, tab, or newline.|

- **Steps for using regex** (re module)

    - Import the regex module with import re.
    - Create a Regex object with the re.compile() function. (Remember to use a raw string.)
    - Pass the string you want to search into the Regex object’s search() method. This returns a Match object.
    - Call the Match object’s group() method to return a string of the actual matched text.

- **Grouping with paranthesis**
    - (\d\d\d)-(\d\d\d-\d\d\d\d) -> group 1: first 3 digits, group2: digits after first 
    - `mo.group()` or `mo.group(0)` returns the entire matched string in a string format
    - `mo.group(n)` gives the nth matched string
    - `mo.groups()` gives the entire matched string in tuple format
- **Pipe operator -** `or`
    - the first occurrence of matching text will be returned as the Match object.
    - `heroRegex.search('Tina Fey and Batman').group()` will return `Tina Fey`

- **Question mark -?** - to match with an optional string or to declare a non-greedy match
    - "This is a mandatory part (while this is optional)?" is an alternate to "This is a mandatory part while this is optional | This is a mandatory part"
    - Adding a ? to the end of the {start, stop} operator matches the shortest string possible
- **Asterisk - * -** Matches any number of a particular occurence
    - used when we need to get any string which has any number of occurence of a particular string
    - (any)* -> returns any Match if it has "any"
    - If no `any` is found, it returns `None`
- **Plus operator-+** - matches one or more occurences

            - Regex A? matches zero or one occurrences of A.
            - Regex A* matches zero or more occurrences of A.
            - Regex A+ matches one or more occurrences of A.
    
- **Curly Braces-{}** - Return Match if a string repeats n number of times
    - (3times){3} - equivalent regex:  "3times3times3times"
    - (3times){3,5} - equivalent regex: "3times3times3times|3times3times3times3times|3times3times3times3times3times" -> 3, 4, or 5 occurence
    - BY default this returns the longest matching string even if there are shorter matching string. This approach is called *Greedy*. Inorder to get the shortest possible match, use **?**
- **search() vs. findall()**
    - search() returns only the first occurence while findall() returns every occurence in a list format
        - list of strings - if there are no groups
        - list of tuples of strings - if there are groups
- **CHARACTER CLASSES** 
    - Shorthand representation of broad long stings of regex
    - the class is defined in square bracket. [aieouAIEOU] can check for all vowels, capital or small
    - the contents in the class need not be escaped. 
    - Therefore, adding a space as shown `[aeiou AEIOU]` will match with space character
    - The classes can also contain ranges. `[a-zA-Z0-9]` will match every letters of English alphabet and every numeric digits
    - A caret `^` immediately after the opening of class bracket makes a negative class. Adding `^` anywhere inside character class checks for match with`^`
    
- $ and ^
    - dollar sign  at the end of the regex matches if the string ends with the specified regex
    - ^ at the start of a regex matches if the string if it begins with the specified regex 

### 1. What is the function that creates Regex objects?
> `re.compile()`

### 2. Why are raw strings often used when creating Regex objects?
> raw strings are used so that backslashes need not be escaped

### 3. What does the search() method return?
> `Match`

### 4. How do you get the actual strings that match the pattern from a Match object?
> `group()`

### 5. In the regex created from r'(\d\d\d)-(\d\d\d-\d\d\d\d)', what does group 0 cover? Group 1? Group 2?
> Group 0 - entire match, group 1 - first set of bracket, group 2 -second set

In [10]:
Regex = re.compile(r'(\d\d\d)-(\d\d\d-\d\d\d\d)')
mo = Regex.search('999-999-9999')
mo.group(1), mo.group(0), mo.group(2)

('999', '999-999-9999', '999-9999')

### 6. Parentheses and periods have specific meanings in regular expression syntax. How would you specify that you want a regex to match actual parentheses and period characters?
> By escaping using '\\'. `\.`, `\(`, `\)`

### 7. The findall() method returns a list of strings or a list of tuples of strings. What makes it return one or the other?

    list of strings - if there are no groups
    list of tuples of strings - if there are groups


### 8. What does the | character signify in regular expressions?
> `or`

### 9. What two things does the ? character signify in regular expressions?
> match 0 or 1 of the group, or non-greedy matching

### 10. What is the difference between the + and * characters in regular expressions?
- A* matches zero or more occurrences of A.
- A+ matches one or more occurrences of A.

### 11. What is the difference between {3} and {3,5} in regular expressions?
- {3} - Exactly 3 occurences
- {3,5} - Anywhere between 3 and 5 occurences

### 12. What do the \d, \w, and \s shorthand character classes signify in regular expressions?
- `\d` - digit
- `\w` - word
- `\s` - space

### 13. What do the \D, \W, and \S shorthand character classes signify in regular expressions?
> anything other than digits, words or spaces

### 14. What is the difference between .* and .*??
 - .* - greedy match
 - .? - non-greedy match

### 15. What is the character class syntax to match all numbers and lowercase letters?
> `[a-z0-9]`

### 16. How do you make a regular expression case-insensitive?
> `re.I` or `re.IGNORECASE` is passed as second argument to `re.compile()`

### 17. What does the . character normally match? What does it match if re.DOTALL is passed as the second argument to re.compile()?
- . matches everything except `\n`
- `re.DOTALL` matches `\n` too

### 18. If numRegex = re.compile(r'\d+'), what will numRegex.sub('X', '12 drummers, 11 pipers, five rings, 3 hens') return?

In [11]:
numRegex = re.compile(r'\d+')
numRegex.sub('X', '12 drummers, 11 pipers, five rings, 3 hens') 

'X drummers, X pipers, five rings, X hens'

### 19. What does passing re.VERBOSE as the second argument to re.compile() allow you to do?
> re.VERBOSE allows to write regular expressions that look better and are more readable by splitting logical sections of the pattern and add comments.

20. How would you write a regex that matches a number with commas for every three digits? It must match the following:

    '42'
    '1,234'
    '6,368,745'

but not the following:

    '12,34,567' (which has only two digits between the commas)
    '1234' (which lacks commas)

21. How would you write a regex that matches the full name of someone whose last name is Watanabe? You can assume that the first name that comes before it will always be one word that begins with a capital letter. The regex must match the following:

    'Haruto Watanabe'
    'Alice Watanabe'
    'RoboCop Watanabe'

but not the following:

    'haruto Watanabe' (where the first name is not capitalized)
    'Mr. Watanabe' (where the preceding word has a nonletter character)
    'Watanabe' (which has no first name)
    'Haruto watanabe' (where Watanabe is not capitalized)

22. How would you write a regex that matches a sentence where the first word is either Alice, Bob, or Carol; the second word is either eats, pets, or throws; the third word is apples, cats, or baseballs; and the sentence ends with a period? This regex should be case-insensitive. It must match the following:

    'Alice eats apples.'
    'Bob pets cats.'
    'Carol throws baseballs.'
    'Alice throws Apples.'
    'BOB EATS CATS.'

but not the following:

    'RoboCop eats apples.'
    'ALICE THROWS FOOTBALLS.'
    'Carol eats 7 cats.'


### DATE DETECTION

In [3]:
import re

In [15]:
dateRegex = re.compile(r'\d{2}/\d{2}/\d{4}')
date = input("ENTER DATE: ")
mo = dateRegex.search(date)
if mo == None:
    print("Invalid Date: Date is not in DD/MM/YYYY format")
else:
    day, month, year = date.split("/")
    day, month, year = int(day), int(month), int(year) 

    feb = 28
    if year%4==0:
        feb = 29
    elif year%100 == 0:
        pass
    elif year %400 == 0:
        feb = 29
    month_days = [31, feb, 31, 30,31,30,31,31,30,31,30,31]
    if month>=12:
        print("Invalid Date: Month Exceeded")
    elif day > month_days[month-1]:
        print("Invalid Date: Date exceeded")
    else:
        print("Valid date")

ENTER DATE: 22/34/3333
Invalid Date: Month Exceeded


### STRONG PASSWORD 
[incomplete]

In [39]:
passRegex = re.compile(r'[A-Za-z0-9]{8,}')
mo = passRegex.search("00000000")
mo.group()

'00000000'

### STRIP USING REGEX

In [2]:
#strip regex equivalent

import re

string = input("Enter the string:")
print("Your input:",string,"___")
stripthis = ' '

lstrip = r'^(\s)*' #starts with spaces

rstrip = r'(\s)*$' # ends with spaces

lstripRegex = re.compile(lstrip)
rstripRegex = re.compile(rstrip)

string = lstripRegex.sub('', string)
string = rstripRegex.sub('', string)
print("Output: (betweeen underscores)___",string,"___")


Enter the string:            spaces            
Your input:             spaces             ___
Output: (betweeen underscores)___ spaces ___
