## In this repository we will explore the basic functionality of python re library.

## But first, a quick reminder of regex
### These are the most common regular expression operators by category, and most of the time you'll use them to create your regular expressions. However, there are many more specialized operators that can be used for more advanced pattern matching.

### Anchors

^: Matches the start of a line
$: Matches the end of a line

### Character classes

\w: Matches any word character (alphanumeric and underscore)
\d: Matches any digit
\s: Matches any whitespace
\b: Matches a word boundary
\W: Matches any non-word character
\D: Matches any non-digit character
\S: Matches any non-whitespace character
\B: Matches a word boundary

[]: Matches any character inside the brackets
[a-z]: Matches any lowercase letter
[A-Z]: Matches any uppercase letter
[0-9]: Matches any digit
[a-zA-z0-9]: Matches any letter or digit

\p{Punct}: Matches any punctuation character
\p{Lower}: Matches any lowercase letter
\p{Upper}: Matches any uppercase letter

### Quantifiers

*: Matches zero or more of the preceding element
+: Matches one or more of the preceding element
?: Matches zero or one of the preceding element
{n}: Matches exactly n of the preceding element
{n,}: Matches n or more of the preceding element
{n,m}: Matches at least n and at most m of the preceding element

### Special characters

.: Matches any character except newline
\: Escapes the following character
|: Alternation operator, matches the preceding or the next pattern (Alternation)

### Groups

(): Creates a capture group
(?:): Creates a non-capturing group

Lookaround:
(?=...): Positive lookahead
(?!...): Negative lookahead
(?<=...): Positive lookbehind
(?<!...): Negative lookbehind

### The main re methods

1. re.compile(pattern, flags=0): This method is used to create a regular expression object that can be used to perform multiple operations such as search, match, finditer, findall, sub, and split.

2. search(string): This method searches for the first occurrence of a pattern in a string and returns a match object if found.

3. match(string): This method matches the pattern only at the beginning of the string and returns a match object if found.

4. findall(string): This method returns a list of all non-overlapping matches of the pattern in the string as a list of strings.

5. finditer(string): This method returns an iterator yielding match objects for all non-overlapping matches of the pattern in the string.

6. sub(repl, string): This method replaces all occurrences of the pattern in the string with a replacement string.

7. split(string): This method splits the string by the occurrences of the pattern.

The methods 2-7 can be either called directly from re library ( re.search(regex_pattern, string) ) or
as a pattern object method ( re.compile(regex_pattern).search(string) ).
In this tutorial we will follow the latter.

In [2]:
import re

In [3]:
string = '''
The Cat in the hat, sat there with a fat rat. He ate a mat, and a bat came out of nowhere to attack.
He had 2 cats, one was black and one was white. He also had 4 dogs, 2 of them were big and 2 of them were small.
He wanted to buy a new Dog, but he couldn't decide between a Labrador or a Golden Retriever.
In the end, he decided to adopt a stray Cat from the street. He painted his room in #ff0000 color.
His email address is moew@gmail.com and his phone numbers 555-555-5555, 555.555.5555, 555555-5555, 555555.5555, 1234560000.
He was born on 12/05/2018, and he graduated on 20/01/2023. He also has a website https://www.example.com.
He likes to shop online and his favorite store is http://store.com and also likes to shop at www.store2.com.
His favorite color is blue and he likes to buy items that cost $9.99, $19.99, $29.99 and also $39.99.
He also likes to buy items that have a discount of $5.99 and $9.99. He has a dog named Buddy with a collar number #123456.
His best friend's email is bestfriend@example.com and his phone number is 555-555-5556.
They usually meet every 15th of the month at 7:00 pm.
My current IP address is 192.168.1.1 but I'm planning to switch to a new one, it's 10.0.0.1. 
I just made a purchase using my credit card number 4111 1111 1111 1111 with expiration date of 12/24 and security code 123.
I can't wait to share my new book #datascience on twitter and Instagram.
it's ISBN number is 123-4-56789-0123-5 and it will be available on Amazon and Barnes & Noble.
My friend @johndoe lives at 123 Main St, Anytown USA 12345 and he told me that his neighbor lives at 124 Main St, Anytown USA 12345.
I'm planning to visit my family in Los Angeles, CA on 12/24/2022 and stay there for 2 days. I'll be staying at my cousin's house located at 1234 Elm St, Los Angeles CA 90210.
I'm also planning to visit my friend in New York, NY on 24-12-2022 and stay there for 2 days, I'll be staying at my friend's apartment located at 123 Broadway, New York NY 10001.
I'm also planning to visit my sister in London, UK on 2022-12-24 and stay there for 2 days, I'll be staying at my friend's flat located at 123 Oxford St, London UK W1D 2LJ.
I need to remember to bring my passport with expiration date of 12/24/2022, my driver's license with number 12345678 and my insurance card with number 123456789.

'''

## Starter Exercises

1. Check if string starts with blanks
2. Check if the text contains phone numbers
3. Check if the text contains the word cat case-insensitive
4. Extract the first phone number in the format 555-555-5555 from the text.
5. Extract the first email address from the text.
6. Extract all the prices in the format $999.99 from the text.
7. Extract and create a list of all the URLs from the text.
8. Replace all the collar numbers in the format ddd-ddd-dddd with "ddd-XX-XXXX"
9. Split the text into a list of sentences.

In [21]:
pattern = re.compile(r'\S+')
pattern.match(string)

In [12]:
pattern = re.compile(r'\d{3}-\d{3}-\d{4}')
if pattern.search(string): print("Phone nubmer exists")

Phone nubmer exists


### More Advanced Exercises

1. Extract all the dates in the format DD/MM/YYYY from the text considering if they are valid dates
2. Extract all the email addresses from the text considering if they are valid 
3. Extract all the phone numbers from the text in different formats (xxx-xxx-xxxx, xxx.xxx.xxxx, xxxxxxxxxx, xxx xxx xxxx)
4. Extract all the IP addresses from the text considering if they are valid IP addresses
5. Extract all the URLs from the text and check if they are valid URLs
6. Extract all the credit card numbers from the text considering if they are valid credit card numbers
7. Extract all the hashtags from the text (e.g. #example)
8. Extract all the mentions from the text (e.g. @example)
9. Extract all the dates and times in multiple formats (MM/DD/YYYY, DD-MM-YYYY, YYYY-MM-DD)
10. Extract all the ISBN numbers from the text and check if they are valid ISBN numbers
11. Extract all the street addresses from the text
12. Extract all the zip codes from the text
13. Extract all the words that contains both vowels and consonants from the text
14. Extract all the words that contains at least 3 vowels from the text
15. Extract all the words that are palindrome from the text
16. Replace all the street addresses in the text with the string "ADDRESS REDACTED" while keeping the zip codes
17. Replace all the credit card numbers in the text with the string "CARD NUMBER REDACTED" while keeping the expiration dates
18. Split the text into a list of sentences, where each sentence starts with a capital letter and ends with a full stop,       exclamation mark or question mark.
19. Split the text into a list of paragraphs, where each paragraph starts with a newline and two spaces.
20. Split the text into a list of words, where each word starts with a capital letter.