# 1. Regular Expressions
----------------
- Regular Expressions are useful to identify `Strings with a special syntax`
- Allow us to match patterns in other strings
- Some of the use cases of regular expression are
    - Find all `web links` in a document
    - Parse `email` addresses, remove/replace unwanted characters
- Common regex patterns are 
![image.png](https://github.com/rritec/datahexa/blob/dev/images/ds%20re.png?raw=true)


- In below exercise, the `'r'` in front tells Python the expression is a `raw string`. 
- In a raw string, escape sequences are not parsed. 
- `For example:` `'\n'` is a single `newline` character.But, `r'\n'` would be two characters: a `backslash` and an `'n'`.


## `Use Split for Sentence Tokenization and Word Tokenization`

### Exercise 1: Sentence Tokenization
-------------------


In [4]:
import re

In [5]:
type(re)

module

In [6]:
my_string = """Let's write RegEx!  Won't that be fun?  I sure think so.  Can you find 4 sentences? Or perhaps, all 19 words?"""

`Big Question :` how to identify a sentance 
- in our case let us break sentance with . or ? or !

In [7]:
# Write a pattern to match sentence endings: sentence_endings
sentence_endings = r"[.?!]"

In [8]:
re.split?

In [9]:
# Split my_string on sentence endings and print the result
re.split(sentence_endings, my_string)

["Let's write RegEx",
 "  Won't that be fun",
 '  I sure think so',
 '  Can you find 4 sentences',
 ' Or perhaps, all 19 words',
 '']

In [10]:
# Split my_string on sentence endings and print the result
re.split(sentence_endings, my_string,maxsplit=2)

["Let's write RegEx",
 "  Won't that be fun",
 '  I sure think so.  Can you find 4 sentences? Or perhaps, all 19 words?']

### Exercise 2: Word Tokenization with space
-----------------

In [10]:
# Split my_string on spaces and print the result
spaces = r"\s+"
print(re.split(spaces, my_string))


["Let's", 'write', 'RegEx!', "Won't", 'that', 'be', 'fun?', 'I', 'sure', 'think', 'so.', 'Can', 'you', 'find', '4', 'sentences?', 'Or', 'perhaps,', 'all', '19', 'words?']


## `Use Findall For required data from text`
-----------------

### Exercise 3: First letter capital Words
---------------------------------

In [8]:
re.findall?

In [9]:
# Find all capitalized words in my_string and print the result
capitalized_words = r"[A-Z]\w+"
print(re.findall(capitalized_words, my_string))

['Let', 'RegEx', 'Won', 'Can', 'Or']


### Exercise 4: Print adverbs of sentence
------------------

In [21]:
re.findall?

In [11]:
text = "He was carefully disguised but captured quickly by police."
re.findall(r"\w+ly", text)

['carefully', 'quickly']

### Exercise 5: Print digits of sentence
--------------

In [12]:
# Find all digits in my_string and print the result
digits = r"\d+"
print(re.findall(digits, my_string))

['4', '19']


## `Match Vs Search Functions`
-------------------

### Exercise 6: Explore re.match vs re.search 
----------------

In [15]:
re.match?

In [16]:
# Try to apply the pattern at the start of the string, returning a match object, 
# or None if no match was found

In [17]:
re.search?

In [18]:
# Scan through string looking for a match to the pattern, returning
# a match object, or None if no match was found.

In [13]:
import re
re.match("b","abcdef") # No Match

In [20]:
re.match("abcd","abcd")

<_sre.SRE_Match object; span=(0, 4), match='abcd'>

In [19]:
re.search("bcd","abcdef") # Match

<_sre.SRE_Match object; span=(1, 4), match='bcd'>

[For more refer help](https://docs.python.org/3/library/re.html)

