# RegEx:
- RegEx (Regular Expression) is a sequence of characters that forms a search pattern
- RegEx can be used to check if a string contains the specified search pattern

## How to install?
- `!pip install regex` - Run this command in jupyter notebook

In [1]:
import re

## RegEx Functions

The re module offers a set of functions that allows us to search a string for a match:

- findall - Returns a list containing all matches
- search - Returns a Match object if there is a match anywhere in the string
- split	- Returns a list where the string has been split at each match
- sub	- Replaces one or many matches with a string

### Metacharacters
|Character         | Description          | Example  |
| ------------- |:-------------:| ------:|
`[]`|	A set of characters	|"[a-m]"	
`\`	|Signals a special sequence (can also be used to escape special characters)	|"\d"	
`.`	|Any character (except newline character)	|"he..o"	
`^`	|Starts with	|"^hello"	
`$`	|Ends with	|"world$"	
`*`	|Zero or more occurrences	|"aix*"	
`+`	|One or more occurrences	|"aix+"	
`{}` |Exactly the specified number of occurrences	|"al{2}"	
`|`	|Either or	|"falls|stays"	
`()`| Capture and group	 

### Special Sequences
|Character       | Description          | Example  |
| ------------- |:-------------:| ------:|
`\A`	| Returns a match if the specified characters are at the beginning of the string	| "\AThe"	
`\b`	| Returns a match where the specified characters are at the beginning or at the end of a word	| r"\bain", r"ain\b"
`\B`	| Returns a match where the specified characters are present, but NOT at the beginning (or at the end) of a word| r"\Bain" , r"ain\B"	
`\d`	| Returns a match where the string contains digits (numbers from 0-9)	| "\d"	
`\D`	| Returns a match where the string DOES NOT contain digits	| "\D"	
`\s`	| Returns a match where the string contains a white space character	| "\s"	
`\S`	| Returns a match where the string DOES NOT contain a white space character	| "\S"	
`\w`	| Returns a match where the string contains any word characters (characters from a to Z, digits from 0-9, and the underscore _ character)	| "\w"	
`\W`	| Returns a match where the string DOES NOT contain any word characters	| "\W"	
`\Z`	| Returns a match if the specified characters are at the end of the string	| "India\Z"

### Sets
|Set        | Description          |
| ------------- |:-------------:|
`[arn]`	|Returns a match where one of the specified characters (a, r, or n) are present	
`[a-n]`	|Returns a match for any lower case character, alphabetically between a and n	
`[^arn]`	|Returns a match for any character EXCEPT a, r, and n	
`[0123]`	|Returns a match where any of the specified digits (0, 1, 2, or 3) are present	
`[0-9]`	|Returns a match for any digit between 0 and 9	
`[0-5][0-9]`	|Returns a match for any two-digit numbers from 00 and 59	
`[a-zA-Z]`	|Returns a match for any character alphabetically between a and z, lower case OR upper case	
`[+]`	|In sets, +, *, ., |, (), $,{} has no special meaning, so [+] means: return a match for any + character in the string

### Supported Regular Expression Flags
|Shott Name      | Long Name        | Effect |
| ------------- |:-------------:| ------:|
re.I	|re.IGNORECASE	|Makes matching of alphabetic characters case-insensitive
re.M	|re.MULTILINE	|Causes start-of-string and end-of-string anchors to match embedded newlines
re.S	|re.DOTALL	|Causes the dot metacharacter to match a newline
re.X	|re.VERBOSE	|Allows inclusion of whitespace and comments within a regular expression
----	|re.DEBUG	|Causes the regex parser to display debugging information to the console
re.A	|re.ASCII	|Specifies ASCII encoding for character classification
re.U	|re.UNICODE	|Specifies Unicode encoding for character classification
re.L    |re.LOCALE	|Specifies encoding for character classification based on the current locale

### search() function
Search the string to see if it starts with "The" and ends with "good":

In [2]:
text = "The food was good"
x = re.search("^The.*good$", text)

if x:
    print("Found a match")
else:
    print("No match")

Found a match


### findall() function
The findall() function returns a list containing all matches

In [3]:
text = "The food was good"
x = re.findall("oo", text)
print(x)

['oo', 'oo']


In [4]:
# Search for the first white-space character in the string:
text = "Indian food was good"
x = re.search("\s", text)

print("The first white-space character is located in position:", x.start())

The first white-space character is located in position: 6


In [5]:
# Program to extract numbers from a string

import re

string = 'hello 12 hi 89. Howdy 34'
pattern = '\d+'

result = re.findall(pattern, string) 
print(result)

# Output: ['12', '89', '34']

['12', '89', '34']


In [6]:
# Valid email ID
# Given an email ID, you have to determine if it is valid or not
import re
def checkmail(email):
    check =  bool(re.search(r"^[\w\.\+\-]+\@[A-Za-z]+\.[a-z]{2,3}$", email))
    if check:
        return 'valid'
    else:
        return 'invalid'
email=input()
print(checkmail(email))

a#12@gmail.com
invalid


### split() function
Split at each white-space character

In [7]:
text = "Indian food was good"
x = re.split("\s", text)
print(x)

['Indian', 'food', 'was', 'good']


In [8]:
# You can control the number of occurrences by specifying the maxsplit parameter
text = "Indian food was good"
x = re.split("\s", text, 1)
print(x)

['Indian', 'food was good']


### sub() Function
The sub() function replaces the matches with the text of your choice

In [9]:
# Replace every white-space character with the number $
text = "Indian food was good"
x = re.sub("\s", "$", text)
print(x)

Indian$food$was$good


In [10]:
# You can control the number of replacements by specifying the count parameter
text = "Indian food was good"
x = re.sub("\s", "$", text, 1)
print(x)

Indian$food was good


In [11]:
# Program to remove all whitespaces
import re

# multiline string
string = 'abc 12\
de 23 \n f45 6'

# matches all whitespace characters
pattern = '\s+'

# empty string
replace = ''

new_string = re.sub(pattern, replace, string) 
print(new_string)

# Output: abc12de23f456

abc12de23f456


In [12]:
'''
# Password Validation
The characteristics of a strong password include:
1. it should be at least 8 characters long
2. it should have at least one lowercase alphabet.
3.it should have at least one uppercase alphabet
4. it should have at least one number(0-9)
5. it should have at least one special character( a special character is considered among the following: [@%$*])

'''

pwd= 'DataScience123'
#write your code here
import re 
flag = 0
while True:   
    if (len(pwd)<8): 
        flag = -1
        break
    elif not re.search("[a-z]", pwd): 
        flag = -1
        break
    elif not re.search("[A-Z]", pwd): 
        flag = -1
        break
    elif not re.search("[0-9]", pwd): 
        flag = -1
        break
    elif not re.search("[@%$*]", pwd): 
        flag = -1
        break
    elif re.search("\s", pwd): 
        flag = -1
        break
    else: 
        flag = 0
        print("Valid") 
        break

if flag ==-1:
    print("Invalid")

Invalid


### Match Object
A Match Object is an object containing information about the search and the result

In [13]:
text = "Indian food was good"
x = re.search("In", text)
print(x)

<re.Match object; span=(0, 2), match='In'>


For more details refer below linl
- [RegEx_1](https://www.w3schools.com/python/python_regex.asp)
- [RegEx_2](https://realpython.com/regex-python/)

In [14]:
# Extras
# Some common interveiw questions

input_list = [[1,2,3],[4,5],[6,7,8,9]]
flat = []
for sublist in input_list:
    for item in sublist:
        flat.append(item)
print(flat)

[1, 2, 3, 4, 5, 6, 7, 8, 9]


In [15]:
# Given a string, you have to find the first n most frequent characters in it.
# You have to print the three letters in alphabetically sorted order.
from collections import Counter
string= 'ddddaacccb'
n=3

a = Counter(string).most_common(n)
b = [i[0] for i in a]
b.sort()
print(b)

['a', 'c', 'd']
