# Regular expressions

Info here:https://www.w3schools.com/python/python_regex.asp

Usefull web site to test RegEx: https://regex101.com/


Regular expressions are included in **re** module

Short for regular expression, a regex is a string of text that allows you to create patterns that help match, locate, and manage text.



In [18]:
# let's assume that we have some string

string = 'just some string and i am doing nothing. One more \'just\''
string2 = 'just'

In [19]:
# if we will use just "in" it returns eather False or True:

'just' in string

True

But with RegEx we can search and will return some object:

In [20]:
import re

In [21]:
re_srch = re.search('just', string)
re_srch

# 'just' here is called pattern

<re.Match object; span=(0, 4), match='just'>

As we see it returns Match object and indicates us the span(where exactly word located)

Let's see what can we do with this object

In [22]:
re_srch.span()

(0, 4)

In [23]:
re_srch.start()

0

In [24]:
re_srch.end()

4

We can create a pattern with pattern class and use it


In [25]:
pattern = re.compile('just')

And now use simply to search in string:

In [26]:
pattern.search(string)

<re.Match object; span=(0, 4), match='just'>

We can find all instances of pattern with **.findall** method:

In [27]:
pattern.findall(string)

['just', 'just']

**.fullmatch** method to find if string exactly the same as pattern

In [30]:
a = pattern.fullmatch(string)
print(a)

None


In [29]:
b = pattern.fullmatch(string2)
b

<re.Match object; span=(0, 4), match='just'>

### Where does it become handy?

RegeEx are used a lot, for example in email validation. We can check if there is valid email address or just some gibberish

Let's make exercise and code our own email validator:


### Email validation check exercise

Just google smt like *email validation regex python* and find already created RE

So I find this https://emailregex.com/

In [33]:
# so we have some user input and we need to check if it valid email address or not:

while True:
    email = input("enter email address")
    # r before string here means raw string. So everything that is next to it in quotes is just string and python don't interprete it elsehow
    pattern = re.compile(r"(^[a-zA-Z0-9_.+-]+@[a-zA-Z0-9-]+\.[a-zA-Z0-9-.]+$)")
    if pattern.match(email):
        print(f'Ok. Email {email} is valid')
        break
    else:
        print("email invalid try again")


enter email addressdsflle
email invalid try again
enter email addressafdd@ffg
email invalid try again
enter email addressAdefd-df0@ggj.cc
Ok. Email Adefd-df0@ggj.cc is valid


**r** before string here means raw string. So everything that is next to it in quotes is just string and python don't interprete it elsehow

So let's see exactly how this RE is working:

r"(^[a-zA-Z0-9_.+-]+@[a-zA-Z0-9-]+$\.$[a-zA-Z0-9-.]+$)"

() brackets means capturing group. There may be several groupes - not just one

^ carrot sign means that what we are looking for *starts with*

[a-zA-Z0-9_.+-] - so it starts with word (beacause of squeare brackets[]). This word is any letter, lower or uppercase, any number and  one of the signs _.+-

$+$ means Quaintifier - same as concatenator

@ means literaly at sign

$\.$ means literally **.** sign as simple dot (.) in RE means any single character

$ means end of a line

### Password validation check

Assume that password must be at least 8 characters long and consists of all letters and numbers + signs: $%&

Use https://regex101.com/ for solving any RE tasks to test RE

In [83]:
# RegEx will be like this: r'([a-zA-Z0-9$%&]+$)'
pattern = re.compile(r'([a-zA-Z0-9$%&]+$)')
password1 = 'sdfeWd3'
password2 = 'bfhr@dKK'
password3 = 'bfhI&dl%'
password4 = 'asdg dedvI%'
password5 = 'sdvlld asdfRvkd5'

In [84]:
pattern.match(password1)

<re.Match object; span=(0, 7), match='sdfeWd3'>

In [85]:
print(pattern.match(password2))

None


In [86]:
pattern.match(password3)

<re.Match object; span=(0, 8), match='bfhI&dl%'>

In [87]:
print(pattern.match(password4))

None


### Password validation but password must be ended with number

In [88]:
# RegEx will be like this: r'([a-zA-Z0-9$%&]+[0-9]+$)'
pattern = re.compile(r'([a-zA-Z0-9$%&]{7,}[0-9]+$)')
# Using the above example we just add
#{7,} to indicate that first part is 8 or more characters
#[0-9] - next part
#end of group(+$)

In [89]:
print(pattern.match(password1))

None


In [90]:
print(pattern.match(password2))

None


In [91]:
print(pattern.match(password3))

None


In [92]:
print(pattern.match(password4))

None


In [94]:
print(pattern.search(password5))

<re.Match object; span=(7, 16), match='asdfRvkd5'>


If we want to check for only one word or to match completely we need to add ^ to RE

In [95]:
pattern = re.compile(r'(^[a-zA-Z0-9$%&]{7,}[0-9]+$)')

In [96]:
print(pattern.search(password5))

None


Or use match or fullmatch methods