## Theory part

In [1]:
import re

In [2]:
pattern = re.compile('Apple')
text = 'I have an Apple'
re.findall(pattern, text)

['Apple']

In [3]:
list(re.finditer(pattern, text))

[<re.Match object; span=(10, 15), match='Apple'>]

Apple with a loswer case

In [4]:
text = 'I have an apple'

pattern = re.compile('[Aa]pple')

re.findall(pattern, text)

['apple']

Examples with \ metasymbols

In [5]:
text = 'I have 5 apples and 16 oranges'
pattern = re.compile('\d+')
re.findall(pattern, text)

['5', '16']

In [6]:
sum(map(int, re.findall(pattern, text)))

21

Another example with \

In [7]:
text = '\section{Introduction}'

pattern = re.compile('\\\\section')

re.findall(pattern, text)

print(re.findall(pattern, text)[0])

len('\section')

\section


8

In [8]:
len('\\section')

8

In [9]:
pattern = re.compile(r'\\section')

print(re.findall(pattern, text)[0])

\section


Example with mail adresses in big text

In [10]:
pattern = re.compile(r'[a-z0-9-]+@[a-z]+\.[a-z]+')

with open('coursera_contact.txt', 'r') as f:
    text = f.read()

re.findall(pattern, text)

['press@coursera.org']

<br>
<br>

***
## Practice part

Implement a pattern, that matches any word, that contains lowercase 'a' letter.

**Example.** A pattern should match all words containing "a" in string 'this is an apple!"

A pattern should have no matches in strings like:

- "crow"

- "Adult" (because there is an uppercase "A")

In [11]:
p = re.compile(r'\b[a-z]*a[a-z]*\b')
text =  'this is an apple!'
print(re.findall(p, text))
text = 'hello man an can anon 123 a!'
print(re.findall(p, text))

['an', 'apple']
['man', 'an', 'can', 'anon', 'a']


***

Implement a pattern, that matches integers.

**Example.** A pattern should match all integers in string, positive or negative: "There are 15 apples and −2 oranges!"

In [12]:
p = re.compile('([0-9]+)|(-[0-9]+)')
text = "There are 15 apples and −2 oranges!"
print(re.findall(p, text))

[('15', ''), ('2', '')]


***

Implement a pattern, that matches positive real numbers.

**Example.** A pattern should match all positive numbers in string, including decimals: "There are 1.5 apples and 2 oranges!"

In [13]:
p = re.compile('(?:\d+(?:\.\d*)?|\.\d+)')
text = "There are 1.5 apples and 2 oranges!"
print(re.findall(p, text))

['1.5', '2']


***

Implement a pattern, that matches time in 24-hour format: HH:MM.

**Example.**

A pattern should match time: "At 17:00, or at 14:00, or at 25:61".

There could be an optional leading 0 for early hours: "At 4:30 or 04:30".

In [14]:
p = re.compile('([012][0-9]:[012345][0-9])|([0-9]:[012345][0-9])')
t1 = "At 17:00, or at 14:00, or at 25:61"
t2 = "At 4:30 or 04:30"
print(re.findall(p, t1))
print(re.findall(p, t2))

[('17:00', ''), ('14:00', '')]
[('', '4:30'), ('04:30', '')]


***

Implement a pattern, that matches a date in format YYYY-MM-DD:

**Example.** 
A pattern should match all appropriate dates: "It was a long time ago. In 1888-01-01 or in 2001-01-01 or in 1999-13-40."

In [15]:
p = re.compile('([1-9][0-9]{3})-(0[1-9]|1[012])-(0[1-9]|[12][0-9]|3[01])')
text = '1881-02-21'
print(re.findall(p, text))

[('1881', '02', '21')]


***

Implement a pattern, that matches an appropriate username. By username here we mean a string of 3 to 16 alphanumeric characters. A username can also contain symbols '-' or '_'.

**Example.** A pattern should match an appropriate username "Hi! I'm megaduck2010!".

In [16]:
p = re.compile('[a-zA-Z0-9_-]{3,16}')
text = "Hi! I'm megaduck2010!"
print(re.findall(p, text))

['megaduck2010']


***

Implement a pattern, that matches an e-mail address. In this problem an e-mail address is expected to have several properties:

- It has "@" symbol
- Local-part (before "@"-symbol) is alphanumeric, but also can contain symbols "_", "-" and "."
- It has a domain part (after "@"-symbol). The domain part may consists of several literal dot-separated parts.

**Example.** A pattern should match all e-mail addresses in a string "Hi! I am writing to you from example@example.com to hello@bye.com.org".

In [17]:
p = re.compile('([a-zA-Z0-9_-]+@[a-z]+\.[a-z]+)(\.[a-z]+)*')
text = "Hi! I am writing to you from example@example.com to hello@bye.com.org"
print(re.findall(p, text))

[('example@example.com', ''), ('hello@bye.com', '.org')]


***

Implement a pattern, that matches an IP address (of IPv4 form). IP address has several properties:
 - There are 4 dot-separated numerical parts.
 - Numerical part can be any number from 0 to 255.

**Example.** A pattern should match all valid IP addresses: "These are 0.0.0.1, 169.255.0.0 and 256.256.0.0 "

In [18]:
p = re.compile('((25[0-5]|2[0-4][0-9]|1[0-9][0-9]|[1-9]?[0-9])\.){3}(25[0-5]|2[0-4][0-9]|1[0-9][0-9]|[1-9]?[0-9])$')
text = "These are 0.0.0.1, 169.255.0.0 and 256.256.0.0 "
print(re.findall(p, text))

[]


***

Implement a pattern, that matches HTTP-links. A link has several properties:

- It starts with "http://" or "https://"

- It has a domain part. Domain part can contain alphanumeric characters and a hyphen '-'. There are at least two domain parts in a proper link (www.google.com, hello123.com.org, 2000.com).

- It may have a relative-path part after the domain part. A relative path is forward-slash "/"-separated alphanumeric (also may include dots '.', hyphens '-' and underscore '_') sequence of chars. (/page.html, /some/path, /another/path/)

**Example.** A pattern should match all valid HTTP-links: "Welcome to https://wel.com/e, stranger!".

In [19]:
p = re.compile(r'(http|ftp|https):\/\/([\w_-]+(?:(?:\.[\w_-]+)+))([\w.,@?^=%&:\/~+#-]*[\w@?^=%&\/~+#-])')
text = "Welcome to https://wel.com/e, stranger    http://hello123.com.org/e!  http://2000.com.adv.org/e/23/dd"
print(re.findall(p, text))

[('https', 'wel.com', '/e'), ('http', 'hello123.com.org', '/e'), ('http', '2000.com.adv.org', '/e/23/dd')]
