# Regex

In the regex task section, syntax used for the example text was retrieved from here: https://loremipsum.io/.

In [1]:
example_text = "Lorem ipsum dolorolo at sit amet, consectetur ipsumm adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco (laboris) nisi ut (aliquip) ex ea commodo consequat. [192.168.1.1:8080] Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum. [192.168.1.254:80]"

### Add the necessary libraries.

In [2]:
import re

### Find how many times the word `ipsum` appears in the text.

In [3]:
word_count = len(re.findall(r'\bipsum\b', example_text))
print(word_count)


1


### Find in the text how many times the string `ipsum` (not the word!) appears.

In [4]:
substring_count = len(re.findall(r'ipsum', example_text))

print(substring_count)

2


### Search the text for everything after the word `Excepteur` (Note: the word Excepteur should not be included in the output!)

In [5]:
after_excepteur = re.search(r'Excepteur(.*)', example_text, re.DOTALL)
print(after_excepteur.group(1).strip())

sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum. [192.168.1.254:80]


### Find all the words inside the brackets `()`.

In [6]:
inside_parentheses = re.findall(r'\((.*?)\)', example_text)
print(inside_parentheses)

['laboris', 'aliquip']


### Choose all the words that end with `at` (include the word 'at' also!).

In [7]:
ends_with_at = re.findall(r'\b\w*at\b', example_text)
print(ends_with_at)

['at', 'consequat', 'fugiat', 'occaecat', 'cupidatat']


### Find words that contain the string `olo` but do not start or end with that string.

In [8]:
contains_olo = re.findall(r'\b\w+olo\w+\b', example_text)
filtered_words = [word for word in contains_olo if not (word.startswith('olo') or word.endswith('olo'))]
print(filtered_words)

['dolore', 'dolor', 'dolore']


### Next find all IP addresses with the following form: `ip-address:port-number` (eg 192.168.1.1:8080) without static search or `r'\[.*?\]'`/similar syntax which returns all the content inside the square brackets.

In [9]:
ip_port_pattern = r'\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}:\d+'
ip_addresses = re.findall(ip_port_pattern, example_text)
print(ip_addresses)

['192.168.1.1:8080', '192.168.1.254:80']
