# Capturing Group

```
regex_email = r"\s([a-zA-Z0-9]+)@\S+" // grouping usernames of email address

email_matched = re.findall(regex_email, some_string)
```

# Multiple Groups

```
import re

some_string =  "Here you have your boarding pass LA4214 AER-CDB 06NOV"
regex = r"([A-Z]{2})(\d{4})\s([A-Z]{3})-([A-Z]{3})\s(\d{2}[A-Z]{3})"

flight_matches = re.findall(regex, some_string)
    
print("Airline: {} Flight number: {}".format(flight_matches[0][0], flight_matches[0][1]))
print("Departure: {} Destination: {}".format(flight_matches[0][2], flight_matches[0][3]))
print("Date: {}".format(flight_matches[0][4]))
```

# repeated group `(\d+)` vs. repeat a capturing group `(\d)+`

- repeated group `(\d+)` = one group 
- repeat a capturing group `(\d)+` = multiple groups 

# OR Groups

```
regex = r"(love|like|enjoy).+?the\s(movie|concert)\s(.+?)\." //  regex that matches sentences with optional words

re.findall(regex, some_string)
```

# Non-capturing groups

- Do not capture the grouped chars with `(?:..)`
- use with lazy `?` approach to capture little difference

```
regex_non = r"(hate|dislike|disapprove).+?(?:movie|concert)\s(.+?)\."

negative_matches = re.findall(regex_non, some_string)
```

# Parsing PDF files

```
some_string = "Signed on 05/24/2016"
regex_dates = r"Signed\son\s(\d{2})/(\d{2})/(\d{4})"
group_list = re.search(regex_dates, some_string)

# Assign to each key the corresponding match
some_dict = {
	"day": group_list.group(2), // access group with index
	"month": group_list.group(1),
	"year": group_list.group(3)
}
print("Our first contract is dated back to {data[year]}. Particularly, the day {data[day]} of the month {data[month]}.".format(data=some_dict))
```

# Close the tag, please!

```
match_tag =  re.match(r"<(\w+)>.*?</\1>", some_string) // referencing \1 with previous frequency
print("Your tag {} is closed".format(match_tag.group(1))) 
notmatch_tag = re.match(r"<(\w+)>", some_string)
print("Close your {} tag!".format(notmatch_tag.group(1)))
```

# Back-referencing

- catch repeated words

```
some_string = "I wish you a happy happy birthday!"
re.findall(r"(\w+)\s\1", some_string) // catch the duplicated words
```

- replace repeated words with single word

```
some_string = "I wish you a happy happy birthday!"
re.sub(r"(\w+)\s\1", r"\1", some_string)
```

# Named Referencing

- Normal syntax: `r"(?P<some_name>regex)regex(?P=some_name)`

```
some_string = "Your new code number is 23434. Please, enter 23434 to open the door."
re.findall(r"(?P<code>\d{5}).*?(?P=code)", some_string) // 
```

- removing duplicates with named referencing

```
some_string = "This app is not working! It's repeating the last word word."
re.sub(r"(?P<word>\w+)\s(?P=word)", r"\g<word>", some_string) // r"\g<group_name>"
```

# Reeepeated characters

```
regex_elongated = r"\w*(\w)\1\w*" // identify more than one frequency with \1 referencing

match_elongated = re.search(regex_elongated, some_string)
elongated_word = match_elongated.group(0)
```

# Looking Around

- Positive lookahead `(?=)` = before the pattern
- Positive lookbehind `(?<=)` = after the pattern

```
look_ahead = re.findall(r"\w+(?=\spython).", some_string) // before the word python

look_behind = re.findall(r"(?<=[pP]ython\s)\w+", some_string) // after the word python
```

- Negative lookahead `(?!)` = after the pattern
- Negative lookbehind `(?<!)` = before the pattern

```
neg_look_ahead = re.findall(r"(?<!\d{3}-)\d{4}-\d{6}-\d{2}", some_string)
neg_look_behind = re.findall(r"\d{3}-\d{4}-\d{6}(?!-\d{2})", some_string)
```