# Regex Groups
*And Python's Groupdict*

[Groupdict](https://docs.python.org/3/library/re.html#re.Match.groupdict) is one of these features I'd seen before but never realized a use case for it until recently.

Regex allows defining named capture groups:

```python
>>> re.match(r"(?P<first_name>\w+) (?P<last_name>\w+)", "Malcolm Reynolds")
>>> {'first_name': 'Malcolm', 'last_name': 'Reynolds'}
```

## Use Case : Pandas

An example nginx entry:

```000.00.000.00 - [US] [04/Jan/2021:19:33:35 +0000] "GET /api/count HTTP/1.1" 200 19 "/some/path" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/87.0.4280.88 Safari/537.36" "000.00.000.00```

In [6]:
entry = ["""000.00.000.00 - [US] [04/Jan/2021:19:33:35 +0000] "GET /api/count HTTP/1.1" 200 19 "/some/path" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/87.0.4280.88 Safari/537.36" "000.00.000.00"
"""]

In [7]:
import pandas as pd
import re
df = pd.DataFrame(entry); df

Unnamed: 0,0
0,000.00.000.00 - [US] [04/Jan/2021:19:33:35 +00...


In [11]:
nginx_parse = re.compile(r'(?P<realip>[\d.]+)\s-\s\[(?P<country_code>\w+)]\s\[(?P<day>\d+)/(?P<month>[A-z][a-z]+)/(?P<year>\d{4}):(?P<time>[\d:]+)\s(?P<tzoffset>\+\d+)]\s"(?P<method>[A-Z]+)\s(?P<path>[/\w\d_]+)\s.+?(?P<status>\d{3})')

In [16]:
# Extracting into columns based on group name

pd.DataFrame(df[0].apply(nginx_parse.match).map(lambda m: m.groupdict()).tolist())

Unnamed: 0,realip,country_code,day,month,year,time,tzoffset,method,path,status
0,000.00.000.00,US,4,Jan,2021,19:33:35,0,GET,/api/count,200
