# Python Regex

Objectives:
  - parse strings using `str` functions
  - match patterns using `re` module

Resources:
  - https://docs.python.org/3/library/re.html
  - ECiP, Chapter 8

## `str` magic

Useful tools:

  - `in`
  - `str.split`
  - `str.find`
  - `str.count`
  - `str.replace`

In [None]:
s = '1 2 3 4 5'
s.split()

In [None]:
s = '1,2,3,4,5'
s.split(',')

In [None]:
s = 'hayneedlestack'
s.find('needle')

In [None]:
s[3:(3+len('needle'))]

In [None]:
s.count('e')

In [None]:
s.replace('e', ' ')

**Exercise**: Process the file `pwr.log` and store `kinf` as a function of `burnup`.  Note, this is a pretty simple file, but it's just special enough to make `np.loadtxt` not the answer. 

## regex

A REGular EXpression is a *pattern* that defines a set of strings that matches it. 

In [None]:
import re

In [None]:
p = '123' # the pattern

In [None]:
s = '123 abc' # the string that matches (or not)

In [None]:
re.match(p, s)

In [None]:
p = 'abc'

In [None]:
re.match(p, s)

In [None]:
re.search(p, s)

### Basic Special Characters

  - `.`  any character
  - `^` beginning of line
  - `$` end of line
  - `*` 0 or more
  - `+` 1 or more
  - `?` 0 or 1
  - `[1]` e.g., `[abc]` matches `a`, `b`, or `c` individually

In [None]:
s = "abc123abc456abc"
p = "abc"

### The Special `\` Sequences

  - `\d` any decimal digit
  - `\D` any character that is *not* `\d`
  - `\s` any whitespace character (`[ \t\n\r\f\v]`)
  - `\S` any character that is *not* `\s`

### Special Operations

  - `?` following `*` or `+` or `?` makes it *non-greedy*
  - `{m}` requires `m` repeats
  - `{m, n}` requires `m`, `m+1`, ..., or `n` repeats
  - `\` is the escape (except for the special sequences)
  - `|` "or" between arbitrary patterns
  - `(...)` group
  - `(?:...)` non-matching group
  - `(?P<name>...)` named group

In [None]:
result1 = re.search("(abc)\d", s)
result2 = re.search("(?:abc)\d", s)
result3 = re.search("(?P<foo>abc)\d", s)

In [None]:
s = """
    0.0000E+00   0.00000E+00 0.0000
    3.0000E-01   7.00000E-04 0.3778
    2.0000E+02   9.99300E-01 0.0003
"""
print(s)

In [None]:
pattern = r'\d.\d\d\d\dE[+-]\d\d'
re.search(pattern, s)

In [None]:
re.findall(pattern, s)