# Exceptions

Exceptions are errors that occur during runtime. Python will print out the type of Exception and other information as shown below:

In [1]:
x = 5
y = 0
z = x / y

ZeroDivisionError: division by zero

We can write defensive code to catch exceptions with try/except blocks as shown below.

In [2]:
try:
    z = x / y
except ZeroDivisionError:
    print("Error: Attempting to divide by zero")
except:
    print("Unexpected error")

Error: Attempting to divide by zero


The try/except blocks can also include else and finally clauses. See [the docs](https://docs.python.org/3/tutorial/errors.html) for a lengthy explanation.

Professional quality code would of course include such exception handling but when you are quickly developing an idea that you know will go through many cycles of rewrites then minimizing these try/except blocks makes the code easier to read and update. 

# Regular expressions

Regular expressions are strings that encode a pattern. You can then match this pattern to other strings to find, extract, or modify other strings.

Python has a regular expression module, **re**, that you need to import before using regular expressions. Regex is another lengthy topic that you can research in [the docs](https://docs.python.org/3/howto/regex.html). Regular expressions are commonly used in text
processing tasks. 

The first example demonstrates the re.sub method which has the form:
re.sub(pattern, replace, string, max=0)

In [3]:
import re

text = '555-444-1234'
numbers = re.sub(r'\D', '', text)  # get rid of non-digit chars
print(numbers)

5554441234


### re.match() and re.search()

The **re.match()** method matches a pattern with a string. The pattern is the first argument and the string is the second argument, and returns a match object or None. 

The following trivial example shows the difference between re.match which looks at the beginning of the string, and re.search which looks anywhere in the string.

There are some helpful methods/attributes for the match object.
* group() reutrns the string matched by the RE or None
* start() and end() return the starting and ending position of the match
* span() returns a tuple (start, end)

In [4]:
text = 'My dog is cuter than your dog.'
m = re.match(r'dog', text)
if m:
    print(m.group())
else:
    print("didn't find a dog")
    
m = re.search(r'dog', text)
if m:
    print(m.group())
else:
    print("didn't find a dog")

didn't find a dog
dog


Regex are very powerful but take quite a bit of practice to get used to. Here are a couple of resources that should prove helpful if you need to write a regex:

[Regex cheat sheet](https://pycon2016.regex.training/cheat-sheet)

[Regex checker](http://www.pythex.org)

## Regular Expressions: a deeper dive

Chapter 2 of the J&M book discusses regular expressions. Regex work similarly in most languages, we'll look at the examples from the J&M book in a Python environment.

### woodchuck

What if you want to search for:
* woodchuck
* woodchucks
* Woodchuck
* Woodchucks

We could write a messy Python if:

if 'woodchuck' in mystring or 'woodchucks' in mystring or .....

However, a better way to match all these variations, logically a disjunction, is to use regex. 

### using [ ]

Characters inside [ ] form logical disjuntions if they are listed sequentially as well as ranges:

* **[wW]oodchuck** will match *Woodchuck* or *woodchuck*
* **[0123456789]** will match a digit
* **[A-Z]** will match an upper case letter
* **[a-z]** will match a lower case letter
* **[0-9]** a single digit

In [5]:
text1 = 'How many chucks could Mr. Woodchuck chuck if woodchucks could chuck wood?'
m = re.search('[Ww]oodchuck', text1)
if m:
    print(m.group())
else:
    print("no woodchucks here")
print('m has type: ', type(m))


Woodchuck
m has type:  <class '_sre.SRE_Match'>


### search vs. findall

OK, we found a woodchuck but what if we wanted all the woodchucks? 

Notice above that **re.search** returned a match object or None if there is no match. A match object has the method **.group()** to return subgroups of the match. We'll see other examples of that later. 

The **re.findall** will find all matches. Notice that below, the type of *m* is a list.



In [6]:
m = re.findall('([Ww]oodchuck)', text1)
if m:
    print(m)
else:
    print("no woodchucks here")
print('m has type: ', type(m))

['Woodchuck', 'woodchuck']
m has type:  <class 'list'>


Some more examples:

In [7]:
m = re.findall('[A-Z]', 'The University of Texas at Dallas')
m

['T', 'U', 'T', 'D']

In [8]:
m = re.findall('[0-9]', 'Class is from 5:30 to 6:45 in Room SOM 11.210.')
m

['5', '3', '0', '6', '4', '5', '1', '1', '2', '1', '0']

In [9]:
# \d means any digit, so we get the same results as above
m = re.findall('\d', 'Class is from 5:30 to 6:45 in Room SOM 11.210.')
m

['5', '3', '0', '6', '4', '5', '1', '1', '2', '1', '0']

In [10]:
# you can have more than one item within the brackets
m = re.findall('[A-Z0-9]', 'Class is from 5:30 to 6:45 in Room SOM 11.210.')
m

['C',
 '5',
 '3',
 '0',
 '6',
 '4',
 '5',
 'R',
 'S',
 'O',
 'M',
 '1',
 '1',
 '2',
 '1',
 '0']

### negation in disjunction

We can express "nor" with the caret at the beginning of the [].

In [11]:
m = re.findall('[^A-Z]', "I love UTD.")
m

[' ', 'l', 'o', 'v', 'e', ' ', '.']

In [12]:
m = re.findall('[^a-z^]', "caret = ^.") # not a-z and not ^
m

[' ', '=', ' ', '.']

### some special characters

See the full list on the pythex.org site, but here are a few common ones:

* \escape 
* . matches any character
* ^ matches beginning of string
* $ matches end of string
* () group

In [13]:
m = re.findall('[\\n]', "line \n break")
m

['\n']

In [14]:
# capture blah at beginning of string followed by a single char
m = re.findall('(^blah.)', "blah1 blah2 blah3")  
m

['blah1']

In [15]:
m = re.findall('(blah.$)', "blah1 blah2 blah3")
m

['blah3']

### quantifiers

* * 0 or more (append ? for non-greedy)
* + 1 or more (append ? for non-greedy)
* ? 0 or 1 (append ? for non-greedy)

In [16]:
m = re.findall('colou?r', "In the US we spell it color but the British spell it colour.")  
m

['color', 'colour']

In [17]:
m = re.findall('(.*!*)', 'Wow! Super!! Tremendous!!!')
m

['Wow! Super!! Tremendous!!!', '']

In [18]:
m = re.findall('(.+?!+)', 'Wow! Super!! Tremendous!!!')
m

['Wow!', ' Super!!', ' Tremendous!!!']