# Regular expressions
## Finding stuff in strings

When handling data in the form of character strings we're usually interested in finding things - either to count the occurrences or to locate and extract fragments.   Generally raw data strings are messy - especially if they're not system generated - variations in spellings, punctuation, whitespace use, and abbreviations can make raw text difficult to search.   Character strings are also the place where most cleaning time is spent; when handling numeric data you at least know the values you are looking at are numbers, but in a character string anything may be present and requiring cleaning.

Regular expressions are intended to make the 'finding' stuff more manageable by allowing search patterns to be used which can be tailored to specific requirements.  In the following we'll see how they can be used to find and extract subsets of strings for further processing.  Regular expressions are widely used - so some basic familiarity with the notation and their application is a useful skill to acquire.

## Using string methods in programming libraries
Programming language string-handling methods, and libraries often contain a rich collection of techniques for finding and manipulating text in strings.

Python string methods include the `replace(<find>, <replace>)` method which replaces any substrings which match the `<find>` string with the `<replace>` string.    

So the following Python replaces `'this'` with `'that'` in the string `'this string values contains this'`.


In [None]:
'this string value contains this'.replace('this', 'that') 

Suppose we start with the sentence 
`'The quick brown fox jumps over the lazy dog and the lazy cat.'`
and use the `replace()` method to replace the string `'the'` with the string `'a'`.

In [None]:
'The quick brown fox jumps over the lazy dog and the lazy cat.'.replace('the', 'a')

Notice the limitation of the simple text matching – `'the'` doesn’t match with the `'The'` at the start of the sentence: `'t'` and `'T'` have different character representations at the binary level, so `'t'` and `'T'` do not match one another.

Okay, so let’s switch the text back and try again

In [None]:
'The quick brown fox jumps over a lazy dog and a lazy cat'.replace('a', 'the')

Oh dear, that doesn’t look right – our simple find and replace operation has replaced all occurrences of the `'a'` character not just the word `'a'` (which has spaces or punctuation either side of it).

So the way to switch the `'a'`s back to `'the'`s is to find and replace a string that includes the spaces around the `'a'`

In [None]:
'The quick brown fox jumps over a lazy dog and a lazy cat'.replace(' a ',' the ')

To simplify this kind of searching and matching we need a more expressive way of specifying the search string - a way of specifying patterns rather than explicit strings to match.  

In computer science the general term for such a text pattern is a *regular expression*.  Here we will examine some basic features of regular expressions for text strings (remember we’ve looked at CVS and JSON files which are fundamentally text).    

If you’ve looked at the range of search features in search engines, editors and such like you will be aware that you can add extra markers to the search phrase, or set different search instructions (such as to ignore capitalisation) that affect the behaviour of the matching.   

At the heart of this process is a pattern-matching engine, and - depending on the complexity of the software you are using - the patterns that can be specified can be quite complex, and very specific. Both OpenRefine and Python (and Java, and several other programming language libraries) use a regular expression language derived from Perl a scripting programming language with strong string-handling functionality.  We’ll give a flavour of the pattern-matching language here, but depending on how much text processing you’re planning to do, you may want to find additional time to explore regular expressions further.

## Regular expression primer

The Python `re` library (`re` for regular expression) has a number of methods allowing us to apply regular expressions.  

In [None]:
import re

We will use the `search()` method to show how different expression patterns are matched in text strings:

    re.search(<pattern>, <text string>)

This searches for the first occurence of `<pattern>` in the `<text string>` and returns a list of the matched pattern components.  The matched groups are accessed using the `group` method and the full matched pattern is element `0`.  (We'll come back to this later, but our few first examples will only return a list with a single match and we will show the pattern matched.) 

So we can find, and show we've found, a text string as we did with the string replace method earlier.

In [None]:
# A simple text string match to find abc in the longer string.
matchObject = re.search('abc', 'aaaaaaabccccccc')
matchObject.group(0)

There are other methods available in the `re` library that will report where in the text string the match occurred and some allowing replacement or removal of the matched patterns.   

Note: if the pattern fails to match, `search` returns nothing to the assignment and the subsequent `group(0)` fails.   A simple boolean test will tell you if the match succeeded or failed:

`matchObject = re.search(<pattern>, <string>)
if matchObject:
    ...`

## Matching literal strings

The most basic regular expression is a literal string of one or more characters that you want to find in the text.  The previous example was a literal string match.

So,  `'abc'` will match `'abc'` inside a string i.e. `'hello abc world'`.

In [None]:
matchObject = re.search('abc', 'hello abc world')
if matchObject:
    print(matchObject.group(0))
else:
    print('No match.')

## Matching single character wildcards

A wildcard is a pattern that will match against any substrings.

The first of the wildcards is the single fullstop `'.'` which matches any single character inside a string.

So,  `'a.c'` will match `'abc'`, `'adc'`, `'a9c'`, but not `'abbc'` or `'a9sc'`

In [None]:
# So a.c will match abc 
matchObject = re.search('a.c', 'aaaaaaaabcccccc')
if matchObject:
    print(matchObject.group(0))
else:
    print('No match.')

# a.c will match adc
matchObject = re.search('a.c', 'aaaaaaadccccccc')
if matchObject:
    print(matchObject.group(0))
else:
    print('No match.')

# a.c will match a c
matchObject= re.search('a.c', 'aaaaaaa ccccccc')
if matchObject:
    print(matchObject.group(0))
else:
    print('No match.')

# To match the actual '.' character preceed it with the '\' (escape) character
# a\.c will match a.c
matchObject= re.search('a\.c', 'aaa c adc abc aaa.ccccccc')
if matchObject:
    print(matchObject.group(0))
else:
    print('No match.')

# a.c will not match abbc - the . matches a single character in the pattern
matchObject= re.search('a.c', 'aaaaaaabbccccccc')
if matchObject:
    print(matchObject.group(0))
else:
    print('No match.')

#### Aside
It's going to get tedious to write out this full fragment for each example. I'll create a function to apply the pattern and print either the matched text, or 'No match.'

In [None]:
def apply_pattern(pattern, search_in):
    matchObject = re.search(pattern, search_in)
    if matchObject:
        print(matchObject.group(0))
    else:
        print('No match.')


In [None]:
# Now test it:

# To match the actual '.' character, preceed it with the '\' (escape) character.
# a\.c will match a.c
apply_pattern('a\.c', 'aaa c adc abc aaa.ccccccc')

# a.c will not match abbc. The . matches a single character in the pattern.
apply_pattern('a.c', 'aaaaaaabbccccccc')


## Matching one character from a set of characters

To match one character from a set of possible matching characters, surround the set of characters with square brackets []. 

So,  `'a[bcd]c'` will match `'abc'`, `'acc'`, `'adc'`, but not `'abcdc'` 

In [None]:
# So, a[bcd]c will match abc  
apply_pattern('a[bcd]c', 'aaaaaaaabcccccc')

# So, a[bcd]c will match abc  
apply_pattern('a[bcd]c', 'aaaaaaaaccccccc')

# So, a[bcd]c will match abc  
apply_pattern('a[bcd]c', 'aaaaaaaadcccccc')

# a.c will not match abbc etc. as the [bcd] pattern matches a single character in the pattern
apply_pattern('a[bcd]c', 'aaaaaaabbccccccc')


So what do you think the pattern `'I am 2[1234] years old'` will match?

This pattern will match a string beginning `'I am 2'` and ending `' years old'` where the age is one of 21, 22, 23 or 24.


You can specify a range  inside the set, for example, `'[1-4]'` or `'[a-e]'`. 

And you can mix ranges and characters such as `'[1-468]'` which is equivalent to `'[123468]'`.

You can also use `'[^'` at the start to indicate any character NOT in the set.

So `'2[^1234]'` will *not* match `21`, `22`, `23` or `24`, but will match any text string with the `2` followed by a character.

In [None]:
apply_pattern('2[1-4]', '25262a272324')

apply_pattern('2[^1234]', '212223242526')

apply_pattern('2[^1234]', '212a23')


In [None]:
# Now try some variations of the above patterns in this cell to
# make sure you are comfortable with these simple patterns.
apply_pattern('put your pattern here', 'hello abc world')


# Matching one pattern from a set of patterns

If we want to be able to match one from a set of strings (not single characters) we can separate each string with a | character (vertical bar). Most systems require the list to be surrounded by brackets.

In [None]:
apply_pattern('(this|that|the other)',
              'I want to find the other, not this or that.' )                           

## Matching repetitions of pattern parts

There are also ways to indicate repetition of patterns: you follow the pattern with a special character, and if the pattern you want to repeat has more than one part you surround the extended pattern with brackets.


To match one or more repetitions of a pattern we use `+` after the pattern so `'a+bc'` will match `'abc'` `'aabc'` `'aaabc'`, etc.  but not `'aabbc'`.

To match zero or more repetitions of a pattern we use `*` after the pattern so `'a*bc'` will match  `'anythingbc'` `'abc'`, `'aabc'`, `'aaabc'`, etc. and this time will match `'aabbc'`.

(Can you see why?   The `'a*'` allows for zero repetitions of `'a'`, which means the `'bc'` can match without any preceeding `'a'`s and `'bc'` matches the last two characters of `'aabbc'`.)

To match zero or exactly one repetition we can use `?` after the pattern so `'a?bc'`  will match   `'aaabc'`,  `'kabc'`,  and `'kbc'`. 

And you can specify the exact, minimum or range of repetitions that you want to accept using 
- `{n}` for exactly n repetitions 
- `{n,}` for at least n repetitions 
- `{n,m}` for at least n but no more than m repetitions.

So   
-      `'a{3}'` will match `'aaa'`
-      `'a{3,}'` will match `'aaa'`, `'aaaa'`, `'aaaaa'`, `'aaaaaa'`, etc.
-      `'a{3,5}'` will match `'aaa'`, `'aaaa'`, `'aaaaa'`.

In [None]:
# So + is one or more 
apply_pattern('a+bc', 'abc')

apply_pattern('a+bc', 'aabc')

apply_pattern('a+bc', 'aaaaabc')


In [None]:
# * is zero or more 
apply_pattern('a*bc', 'anythingbc')

apply_pattern('a*bc', 'aaaaabc')

apply_pattern('a*bc', 'aabbc')


In [None]:
# ? is zero or exactly one 
apply_pattern('a?bc', 'anythingbc')

apply_pattern('a?bc', 'aaaaabc')

apply_pattern('a?bc', 'aabbc')

In [None]:
# and the behaviour of () is to group the pattern to allow multi-part pattern repetition, 
# e.g. ()+  
apply_pattern('(ab)+', 'aaabababbbbbb')

In [None]:
# a{3} exactly three repetitions
apply_pattern('a{3}', 'aaaaaa')

# a{3,} three or more repetitions
apply_pattern('a{3,}', 'aaaaaaaaaaaaaaaaaaaaabc')

# a{3,5} between three and five repetitions
apply_pattern('a{3,5}', 'aabaaaaaabcaaaabc')


The `+*?` and `{}` are referred to as **greedy** qualifiers: they will attempt to match the longest possible repeated string that matches the pattern; if you follow these with a `?` you get the non-greedy or **minimal** match.   

Note the difference in the following matches, caused by the additional `?` in the pattern.

What will `'a+bc'` match in the string `'xxxaaaaabcxx'`; what about `'a+?bc'` in the same string?

In [None]:
# a{3,} three or more repetitions is the greedy version
apply_pattern('a{3,}', 'aaaaaaaaaaaaaaaaaaaaabc')


# a{3,}? is the non-greedy or minimal match version
apply_pattern('a{3,}?', 'aaaaaaaaaaaaaaaaaaaaabc')


## The escape character to allow matching of special characters

If you want to match any of the special characters you can preceed it with a backslash `\ ` to ‘escape’ the wildcard and treat it as a normal character so `'Rock\+Roll'`  will match `'Rock+Roll'` and`'a+\?'` will match `'a?'`, `'aa?'`, `'aaa?'`, etc.
          

In [None]:
# The escape character '\'
apply_pattern('ROCK\+ROLL','I love ROCK+ROLL')


## Other special characters
Finally there are special character sequences for position of the pattern in the text string, or special characters or common groupings of characters: 

- `^aa` will match `aa` if it appears at the beginning of a line in the search text
- `bb$` will match `bb` if it appears at the end of the line 
- `\bccc` will match `ccc` if it is at the start of a word (`\b` matches any word boundary so guess what `ccc\b` does!) 
- `\d` will match any digit, so `\d` is equivalent to `[0123456789]`, or `[0-9]` 
- `\D` is any non-digit 
- `\s` is any whitespace character 
- `\S` is any non-whitespace character 
- `\t` is a tab character.


### Exercise
Think about strings that the patterns below will match - then use the code cell below to text your understanding.

i)  `'(ab){3}'`

ii)  `'c[oa]t'`

iii)  `'^Price:[£$][0-9]+\.[0-9]{2}$'`

In [None]:
# Test the pattern against your own text strings to confirm your understanding
apply_pattern('<pattern>', '<test string>')


In [None]:
# Sample solution - before you run the cell check you know what will be matched,
# and what will return No match.

# i) '(ab){3}'  will match exactly the string `ababab` - three occurences of ab
apply_pattern('(ab){3}', 'dsehabababdkjdia')
apply_pattern('(ab){3}', 'xxxababxxx')
apply_pattern('(ab){3}', 'xxxabababababxxx')

# ii) 'c[ao]t will match either `cat` or `cot`
apply_pattern('c[oa]t', 'A coat or a cat')
apply_pattern('c[oa]t', 'A coat or a cot')

# iii) '^Price:[£$][0-9]+\.[0-9]{2}$' will match: 
#      the string 'Price:' at the start of a line 
#      followed by a pount or dollar sign, 
#      then a series of 1 or more digits digit, 
#      a decimal point, and two digits at the end of a line.
apply_pattern('^Price:[£$][0-9]+\.[0-9]{2}$', 'Price:£4.23')
apply_pattern('^Price:[£$][0-9]+\.[0-9]{2}$', 'Price:£.93')
apply_pattern('^Price:[£$][0-9]+\.[0-9]{2}$', 'Price:£0.93')
apply_pattern('^Price:[£$][0-9]+\.[0-9]{2}$', 'Price:£0.9')
apply_pattern('^Price:[£$][0-9]+\.[0-9]{2}$', 'Price:$99999.99')
apply_pattern('^Price:[£$][0-9]+\.[0-9]{2}$', 'Full Price:£4.23')
apply_pattern('^Price:[£$][0-9]+\.[0-9]{2}$', 'Price:£4.23 - cheap!')

### Exercise

Write a single regular expression to match each of the following:

i) either `Rd`,  `Rd.`,  `Road` or `Road.`  if they appear anywhere in a string.

ii) character sequences for the years `1951` to `1963` inclusive (assume they are suffounded by spaces).

Use the cells below to test your patterns (remember there will be more than one pattern that will match these).


In [None]:
# Test the pattern against your own text strings to confirm your understanding:
apply_pattern('<pattern>', 'We live at the end of the Rd. at #2')

apply_pattern('<pattern>', 'We live at the end of the Rd, at #2')

apply_pattern('<pattern>', 'We live at the end of the Road, at #2')

apply_pattern('<pattern>', 'We live at the end of the Road.')


In [None]:
# Also test the pattern against your own text strings to confirm your understanding:
apply_pattern('<pattern>', ' 1940, 1951, 1953 1999 ')
apply_pattern('<pattern>', ' 1940, 1949, 1963 1999 ')
apply_pattern('<pattern>', ' 1940, 1949, 1965 1999 ')
apply_pattern('<pattern>', ' 1940, 1949, 1950 1999 ')

In [None]:
# Sample solutions:

# i)  'R(oa)?d\.?'
# The (oa)? matches zero or exactly one occurence of oa and \.? is zero or one occurrences of '.'

# ii)  ' 19(5[1-9]|6[0-3]) ' here we have alternate patterns for the 50s and 60s.

## Extracting the parts of the pattern matched

For data cleaning purposes we want to be able to extract the pattern we have matched from the text string.  

This gives us the ability to match parts of a string, extract the matched substrings, and make the substrings available via a variable.   

In Python and OpenRefine (and Perl) any grouping or pattern expression surrounded by brackets `( )` can be referred to by using a series of numbered variables, one per bracketed set.   The numbering takes a bit of getting used to as the pattern number works by counting the number of opening brackets starting from the left. So  `\0` means the whole matched string, `\1` is the first group matched, etc.

So far we have only been showing the result of the `re.search` by showing `.group(0)` which is the whole match; but we said earlier that the object returned a list showing each component part that was matched.

Look at the following complex pattern:   `([a-zA-Z]+)([0-9]+)`
        
This has two component parts, each surrounded by `()`s: a string of at least one letter (upper or lower case) followed by a string of at least one digit.   

The `groups()` method accesses all the component matches in the text string, and we can pick out individual matches using the `group(n)` methods.

In [None]:
# Apply the search
matchObject = re.search('([a-zA-z]+)([0-9]+)', '24aslkf23qowu89987')

if matchObject:
    print('First show all the component part matches')
    print(matchObject.groups())
    print('Now pick out the full match, the letter match and the digit pattern match')
    print('.group(0)    ' + matchObject.group(0))
    print('.group(1)    ' + matchObject.group(1))
    print('.group(2)    ' + matchObject.group(2))
else:
    print('No match.')

#### Aside
As before, let's create a function to make these examples less tedious

In [None]:
def test_extraction(pattern, target_string):
    matchObject = re.search(pattern, target_string)
    if matchObject:
        print('First show all the component part matches using .groups()')
        print(matchObject.groups())
        print('Now pick out the individual pattern matches using .group(n)')
        i = 0
        while i <= len(matchObject.groups()):
            print('.group(%d) has value %s' %(i, matchObject.group(i)))
            i = i+1
    else:
        print('No match.')

In [None]:
test_extraction('([a-zA-z]+)([0-9]+)', '24aslkf23qowu89987')

Nested groups of patterns follow the same numbering scheme, and if we want to force a match of our partial strings we can add () to make a numbered component match.

In [None]:
test_extraction('((ab){2})(c)([de])', 'aaababce')

### Exercise

Look at the following pattern: 

`'Price: ([£$])([0-9]+)\.([0-9]{2})'`

Can you see how many components this pattern has, and what each component would represent?

In [None]:
# Discussion and demonstration
# The pattern has three components: the first picks out the 
# currency symbol, the second the pound or dollar amount 
# and the final component picks out the pence or cents amounts.

# Demonstration of pattern.
test_extraction('^Price:([£$])([0-9]+)\.([0-9]{2}$)', 'Price:£229.22') 

### Exercise

Using the above pattern, and the component pattern parts: write Python code (you'll need `re.search()`, not the `test_extraction()` function) that takes a string that matches the above pattern then uses the component matches to produce a string of the form: 

- Price:$xxxx.yy  => 'xxxx dollars and yy cents'.
  
- Price:£xxxx.yy  => 'xxxx pounds and yy pence.'

In [None]:
# Your code here

In [None]:
# Sample solution
matchObject = re.search('^Price:([£$])([0-9]+).([0-9]{2}$)', 'Price:£112.99' )
if matchObject:
    if matchObject.group(1) == '£':
        result = matchObject.group(2) + " pounds and " + matchObject.group(3) + " pence."
    else :
        result = matchObject.group(2) + " dollars and " + matchObject.group(3) + " cents."
    print(result)
else:
    print('No match.')

### Exercise

Write a Python function that will take a string representing a name in the form 'Firstname Surname' and return a tuple with two strings of the form (Surname, FirstName), and ('err', 'err') if the source string is not matched. (To simplify it, no hyphenated surnames - unless you want a challenge!)

Assume both names consist of an upper-case letter followed by zero or more lower-case letters.  For example, `'John Whittington'` becomes the tuple `('Whittington', 'John')` while `'J D'` becomes the tuple `('D', 'J')`.


In [None]:
# Your code here

In [None]:
# Sample solution
def NameShufflef(source_string):
    names_found = re.search('^([A-Z][a-z]*) ([A-Z][a-z]*)$', source_string)
    if names_found:
        return (names_found.group(2), names_found.group(1)) 
    else:
        return ('err', 'err')

In [None]:
# Solution test.
NameShufflef('John Whittington')

In [None]:
# Solution test.
NameShufflef('J D')

In [None]:
# Solution test.
NameShufflef('JD Smythe')

In [None]:
# Solution test.
NameShufflef('991991 Smith')

In [None]:
# Solution test.
NameShufflef('Arther Terence Smith')

## Regular expressions and the *pandas* `replace()` method

The *pandas* library  has a `.replace()` method that can be applied to strings in DataFrames.

In [None]:
import pandas as pd

In [None]:
# Some sample strings.
samples = pd.DataFrame({'test_string' : ['aba', 
                                         'abcababcabca', 
                                         'adfddfda', 
                                         'The Cat sat on the Mat',
                                         'The Dog sat on the Cat',
                                         'The Elephant sat on the Dog']})

In [None]:
# The string to be replaced must entirely match the string in the DataFrame.

# So, the following has no effect on the DataFrame values.
print(samples.replace({'test_string' : 'ab'}, 'XXXXXXXXX'))

print('====================')

# But the following does: aba is the full string in the first element of the dataframe.
print(samples.replace({'test_string' : 'aba'}, 'XXXXXXXXX'))

As well as exact text matching (seen above) `replace()` can also accept regular expressions as the pattern to find in the string, and use the matched group in the replacement strings - to do that looks like
`replace(<pattern to find>, <string to replace>, regex=True)`
            
Notice in the result that the regular expression can match parts of the target string - it doesn't have to match the full string.

In [None]:
samples.replace({'test_string' : 'cab'}, 'TAXI', regex=True)

In [None]:
# Switch Who sat on What based on regular expression patterns.
samples.replace({'test_string' : "(The )([a-zA-Z]*)( sat on the )([a-zA-Z]*)$"}, 
                 r'\1'+r'\4'+ r'\3'+ r'\2', 
                 regex=True)


And if you want to make the replacement in the original DataFrame the `replace()` method has `inplace=True` available.

In [None]:
samples.replace({'test_string' : "(The )([a-zA-Z]*)( sat on the )([a-zA-Z]*)$"}, 
                r'\1'+r'\4'+ r'\3'+ r'\2', 
                regex=True, 
                inplace=True)
samples

## Summary: regular expressions

Pattern matching is a common task when finding things in strings. The regular expression patterns are used in a lot of programming language libraries, and applications such as OpenRefine, where string manipulation is required.

The ability to find strings, and then to manipulate the strings by changing or removing substrings, or extracting matches to use in other code, is important.

The `re` library documentation can be found at https://docs.python.org/2/library/re.html.


# SQL string matching
SQL doesn't have regular expressions, as such, although there may be additional libraries and packages available within a DBMS to support extended pattern matching.

SQL does have the `LIKE` boolean condition which uses a limited form of pattern, with wildcards, to compare against string values.  (You can read it as: `Is this string value LIKE this pattern?`)

Standard SQL has four 'wildcards':

- %... any string of zero or more characters
- \_... any single character
- [xyz]... any single character from the set (x or y or z), or range [a-f]
- [^xyz]... any single character _not_ within the specified set or range.

If you need to match against a wildcard character most SQL implementation allow you to follow the  pattern string with the keyword `ESCAPE` and the quoted character that will, in that string, be used to escape the next character. So, `test_string LIKE '%-%' ESCAPE '-'` will be true for any character string ending in the `%` character.

So using _pandassql_ as our test SQL and using the samples DataFrame created earlier we can see the behaviour of the SQL LIKE in the following. (Unfortunately, SQLite3 - the SQL engine underpinning _pandasql_ - does not support the [] forms.)


In [None]:
# Start by importing the sqldf function from pandasql.
from pandasql import sqldf

# Then create a simple wrapper function to allow us to supply 
# the query 'q' without the surrounding syntax.
pysqldf = lambda q: sqldf(q, globals())

In [None]:
# Find three-character strings with a 'b' in the middle.
query = ''' SELECT * FROM samples WHERE test_string LIKE '_b_'; '''
result = pysqldf(query)
result

In [None]:
# Find any string ending with a 't'.
query = ''' SELECT * FROM samples WHERE test_string LIKE '%t'; '''
result = pysqldf(query)
result

In [None]:
# Find any string with 'Dog' in it.
query = ''' SELECT * FROM samples WHERE test_string LIKE '%Dog%'; '''
result = pysqldf(query)
result

The LIKE condition can, of course, be used anywhere the other Boolean conditions can be used and in more complex Boolean expressions.

In [None]:
# Find any string with 'Cat' appearing before 'Dog'.
query = ''' SELECT * FROM samples WHERE test_string LIKE '%Cat%Dog%'; '''
result = pysqldf(query)
result

## Summary: SQL LIKE
Standard SQL doesn't have a large pool of pattern matching capabilities, which can make it tedious to use for character string processing.  It is always worth checking the  DBMS documentation to see if the basic patterns have been extended, or additional functions are available, to make the pattern capabilities richer.

## What next?

If you are working through this Notebook as part of an inline exercise, return to the module materials now.

If you are working through this set of Notebooks as a whole, move on to: `04.7 Reshaping data with pandas`.