# Regex and `.finditer()` demo

The `finditer()` method allows you to loop through the match string and do something for each return. Write a block of code that searches the contents of the `mbox-short.txt` file to identify IP addresses, then writes each to file called `ip-addresses.txt`. Add the output file to your repo with the notebook. 

This notebook contains a possible solution for Q5 in [Lab 4]().

What may be helpful here is to list out the steps that you might need to accomplish here, for example:

* Activate regular expressions in Python
* Figure out a regex pattern that will match an IP address
* Set up that pattern for use in Python
* Access the desired information you want to manage (in this case, the `mbox-short` file)
* Open the file and search for matches in the text
* Identify the matches 
* Save those matches to a new file

In [1]:
import re

In [2]:
ip_regex_finder = re.compile(r'\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}')

In [3]:
!rm ip-addresses.txt

The following code uses `.finditer()`, prints the matched strings, and writes the matched strings
to a file in the embedded `with . . . open` loop.

In [6]:
for ip_addr in re.finditer(ip_regex_finder, open('data/emails/mbox-short.txt').read()):
    print(ip_addr)
    with open('ip-addresses.txt', 'a') as f:
        line = ip_addr.group() + '\n'
        f.write(line)

<re.Match object; span=(146, 159), match='141.211.14.90'>
<re.Match object; span=(462, 475), match='141.211.14.79'>
<re.Match object; span=(632, 646), match='194.35.219.184'>
<re.Match object; span=(775, 784), match='127.0.0.1'>
<re.Match object; span=(1038, 1052), match='194.35.219.182'>
<re.Match object; span=(1288, 1302), match='134.68.220.122'>
<re.Match object; span=(1487, 1496), match='127.0.0.1'>
<re.Match object; span=(1528, 1539), match='8.12.11.200'>
<re.Match object; span=(1716, 1727), match='8.12.11.200'>
<re.Match object; span=(3343, 3356), match='141.211.14.97'>
<re.Match object; span=(3665, 3679), match='141.211.93.149'>
<re.Match object; span=(3836, 3850), match='194.35.219.184'>
<re.Match object; span=(3981, 3990), match='127.0.0.1'>
<re.Match object; span=(4244, 4258), match='194.35.219.182'>
<re.Match object; span=(4494, 4508), match='134.68.220.122'>
<re.Match object; span=(4693, 4702), match='127.0.0.1'>
<re.Match object; span=(4734, 4745), match='8.12.11.200'>
<re

## Comparison to `.findall()`

Note that findall is useful for creating a list, if you want to use it later, but less 
useful for using within a loop or function. There are also some differences to note when using
groups. 

Below, the process of using groups is illustrated, then the option to individually
reference any element of the resulting tuple. 

In [14]:
ip_regex_finder_groups = re.compile(r'(\d{1,3})\.(\d{1,3})\.(\d{1,3})\.(\d{1,3})')

ip_addr_list = re.findall(ip_regex_finder_groups, open('data/emails/mbox-short.txt').read())

for address in ip_addr_list:
    print('Position 1:',address[0],'Position 2:',address[1])

Position 1: 141 Position 2: 211
Position 1: 141 Position 2: 211
Position 1: 194 Position 2: 35
Position 1: 127 Position 2: 0
Position 1: 194 Position 2: 35
Position 1: 134 Position 2: 68
Position 1: 127 Position 2: 0
Position 1: 8 Position 2: 12
Position 1: 8 Position 2: 12
Position 1: 141 Position 2: 211
Position 1: 141 Position 2: 211
Position 1: 194 Position 2: 35
Position 1: 127 Position 2: 0
Position 1: 194 Position 2: 35
Position 1: 134 Position 2: 68
Position 1: 127 Position 2: 0
Position 1: 8 Position 2: 12
Position 1: 8 Position 2: 12
Position 1: 141 Position 2: 211
Position 1: 141 Position 2: 211
Position 1: 194 Position 2: 35
Position 1: 127 Position 2: 0
Position 1: 194 Position 2: 35
Position 1: 134 Position 2: 68
Position 1: 127 Position 2: 0
Position 1: 8 Position 2: 12
Position 1: 8 Position 2: 12
Position 1: 141 Position 2: 211
Position 1: 141 Position 2: 211
Position 1: 194 Position 2: 35
Position 1: 127 Position 2: 0
Position 1: 194 Position 2: 35
Position 1: 134 Pos