# Regular Expression

<details>    
<summary>
    <font size="4" color="darkgreen"><b>1. re.findall()</b></font>
</summary>
<p>
<ul>Finds all the possible matches in the entire sequence and returns them as a list of strings. Each returned string represents one match.</ul>
</p>

In [None]:
import re
s = 'Please contact us at: support@textmining.com, info@textmining.com'
addresses = re.findall(r'[\w\.-]+@[\w\.-]+', s)
print(addresses)

<details>    
<summary>
    <font size="4" color="darkgreen"><b>2. re.compile()</b></font>
</summary>
<p>
<ul>We can compile a regular expression pattern into pattern objects, which can be used for pattern matching. It also helps to search a pattern again without rewriting it.</ul>
</p>

In [None]:
s = 'Please contact us at: support@textmining.com, info@textmining.com'
email = re.compile(r'[\w\.-]+@[\w\.-]+')
addresses = email.findall(s)
print(addresses)

<details>    
<summary>
    <font size="4" color="darkgreen"><b>3. re.search()</b></font>
</summary>
<p>
<ul>The re.search() method takes two arguments: a pattern and a string. The method looks for the first location where the regex pattern produces a match with the string.</ul>
<ul>If the search is successful, re.search() returns a match object; if not, it returns None.</ul>
</p>

In [None]:
txt = 'Text mining is fun'
x = re.search(r'^Text.*fun$', txt)
if x:
    print('YES!, We have a match!')
else:
    print('No match')

<details>    
<summary>
    <font size="4" color="darkgreen"><b>4. re.search() vs re.match()</b></font>
</summary>
<p>
<ul>The match() function checks for a match only at the beginning of the string (by default), whereas the search() function checks for a match anywhere in the string.
</ul>
</p>

In [None]:
s = 'Text mining is fun'
match_obj = re.match(r'fun', s)
if match_obj:
    print('Matched!')
else:
    print('No match')

In [None]:
s = 'Text mining is fun'
search_obj = re.search(r'fun', s)
if search_obj:
    print('Matched!')
else:
    print('No match')

<details>    
<summary>
    <font size="4" color="darkgreen"><b>5. Grouping in Regular Expressions</b></font>
</summary>
<p>
<ul>The re.search function returns a match object on success. We can apply group(num) or groups() function on match object to get matched expression.
</ul>
<ul>The group feature of regular expression allows you to pick up parts of the matched text. Parts of a regular expression bounded by parenthesis () are called groups.</ul>
</p>

In [None]:
s = "Please contact us at: support@textmining.com"
match = re.search(r'([\w\.-]+)@([\w\.-]+)', s)
print(match.group()) #The whole matched text
print(match.group(1)) #The username (group 1)
print(match.group(2)) #The host (group 2)

<details>    
<summary>
    <font size="4" color="darkgreen"><b>6. re.sub()</b></font>
</summary>
<p>
<ul>The method returns a string where matched occurrences are replaced with the content of replace variable.
</ul>
</p>

In [None]:
phone = "2004-959-559 # This is Phone Number"
num = re.sub(r'#.*$', '', phone) #Delete Python-style comments
print(num)
num = re.sub(r'\D', '', phone) #Remove anything other than digits
print(num)

## Exercise

'china_bond.csv' is a csv format file in which the first column represents id, the second column represents title. Please open it for more details.

Write a python program to read from this file and extract firm name from each line of title, store the extracted values in a csv format file named as 'output.csv', and this output file should be same as the expected.csv.

#### Options and Hints
- If you would like more of a real-life practice, don't open the 'Hints' below. Try to think this through and implement this yourself.
- If you want a little help, click on the green "General Hints" section for some hints to get started.
- If you would prefer more guidance, click on the green 'Detailed Hints' section for step by step instructions.

<details>    
<summary>
    <font size="3" color="darkgreen"><b>General Hints</b></font>
</summary>
<p>
    
General Hints to get started
<ul>
    <li>Use with statement for reading from and writing to a csv file</li>
    <li>Use re.search(pattern, string)</li>
</ul>
</p>

<details>    
<summary>
    <font size="3" color="darkgreen"><b>Detailed Hints</b></font>
</summary>
<p>     
Detailed hints if you're stuck
<ul>
    <li>Use with statement for reading from and writing to a csv file</li>
    <li>Use next() to skip header</li>
    <li>Use for loop to read file line by line</li>
    <li>Look for patterns left to the firm name, use '|' and () in re</li>
    <li>Use Non-Greedy quantifiers</li>
    <li>Use re.search(pattern, string)</li>
    <li>Apply group(num) or groups() function on match object</li>
    <li>Use writerow() to write the extracted firm name into csv file</li>
</ul>
</p>

In [None]:
### START CODE HERE ###

### END CODE HERE ###