## Email / Phone extractor (Regex)
Manual:
1.) Get the text from siste off to the clipboard. <br>
2.) Find all phone numbers and email addresses in the text.<br>
3.) Paste them onto the clipboard.<br>
Code:<br>
1.) Use the pyperclip module to copy and paste strings.<br>
2.) Create two regexes, one for matching phone numbers and the other for
matching email addresses.<br>
<li> 2.a) Phone number is comprised out of area code, separator, first 3 digits, separator, last 4 digits, extension <br>  
3.) Find all matches, not just the first match, of both regexes.<br>
4.) Neatly format the matched strings into a single string to paste.<br>
5.) Display some kind of message if no matches were found in the text.<br>

In [3]:
import pyperclip, re #import relevant libraries

### Define the two Regexes, email and phoneNum extractor

In [4]:
phoneRegex = re.compile(r'''(
    (\d{3}|\(\d{3}\))? # area code could match 415 or (415), it is optional
    (\-s|-|\.)?        # followed by Whitespace or hyphen or dot or nothing -?-, that is for example 444444
    (\d{3})            # followed by 3 digits
    (\-s|-|\.)         # followed by a separator
    (\d{4})            # followed by 4 digits
    (\s*(ex|ext|ext.)\s*\d{2,5}) # followed by ext
                       )''', re.VERBOSE)

In [8]:
phoneRegexSimple = re.compile(r'((\d{3}|\(\d{3}\))?(\s|-|\.)?(\d{3})(\s|-|\.)(\d{4})(\s*(ext|x|ext.)\s*(\d{2,5}))?)') #same as above just without the extension and the verbose part

In [9]:
emailRegex = re.compile(r'''(
[a-z,A-Z,0-9.%_+-]+      # username alphanumeric -> character class followed by one or more chars
@                        # @
[a-z,A-Z,0-9_-]+         # domain alphanumeric
\.\w{2,6}          # dot and some alpha ending
)''', re.VERBOSE)

### Find all emails in clipboard text. Copy contents of this page into the clipboard: https://nostarch.com/contactus

In [39]:
emailList = []

In [42]:
text = str(pyperclip.paste()) # text object pastes clipboard text (string)

In [43]:
text

'Skip to main content\nHome\nSearch form\nSearch\nCatalog\nBlog\nMedia\nWrite for Us\nAbout Us\nContact Us\nTopics\nArt & Design\nGeneral Computing\nHacking & Computer Security\nHardware / DIY\nKids\nLEGO®\nLinux & BSD\nManga\nProgramming\nPython\nScience & Math\nScratch\nSystem Administration\nEarly Access\nFree ebook edition with every print book purchased from nostarch.com!\nShopping cart\n0 Items\tTotal: $0.00\nUser login\nLog in\nCreate account\nContact Us\n\nNo Starch Press, Inc.\n245 8th Street\nSan Francisco, CA 94103 USA\nPhone: 800.420.7240 or +1 415.863.9900 (9 a.m. to 5 p.m., M-F, PST)\nFax: +1 415.863.9950\n\nReach Us by Email\n\nGeneral inquiries: info@nostarch.com\nMedia requests: media@nostarch.com\nAcademic requests: academic@nostarch.com (Please see this page for academic review requests)\nHelp with your order: info@nostarch.com\nReach Us on Social Media\nTwitter\nFacebook\nInstagram\nPinterest\n\nNavigation\nMy account\nWant sweet deals? \nSign up for our newsletter.

In [44]:
emailRegex.search(text) # check to see we indeed find at least one email address

<re.Match object; span=(627, 644), match='info@nostarch.com'>

In [47]:
for emails in emailRegex.findall(text): # iterate over emails
    emailList.append(emails) # append emails to the list (recall list is mutable)

In [48]:
emailList

['info@nostarch.com',
 'media@nostarch.com',
 'academic@nostarch.com',
 'info@nostarch.com']

In [50]:
print('\n'.join(emailList)) # use join to neatly add all email addresses

info@nostarch.com
media@nostarch.com
academic@nostarch.com
info@nostarch.com


### Appending phone numbers to the list:

In [18]:
phoneRegexSimple.search(text) # check to see we have a number

<re.Match object; span=(506, 518), match='800.420.7240'>

In [34]:
phoneRegexSimple.findall(text) # we get a list of tuples that might not be in the same format

[('800.420.7240', '800', '.', '420', '.', '7240', '', '', ''),
 ('415.863.9900', '415', '.', '863', '.', '9900', '', '', ''),
 ('415.863.9950', '415', '.', '863', '.', '9950', '', '', '')]

In [35]:
phoneList = [] # create empty list

In [36]:
for phones in phoneRegexSimple.findall(text):
    phoneNum = '-'.join([phones[1], phones[3], phones[5]]) #keeping numbers separated by hyphen
    phoneList.append(phoneNum)


In [37]:
phoneList

['800-420-7240', '415-863-9900', '415-863-9950']

In [38]:
print('\n'.join(phoneList)) # join the numbers neatly with line break

800-420-7240
415-863-9900
415-863-9950


### Now we will add both emails and phone numbers into one list - allMatches

In [60]:
allMatches = []

In [61]:
for phones in phoneRegexSimple.findall(text):
    phoneMatch = '-'.join([phones[1], phones[3], phones[5]])
    allMatches.append(phoneMatch)
for emails in emailRegex.findall(text):
    allMatches.append(emails)

if len(allMatches) > 0:
    pyperclip.copy('\n'.join(allMatches))
    print('copied to clipboard')
    print('\n'.join(allMatches))
else:
    print('no match found')
    

copied to clipboard
800-420-7240
415-863-9900
415-863-9950
info@nostarch.com
media@nostarch.com
academic@nostarch.com
info@nostarch.com
