# Regular Expression
- [source](https://github.com/codebasics/nlp-tutorials/blob/main/1_regex/regex_for_information_extraction.ipynb)
- [regex practice](https://regex101.com/)

In [2]:
import re

## (1) Regex in customer support


### Retrieve phone number
There are 2 phone patterns
- xxxxxxxxxx : "\d{10}"
- (xxx)-xxx-xxxx : " \ (\d{3}\ )-\d{3}-\d{4}"

In [11]:
chat1 = 'codebasics: you ask lot of questions 😠  1235678912 or 9998881234, abcA@xyz.com'
chat2 = 'codebasics: here it is: (123)-567-8912, abc_82@xyz.in'
chat3 = 'codebasics: yes, phone: 1235678912 email: abc@xyz.com'

In [9]:
phone_pattern = "\d{10}|\(\d{3}\)-\d{3}-\d{4}"
print(re.findall(phone_pattern, chat1))
print(re.findall(phone_pattern, chat2))
print(re.findall(phone_pattern, chat3))

['1235678912', '9998881234']
['(123)-567-8912']
['1235678912']


### Retrieve email
- xxx@xxx.com : "[a-z0-9A-Z_]*@[a-z0-9A-Z_]*\.(?:com|in)"

In [19]:
mail_pattern = "[a-z0-9A-Z_]*@[a-z0-9A-Z_]*\.(?:com|in)"
print(re.findall(mail_pattern, chat1))
print(re.findall(mail_pattern, chat2))
print(re.findall(mail_pattern, chat3))

['abcA@xyz.com']
['abc_82@xyz.in']
['abc@xyz.com']


### retrieving order number
- order xxx order_number : "order[]"

In [20]:
chat1='codebasics: Hello, I am having an issue with my order # 412889912'
chat2='codebasics: I have a problem with my order number 412889912'
chat3='codebasics: My order 412889912 is having an issue, I was charged 300$ when online it says 280$'

In [21]:
order_pattern = "order[^\d]*(\d*)"
print(re.findall(order_pattern, chat1))
print(re.findall(order_pattern, chat2))
print(re.findall(order_pattern, chat3))

['412889912']
['412889912']
['412889912']


## (2) Regex for Information Extraction
- **Note** : try web scrapping

In [1]:
info_1='''
Born	Elon Reeve Musk
June 28, 1971 (age 50)
Pretoria, Transvaal, South Africa
Citizenship
South Africa (1971–present)
Canada (1971–present)
United States (2002–present)
Education	University of Pennsylvania (BS, BA)
Title
Founder, CEO and Chief Engineer of SpaceX
CEO and product architect of Tesla, Inc.
Founder of The Boring Company and X.com (now part of PayPal)
Co-founder of Neuralink, OpenAI, and Zip2
Spouse(s)
Justine Wilson
​
​(m. 2000; div. 2008)​
Talulah Riley
​
​(m. 2010; div. 2012)​
​
​(m. 2013; div. 2016)
'''

In [4]:
def celeb_info(text):
  age_pattern = 'age (\d+)'
  born_pattern = 'Born(.*)\n'
  birth_date_pattern = 'Born.*\n(.*) \('
  birth_place_pattern = 'age.*\n(.*)'

  return {
  'age' : int(re.findall(age_pattern, text)[0]),
  'name' : re.findall(born_pattern, text)[0].strip(),
  'birth_date' : re.findall(birth_date_pattern, text)[0],
  'birth_place' : re.findall(birth_place_pattern, text)[0]
  }

celeb_info(info_1)

{'age': 50,
 'name': 'Elon Reeve Musk',
 'birth_date': 'June 28, 1971',
 'birth_place': 'Pretoria, Transvaal, South Africa'}

In [5]:
info_2 = '''
Born	Mukesh Dhirubhai Ambani
19 April 1957 (age 64)
Aden, Colony of Aden
(present-day Yemen)[1][2]
Nationality	Indian
Alma mater
St. Xavier's College, Mumbai
Institute of Chemical Technology (B.E.)
Stanford University (drop-out)
Occupation	Chairman and MD, Reliance Industries
Spouse(s)	Nita Ambani ​(m. 1985)​[3]
Children	3
Parent(s)
Dhirubhai Ambani (father)
Kokilaben Ambani (mother)
Relatives	Anil Ambani (brother)
Tina Ambani (sister-in-law)
'''

In [6]:
celeb_info(info_2)

{'age': 64,
 'name': 'Mukesh Dhirubhai Ambani',
 'birth_date': '19 April 1957',
 'birth_place': 'Aden, Colony of Aden'}