# Regular expressions

https://regex101.com/


## re.match()
- **Description:** Checks for a match **only at the beginning** of the string.
- **Syntax:** re.match(pattern, string, flags=0)
- **Returns:** A match object if the pattern is found; otherwise, None.



In [37]:
import re

matched = re.match(r"\d", "112 abc \n123")
if matched:
   print(matched)

<re.Match object; span=(0, 1), match='1'>


## re.search()
- **Description:** Searches for a match **anywhere** in the string.
- **Syntax:** re.search(pattern, string, flags=0)
- **Returns:** A match object if the pattern is found; otherwise, None.

In [38]:

matched = re.search(r"\d", "112 abc \n123 xyz")
if matched:
   print(matched)

<re.Match object; span=(0, 1), match='1'>


# re.findall()
- **Description:** Returns a **list of all matches** of the pattern in the string.
- **Syntax:** re.findall(pattern, string, flags=0)
- **Returns:** A list of matched strings.

In [43]:
matches = re.findall(r"(\d)", "112 abc \n123 xyz")
for match in matches:
   print(match)

1
1
2
1
2
3


# re.finditer()
- **Description:** Returns an **iterator yielding match objects for all matches** of the pattern in the string.
- **Syntax:** re.finditer(pattern, string, flags=0)
- **Returns:** An iterator of match objects.

In [41]:
matches = re.finditer(r"\d", "112 abc \n123 xyz")
for match in matches:
   print(match)

<re.Match object; span=(0, 1), match='1'>
<re.Match object; span=(1, 2), match='1'>
<re.Match object; span=(2, 3), match='2'>
<re.Match object; span=(9, 10), match='1'>
<re.Match object; span=(10, 11), match='2'>
<re.Match object; span=(11, 12), match='3'>


# re.split()
- **Description:** **Splits** the string at each occurrence of the pattern.
- **Syntax:** re.split(pattern, string, maxsplit=0, flags=0)
- **Returns**: A list of substrings.

In [48]:
matches = re.split(r"\n|\t", "112 abc \n123 \txyz")
for match in matches:
   print(match)

112 abc 
123 
xyz


# re.sub()
- **Description:** **Replaces occurrences** of the pattern with a specified string.
- **Syntax:** re.sub(pattern, repl, string, count=0, flags=0)
- **Returns:** A string with the substitutions applied.

In [51]:
matches = re.sub(r"\n|\t", "**", "112 abc \n123 \txyz")
print(matches)
matches = re.subn(r"\n|\t", "**", "112 abc \n123 \txyz")
print(matches)

112 abc **123 **xyz
('112 abc **123 **xyz', 2)


# re.fullmatch()
- **Description:** Checks if the entire string matches the pattern.
- **Syntax:** re.fullmatch(pattern, string, flags=0)
- **Returns:** A match object if the pattern matches the whole string; otherwise, None.

In [63]:
matches = re.fullmatch(r"\d+.*", "112 \t")
print(matches)

<re.Match object; span=(0, 5), match='112 \t'>


# re.compile()
- **Description:** Compiles a regular expression pattern into a regex object for repeated use.
- **Syntax:** re.compile(pattern, flags=0)
- **Returns:** A regex object that can be reused with methods like .match() and .search().

In [73]:
expression = re.compile(r"\d+")
matches = expression.findall("112 abc \n123 \txyz")
print(matches)

['112', '123']


In [90]:
content = '''


^
kamal kumar mukiri (you)
    kamal-bec2004-@gmail.co.in
* *
^kamal 
Hariprasad. Tavarekere
tavarekere.hariprasad@gmail.com

mymail id kadiyalapavani04@gmailacom
       kadiyalapavani04@gmail.com

Kapil Tarani
kapil.tarani@gmail.com

Lakshmi Mocherla
lakshmi.mocherla@gmail.com

monikabm200@gmail.com
monikabm200@gmail.com

MOUNISHA KOMMURI
mounishakommuri@gmail.com

pavani kadiyala
pavanikadiyalachowdary@gmail.com

rajesh rama
rajeshramaraok@gmail.com

Ramana Pokala
ramana.pokala@gmail.com

Raviteja K
raviteja.kollapur@gmail.com

siddavatam salmafirdose
ssalmafirdose@gmail.com

mustafa123@outlook.com

'''
new_content = re.sub(r"([\.\-\w]+)@(\w+\.[comn\.i]*)", "***", content, count=2)
re.sub
print(new_content)




^
kamal kumar mukiri (you)
    ***
* *
^kamal 
Hariprasad. Tavarekere
***

mymail id kadiyalapavani04@gmailacom
       kadiyalapavani04@gmail.com

Kapil Tarani
kapil.tarani@gmail.com

Lakshmi Mocherla
lakshmi.mocherla@gmail.com

monikabm200@gmail.com
monikabm200@gmail.com

MOUNISHA KOMMURI
mounishakommuri@gmail.com

pavani kadiyala
pavanikadiyalachowdary@gmail.com

rajesh rama
rajeshramaraok@gmail.com

Ramana Pokala
ramana.pokala@gmail.com

Raviteja K
raviteja.kollapur@gmail.com

siddavatam salmafirdose
ssalmafirdose@gmail.com

mustafa123@outlook.com




In [82]:
import re

expression = r"([\.\-\w]+)@(\w+\.[comn\.i]*)" # \n\r\t\v
print(expression)

mailds = re.findall(expression, text)
for mailid in mailds:
    print(mailid)


([\.\-\w]+)@(\w+\.[comn\.i]*)
('kamal-bec2004-', 'gmail.co.in')
('tavarekere.hariprasad', 'gmail.com')
('kadiyalapavani04', 'gmail.com')
('kapil.tarani', 'gmail.com')
('lakshmi.mocherla', 'gmail.com')
('monikabm200', 'gmail.com')
('monikabm200', 'gmail.com')
('mounishakommuri', 'gmail.com')
('pavanikadiyalachowdary', 'gmail.com')
('rajeshramaraok', 'gmail.com')
('ramana.pokala', 'gmail.com')
('raviteja.kollapur', 'gmail.com')
('ssalmafirdose', 'gmail.com')
('mustafa123', 'outlook.com')


In [86]:
text = "Kamal, Phone: +91-0000011111 I am working in Harman international, "
#new_test = text.replace("+91-1234567890", "+91-1234567899")
new_text = re.sub(r"\s\+\d{2}\-\d{10}\s", " +91-1234567899 ", text)
print(new_text)

Kamal, Phone: +91-1234567899 I am working in Harman international, 


In [111]:
text = '''Wifi
Wifi: connection success
  fdsafds Wifi1234
Wifi: connection failed
fafdsafs
'''
matches = re.findall(r"^Wifi", text, re.VERBOSE)
print(matches)

['Wifi']


In [22]:
import re
text = '''Wifi 12 line
Wifi: connection line success
  fdsafds line +91-1234567890
Wifi: connection failed
fafdsafs
'''

result = re.search(r"\s(\+\d{2})\-(\d{10})\s", text)
print(result.groups())

('+91', '1234567890')


In [25]:
match = re.compile(r"\s(\+\d{2})\-(\d{10})\s")
print(type(match))

<class 're.Pattern'>


In [26]:
re.split(r":|line", text)

['Wifi 12\nWifi',
 ' connection success\n  fdsafds Wifi1234 +91-1234567890\nWifi',
 ' connection failed\nfafdsafs\n']

In [33]:
bill = '''
rice 1kg 100 per kg: 2kgs
oil 1kg 200 per kg: 2kgs
onion 1kg 20 per kg: 2kgs
wheat 1kg 120 per kg: 2kgs
'''

import re
bill = '''
rice 1kg 100.5 per kg: 2kgs
oil 1kg 200 per kg: 2kgs
onion 1kg 20 per kg: 2.5kgs
wheat 1kg 120 per kg: 2kgs
'''
reg= re.compile(r"(\w+)\s+(\d)kg\s([\d\.]+)\s\w+\skg:\s([\.\d])kgs")
items=re.findall(reg, bill)
bill=0
print(items)
for i in items:
    print(i)
    i=list(i)
    bill+=float(i[-2])*int(i[-1])
print(bill)

[('rice', '1', '100.5', '2'), ('oil', '1', '200', '2'), ('wheat', '1', '120', '2')]
('rice', '1', '100.5', '2')
('oil', '1', '200', '2')
('wheat', '1', '120', '2')
841.0
