# Advanced Regular Expression Assignments
Assignment 1: Extracting Phone Numbers
Raw Text: Extract all valid Pakistani phone numbers from a given text.

Example:

Text: Please contact me at 0313-2964861 or 0335-0226203 for further details.

In [1]:
import re
text_1= """
Please contact me at 0313-2964861 or 0335-0226203  for further details.
"""
pattern_1 = r"\b(\d{3,4}\-\d{7,8})\b"
phone_number= re.findall(pattern_1, text_1, re.MULTILINE)
phone_number

['0313-2964861', '0335-0226203']

# Assignment 2: Validating Email Addresses
Raw Text: Validate email addresses according to Pakistani domain extensions (.pk).

Example:

Text: Contact us at info@example.com or support@domain.pk for assistance.

In [2]:
text_2= """
Contact us at info@example.com or support@domain.pk for assistance.
"""
pattern_2 = r"\b([\w\-\.]+@[\w]+\.pk)\b"

pakistani_email = re.findall(pattern_2,text_2,re.MULTILINE)
pakistani_email

['support@domain.pk']

# Assignment 3: Extracting CNIC Numbers
Raw Text: Extract all Pakistani CNIC (Computerized National Identity Card) numbers from a given text.

Example:

Text: My CNIC is 42201-1316060-2 and another one is 42201-1316062-3.

In [6]:
text_3 = """
My CNIC is 42201-1316060-2 and another one is 42201-1316062-3.
"""
pattern_3 = r"\b(\d{5}\-\d{7}\-\d)"
pakistani_cnic = re.findall(pattern_3,text_3,re.MULTILINE)
pakistani_cnic

['42201-1316060-2', '42201-1316062-3']

# Assignment 4: Identifying Urdu Words
Raw Text: Identify and extract Urdu words from a mixed English-Urdu text.

Example:

Text: یہ sentence میں کچھ English words بھی ہیں۔

In [4]:
text_4 = """
 یہ sentence میں کچھ English words بھی ہیں۔
"""

pattern_4 = r"\b([^\s\-a-zA-Z]+\b)"

urdu_text = re.findall(pattern_4,text_4,re.MULTILINE)
urdu_text

['یہ', 'میں', 'کچھ', 'بھی', 'ہیں']

# Assignment 5: Finding Dates
Raw Text: Find and extract dates in the format DD-MM-YYYY from a given text.

Example:

Text: The event will take place on 5-09-2023 and 7-09-2023.

In [5]:
text_5 = """
The event will take place on 5-09-2023 and 7-09-2023.
"""
pattern_5 = r"\b(\d{1,2}\-\d{1,2}\-\d{4})\b"

dates = re.findall(pattern_5,text_5,re.MULTILINE)
dates

['5-09-2023', '7-09-2023']

# Assignment 6: Extracting URLs
Raw Text: Extract all URLs from a text that belong to Pakistani domains.

Example:

Text: Visit http://www.example.pk or https://website.com.pk for more information.

In [7]:
text_6 = """
Visit http://www.example.pk or https://website.com.pk for more information.
"""
pattern_6 = r"\b([https://]+[\w]+\.[\w]+.pk)\b"

url_pk = re.findall(pattern_6, text_6, re.MULTILINE)
url_pk

['http://www.example.pk', 'https://website.com.pk']

# Assignment 7: Analyzing Currency
Raw Text: Extract and analyze currency amounts in Pakistani Rupees (PKR) from a given text.

Example:

Text: The product costs PKR 1200, while the deluxe version is priced at Rs. 3500.

In [8]:
text_7 = """
The product costs PKR 1200, while the deluxe version is priced at Rs. 3500.
"""
pattern_7 = r"\b(?:[PKRs]+\.?\s(\d{4}))"

rupees = re.findall(pattern_7,text_7,re.MULTILINE)
rupees

['1200', '3500']

# Assignment 8: Removing Punctuation
Raw Text: Remove all punctuation marks from a text while preserving Urdu characters.

Example:

Text: کیا! آپ, یہاں؟

In [9]:
text_8="""
کیا! آپ, یہاں؟
"""
pattern_8 = r"\b([^\W]+)"

urdu_without_punctuation = re.findall(pattern_8,text_8,re.MULTILINE)
urdu_without_punctuation

['کیا', 'آپ', 'یہاں']

# Assignment 9: Extracting City Names
Raw Text: Extract names of Pakistani cities from a given text.

Example:

Text: Lahore, Karachi, Islamabad, and Peshawar are major cities of Pakistan.

In [10]:
text_9 = """
Lahore, Karachi, Islamabad, and Peshawar are major cities of Pakistan.
"""
pattern_9 = r"\b(?:([\w]+)\,\s([\w]+)\,\s([\w]+)\,\sand\s([\w]+))\b"

city_pakistan = re.findall(pattern_9,text_9,re.MULTILINE)
city_pakistan

[('Lahore', 'Karachi', 'Islamabad', 'Peshawar')]

# Assignment 10: Analyzing Vehicle Numbers
Raw Text: Identify and extract Pakistani vehicle registration numbers (e.g., ABC-123) from a text.

Example:

Text: I saw a car with the number plate LEA-421 near the shop.

In [11]:
text_10 = """
I saw a car with the number plate LEA-421 near the shop.
"""
pattern_10 = r"\b(\D{3}\-\d{3})\b"

registeration_number = re.findall(pattern_10,text_10,re.MULTILINE)
registeration_number

['LEA-421']