### Assignment 1: Extracting Phone Numbers
Raw Text: Extract all valid Pakistani phone numbers from a given text.

Example:

Text: Please contact me at 0301-1234567 or 042-35678901 for further details.

In [55]:
import re

txt = "Please contact me at 0301-1234567 or 042-35678901 for further details."

phone_number_pattern = r'\d{3,4}-\d{7}' #to extract phone nos.

output = re.findall(phone_number_pattern, txt)
print(output)

['0301-1234567', '042-3567890']


### Assignment 2: Validating Email Addresses
Raw Text: Validate email addresses according to Pakistani domain extensions (.pk).

Example:

Text: Contact us at info@example.com or support@domain.pk for assistance.

In [56]:
import re

txt = "Contact us at info@example.com or support@domain.pk for assistance."

email_pattern = r'\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,3}\b'

output = re.findall(email_pattern, txt)
print(output)

['info@example.com', 'support@domain.pk']


### Assignment 3: Extracting CNIC Numbers
Raw Text: Extract all Pakistani CNIC (Computerized National Identity Card) numbers from a given text.

Example:

Text: My CNIC is 12345-6789012-3 and another one is 34567-8901234-5.

In [57]:
import re

txt = "My CNIC is 12345-6789012-3 and another one is 34567-8901234-5."

cnic_pattern = pattern = r'\d{5}-\d{7}-\d'

output = re.findall(cnic_pattern, txt)
print(output)

['12345-6789012-3', '34567-8901234-5']


### Assignment 4: Identifying Urdu Words
Raw Text: Identify and extract Urdu words from a mixed English-Urdu text.

Example:

Text: یہ sentence میں کچھ English words بھی ہیں۔

In [58]:
import re

txt = " یہ sentence میں کچھ English words بھی ہیں۔"

urdu_pattern = r'[\u0600-\u06FF]+'

output = re.findall(urdu_pattern, txt)
print(output)

['یہ', 'میں', 'کچھ', 'بھی', 'ہیں۔']


### Assignment 5: Finding Dates
Raw Text: Find and extract dates in the format DD-MM-YYYY from a given text.

Example:

Text: The event will take place on 15-08-2023 and 23-09-2023.

In [59]:
import re

txt = "The event will take place on 15-08-2023 and 23-09-2023."

date_pattern = r'\b\d{2}-\d{2}-\d{4}\b'

output = re.findall(date_pattern, txt)
print(output)

['15-08-2023', '23-09-2023']


### Assignment 6: Extracting URLs
Raw Text: Extract all URLs from a text that belong to Pakistani domains.

Example:

Text: Visit http://www.example.pk or https://website.com.pk for more information.


In [60]:
import re

txt = "Visit http://www.example.pk or https://website.com.pk for more information."

url_pattern = r'https?://\S+|www\.\S+'

output = re.findall(url_pattern, txt)
print(output)

['http://www.example.pk', 'https://website.com.pk']


### Assignment 7: Analyzing Currency
Raw Text: Extract and analyze currency amounts in Pakistani Rupees (PKR) from a given text.

Example:

Text: The product costs PKR 1500, while the deluxe version is priced at Rs. 2500.

In [61]:
import re

txt = "The product costs PKR 1500, while the deluxe version is priced at Rs. 2500."

currency_pattern = r'(PKR|Rs\.)\s+(\d+)'

output = re.findall(currency_pattern, txt)
print(output)

[('PKR', '1500'), ('Rs.', '2500')]


### Assignment 8: Removing Punctuation
Raw Text: Remove all punctuation marks from a text while preserving Urdu characters.

Example:

Text: کیا! آپ, یہاں؟

In [62]:
import re

txt = "کیا! آپ, یہاں؟"

punctuation_pattern = r'[^\w\s]'

output = re.sub(punctuation_pattern, '', txt)
print(output)

کیا آپ یہاں


### Assignment 9: Extracting City Names
Raw Text: Extract names of Pakistani cities from a given text.

Example:

Text: Lahore, Karachi, Islamabad, and Peshawar are major cities of Pakistan.

In [63]:
import re

# Input text containing city names
txt = "Lahore, Karachi, Islamabad, and Peshawar are major cities of Pakistan."

city_pattern = r'(\b.+,).+(\bP.+ar\b)'

output = re.findall(city_pattern, txt)
print(output)

[('Lahore, Karachi, Islamabad,', 'Peshawar')]


### Assignment 10: Analyzing Vehicle Numbers
Raw Text: Identify and extract Pakistani vehicle registration numbers (e.g., ABC-123) from a text.

Example:

Text: I saw a car with the number plate LEA-567 near the market.

In [64]:
import re

txt = "I saw a car with the number plate LEA-567 near the market. Another car with black color was ABC-765"

registration_number_pattern = r'[A-Z]{3}-\d{3}'

output = re.findall(registration_number_pattern, txt)
print(output)

['LEA-567', 'ABC-765']
