<a href="https://colab.research.google.com/github/junaidurrehmankhan/PGD_DS_Regex_assignment/blob/main/Assignment4.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

---

## Advanced Regular Expression Assignments

### Assignment 1: Extracting Phone Numbers

**Raw Text:**
Extract all valid Pakistani phone numbers from a given text.

**Example:**
```
Text: Please contact me at 0301-1234567 or 042-35678901 for further details.
```



In [None]:
import re

text = "Please contact me at 0301-1234567 or 042-35678901 for further details."

# pattern to match Pakistani phone numbers 042 and 03 including a combination of 2 digits
pattern = r'\b(03\d{2}|042)\b'
phone_numbers = re.findall(pattern, text)
for number in phone_numbers:
    print(number)


0301
042


In [None]:
import re

text = "Please contact me at 0301-1234567 or 042-35678901 for further details."

# pattern to match exact code 0301 and 042 Pakistani phone numbers
pattern = r'\b(0301|042)\b'
phone_numbers = re.findall(pattern, text)
for number in phone_numbers:
    print(number)

0301
042


### Assignment 2: Validating Email Addresses

**Raw Text:**
Validate email addresses according to Pakistani domain extensions (.pk).

**Example:**
```
Text: Contact us at info@example.com or support@domain.pk for assistance.
```



In [None]:
import re

text = "Contact us at info@example.com or support@domain.pk for assistance."
pattern = r'\b[A-Za-z0-9._-]+@[A-Za-z0-9._-]+\.(?:com.pk|net.pk|org.pk|pk)\b'
email_addresses = re.findall(pattern, text)
for email in email_addresses:
    print(email)

support@domain.pk


### Assignment 3: Extracting CNIC Numbers

**Raw Text:**
Extract all Pakistani CNIC (Computerized National Identity Card) numbers from a given text.

**Example:**
```
Text: My CNIC is 12345-6789012-3 and another one is 34567-8901234-5.
```


In [None]:
import re

text = "My CNIC is 12345-6789012-3 and another one is 34567-8901234-5."

pattern = r'\b\d{5}-\d{7}-\d{1}\b'
cnic_numbers = re.findall(pattern, text)
for cnic in cnic_numbers:
    print(cnic)

12345-6789012-3
34567-8901234-5



### Assignment 4: Identifying Urdu Words

**Raw Text:**
Identify and extract Urdu words from a mixed English-Urdu text.

**Example:**
```
Text: یہ sentence میں کچھ English words بھی ہیں۔
```



In [None]:
import re

text = "یہ sentence میں کچھ English words بھی ہیں۔"

pattern = r'[\u0600-\u06FF]+'
urdu_words = re.findall(pattern, text)
for word in urdu_words:
    print(word)

یہ
میں
کچھ
بھی
ہیں۔


### Assignment 5: Finding Dates

**Raw Text:**
Find and extract dates in the format DD-MM-YYYY from a given text.

**Example:**
```
Text: The event will take place on 15-08-2023 and 23-09-2023.
```



In [None]:
import re

Text = "The event will take place on 15-08-2023 and 23-09-2023."
pattern = r'\b\d{2}-\d{2}-\d{4}\b'
dates = re.findall(pattern, text)
for date in dates:
    print(date)

15-08-2023
23-09-2023


### Assignment 6: Extracting URLs

**Raw Text:**
Extract all URLs from a text that belong to Pakistani domains.

**Example:**
```
Text: Visit http://www.example.pk or https://website.com.pk for more information.
```



In [None]:
import re

text = "Visit http://www.example.pk or https://website.com.pk for more information."
pattern = r'https?://(?:www\.)?([A-Za-z0-9.-]+\.pk)(?:/[A-Za-z0-9.-/]*)?'
urls = re.findall(pattern, text)
for url in urls:
    print(url)

example.pk
website.com.pk


### Assignment 7: Analyzing Currency

**Raw Text:**
Extract and analyze currency amounts in Pakistani Rupees (PKR) from a given text.

**Example:**
```
Text: The product costs PKR 1500, while the deluxe version is priced at Rs. 2500.
```



In [None]:
import re

text = "The product costs PKR 1500, while the deluxe version is priced at Rs. 2500."
pattern = r'(PKR|Rs\.)\s+(\d+(?:,\d{3})*(?:\.\d{2})?)'
matches = re.findall(pattern, text)
for match in matches:
    currency_symbol, amount = match
    print(f"{currency_symbol} {amount}")


PKR 1500
Rs. 2500


### Assignment 8: Removing Punctuation

**Raw Text:**
Remove all punctuation marks from a text while preserving Urdu characters.

**Example:**
```
Text: کیا! آپ, یہاں؟
```



In [None]:
import re

text = "کیا! آپ, یہاں؟"
cleaned_text = re.sub(r'[^\w\sآ-ی]', '', text)
print(cleaned_text)


کیا آپ یہاں


### Assignment 9: Extracting City Names

**Raw Text:**
Extract names of Pakistani cities from a given text.

**Example:**
```
Text: Lahore, Karachi, Islamabad, and Peshawar are major cities of Pakistan.
```


In [69]:
import re

# Sample text
text = "Lahore, Karachi, Islamabad, and Peshawar are major cities of Pakistan."

# Define a regular expression pattern to match city names
pattern = r'\b(?:Lahore|Karachi|Islamabad|Peshawar)\b'

# Find all matching city names in the text
city_names = re.findall(pattern, text)

# Print the extracted city names
for city in city_names:
    print(city)


Lahore
Karachi
Islamabad
Peshawar



### Assignment 10: Analyzing Vehicle Numbers

**Raw Text:**
Identify and extract Pakistani vehicle registration numbers (e.g., ABC-123) from a text.

**Example:**
```
Text: I saw a car with the number plate LEA-567 near the market.
```



In [None]:
import re

text = "I saw a car with the number plate LEA-567 near the market."
pattern = r'\b[A-Z]{3}-\d{3}\b'
registration_numbers = re.findall(pattern, text)
for reg_number in registration_numbers:
    print(reg_number)


LEA-567
