---

## Advanced Regular Expression Assignments

### Assignment 1: Extracting Phone Numbers

**Raw Text:**
Extract all valid Pakistani phone numbers from a given text.

**Example:**
```
Text: Please contact me at 0301-1234567 or 042-35678901 for further details.
```



In [2]:
import re
Text = "Please contact me at 0301-1234567 or 042-35678901 for further details."
pattren = r'0\d{2,3}-\d{7,8}'
findPhoneNum = re.findall(pattren,Text)
findPhoneNum

['0301-1234567', '042-35678901']

### Assignment 2: Validating Email Addresses

**Raw Text:**
Validate email addresses according to Pakistani domain extensions (.pk).

**Example:**
```
Text: Contact us at info@example.com or support@domain.pk for assistance.
```



In [7]:
Text  = "Contact us at info@example.com or support@domain.pk for assistance."
pattren = '[\w.]+\@[\w.]+'
findEmail = re.findall(pattren,Text)
findEmail


['info@example.com', 'support@domain.pk']

### Assignment 3: Extracting CNIC Numbers

**Raw Text:**
Extract all Pakistani CNIC (Computerized National Identity Card) numbers from a given text.

**Example:**
```
Text: My CNIC is 12345-6789012-3 and another one is 34567-8901234-5.
```


In [9]:
Text = "My CNIC is 12345-6789012-3 and another one is 34567-8901234-5."
pattren = r'\d{5}-\d{7}-\d{1}'
findCNIC = re.findall(pattren,Text)
findCNIC

['12345-6789012-3', '34567-8901234-5']


### Assignment 4: Identifying Urdu Words

**Raw Text:**
Identify and extract Urdu words from a mixed English-Urdu text.

**Example:**
```
Text: یہ sentence میں کچھ English words بھی ہیں۔
```



In [11]:
Text = "یہ sentence میں کچھ English words بھی ہیں۔"
pattren =re.findall(r'[\u0600-\u06ff]+',Text)
pattren



['یہ', 'میں', 'کچھ', 'بھی', 'ہیں۔']

### Assignment 5: Finding Dates

**Raw Text:**
Find and extract dates in the format DD-MM-YYYY from a given text.

**Example:**
```
Text: The event will take place on 15-08-2023 and 23-09-2023.
```



In [12]:
Text = "The event will take place on 15-08-2023 and 23-09-2023"
pattren = r'\d{2}-\d{2}-\d{4}'
findDates = re.findall(pattren,Text)
findDates

['15-08-2023', '23-09-2023']

### Assignment 6: Extracting URLs

**Raw Text:**
Extract all URLs from a text that belong to Pakistani domains.

**Example:**
```
Text: Visit http://www.example.pk or https://website.com.pk for more information.
```



In [19]:
Text    = "Visit http://www.example.pk or https://website.com.pk for more information."
pattren = "\w{4,5}://\w.*.\w.*.pk"
findUrlPak = re.findall(pattren,Text)
findUrlPak

['http://www.example.pk or https://website.com.pk']

### Assignment 7: Analyzing Currency

**Raw Text:**
Extract and analyze currency amounts in Pakistani Rupees (PKR) from a given text.

**Example:**
```
Text: The product costs PKR 1500, while the deluxe version is priced at Rs. 2500.
```



In [23]:
Text = "The product costs PKR 1500, while the deluxe version is priced at Rs. 2500."
pattren = "\d{3,4}"
findPakCur = re.findall(pattren,Text)
findPakCur


['1500', '2500']

### Assignment 8: Removing Punctuation

**Raw Text:**
Remove all punctuation marks from a text while preserving Urdu characters.

**Example:**
```
Text: کیا! آپ, یہاں؟
```



In [24]:
Text = "کیا! آپ, یہاں؟"
pattren = r'[^\w\s]'
res = re.sub(pattren, '', Text)
 
# printing result
print("The string after punctuation filter : " + res)


The string after punctuation filter : کیا آپ یہاں


### Assignment 9: Extracting City Names

**Raw Text:**
Extract names of Pakistani cities from a given text.

**Example:**
```
Text: Lahore, Karachi, Islamabad, and Peshawar are major cities of Pakistan.
```


In [34]:
Text=  """Lahore, Karachi, Islamabad, and Peshawar are major cities of Pakistan."""
pattern = r'(\b.+,).+(\bP.+ar\b)'

print(re.findall(pattern,Text,re.MULTILINE))

[('Lahore, Karachi, Islamabad,', 'Peshawar')]



### Assignment 10: Analyzing Vehicle Numbers

**Raw Text:**
Identify and extract Pakistani vehicle registration numbers (e.g., ABC-123) from a text.

**Example:**
```
Text: I saw a car with the number plate LEA-567 near the market.
```



In [26]:
Text="I saw a car with the number plate LEA-567 near the market."
pattren = "\w{3,4}-\d{3,4}"
findPakNum = re.findall(pattren,Text)
findPakNum


['LEA-567']