---

## Advanced Regular Expression Assignments

### Assignment 1: Extracting Phone Numbers

**Raw Text:**
Extract all valid Pakistani phone numbers from a given text.

**Example:**
```
Text: Please contact me at 0301-1234567 or 042-35678901 for further details.
```



In [2]:
# First import Regular Expression library
import re


# Text: Please contact me at 0301-1234567 or 042-35678901 for further details.
theText = """
Please contact me at 0301-1234567 or 042-35678901 for further details.
"""

pattern = '\d{3,4}-\d{7,8}'

theMatch = re.findall(pattern, theText)

print(theMatch)

['0301-1234567', '042-35678901']


### Assignment 2: Validating Email Addresses

**Raw Text:**
Validate email addresses according to Pakistani domain extensions (.pk).

**Example:**
```
Text: Contact us at info@example.com or support@domain.pk for assistance.
```



In [3]:
theText = """
Contact us at info@example.com or support@domain.pk for assistance.
"""

pattern = '[A-z0-9\.]+@[A-z0-9]+\.pk'

theMatch = re.findall(pattern, theText)


print(theMatch)

['support@domain.pk']


### Assignment 3: Extracting CNIC Numbers

**Raw Text:**
Extract all Pakistani CNIC (Computerized National Identity Card) numbers from a given text.

**Example:**
```
Text: My CNIC is 12345-6789012-3 and another one is 34567-8901234-5.
```


In [4]:
theText = """
My CNIC is 12345-6789012-3 and another one is 34567-8901234-5.
"""

pattern = '\d{5}-\d{7}-\d{1}'

theMatch = re.findall(pattern, theText)

print(theMatch)

['12345-6789012-3', '34567-8901234-5']



### Assignment 4: Identifying Urdu Words

**Raw Text:**
Identify and extract Urdu words from a mixed English-Urdu text.

**Example:**
```
Text: یہ sentence میں کچھ English words بھی ہیں۔
```



In [5]:

theText = """
یہ sentence میں کچھ English words بھی ہیں۔
"""

pattern = r'[\u0600-\u06ff]+'

theMatch = re.findall(pattern, theText)


print(theMatch)


['یہ', 'میں', 'کچھ', 'بھی', 'ہیں۔']


### Assignment 5: Finding Dates

**Raw Text:**
Find and extract dates in the format DD-MM-YYYY from a given text.

**Example:**
```
Text: The event will take place on 15-08-2023 and 23-09-2023.
```



In [6]:

theText = """
The event will take place on 15-08-2023 and 23-09-2023.
"""

pattern = '\d{1,2}-\d{2}-\d{4}'

theMatch = re.findall(pattern, theText)


print(theMatch)

['15-08-2023', '23-09-2023']


### Assignment 6: Extracting URLs

**Raw Text:**
Extract all URLs from a text that belong to Pakistani domains.

**Example:**
```
Text: Visit http://www.example.pk or https://website.com.pk for more information.
```



In [7]:
theText = """
Visit http://www.example.pk or https://website.com.pk for more information.
"""

pattern = 'https?://[A-z0-9\.]+\.pk'

theMatch = re.findall(pattern, theText)


print(theMatch)

['http://www.example.pk', 'https://website.com.pk']


### Assignment 7: Analyzing Currency

**Raw Text:**
Extract and analyze currency amounts in Pakistani Rupees (PKR) from a given text.

**Example:**
```
Text: The product costs PKR 1500, while the deluxe version is priced at Rs. 2500.
```



In [8]:
theText = """
The product costs PKR 1500, while the deluxe version is priced at Rs. 2500.
"""

pattern = 'PKR \d{1,999999999}.\d{1,999999999}'          # To make sure if the amount cotains a 'Decimal' number

theMatch = re.findall(pattern, theText)


print(theMatch)

['PKR 1500']


### Assignment 8: Removing Punctuation

**Raw Text:**
Remove all punctuation marks from a text while preserving Urdu characters.

**Example:**
```
Text: کیا! آپ, یہاں؟
```



In [9]:
theText = """
کیا! آپ, یہاں؟
"""

theMatch = re.sub('[^\w\s]', '', theText)

print(theMatch)


کیا آپ یہاں



### Assignment 9: Extracting City Names

**Raw Text:**
Extract names of Pakistani cities from a given text.

**Example:**
```
Text: Lahore, Karachi, Islamabad, and Peshawar are major cities of Pakistan.
```


In [10]:
theText = """
Lahore, Karachi, Islamabad, and Peshawar are major cities of Pakistan.
"""

theText = re.sub("Pakistan", "", theText)          # Replaced the occurance on 'Pakistan' in the string by empty string

pattern = '[A-Z][a-z]+'

theMatch = re.findall(pattern, theText)

print(theMatch)

['Lahore', 'Karachi', 'Islamabad', 'Peshawar']



### Assignment 10: Analyzing Vehicle Numbers

**Raw Text:**
Identify and extract Pakistani vehicle registration numbers (e.g., ABC-123) from a text.

**Example:**
```
Text: I saw a car with the number plate LEA-567 near the market.
```



In [11]:
theText = """
I saw a car with the number plate LEA-567 near the market.
"""

pattern = '[A-Z]{3}-\d{3}'

theMatch = re.findall(pattern, theText)


print(theMatch)

['LEA-567']
