# Experiment 5
## Generate Regular Expressions for a given text.

## Regular Expression
A **regular expression** (regex or regexp) is a sequence of characters that define a search pattern.<br>
It is a powerful tool used in Natural Language Processing (NLP) to search for specific patterns or structures in text data. <br>
Regular expressions are highly expressive and can match many patterns, including numbers, dates, email addresses, and phone numbers. <br>
They are useful for numerous practical day-to-day tasks that a data scientist encounters, such as data pre-processing, rule-based information mining systems, pattern matching, text feature engineering, web scraping, data extraction, etc.<br>

Here are some key concepts related to regular expressions in NLP:

- A regular expression is a sequence of characters that is used to find or replace patterns embedded in the text.
- Regular expressions are used to recognize different strings of characters ¹.
- Raw strings are used in regular expressions to treat backslashes as literal characters.
- The `re` module in Python provides functions for working with regular expressions.
- The `re.findall()` function is used to search for all occurrences that match a given pattern.
- The `re.sub()` function is used to substitute the matched RE pattern with given text.
- The `re.match()` function is used to match the RE pattern to string with some optional flags.

## Explanation Of The Code

This code defines a function called `generate_regex` that takes a text input, escapes special characters, and generates a regular expression pattern based on the input text. Here's a breakdown of the code:

### Importing the Required Library
```python
import re
```
This line imports the `re` module, which stands for regular expressions, and will be used for working with regular expressions in the code.

### Function for Generating Regular Expression
```python
def generate_regex(text):
    regex = re.escape(text)
    regex = regex.replace(r'\ ', r'\s+')
    return regex
```
This function, `generate_regex`, takes a text input and performs the following steps:

1. `re.escape(text)`: This function escapes special characters in the input text, ensuring that they are treated as literal characters in the regular expression.

2. `regex.replace(r'\ ', r'\s+')`: This line replaces escaped space characters (`\ `) with `\s+`, where `\s` represents any whitespace character, and `+` means one or more occurrences. This modification allows for flexibility in matching multiple spaces in the input text.

3. The final regular expression is returned.

### Main Section
```python
if __name__ == '__main__':
    text = 'This is a sample text'
    regex = generate_regex(text)
    print(f'Text is: {text}')
    print(f'Generated Regular Expression is: {regex}')
```
The main section of the code initializes a sample text, calls the `generate_regex` function to create a regular expression based on the text, and then prints both the original text and the generated regular expression.

### Explanation
The purpose of this code is to create a regular expression pattern that can be used to match the input text, considering the input text may contain special characters and multiple spaces. The function aims to make the text suitable for pattern matching in a way that accounts for potential variations in spacing. This code then demonstrates the usage of the function with a sample text.

In [None]:
# Importing the required library
import re

# Function for Generating Regular Expression
def generate_regex(text):

    regex = re.escape(text)

    regex = regex.replace(r'\ ', r'\s+')

    # Returning Regular Expression
    return regex

if __name__ == '__main__':

    # Initializing Text
    text = 'This is a sample text'

    # Generating Regular Expression by calling the function
    regex = generate_regex(text)

    # Printing Text
    print(f'Text is: {text}')

    # Printing Regular Expression
    print(f'Generated Regular Expression is: {regex}')

Text is: This is a sample text
Generated Regular Expression is: This\s+is\s+a\s+sample\s+text
