# Regular Expression

Imagine you have a big pile of documents, and you need to find all the phone numbers in them.  You could read every single word, but that would take forever! Instead, you can use a "search pattern" to find all the phone numbers quickly.

That "search pattern" is what a regular expression (regex) is.

Think of it like a super-powered search tool that understands patterns, not just exact words.

Here's a simple analogy:

* Normal search: If you search for "cat," you'll only find the word "cat."
* Regex search: You can create a pattern that says, "Find anything that looks like a phone number: three digits, then a hyphen, then three more digits, then another hyphen, and then four digits."





# Text Extraction and Parsing

## A. Extract product prices

The regex pattern allows the function to extract both whole dollar amounts and amounts with cents from HTML content, providing flexibility in handling various price formats.

Business Use Cases:

* E-commerce Price Monitoring: Competitor price tracking, dynamic pricing adjustments, identifying pricing errors.
* Retail Analytics: Analyzing product pricing trends across various online stores, understanding consumer price sensitivity.
* Financial Data Aggregation: Gathering stock prices, commodity prices, or real estate values from web pages.
* Travel Industry: Extracting hotel or flight prices from travel websites.

In [26]:
import re

def extract_product_price(html_content):
    """
    Extracts product prices from HTML content, including whole dollar amounts and amounts with cents.
    """
    price_pattern = r"\$\d+(?:\.\d{2})?"  # Matches dollar amounts (e.g., $10, $10.99)
    prices = re.findall(price_pattern, html_content)
    return prices

In [27]:
# Detailed Example:
html_text = """
<p>Product A: $10</p>
<p>Product B: $19.99</p>
<p>Product C: $100</p>
<p>Product D: $5.50</p>
<p>Product E: $0.99</p>
<p>Product F: $12345</p>
<p>Product G: $123.45</p>
<p>Product H: $12.34</p>
<p>Product I: This is not a price: $abc</p>
<p>Product J: This is not a price: 10.99$</p>
"""

extracted_prices = extract_product_price(html_text)
print("Extracted Prices:", extracted_prices)

Extracted Prices: ['$10', '$19.99', '$100', '$5.50', '$0.99', '$12345', '$123.45', '$12.34']


## B. Extract Contact Details

Example Breakdown:

* The html_text string contains various email addresses and phone numbers, including valid and invalid examples.
* The extract_contact_details function uses the regex patterns to find and extract the valid email addresses and phone numbers.
* The contact_details dictionary will contain the extracted information.
* Invalid email addresses and phone numbers that do not match the patterns are ignored.
* The example also shows that the code will extract emails with subdomains, and the + symbol in the username.
* The international phone number is not extracted, because the regular expression is only set to extract US phone numbers.

Business Use Cases:

* Lead Generation: Scraping contact information from company websites, online directories, or social media.
* Customer Relationship Management (CRM): Extracting customer contact details from email signatures or online forms.
* Market Research: Gathering contact information for potential survey participants or focus group attendees.
* Recruiting: Extracting contact details from resumes or LinkedIn profiles.

In [28]:
import re

def extract_contact_details(html_content):
    """
    Extracts contact details (emails, phone numbers) from HTML content.
    """
    email_pattern = r"[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}"
    phone_pattern = r"\(?\d{3}\)?[-.\s]?\d{3}[-.\s]?\d{4}"  # Basic US format
    emails = re.findall(email_pattern, html_content)
    phones = re.findall(phone_pattern, html_content)
    return {"emails": emails, "phones": phones}

In [29]:
# Detailed Example:

html_text = """
<p>Contact us:</p>
<p>Email: info@example.com or support@company.net</p>
<p>Phone: 123-456-7890, (555) 123-4567, or 987.654.3210</p>
<p>Alternative email: user.name+tag@subdomain.example.co.uk</p>
<p>Invalid email: invalid.email</p>
<p>Invalid phone: 1234567</p>
<p>International phone: +1-555-123-4567</p>
"""

contact_details = extract_contact_details(html_text)
print("Extracted Contact Details:", contact_details)

Extracted Contact Details: {'emails': ['info@example.com', 'support@company.net', 'user.name+tag@subdomain.example.co.uk'], 'phones': ['123-456-7890', '(555) 123-4567', '987.654.3210', '555-123-4567']}


## C. Extract article content

The function extracts the main article content by finding and concatenating the text from all paragraph tags, effectively removing other HTML elements.

Business Use Cases:

* Content Aggregation: Gathering articles from news websites, blogs, or industry publications for content marketing or research.
* Sentiment Analysis: Extracting text content for analyzing customer reviews, social media posts, or news articles.
* Knowledge Management: Building a database of articles or documents for internal search or analysis.
* SEO Monitoring: Extracting article content to analyze keyword density and content quality for SEO purposes.

In [30]:
from bs4 import BeautifulSoup

def extract_article_content(html_content):
    """Extracts article content from HTML (basic example)."""
    soup = BeautifulSoup(html_content, "html.parser")
    article_text = ""
    for paragraph in soup.find_all("p"):  # Assumes article content is in <p> tags
        article_text += paragraph.get_text() + " "
    return article_text.strip()

In [31]:
# Detailed Example:
html_text = """
<html>
<head>
    <title>My Article</title>
</head>
<body>
    <div class="article">
        <h1>Article Title</h1>
        <p>This is the first paragraph of the article. It contains some text.</p>
        <p>Here is the second paragraph. It has more details and information.</p>
        <p>This paragraph contains a <strong>strong</strong> word.</p>
        <div>
            <p>This paragraph is nested inside a div.</p>
        </div>
        <p>And this is the final paragraph of the article.</p>
        <p>
            This paragraph has multiple lines.
            It spans across several lines.
        </p>
    </div>
    <div class="other-content">
        <p>This is some other content we don't need.</p>
    </div>
</body>
</html>
"""

extracted_article = extract_article_content(html_text)
print("Extracted Article Content:", extracted_article)

Extracted Article Content: This is the first paragraph of the article. It contains some text. Here is the second paragraph. It has more details and information. This paragraph contains a strong word. This paragraph is nested inside a div. And this is the final paragraph of the article. 
            This paragraph has multiple lines.
            It spans across several lines.
         This is some other content we don't need.


## D. Extract XML data

Business Use Cases:

* Data Integration: Extracting data from XML feeds or APIs for integration with internal systems.
* Financial Reporting: Parsing XML-based financial reports (e.g., XBRL) for analysis.
* E-commerce Data Exchange: Extracting product information from XML-based product catalogs.
* Configuration Management: Parsing XML configuration files.

In [32]:
import re

def extract_xml_data(xml_content, tag_name):
    """Extracts data from XML based on a tag name."""
    tag_pattern = rf"<{tag_name}>(.*?)</{tag_name}>"
    matches = re.findall(tag_pattern, xml_content, re.DOTALL) #re.DOTALL to allow new line characters.
    return matches

In [33]:
# Detailed Example:
xml_data = """
<data>
    <product>
        <name>Laptop</name>
        <price>$1200</price>
        <description>
            A powerful laptop with high-performance components.
            It is suitable for gaming and professional use.
        </description>
    </product>
    <customer>
        <email>customer@example.com</email>
        <phone>555-123-4567</phone>
    </customer>
    <article>
        <content>
            This is a multi-line
            article.
            It contains several lines of text.
        </content>
    </article>
</data>
"""

product_names = extract_xml_data(xml_data, "name")
product_prices = extract_xml_data(xml_data, "price")
product_descriptions = extract_xml_data(xml_data, "description")
customer_emails = extract_xml_data(xml_data, "email")
article_contents = extract_xml_data(xml_data, "content")

print("Product Names:", product_names)
print("Product Prices:", product_prices)
print("Product Descriptions:", product_descriptions)
print("Customer Emails:", customer_emails)
print("Article Contents:", article_contents)

Product Names: ['Laptop']
Product Prices: ['$1200']
Product Descriptions: ['\n            A powerful laptop with high-performance components.\n            It is suitable for gaming and professional use.\n        ']
Customer Emails: ['customer@example.com']
Article Contents: ['\n            This is a multi-line\n            article.\n            It contains several lines of text.\n        ']


## E. Extract log info

Example Breakdown:

* The log_data list contains sample log lines with different formats and content.
* The loop iterates through each log_line in log_data.
* For each line, extract_log_info() is called.
* If the line matches the pattern, the extracted information is printed as a dictionary.
* If the line doesn't match the pattern (e.g., "Invalid log line..."), a message indicating the parsing failure is printed.
* The example shows how the regex correctly extracts the timestamp, level and message from the different log lines.
* The example also shows how the code deals with invalid log lines.
* The example shows how the regex captures the message even if it contains colons, dots, or other symbols.

Business Use Cases:

* System Monitoring: Extracting timestamps, error messages, and performance metrics from server logs.
* Security Auditing: Analyzing security logs for suspicious activity or unauthorized access.
* Application Debugging: Extracting error messages and stack traces for troubleshooting software issues.
* Business Intelligence: Analyzing log data for user behavior patterns or system performance trends.

In [1]:
import re

def extract_log_info(log_line):
    """
    Extracts timestamp, log level, and message from a log line.
    """
    log_pattern = r"^(?P<timestamp>\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2}) \[(?P<level>\w+)\] (?P<message>.*)$"
    match = re.match(log_pattern, log_line)
    if match:
        return match.groupdict()
    else:
        return None

In [2]:
# Detailed Example:
log_data = [
    "2023-10-27 10:00:00 [INFO] Application started successfully.",
    "2023-10-27 10:05:15 [WARNING] Network connection timed out.",
    "2023-10-27 10:10:30 [ERROR] Database query failed: SELECT * FROM users;",
    "2023-10-27 10:15:45 [DEBUG] User 'john.doe' logged in from IP 192.168.1.100.",
    "2023-10-27 10:20:00 [CRITICAL] System encountered an unexpected error and will shut down.",
    "2023-10-27 10:25:15 [INFO] Data processing completed in 5 seconds.",
    "2023-10-27 10:30:30 [DEBUG] Received request: GET /api/data?id=123",
    "Invalid log line - missing timestamp",
    "2023-10-27 10:35:00 [INFO] Log line with no message"
]

In [3]:
# Example Usage:
for log_line in log_data:
    log_info = extract_log_info(log_line)
    if log_info:
        print("Extracted Log Info:", log_info)
    else:
        print(f"Failed to parse log line: '{log_line}'")

Extracted Log Info: {'timestamp': '2023-10-27 10:00:00', 'level': 'INFO', 'message': 'Application started successfully.'}
Extracted Log Info: {'timestamp': '2023-10-27 10:10:30', 'level': 'ERROR', 'message': 'Database query failed: SELECT * FROM users;'}
Extracted Log Info: {'timestamp': '2023-10-27 10:15:45', 'level': 'DEBUG', 'message': "User 'john.doe' logged in from IP 192.168.1.100."}
Extracted Log Info: {'timestamp': '2023-10-27 10:20:00', 'level': 'CRITICAL', 'message': 'System encountered an unexpected error and will shut down.'}
Extracted Log Info: {'timestamp': '2023-10-27 10:25:15', 'level': 'INFO', 'message': 'Data processing completed in 5 seconds.'}
Extracted Log Info: {'timestamp': '2023-10-27 10:30:30', 'level': 'DEBUG', 'message': 'Received request: GET /api/data?id=123'}
Failed to parse log line: 'Invalid log line - missing timestamp'
Extracted Log Info: {'timestamp': '2023-10-27 10:35:00', 'level': 'INFO', 'message': 'Log line with no message'}


## F. Find Log error pattern

Example Breakdown:

Log Data:

The log_data list contains sample log lines with different log levels and messages.

Finding Error Logs:

* The find_error_patterns() function is called with log_data.
* The regex r"ERROR|CRITICAL|FAILURE" is used to find lines containing error-related keywords.
* The re.IGNORECASE flag ensures that the search is case-insensitive.
* The list comprehension filters the log_data list, keeping only the lines that match the error pattern.

Output:

The error_logs list will contain the following lines:

* "2023-10-27 10:10:30 [ERROR] Database query failed: SELECT * FROM users;"
* "2023-10-27 10:20:00 [CRITICAL] System encountered an unexpected error and will shut down."
* "2023-10-27 10:40:00 [Failure] Job failed due to memory error"
* "2023-10-27 10:45:00 [info] This line has an error, but in lowercase"

In essence, the function efficiently extracts all log lines that are likely related to errors, simplifying the process of error analysis from large log files.

Business Use Cases:

* Proactive System Maintenance: Quickly identifying and addressing critical errors in server logs.
* Incident Response: Filtering log data to isolate error messages during a system outage or security incident.
* Root Cause Analysis: Analyzing error logs to determine the underlying cause of system failures.

In [4]:
import re

def find_error_patterns(log_data):
    """
    Finds log entries matching error patterns.
    """
    error_pattern = r"ERROR|CRITICAL|FAILURE"
    errors = [line for line in log_data if re.search(error_pattern, line, re.IGNORECASE)]
    return errors

In [9]:
# Detailed Example:
log_data = [
    "2023-10-27 10:00:00 [INFO] Application started successfully.",
    "2023-10-27 10:05:15 [WARNING] Connection timeout.",
    "2023-10-27 10:10:30 [ERROR] Database query failed: SELECT * FROM users;",
    "2023-10-27 10:15:45 [INFO] User login successful.",
    "2023-10-27 10:20:00 [CRITICAL] System encountered an unexpected error and will shut down.",
    "2023-10-27 10:25:15 [INFO] Data processing completed in 5 seconds.",
    "2023-10-27 10:30:30 [DEBUG] Unusual activity detected: user login from unknown IP.",
    "2023-10-27 10:35:00 [INFO] Log line with no message",
    "2023-10-27 10:40:00 [Failure] Job failed due to memory error",
    "2023-10-27 10:45:00 [info] This line has an error, but in lowercase"
]

# Example Usage:
error_logs = find_error_patterns(log_data)
print("Error Logs:", error_logs)

Error Logs: ['2023-10-27 10:10:30 [ERROR] Database query failed: SELECT * FROM users;', '2023-10-27 10:20:00 [CRITICAL] System encountered an unexpected error and will shut down.', '2023-10-27 10:40:00 [Failure] Job failed due to memory error', '2023-10-27 10:45:00 [info] This line has an error, but in lowercase']


## G. Find Anomalies

Example Breakdown:

Log Data:

* The log_data list contains sample log lines with various log levels and messages.

Example Usage 1: Finding "timeout" Logs:

* anomaly_pattern1 = r"timeout": The regex searches for the literal word "timeout" (case-insensitive).
* The timeout_logs list will contain:

"2023-10-27 10:05:15 [WARNING] Connection timeout."

Example Usage 2: Finding "CPU" or "Disk" Logs:

* anomaly_pattern2 = r"CPU|Disk": The regex uses the "or" operator (|) to find lines containing either "CPU" or "Disk".

The resource_logs list will contain:

* "2023-10-27 10:35:00 [INFO] High CPU usage detected."
* "2023-10-27 10:50:00 [warning] Disk space nearing capacity."

Example Usage 3: Finding IP Address Logs:

* anomaly_pattern3 = r"\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}": The regex searches for the IP address format (e.g., 192.168.1.100).
* The ip_logs list will contain:

"2023-10-27 10:30:30 [DEBUG] Unusual activity detected: user login from unknown IP."

Example Usage 4: Finding "failed" Logs:

* anomaly_pattern4 = r"failed": The regex searches for the literal word "failed" (case-insensitive).
* The failed_logs list will contain:

"2023-10-27 10:10:30 [ERROR] Database query failed: SELECT * FROM users;"

"2023-10-27 10:40:00 [Failure] Job failed due to memory error."

In essence, the function allows you to define custom regex patterns to find any anomalies or specific events within your log data, providing a powerful way to filter and analyze log files.

Business Use Cases:

* Fraud Detection: Identifying unusual patterns in transaction logs or user activity logs.
* Network Security: Detecting suspicious network traffic patterns in firewall logs.
* Predictive Maintenance: Identifying anomalies in sensor data or machine logs that may indicate impending equipment failures.
* Business Process Monitoring: Discovering deviations from normal business process flows in application logs.

In [10]:
import re

def find_anomalies(log_data, anomaly_pattern):
    """
    Finds log entries matching a given anomaly pattern.
    """
    anomalies = [line for line in log_data if re.search(anomaly_pattern, line, re.IGNORECASE)]
    return anomalies

In [11]:
# Detailed Example:

log_data = [
    "2023-10-27 10:00:00 [INFO] Application started successfully.",
    "2023-10-27 10:05:15 [WARNING] Connection timeout.",
    "2023-10-27 10:10:30 [ERROR] Database query failed: SELECT * FROM users;",
    "2023-10-27 10:15:45 [INFO] User login successful.",
    "2023-10-27 10:20:00 [CRITICAL] System encountered an unexpected error and will shut down.",
    "2023-10-27 10:25:15 [INFO] Data processing completed in 5 seconds.",
    "2023-10-27 10:30:30 [DEBUG] Unusual activity detected: user login from unknown IP.",
    "2023-10-27 10:35:00 [INFO] High CPU usage detected.",
    "2023-10-27 10:40:00 [Failure] Job failed due to memory error.",
    "2023-10-27 10:45:00 [info] This line has an error, but in lowercase.",
    "2023-10-27 10:50:00 [warning] Disk space nearing capacity."
]

In [12]:
# Example Usage 1: Finding lines with "timeout"

anomaly_pattern1 = r"timeout"
timeout_logs = find_anomalies(log_data, anomaly_pattern1)
print("Timeout Logs:", timeout_logs)



In [13]:
# Example Usage 2: Finding lines with "CPU" or "Disk"

anomaly_pattern2 = r"CPU|Disk"
resource_logs = find_anomalies(log_data, anomaly_pattern2)
print("\nResource Logs:", resource_logs)




In [14]:
# Example Usage 3: Finding lines with an IP address (potential security anomaly)

anomaly_pattern3 = r"\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}"
ip_logs = find_anomalies(log_data, anomaly_pattern3)
print("\nIP Address Logs:", ip_logs)


IP Address Logs: []


In [15]:
# Example Usage 4: Finding lines with "failed"

anomaly_pattern4 = r"failed"
failed_logs = find_anomalies(log_data, anomaly_pattern4)
print("\nFailed Logs:", failed_logs)


Failed Logs: ['2023-10-27 10:10:30 [ERROR] Database query failed: SELECT * FROM users;', '2023-10-27 10:40:00 [Failure] Job failed due to memory error.']


## H. Extract data from csv

Example Breakdown:

CSV Data:

The csv_data string contains sample CSV data with columns for "Name," "Age," "City," and "Occupation."
It also contains rows with missing values to demonstrate how the code handles them.
Extracting Names:

* names = extract_data_from_csv(csv_data, 0): Extracts the values from the first column (index 0), which are the names.
* names will be ['Alice', 'Bob', 'Charlie', 'David', 'Eva', 'Frank'].

Extracting Ages:

* ages = extract_data_from_csv(csv_data, 1): Extracts the values from the second column (index 1), which are the ages.
* ages will be ['30', '25', '35', '40', '', '22'].
* Note that 'Eva' has an empty string for age.

Extracting Cities:

* cities = extract_data_from_csv(csv_data, 2): Extracts the values from the third column (index 2), which are the cities.
* cities will be ['New York', 'London', 'Paris', 'Tokyo', 'Berlin', ''].
* Note that 'Frank' has an empty string for city.

Extracting Occupations:

* occupations = extract_data_from_csv(csv_data, 3): Extracts the values from the fourth column (index 3), which are the occupations.
* occupations will be ['Engineer', 'Teacher', 'Artist', 'Manager', 'Developer', 'Student'].

Example with Invalid Column Index:

* invalid_column = extract_data_from_csv(csv_data, 4): This will raise an IndexError, because the CSV data only has columns with indexes 0 to 3. It serves to show that if an invalid column index is provided, the code will raise the appropriate error.
* It is good practice to add try and except blocks to your code to handle these kinds of errors.

In essence, the function provides a simple way to extract data from a specific column in a CSV string, handling basic CSV parsing and skipping empty rows.

Business Use Cases:

* Data Migration: Extracting data from CSV files for importing into databases or other systems.
* Data Analysis: Parsing CSV data for creating reports, charts, or visualizations.
* Financial Reporting: Extracting financial data from CSV-based transaction logs.
* Customer Data Management: Importing customer data from CSV files into CRM systems.

In [19]:
import re
import csv

def extract_data_from_csv(csv_content, column_index):
    """Extracts data from a specific column in a CSV string."""
    reader = csv.reader(csv_content.splitlines())
    data = []
    for row in reader:
        if row:  # Skip empty rows
            try:
                data.append(row[column_index])
            except IndexError:
                # Handle the error, e.g., print a message or append a default value
                print(f"Column index {column_index} out of bounds for row: {row}")
                data.append('')  # Append an empty string as a default
    return data

In [20]:
# Detailed Example:

csv_data = """
Name,Age,City,Occupation
Alice,30,New York,Engineer
Bob,25,London,Teacher
Charlie,35,Paris,Artist
David,40,Tokyo,Manager
Eva,,Berlin,Developer
Frank,22,,Student
"""

In [21]:
# Extract names (column 0)
names = extract_data_from_csv(csv_data, 0)
print("Names:", names)

# Extract ages (column 1)
ages = extract_data_from_csv(csv_data, 1)
print("Ages:", ages)

# Extract cities (column 2)
cities = extract_data_from_csv(csv_data, 2)
print("Cities:", cities)

# Extract occupations (column 3)
occupations = extract_data_from_csv(csv_data, 3)
print("Occupations:", occupations)

# Example with an out-of-bounds column index:
invalid_column = extract_data_from_csv(csv_data, 4) # This will raise an IndexError
print("Invalid column", invalid_column)

Names: ['Name', 'Alice', 'Bob', 'Charlie', 'David', 'Eva', 'Frank']
Ages: ['Age', '30', '25', '35', '40', '', '22']
Cities: ['City', 'New York', 'London', 'Paris', 'Tokyo', 'Berlin', '']
Occupations: ['Occupation', 'Engineer', 'Teacher', 'Artist', 'Manager', 'Developer', 'Student']
Column index 4 out of bounds for row: ['Name', 'Age', 'City', 'Occupation']
Column index 4 out of bounds for row: ['Alice', '30', 'New York', 'Engineer']
Column index 4 out of bounds for row: ['Bob', '25', 'London', 'Teacher']
Column index 4 out of bounds for row: ['Charlie', '35', 'Paris', 'Artist']
Column index 4 out of bounds for row: ['David', '40', 'Tokyo', 'Manager']
Column index 4 out of bounds for row: ['Eva', '', 'Berlin', 'Developer']
Column index 4 out of bounds for row: ['Frank', '22', '', 'Student']
Invalid column ['', '', '', '', '', '', '']


## I. Parse Config file

Configuration Data:

* The config_file_data string contains sample configuration data with comments, different spacing, and various value formats.

Key and Value Patterns:

* key_pattern = r"(\w+)": Matches one or more word characters (alphanumeric and underscore).
* value_pattern = r"(.+)": Matches one or more of any character. This is very general.

Parsing Configuration:

* The parse_config_file() function is called with the configuration data and the key/value patterns.
* The regex pattern finds key-value pairs in each line, ignoring comments and empty lines.

Example with more restrictive value pattern

* The example shows how to use a different value pattern to parse only certain kinds of values.

Example with only numbers as values

* The example shows how to parse only keys with numeric values.

Business Use Cases:

* Software Deployment: Parsing configuration files for setting application parameters or environment variables.
* System Administration: Parsing configuration files for managing server settings or network configurations.
* Data Integration: Parsing configuration files for defining data source connections or transformation rules.
* Automation: Parsing configuration files to automate system tasks or workflows.

In [37]:
import re

def parse_config_file(config_content, key_pattern, value_pattern):
    """Parses key-value pairs from a configuration file string."""
    config_data = {}
    lines = config_content.splitlines()
    for line in lines:
        line = line.strip()  # Remove leading/trailing whitespace
        if line and not line.startswith("#"):  # Skip empty lines and comments
            match = re.search(rf"{key_pattern}\s*=\s*{value_pattern}", line)
            if match:
                key = match.group(1).strip()
                value = match.group(2).strip()
                config_data[key] = value
    return config_data

In [38]:
# Detailed Example:
config_file_data = """
# This is a sample configuration file

# Database settings
DATABASE_HOST = localhost
DATABASE_PORT = 5432
DATABASE_USER = myuser
DATABASE_PASSWORD = mypassword

# Application settings
APP_NAME = MyApp
APP_VERSION = 1.2.3
LOG_LEVEL = INFO

# Server settings
SERVER_IP = 192.168.1.100
SERVER_PORT = 8080

# More settings with different spacing
API_KEY    =    ABCDEFG12345
TIMEOUT = 60 #seconds

#Example with spaces in values
FILE_PATH = /path/to/my files

#Example with quotes around the value
MESSAGE = "Hello, World!"

#Example of multiline value.
MULTI_LINE = This is a
multiline value.

"""

In [39]:
# Example Usage:
key_pattern = r"(\w+)"  # Matches one or more word characters (alphanumeric and underscore)
value_pattern = r"(.+)" # Matches one or more of any character

config = parse_config_file(config_file_data, key_pattern, value_pattern)
print("Parsed Configuration:", config)

# Example Usage with more restrictive value pattern
key_pattern2 = r"(\w+)"
value_pattern2 = r"([\w./]+)" #matches words, dots and slashes

config2 = parse_config_file(config_file_data, key_pattern2, value_pattern2)
print("\nParsed Configuration 2:", config2)

#Example usage with only numbers as values.
key_pattern3 = r"(\w+)"
value_pattern3 = r"(\d+)" #matches only digits.
config3 = parse_config_file(config_file_data, key_pattern3, value_pattern3)
print("\nParsed Configuration 3:", config3)

Parsed Configuration: {'DATABASE_HOST': 'localhost', 'DATABASE_PORT': '5432', 'DATABASE_USER': 'myuser', 'DATABASE_PASSWORD': 'mypassword', 'APP_NAME': 'MyApp', 'APP_VERSION': '1.2.3', 'LOG_LEVEL': 'INFO', 'SERVER_IP': '192.168.1.100', 'SERVER_PORT': '8080', 'API_KEY': 'ABCDEFG12345', 'TIMEOUT': '60 #seconds', 'FILE_PATH': '/path/to/my files', 'MESSAGE': '"Hello, World!"', 'MULTI_LINE': 'This is a'}

Parsed Configuration 2: {'DATABASE_HOST': 'localhost', 'DATABASE_PORT': '5432', 'DATABASE_USER': 'myuser', 'DATABASE_PASSWORD': 'mypassword', 'APP_NAME': 'MyApp', 'APP_VERSION': '1.2.3', 'LOG_LEVEL': 'INFO', 'SERVER_IP': '192.168.1.100', 'SERVER_PORT': '8080', 'API_KEY': 'ABCDEFG12345', 'TIMEOUT': '60', 'FILE_PATH': '/path/to/my', 'MULTI_LINE': 'This'}

Parsed Configuration 3: {'DATABASE_PORT': '5432', 'APP_VERSION': '1', 'SERVER_IP': '192', 'SERVER_PORT': '8080', 'TIMEOUT': '60'}
