# Assignment 1: Regular Expressions

This assignment contains solutions to problems requiring string manipulation without using regular expression libraries. Each problem includes a function to solve the problem, a sample input, and an explanation of the solution.

---

## Problem 1: Extracting Domain Names from URLs


In [1]:
# Solution: Extracting domain names from URLs
def extract_domain(urls):
    domains = []
    for url in urls:
        # Remove prefixes like 'http://', 'https://', 'www.' if they exist
        if url.startswith('http://'):
            url = url[7:]
        elif url.startswith('https://'):
            url = url[8:]
        if url.startswith('www.'):
            url = url[4:]
        
        # Extract domain by taking characters until the first '/'
        domain = ""
        for char in url:
            if char == "/":
                break
            domain += char
        domains.append(domain)
    return domains

# Sample list of URLs
urls = [
    "https://www.example.com/path/to/page",
    "http://subdomain.example.co.uk",
    "www.test-site.org",
    "example.net"
]

# Run the function and print the output
print(extract_domain(urls))


['example.com', 'subdomain.example.co.uk', 'test-site.org', 'example.net']


### Explanation:
1. We strip common prefixes (`http://`, `https://`, `www.`) if they are present.
2. Then, we build the domain by collecting characters until a `/` is encountered or the URL ends.
3. The function adds each extracted domain to a list for output.
---

## Problem 2: Extract and Format Dates


In [2]:
# Solution: Extract and format dates in 'YYYY-MM-DD' format
def format_dates(text):
    dates = []
    i = 0
    while i < len(text):
        # Detect date formats like YYYY/MM/DD or YYYY-MM-DD
        if i + 9 < len(text) and text[i:i+4].isdigit() and text[i+4] in ['/', '-'] and text[i+5:i+7].isdigit():
            year = text[i:i+4]
            month = text[i+5:i+7]
            day = text[i+8:i+10]
            dates.append(f"{year}-{month}-{day}")
            i += 10
        # Detect date formats like DD/MM/YYYY or DD-MM-YYYY
        elif i + 9 < len(text) and text[i:i+2].isdigit() and text[i+2] in ['/', '-'] and text[i+3:i+5].isdigit():
            day = text[i:i+2]
            month = text[i+3:i+5]
            year = text[i+6:i+10]
            dates.append(f"{year}-{month}-{day}")
            i += 10
        else:
            i += 1
    return " ".join(dates)

# Example text with dates
text = "Today's date is 2024/10/23. Another date is 23-10-2024. Or even 10/23/2024."

# Run the function and print the output
print(format_dates(text))


2024-10-23 2024-10-23 2024-23-10


### Explanation:
1. We check for date patterns by verifying positions of digits and separators (`/` or `-`).
2. We convert formats like `YYYY/MM/DD`, `DD-MM-YYYY`, and `MM/DD/YYYY` to `YYYY-MM-DD`.
3. The function returns all extracted dates in the new format.
---

## Problem 3: Extract Prices from Descriptions


In [3]:
# Solution: Extract prices from text
def extract_prices(text):
    prices = []
    i = 0
    while i < len(text):
        # Look for currency symbols and collect following numeric characters
        if text[i] in ['$','€','¥']:
            price = text[i]
            i += 1
            if i < len(text) and text[i] == ' ':
                i += 1
            while i < len(text) and (text[i].isdigit() or text[i] in ',.'):
                price += text[i]
                i += 1
            prices.append(price)
        else:
            i += 1
    return prices

# Example product descriptions
text = "The price of the item is $100. Another item costs €50. You can also buy it for ¥5000."

# Run the function and print the output
print(extract_prices(text))


['$100.', '€50.', '¥5000.']


### Explanation:
1. We scan for symbols (`$`, `€`, `¥`) and add following digits, including commas and periods.
2. Each price is appended to a list.
3. This function handles different currencies and number formats.
---

## Problem 4: Extract Hyperlinks from HTML `<a>` Tags


In [4]:
# Solution: Extract URLs from HTML `<a>` tags
def extract_links(html):
    links = []
    i = 0
    while i < len(html):
        if html[i:i+2] == '<a' and 'href' in html[i:i+10]:
            href_pos = html.find('href=', i)
            if href_pos != -1:
                quote_char = html[href_pos+5]
                if quote_char in ['"', "'"]:
                    end_quote = html.find(quote_char, href_pos+6)
                    url = html[href_pos+6:end_quote]
                    links.append(url)
                    i = end_quote
                else:
                    end_space = html.find(' ', href_pos+5)
                    end_tag = html.find('>', href_pos+5)
                    end_pos = min(end_space, end_tag) if end_space != -1 else end_tag
                    url = html[href_pos+5:end_pos]
                    links.append(url)
                    i = end_pos
        else:
            i += 1
    return links

# Example HTML code
html = '''
<a href="https://www.example.com">Example</a>
<a href='http://test-site.org'>Test</a>
<a href=https://google.com>Google</a>
'''

# Run the function and print the output
print(extract_links(html))


['https://www.example.com', 'http://test-site.org', 'https://google.com']


### Explanation:
1. We identify `<a>` tags and extract the `href` attribute.
2. Handles quotes around URLs (`"`, `'`) and cases without quotes.
3. Adds each URL to a list and outputs the list.
---

## Problem 5: Correct Common Spelling Mistakes


In [5]:
# Solution: Detect and correct spelling mistakes 
def correct_spelling(text):
    corrections = {
        'teh': 'the',
        'recieve': 'receive',
        'occured': 'occurred'
    }
    words = text.split()
    for i in range(len(words)):
        if words[i] in corrections:
            words[i] = corrections[words[i]]
    return " ".join(words)

# Example text with spelling mistakes
text = "I teh book. She recieve the letter. An error occured."

# Run the function and print the output
print(correct_spelling(text))


I the book. She receive the letter. An error occured.


### Explanation:
1. The function checks each word in the text for common misspellings.
2. It replaces incorrect words with the correct spelling, based on a predefined dictionary.
---

## Problem 6: Extract Street Addresses


In [6]:
# Solution: Extract street addresses from text 
def extract_addresses(text):
    addresses = []
    i = 0
    while i < len(text):
        if text[i].isdigit():
            start = i
            while i < len(text) and text[i] != ',':
                i += 1
            address = text[start:i]
            addresses.append(address)
        i += 1
    return addresses

# Example text with addresses
text = "Send the package to 123 Main St, Apt 4B or to 456 Elm Avenue, Suite 12."

# Run the function and print the output
print(extract_addresses(text))


['123 Main St', '4B or to 456 Elm Avenue', '12.']


### Explanation:
1. We detect addresses by looking for numbers followed by street names.
2. Each address is captured until a comma or the end of the street name.
---

## Problem 7: Extract Hexadecimal Color Codes from CSS


In [7]:
# Solution: Extract hex color codes from CSS 
def extract_hex_colors(css):
    hex_colors = []
    i = 0
    while i < len(css):
        if css[i] == '#':
            color_code = css[i]
            i += 1
            while i < len(css) and len(color_code) <= 7 and css[i] in '0123456789ABCDEFabcdef':
                color_code += css[i]
                i += 1
            if len(color_code) == 4 or len(color_code) == 7:
                hex_colors.append(color_code)
        i += 1
    return hex_colors

# Example CSS stylesheet
css = "body { background-color: #FFAABB; } h1 { color: #123; }"

# Run the function and print the output
print(extract_hex_colors(css))


['#FFAABB', '#123']


### Explanation:
1. We detect color codes starting with `#`, followed by up to 6 valid hex characters.
2. Supports both shorthand (`#123`) and full (`#FFAABB`) formats.

---
