<a href="https://colab.research.google.com/github/sada1908/edyoda_python/blob/main/string_re_edyodapy.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [1]:
import re

In [2]:
# 1. Sample Manuscript (multi-line string)
manuscript = """
Lorem ipsum dolor sit amet, consectetur adipiscing elit. "Sed do eiusmod tempor incididunt ut labore et dolore magna aliqua!"
Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat.
"Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur?"
Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.
Note: 20% of writers use @gmail & some use Outlook.
"""

In [3]:
# 2. word_stats function
def word_stats(text):
    total_chars = len(text)
    words = re.findall(r'\b\w+\b', text)
    total_words = len(words)
    sentences = re.split(r'[.!?]+', text)
    sentences = [s.strip() for s in sentences if s.strip()]  # remove empty strings
    total_sentences = len(sentences)
    avg_word_length = round(sum(len(w) for w in words) / total_words, 2) if total_words > 0 else 0
    return {
        "total_characters": total_chars,
        "total_words": total_words,
        "total_sentences": total_sentences,
        "average_word_length": avg_word_length
    }

In [4]:
# 3. format_title function
def format_title(title):
    exclusions = {"a", "an", "the", "and", "but", "or", "in", "on", "at"}
    words = title.lower().split()
    formatted_words = []
    for i, word in enumerate(words):
        if i == 0 or word not in exclusions:
            formatted_words.append(word.capitalize())
        else:
            formatted_words.append(word)
    return ' '.join(formatted_words)

In [5]:
# 4. find_quotes function
def find_quotes(text):
    # This handles multiple quoted sections (non-nested) and preserves order
    matches = re.findall(r'"(.*?)"', text)
    return matches

In [6]:
# 5. replace_symbols function
def replace_symbols(text):
    replacements = {
        "@": "at",
        "&": "and",
        "%": "percent"
    }

    def replace_symbol(match):
        symbol = match.group(0)
        word = replacements[symbol]
        return word

    # Use regex for all 3 symbols
    return re.sub(r'[@&%]', replace_symbol, text)

In [7]:
print("ðŸ“Œ Word Stats:")
print(word_stats(manuscript))


ðŸ“Œ Word Stats:
{'total_characters': 507, 'total_words': 78, 'total_sentences': 6, 'average_word_length': 5.21}


In [8]:
print("\nðŸ“Œ Formatted Title:")
sample_title = "the art of writing in the digital age"
print(format_title(sample_title))


ðŸ“Œ Formatted Title:
The Art Of Writing in the Digital Age


In [9]:

print("\nðŸ“Œ Quotes Found:")
print(find_quotes(manuscript))


ðŸ“Œ Quotes Found:
['Sed do eiusmod tempor incididunt ut labore et dolore magna aliqua!', 'Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur?']


In [10]:

print("\nðŸ“Œ Replaced Symbols:")
print(replace_symbols(manuscript))


ðŸ“Œ Replaced Symbols:

Lorem ipsum dolor sit amet, consectetur adipiscing elit. "Sed do eiusmod tempor incididunt ut labore et dolore magna aliqua!" 
Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. 
"Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur?" 
Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum. 
Note: 20percent of writers use atgmail and some use Outlook.

