
## Text Processing – Problem Set


### Problem 1 — String Basics
Write a function that takes a sentence (string) and returns a tuple: `(first_char, last_char, length)`.
- Ignore leading/trailing whitespace when determining the first and last characters.
- If the trimmed string is empty, return `(None, None, 0)`.

In [39]:
def string_info(sentence: str):
    trimmed = sentence.strip()  # ignore leading/trailing spaces
    if not trimmed:             # empty after trimming
        return (None, None, 0)
    first_char = trimmed[0]
    last_char = trimmed[-1]
    length = len(trimmed)
    return (first_char, last_char, length)

# Example usage
print(string_info("   Python is fun   "))
# ('P', 'n', 13)

print(string_info("     "))
# (None, None, 0)

('P', 'n', 13)
(None, None, 0)


### Problem 2 — String Methods
Given the string `s = "  Hello, Python World!  "`:
- Remove extra spaces at both ends
- Convert to lowercase
- Replace `"python"` with `"text"` (match should be case-insensitive)
Return the final string.

In [37]:
s = "  Hello, Python World!  "

# Step 1: remove extra spaces
cleaned = s.strip()

# Step 2: convert to lowercase
cleaned = cleaned.lower()

# Step 3: replace "python" with "text"
cleaned = cleaned.replace("python", "text")

print(cleaned)
# Output: "hello, text world!"

hello, text world!


### Problem 3 — Splitting and Joining
You are given a CSV string like `"a,b,c,d,e"`.

Write a function that:
1. Splits the string by commas into a list of tokens.
2. Joins the tokens with `"-"` as a separator.
3. Returns the resulting string.

In [36]:
def csv_to_list_and_back(csv_str: str):
    items = csv_str.split(',')     # split by comma
    joined = '-'.join(items)       # join with '-'
    return joined

# Example usage
print(csv_to_list_and_back("a,b,c,d,e"))

a-b-c-d-e


### Problem 4 — f-Strings and Formatting
Ask for `name` (str), `age` (int), and `height_feet` (float) and return a formatted sentence using an f-string:

`"Alice is 23 years old and 5.6 feet tall."`

Write the function so that it accepts three parameters and **returns** the string (do not use `input()`).

In [35]:
def introduce(name: str, age: int, height_feet: float):
    # Using f-string formatting with specifiers
    return f"{name} is {age} years old and {height_feet:.1f} feet tall."


print(introduce("Alice", 23, 5.6))
print(introduce("Akash", 24, 5.9))

Alice is 23 years old and 5.6 feet tall.
Akash is 24 years old and 5.9 feet tall.


### Problem 5 — Word Count (Dictionary)
Implement `word_count(text)` that returns a dictionary mapping each lowercase word to its frequency.
- Words are separated by whitespace.
- Strip leading/trailing punctuation: `.,;:!?"'()[]{}-` from each token before counting.
- Ignore empty tokens after stripping.

In [34]:
def word_count(text: str):
    counts = {}
    punctuation = ".,;:!?\"'()[]{}-"
    words = text.split()
    for word in words:
        word = word.strip(punctuation).lower()
        if word:
            counts[word] = counts.get(word, 0) + 1
    return counts

# Example usage
print(word_count("Hello hello world!"))
print(word_count("This is a test. This is only a test."))

{'hello': 2, 'world': 1}
{'this': 2, 'is': 2, 'a': 2, 'test': 2, 'only': 1}


### Problem 6 — File Processing (I/O)
Write a function `file_stats(path)` that reads a UTF-8 `.txt` file and returns a tuple: `(num_lines, num_words, num_chars)`.
- `num_words` splits on whitespace
- Count newline characters in `num_chars` as well
- If the file is not found, raise `FileNotFoundError`

In [None]:
def file_stats(path: str):
    try:
        with open(path, 'r', encoding='utf-8') as f:
            lines = f.readlines()
    except FileNotFoundError:
        raise FileNotFoundError(f"File not found: {path}")

    num_lines = len(lines)
    num_words = 0
    num_chars = 0
    for line in lines:
        num_words += len(line.split())   # whitespace split
        num_chars += len(line)           # includes newline
    return num_lines, num_words, num_chars

# Example usage (create a sample.txt first)
# print(file_stats("sample.txt"))

### Problem 7 — Top-k Frequent Words
Given a paragraph of text, return the **top 5 most frequent words** as a list of `(word, count)` pairs, sorted by decreasing count then alphabetically.
- Use `collections.Counter`, see https://docs.python.org/3/library/collections.html#counter-objects.
- Treat words case-insensitively and strip the same punctuation set as in Problem 5.

In [30]:
from collections import Counter

def top_k_words(text: str, k=5):
    words = [w.strip('.,!?').lower() for w in text.split()]
    counts = Counter(words)
    # Sort: first by -count (descending), then by word (alphabetical)
    sorted_items = sorted(counts.items(), key=lambda x: (-x[1], x[0]))
    return sorted_items[:k]

# Example usage
print(top_k_words("apple banana apple orange banana apple"))
print(top_k_words("one two two three three three"))


[('apple', 3), ('banana', 2), ('orange', 1)]
[('three', 3), ('two', 2), ('one', 1)]
[('bat', 2), ('cat', 2), ('rat', 2)]


In [25]:
def top_k_words(text: str, k=5):
    counts = {}
    words = text.split()
    for word in words:
        word = word.strip('.,!?').lower()
        if word in counts:
            counts[word] += 1
        else:
            counts[word] = 1
    # Sort dictionary by frequency
    items = list(counts.items())
    items.sort(key=lambda x: x[1], reverse=True)
    return items[:k]

# Example usage
print(top_k_words("apple banana apple orange banana apple"))
print(top_k_words("one two two three three three"))

[('apple', 3), ('banana', 2), ('orange', 1)]
[('three', 3), ('two', 2), ('one', 1)]


### Problem 8 — Palindrome Check
Implement `is_palindrome(s)` that returns `True` if `s` is a palindrome, ignoring case and spaces.
- Example: `"A man a plan a canal Panama"` → `True`

In [None]:
def is_palindrome(s: str):
    s = s.lower()
    s = ''.join(ch for ch in s if ch.isalnum())
    return s == s[::-1]

print(is_palindrome("A man a plan a canal Panama"))
print(is_palindrome("Python"))

True
False


### Problem 9 — Text Cleanup Challenge
Given a messy string like: `"Hello!!! This, is... an example?? of messy---text."`
Write `clean_text(text)` that:
1. Removes punctuation
2. Converts to lowercase
3. Collapses multiple spaces to a single space
   
Return the cleaned text.

In [2]:
def clean_text(s: str):
    result = ''
    for ch in s:
        if ch.isalnum() or ch == ' ':
            result += ch
    # convert to lowercase
    result = result.lower()
    print(result.split())
    # collapse multiple spaces by splitting and rejoining
    
    result = ' '.join(result.split())
    return result

print(clean_text("Hello!!! This, is... an example?? of      messy---text."))
print(clean_text("123###Python***Rocks!!!"))

['hello', 'this', 'is', 'an', 'example', 'of', 'messytext']
hello this is an example of messytext
['123pythonrocks']
123pythonrocks



---
*End of Problem Set.*


d=list(range(4,41,4))

d=list(range(4,41,4))

In [None]:
i=3
d=list(range(i,(i*10)+1,i))
print(d)

[3, 6, 9, 12, 15, 18, 21, 24, 27, 30]


: 