---
title: "Practice Activity 4.2 — Decode a Message"
author: "Shiqi Wu"
format:
  html:
    self-contained: true   
    toc: false
    code-fold: false
jupyter: python3
execute: false             
---

**Repository (Week 4 – Activity 4.2):**  [Week 4](https://github.com/shiqiwu212/GSB-S544-01/tree/db6617047c467143db543c1723fd0a22cbff2df4/Week%204/Practice%20Activities/Practice%20Activity%204.2)  

## Setup

Run the code below to load the scrambled message:


In [44]:
import pandas as pd

message = pd.read_csv("https://www.dropbox.com/s/lgpn3vmksk3ssdo/scrambled_message.txt?dl=1")['Word']

In [45]:
message

0                    Koila!
1                     In   
2                     kiew,
3                         a
4                 humble   
               ...         
122                     you
123                 mabugh.
124              ughhh?call
125        meugh.ughhhh!   
126                      K.
Name: Word, Length: 127, dtype: object

In this activity, a "word" refers to any set of characters with no white space, even though they are not truly an English word.  That is, even though many of elements of the scrambled message vector are nonsense, and some have punctuation, you can consider each element to be a "word".

Beware!  The object named `message` is a **pandas Series** of strings.  If you want to use functions that expect a string, rather than a series, remember `.apply()` and `lambda` functions.




## Warm-up exercises

1. How many characters are in the scrambled message?
2. How many of these characters are white space?
3. How many words are in the scrambled message?
4. Show all the punctuation marks in the scrambled message.
5. Print out, in all capitals, the longest word in the scrambled message.
6. Print out every piece of a word that starts with the letter "m" and ends with the letter "z" in the scrambled message.

## 1. How many characters are in the scrambled message?

In [46]:
total_chars = message.str.len().sum()
total_chars

np.int64(2544)

## 2. How many of these characters are white space?

In [47]:
whitespace_chars = message.str.count(r"\s").sum()
whitespace_chars

np.int64(1652)

## 3. How many words are in the scrambled message?

In [48]:
num_words = len(message)
num_words

127

## 4. Show all the punctuation marks in the scrambled message.

In [49]:
# AI assistance for this step:
# - Extract all non-word, non-space chars and list unique punctuation marks (sorted for readability).

puncts = (
    message.str.findall(r"[^\w\s]")
           .explode()
           .dropna()
           .unique()
)
sorted(list(puncts))

['!', ',', '.', ';', '?']

## 5. Print out, in all capitals, the longest word in the scrambled message.

In [50]:
# AI assistance for this step:
# - Locate the index of the max length token and uppercase it.

longest_idx = message.str.len().idxmax()
longest_word_upper = str(message.iloc[longest_idx]).upper()
longest_word_upper

'KAUDEVILLIANUGH?AOGHAJDBN'

## 6. Print out every piece of a word that starts with the letter "m" and ends with the letter "z" in the scrambled message.

In [51]:
# AI assistance for this step:
# - Find all substrings matching word-boundary m...z using a case-insensitive regex.

import re

pieces_m_to_z = (
    message.apply(lambda s: re.findall(r"\bm\w*?z\b", str(s), flags=re.I))
           .explode()
           .dropna()
           .unique()
)
list(pieces_m_to_z)

['maaz']



## Decode a message

Complete the following steps to decode the message.  

1. Remove any spaces before or after each word.
2. Any time you see the word "ugh", with any number of h's, followed by a punctuation mark, delete this.
3. No word should be longer than 16 characters. Drop all extra characters beyond 13 off the end of each word.
4. Replace all instances of exactly 2 a's with exactly 2 e's.
5. Replace all z's with t's.
6. Every word that ends in b, change that to a y.  *Hint: look out for punctuation!*
7. Every word that starts with k, change that to a v.  *Hint: look out for capitalization!*
8. Use `.join()` to recombine all your words into a message.
9. Find the movie this quote is from.

## 1. Remove any spaces before or after each word.

In [52]:
words = message.astype(str).str.strip()

## 2. Any time you see the word "ugh", with any number of h's, followed by a punctuation mark, delete this.

In [64]:
# AI assistance for this step:
# - Remove 'ugh' with any number of h's followed by a punctuation mark ANYWHERE inside the token.

import re, string

pattern_ugh_anywhere = re.compile(r"ugh+[" + re.escape(string.punctuation) + r"]", flags=re.I)
words = words.apply(lambda w: pattern_ugh_anywhere.sub("", w))

## 3. No word should be longer than 16 characters. Drop all extra characters beyond 13 off the end of each word.

In [54]:
# AI assistance for this step:
# - Truncate only if length > 16; then keep the first 13 characters.

MAX_FLAG = 16
TRUNC_TO = 13

def _truncate_rule(w: str) -> str:
    w = str(w)
    return w[:TRUNC_TO] if len(w) > MAX_FLAG else w

words = words.apply(_truncate_rule)

## 4. Replace all instances of exactly 2 a's with exactly 2 e's.

In [55]:
exact_two_a = re.compile(r"(?<!a)aa(?!a)")
words = words.apply(lambda w: exact_two_a.sub("ee", w))

## 5. Replace all z's with t's.

In [56]:
words = words.str.replace("z", "t", regex=False).str.replace("Z", "T", regex=False)

## 6. Every word that ends in b, change that to a y. Hint: look out for punctuation!

In [57]:
# AI assistance for this step:
# - Replace a trailing 'b' even if punctuation follows it (e.g., "glorb," -> "glory,").

end_b_to_y = re.compile(r"b(?=[" + re.escape(string.punctuation) + r"]*$)")
words = words.apply(lambda w: end_b_to_y.sub("y", w))

## 7. Every word that starts with k, change that to a v. Hint: look out for capitalization!

In [None]:
# AI assistance for this step:
# - Replace only the first letter k/K with v/V and KEEP the rest of the word (handle leading punctuation).

import re, string

# capture: leading punctuation, the first letter k/K, and the rest of the token
start_k_keep_tail = re.compile(r"^([" + re.escape(string.punctuation) + r"]*)([kK])(.*)$")

def k_to_v_keep_tail(w: str) -> str:
    m = start_k_keep_tail.match(str(w))
    if not m:
        return str(w)
    lead, letter, tail = m.groups()
    return lead + ("v" if letter == "k" else "V") + tail

words = words.apply(k_to_v_keep_tail)

## 8. Use .join() to recombine all your words into a message.

In [63]:
decoded_message = " ".join([w for w in words if str(w).strip() != ""]).strip()

## 9. Find the movie this quote is from.

In [65]:
# Movie: V for Vendetta
print(decoded_message)

Voila! In view, a humble vaudevilliana veteran, cast vicariously as both victim and villain by the vicissitudes of fate. This visage, no mere veneer of vanity, is a vestige of the vox populi now vacant, vanished. However, this valorous visitation of a bygone vexation stands vivified, and has vowed to vanquish these venal and virulent vermin, van guarding vice and vouchsafing the violently vicious and voracious violation of volition. The only verdict is vengeance; a vendetta, held as a votive not in vain, for the value and veracity of such shall one day vindicate the vigilant and the virtuous. Verily this vichyssoise of verbiage veers most verbose, so let me simply add that its my very good honour to meet you and you may call me V.
