## File Type

This notebook is an `.ipynb` Jupyter Notebook file, which uses an XML-based cell structure in VS Code. It is not a `.json` file, but Jupyter Notebooks are typically stored in JSON format. In VS Code, each cell is represented in an XML-like format for editing and versioning.

# Practice Activity: Extracting Key Information Using Regular Expressions

Objective:
To build a Python GUI application using Tkinter that allows users to open a .txt file, display its content, and extract specific information (email addresses, phone numbers, dates) using regular expressions. The extracted information will be displayed and can be copied to the clipboard.

## Prerequisites

- Python IDE (e.g., Jupyter Notebook, PyCharm, VS Code).
- Python installed.
- The pyperclip library needs to be installed. Open your terminal or command prompt and run:

```bash
pip install pyperclip
```

## Step 1: Import Required Libraries

Create a new Python file (e.g., app.py). Start by importing the necessary libraries:

- `tkinter (as tk)`: For creating the Graphical User Interface (GUI).
- `filedialog`: For the file open dialog.
- `scrolledtext`: For a scrollable text area to display file content and results.
- `messagebox`: For pop-up messages (errors, confirmations).
- `ttk`: For themed Tkinter widgets like the Combobox (dropdown menu).
- `re`: The Regular Expressions module for pattern matching.
- `pyperclip`: To enable copying text to the clipboard.

In [1]:
import tkinter as tk
from tkinter import filedialog, scrolledtext, messagebox, ttk
import re
import pyperclip # For copying extracted results

## Step 2: Initialize the Main GUI Window

Create the main application window and set its properties:

In [2]:
# Create main application window
root = tk.Tk()
root.title("Regex Information Extractor")
root.geometry("700x500") # 700 pixels width x 500 pixels height

''

## Step 3: Define Core Functions

### a. Function to Open and Read a File (open_file)
This function will handle opening a .txt file, reading its content, and displaying it in a text area.

In [3]:
def open_file():
    file_path = filedialog.askopenfilename(filetypes=[
        ("Text files", "*.txt"),
        ("JSON files", "*.json"),
        ("CSV files", "*.csv"),
        ("All files", "*.*")
    ])
    if file_path:
        try:
            with open(file_path, "r", encoding="utf-8") as file:
                text_content = file.read()
            text_area.delete("1.0", tk.END) # Clear previous content
            text_area.insert(tk.END, text_content)
            label_status.config(text=f"Loaded: {file_path}")
        except Exception as e:
            messagebox.showerror("Error", f"Failed to read file: {e}")
            label_status.config(text="Error loading file")

### b. Function to Extract Information Using Regex (extract_info)
This function will retrieve text from the display area, use a selected regex pattern to find matches, and display the results.

In [4]:
def extract_info():
    text_content = text_area.get("1.0", tk.END)
    selected_option = extract_option.get()

    regex_patterns = {
        "Email Addresses": r"\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,}\b",
        "Phone Numbers": r"\+?\d{1,3}[-.\s]?\(?\d{1,4}\)?[-.\s]?\d{3}[-.\s]?\d{4}",
        "Dates": r"\b\d{1,2}[-/]\d{1,2}[-/]\d{2,4}\b", # Matches MM/DD/YYYY, DD-MM-YYYY etc.
        "flagged": r"\b(?:flagged|flag)\b",
        "URLs": r"http[s]?://(?:[a-zA-Z]|[0-9]|[$-_@.&+]|[!*\\(\\),]|(?:%[0-9a-fA-F][0-9a-fA-F]))+"
    }

    if selected_option == "Messages with flagged True (JSON)":
        import json
        try:
            data = json.loads(text_content)
            messages = []
            if isinstance(data, list):
                messages = [item["message"] for item in data if item.get("flagged") is True and "message" in item]
            elif isinstance(data, dict):
                for v in data.values():
                    if isinstance(v, list):
                        messages.extend([item["message"] for item in v if isinstance(item, dict) and item.get("flagged") is True and "message" in item])
        except Exception as e:
            result_area.delete("1.0", tk.END)
            result_area.insert(tk.END, f"Error parsing JSON: {e}")
            return
        result_area.delete("1.0", tk.END)
        if messages:
            result_area.insert(tk.END, "\n".join(messages))
        else:
            result_area.insert(tk.END, "No flagged messages found.")
        return

    if selected_option == "Decoded FLAGs from flagged JSON (base64)":
        import json, base64, re
        try:
            data = json.loads(text_content)
            flags = []
            if isinstance(data, list):
                entries = data
            elif isinstance(data, dict):
                entries = []
                for v in data.values():
                    if isinstance(v, list):
                        entries.extend(v)
            else:
                entries = []
            for entry in entries:
                if entry.get("flagged") and "message" in entry:
                    encoded_msg = entry["message"]
                    try:
                        decoded_bytes = base64.b64decode(encoded_msg)
                        decoded_msg = decoded_bytes.decode('utf-8')
                        match = re.search(r'FLAG\{.*?\}', decoded_msg)
                        if match:
                            flags.append(match.group())
                    except Exception as e:
                        continue
        except Exception as e:
            result_area.delete("1.0", tk.END)
            result_area.insert(tk.END, f"Error parsing/decoding JSON: {e}")
            return
        result_area.delete("1.0", tk.END)
        if flags:
            result_area.insert(tk.END, "\n".join(flags))
        else:
            result_area.insert(tk.END, "No decoded FLAGs found.")
        return

    pattern = regex_patterns.get(selected_option)
    if not pattern:
        messagebox.showerror("Error", "Invalid selection.")
        return

    matches = re.findall(pattern, text_content)

    result_area.delete("1.0", tk.END) # Clear previous results
    if matches:
        result_area.insert(tk.END, "\n".join(matches))
    else:
        result_area.insert(tk.END, "No matches found.")

### c. Function to Copy Extracted Results (copy_results)
This function will copy the content of the results area to the system clipboard.

In [5]:
def copy_results():
    extracted_text = result_area.get("1.0", tk.END).strip()
    if extracted_text and extracted_text != "No matches found.":
        pyperclip.copy(extracted_text)
        messagebox.showinfo("Copied", "Extracted information copied to clipboard!")
    else:
        messagebox.showwarning("Warning", "No extracted data to copy.")

## Step 4: Create and Add GUI Components

### a. File Selection Button

In [6]:
btn_open = tk.Button(root, text="Open File", command=open_file)
btn_open.pack(pady=5)

### b. Scrollable Text Area for File Content

In [7]:
text_area = scrolledtext.ScrolledText(root, wrap=tk.WORD, width=80, height=12)
text_area.pack(pady=5)

### c. Dropdown Menu for Extraction Options

In [8]:
if root.winfo_exists():  # Check if the root window is still active
	dropdown_label = tk.Label(root, text="Select Data Type to Extract:")
	dropdown_label.pack()

	extract_option = tk.StringVar()
	# Set default value
	extract_option.set("Email Addresses")
	options = [
        "Email Addresses",
        "Phone Numbers",
        "Dates",
        "flagged",
        "URLs",
        "Messages with flagged True (JSON)",
        "Decoded FLAGs from flagged JSON (base64)"
    ]
	dropdown_menu = ttk.Combobox(root, textvariable=extract_option, values=options, state="readonly")
	dropdown_menu.pack(pady=5)
else:
	print("The application window has been closed. Cannot create widgets.")

### d. Extract Button

In [9]:
btn_extract = tk.Button(root, text="Extract", command=extract_info)
btn_extract.pack(pady=5)

### e. Result Display Area

In [10]:
result_area = scrolledtext.ScrolledText(root, wrap=tk.WORD, width=80, height=8)
result_area.pack(pady=5)

### f. Copy to Clipboard Button

In [11]:
btn_copy = tk.Button(root, text="Copy Results", command=copy_results)
btn_copy.pack(pady=5)

### g. Status Label for File Name

In [12]:
label_status = tk.Label(root, text="No file loaded", fg="blue")
label_status.pack(pady=5)

## Step 5: Run the Application

Start the Tkinter event loop to make the GUI active and responsive.

In [13]:
# Run the application
root.mainloop()

Exception in Tkinter callback
Traceback (most recent call last):
  File "C:\Users\ADMIN\AppData\Local\Programs\Python\Python313\Lib\tkinter\__init__.py", line 2068, in __call__
    return self.func(*args)
           ~~~~~~~~~^^^^^^^
  File "C:\Users\ADMIN\AppData\Local\Temp\ipykernel_17760\141964668.py", line 76, in extract_info
    matches = re.findall(pattern, text_content)
              ^^
UnboundLocalError: cannot access local variable 're' where it is not associated with a value
Exception in Tkinter callback
Traceback (most recent call last):
  File "C:\Users\ADMIN\AppData\Local\Programs\Python\Python313\Lib\tkinter\__init__.py", line 2068, in __call__
    return self.func(*args)
           ~~~~~~~~~^^^^^^^
  File "C:\Users\ADMIN\AppData\Local\Temp\ipykernel_17760\141964668.py", line 76, in extract_info
    matches = re.findall(pattern, text_content)
              ^^
UnboundLocalError: cannot access local variable 're' where it is not associated with a value
Exception in Tkinter c

## Step 6: Test the Application

1. Save your Python script (e.g., app.py).
2. Run the script from your terminal: `python app.py`.
3. The GUI window "Regex Information Extractor" should appear.
4. Click "Open File" and select a .txt file (you can use the sample_file.txt if one was provided with the activity, or create your own with some emails, phone numbers, and dates).
5. The file content should appear in the top text area.
6. Select an option from the "Select Data Type to Extract:" dropdown (e.g., "Email Addresses").
7. Click the "Extract" button.
8. The extracted information (or "No matches found.") should appear in the lower text area.
9. Click "Copy Results" to copy the extracted text to your clipboard. A confirmation message should appear.
10. Test with other extraction types and different files.