# <font color="#418FDE" size="6.5" uppercase>**Simple Text Parsing**</font>

>Last update: 20260103.
    
By the end of this Lecture, you will be able to:
- Use split() to break strings into lists based on separators. 
- Process each part of a split string using loops and simple conditions. 
- Apply basic parsing to simple text formats like comma-separated values. 


## **1. Splitting Text Basics**

### **1.1. Using split**

<img src="https://cdn.jsdelivr.net/gh/mhrafiei/contents@main/LFF/Python for Beginners/Module_06/Lecture_B/image_01_01.jpg?v=1767426070" width="250">



>* Split breaks one long string into pieces
>* Separators create a list of independent parts

>* Splitting text separates combined details into parts
>* Each part becomes easier to store and use

>* Splitting text helps handle many real situations
>* It turns messy input into organized, usable data



In [None]:
#@title Python Code - Using split

# This script shows how split breaks text into smaller parts.
# It uses simple examples with names and dates for clarity.
# It prints original strings and resulting lists after splitting.

# pip install commands are not required because script uses only builtins.

# Define a full name string with first middle and last parts.
full_name = "Alice Marie Johnson"
# Split the name wherever a space character appears.
name_parts = full_name.split(" ")
# Print original name and resulting list of parts.
print("Full name:", full_name)
print("Name parts list:", name_parts)

# Define a date string using dashes as separators.
date_text = "04-15-2024"
# Split the date wherever a dash character appears.
date_parts = date_text.split("-")
# Print original date and resulting list of parts.
print("Date text:", date_text)
print("Date parts list:", date_parts)

# Define a short address line with city state and zip code.
address_text = "Denver CO 80202"
# Split the address wherever a space character appears.
address_parts = address_text.split(" ")
# Print original address and resulting list of parts.
print("Address text:", address_text)
print("Address parts list:", address_parts)



### **1.2. Choosing Split Separators**

<img src="https://cdn.jsdelivr.net/gh/mhrafiei/contents@main/LFF/Python for Beginners/Module_06/Lecture_B/image_01_02.jpg?v=1767426085" width="250">



>* Separators mark boundaries between pieces of text
>* Choose separators that match how information is organized

>* Choose separators that sit between, not inside, items
>* Study punctuation roles to get meaningful split pieces

>* Separators can mix roles and cause confusion
>* Study patterns and choose boundaries that preserve meaning



In [None]:
#@title Python Code - Choosing Split Separators

# Demonstrate choosing useful separators for string splitting.
# Compare splitting the same text using different separator characters.
# Show why some separators produce cleaner, more meaningful pieces.

# pip install example_library_if_needed_here.

# Define a shopping list string with commas between items.
shopping_list = "milk, eggs, bread, peanut butter, jelly"

# Split using comma separator which matches list structure.
items_by_comma = shopping_list.split(",")

# Print result showing clean separated grocery items.
print("Split using comma separator:", items_by_comma)

# Define a similar string but with dashes inside item names.
city_trip = "New-York, Los-Angeles, San-Francisco, Kansas-City"

# Split using comma separator which appears only between city names.
cities_by_comma = city_trip.split(",")

# Print result showing each full city name kept together.
print("Split using comma separator:", cities_by_comma)

# Split using dash separator which incorrectly splits hyphenated city names.
cities_by_dash = city_trip.split("-")

# Print result showing messy fragments from poor separator choice.
print("Split using dash separator:", cities_by_dash)

# Define a log line using vertical bars between important fields.
log_line = "2025-01-01|12:00 PM|Front door opened|Sensor ID 42"

# Split using vertical bar separator which matches field boundaries.
fields_by_bar = log_line.split("|")

# Print result showing meaningful log fields after splitting.
print("Split using bar separator:", fields_by_bar)



### **1.3. Trimming Extra Spaces**

<img src="https://cdn.jsdelivr.net/gh/mhrafiei/contents@main/LFF/Python for Beginners/Module_06/Lecture_B/image_01_03.jpg?v=1767426100" width="250">



>* Extra spaces after splitting cause inconsistent pieces
>* Trim spaces so comparisons and searches work

>* Trim whitespace around each split text piece
>* Trimming makes similar responses match and analyze correctly

>* Trimming prevents duplicates, bad matches, confusing output
>* Clean, trimmed pieces improve data quality and reliability



In [None]:
#@title Python Code - Trimming Extra Spaces

# This script shows how extra spaces affect split text pieces.
# It then trims spaces using strip for cleaner comparisons.
# Finally, it compares results before and after trimming pieces.

# pip install some_required_library_here if external libraries needed.

# Define a raw string containing names with inconsistent spaces.
raw_names = "Alice, Bob,  Charlie ,  Dana"

# Split the raw string using comma as separator character.
split_names = raw_names.split(",")

# Print pieces to show leading or trailing spaces clearly.
print("Before trimming pieces:")
print(split_names)

# Trim spaces from each piece using list comprehension and strip.
trimmed_names = [name.strip() for name in split_names]

# Print pieces again to show cleaned consistent names list.
print("\nAfter trimming pieces:")
print(trimmed_names)

# Compare lengths before and after trimming to show hidden spaces.
print("\nLengths before trimming:",[len(name) for name in split_names])

# Print lengths after trimming to highlight removed whitespace characters.
print("Lengths after trimming:",[len(name) for name in trimmed_names])



## **2. Processing Split Parts**

### **2.1. Looping Through Words**

<img src="https://cdn.jsdelivr.net/gh/mhrafiei/contents@main/LFF/Python for Beginners/Module_06/Lecture_B/image_02_01.jpg?v=1767426117" width="250">



>* Use a loop to visit each split part
>* Split then loop turns text into manageable pieces

>* Use conditions to treat words differently while looping
>* Detect keywords or patterns to extract useful meaning

>* Use loops to build summaries and statistics
>* Decide which parts to keep, change, discard



In [None]:
#@title Python Code - Looping Through Words

# Demonstrate looping through words from a split sentence.
# Show how each word gets processed individually in a loop.
# Print original sentence, split words, and processed words summary.

# !pip install example_library_not_needed_here.

# Define a simple customer review sentence example.
review_sentence = "This product is absolutely excellent and totally worth every dollar."

# Split the sentence into separate words list.
words_list = review_sentence.split()

# Print the original sentence for reference.
print("Original review sentence:", review_sentence)

# Print the list of split words for clarity.
print("Split words list:", words_list)

# Initialize a counter for positive words found.
positive_count = 0

# Define a small list of positive indicator words.
positive_words = ["excellent", "worth", "absolutely", "totally"]

# Loop through each word and check for positivity.
for word in words_list:
    # Convert word to lowercase for consistent comparison.
    lower_word = word.lower()

    # Check if current word appears in positive words list.
    if lower_word in positive_words:
        # Increase counter when a positive word is found.
        positive_count += 1

# Print how many positive words were detected.
print("Positive words found count:", positive_count)

# Print a simple interpretation based on the count.
print("Overall review sentiment seems positive." if positive_count > 1 else "Overall review sentiment seems neutral.")



### **2.2. Cleaning Split Parts**

<img src="https://cdn.jsdelivr.net/gh/mhrafiei/contents@main/LFF/Python for Beginners/Module_06/Lecture_B/image_02_02.jpg?v=1767426135" width="250">



>* Clean each split piece for consistent processing
>* Trim spaces, fix punctuation, standardize letter case

>* Clean tags before looping to avoid duplicates
>* Normalize spaces, case, punctuation for reliable checks

>* Remove or change unhelpful symbols in parts
>* Apply consistent checks to handle messy text



In [None]:
#@title Python Code - Cleaning Split Parts

# Demonstrate cleaning split text parts using simple string methods.
# Show trimming spaces, removing punctuation, and normalizing lowercase words.
# Help beginners understand reliable comparisons after cleaning split parts.

# pip install some_required_library_if_needed_here.

# Define a messy survey response string with inconsistent formatting.
raw_text = "Yes, no!!  YES?  maybe ,  yes !"

# Show original raw text before any cleaning operations.
print("Original text:", raw_text)

# Split the text into rough parts using spaces as separators.
parts = raw_text.split(" ")

# Prepare a list of unwanted edge punctuation characters for cleaning.
unwanted_punctuation = [",", "!", "?", "."]

# Create an empty list that will store cleaned parts only.
cleaned_parts = []

# Loop through each rough part and clean it step by step.
for part in parts:

    # Remove leading and trailing spaces from the current part.
    trimmed = part.strip()

    # Remove unwanted punctuation characters from both ends iteratively.
    while len(trimmed) > 0 and trimmed[0] in unwanted_punctuation:
        trimmed = trimmed[1:]

    # Continue removing punctuation from the right side if present.
    while len(trimmed) > 0 and trimmed[-1] in unwanted_punctuation:
        trimmed = trimmed[:-1]

    # Convert the remaining text to lowercase for consistent comparisons.
    normalized = trimmed.lower()

    # Only keep non empty cleaned parts inside the final list.
    if normalized != "":
        cleaned_parts.append(normalized)

# Show the list of cleaned parts after all transformations.
print("Cleaned parts:", cleaned_parts)



### **2.3. Filtering Split Results**

<img src="https://cdn.jsdelivr.net/gh/mhrafiei/contents@main/LFF/Python for Beginners/Module_06/Lecture_B/image_02_03.jpg?v=1767426151" width="250">



>* Filter split text by checking each piece
>* Keep parts matching simple conditions, discard others

>* Use loops to keep only relevant parts
>* Apply conditions to ignore noise and invalid entries

>* Combine multiple simple checks to filter parts
>* Layered filtering turns messy text into useful data



In [None]:
#@title Python Code - Filtering Split Results

# Demonstrate filtering split text parts using simple loop conditions.
# Show how to keep only useful words from a sentence.
# Print original words and filtered words for clear comparison.
# pip install some_required_library_if_needed.

# Define a sample sentence with several different words.
sentence = "The quick brown fox jumps over the lazy dog and the log."

# Split the sentence into separate word parts.
words = sentence.split(" ")

# Define a list containing unhelpful common short words.
stop_words = ["the", "and", "over", "the"]

# Prepare an empty list for storing filtered useful words.
filtered_words = []

# Loop through each word and apply filtering conditions.
for word in words:

    # Convert word to lowercase for consistent comparisons.
    lower_word = word.lower()

    # Remove trailing period characters from the word.
    cleaned_word = lower_word.strip(".")

    # Check length and stop word conditions before keeping.
    if len(cleaned_word) > 3 and cleaned_word not in stop_words:

        # Append cleaned word into the filtered words list.
        filtered_words.append(cleaned_word)

# Print original split words list for reference understanding.
print("Original words:", words)

# Print filtered words list showing only useful selected words.
print("Filtered words:", filtered_words)



## **3. Parsing Simple Formats**

### **3.1. Working With CSV Text**

<img src="https://cdn.jsdelivr.net/gh/mhrafiei/contents@main/LFF/Python for Beginners/Module_06/Lecture_B/image_03_01.jpg?v=1767426177" width="250">



>* CSV stores records as comma-separated fields
>* Parsing CSV turns raw lines into useful values

>* Split each CSV line into separate fields
>* Use fields to store, check, and analyze data

>* Assume commas separate simple fields in basics
>* Use simple CSV cases to build parsing confidence



In [None]:
#@title Python Code - Working With CSV Text

# Demonstrate parsing simple CSV style text lines.
# Show splitting lines into separate meaningful data fields.
# Print parsed records for clear beginner friendly understanding.
# pip install some_required_library_if_needed_here.
# This script uses only standard Python features available by default.

# Define a small CSV style text block with three book records.
books_csv_text = "Title,Author,Year,Shelf\nPython Basics,Smith,2020,A3\nData Skills,Jones,2018,B1\nCoding 101,Brown,2022,A1"

# Split the text into separate lines representing individual book records.
lines = books_csv_text.split("\n")

# Take the first line as header names for each column field.
header_line = lines[0]

# Split the header line into individual column names for later use.
headers = header_line.split(",")

# Prepare an empty list that will store parsed book dictionaries.
parsed_books = []

# Loop over remaining lines, skipping the header line already processed.
for line in lines[1:]:

    # Split each data line into separate field values using commas.
    parts = line.split(",")

    # Create a dictionary mapping headers to corresponding field values.
    book_record = {
        headers[0]: parts[0],
        headers[1]: parts[1],
        headers[2]: parts[2],
        headers[3]: parts[3],
    }

    # Append the new book record dictionary into the parsed books list.
    parsed_books.append(book_record)

# Print a short title describing the following parsed book records.
print("Parsed book records from simple CSV style text:")

# Loop through parsed books and print formatted information for each record.
for book in parsed_books:

    # Build a readable description string using fields from each book dictionary.
    description = f"Title: {book['Title']}, Author: {book['Author']}, Year: {book['Year']}, Shelf: {book['Shelf']}"

    # Print the description so beginners see structured data from raw text.
    print(description)



### **3.2. Colon separated pairs**

<img src="https://cdn.jsdelivr.net/gh/mhrafiei/contents@main/LFF/Python for Beginners/Module_06/Lecture_B/image_03_02.jpg?v=1767426195" width="250">



>* Colon pairs link labels to their values
>* Colons clearly separate two parts for easy parsing

>* Treat each line as key and value
>* Clean, normalize, store pairs for structured use

>* Combine colons with commas for richer parsing
>* Turn informal labeled text into structured, checkable data



In [None]:
#@title Python Code - Colon separated pairs

# This script parses colon separated key value pairs from simple configuration style text.
# It shows how to split each line around the colon character into key and value.
# It then trims spaces and stores the parsed pairs inside a Python dictionary.

# pip install commands are not required because this script uses only built in features.

# Define a multi line string representing simple configuration style settings.
settings_text = """username: alice
max_items: 10
unit_system: imperial
favorite_length: 12 inches"""

# Print the original text block so learners can see the raw input.
print("Original settings text block:\n")
print(settings_text)

# Prepare an empty dictionary that will store parsed key value pairs.
parsed_settings = {}

# Loop through each individual line inside the multi line settings text.
for line in settings_text.split("\n"):
    # Skip completely empty lines that might appear inside the text block.
    if not line.strip():
        continue

    # Split the current line into key and value parts using the first colon.
    key_part, value_part = line.split(":", 1)

    # Strip surrounding spaces from both key and value parts for cleanliness.
    key_clean = key_part.strip().lower()
    value_clean = value_part.strip()

    # Store the cleaned key and value inside the parsed settings dictionary.
    parsed_settings[key_clean] = value_clean

# Print a blank line to separate original text from the parsed dictionary output.
print("\nParsed settings dictionary:\n")

# Loop through dictionary items and print each parsed key with its corresponding value.
for key, value in parsed_settings.items():
    print(f"Key '{key}' has parsed value '{value}'")



### **3.3. Checking Parsed Data**

<img src="https://cdn.jsdelivr.net/gh/mhrafiei/contents@main/LFF/Python for Beginners/Module_06/Lecture_B/image_03_03.jpg?v=1767426210" width="250">



>* After splitting, always check pieces for sense
>* Validate count and meaning to avoid bad data

>* Validate each fieldâ€™s type and expected format
>* Use simple rules to catch common data errors

>* Check records for completeness and realistic values
>* Handle bad lines separately to protect later analysis



In [None]:
#@title Python Code - Checking Parsed Data

# Demonstrate checking parsed CSV style data for basic correctness.
# Show how to validate field counts and simple value formats safely.
# Help beginners trust parsed text lines before using their data.

# pip install commands are unnecessary because this script uses only built in modules.

# Define some example CSV style lines representing simple purchase records.
lines = ["Alice,alice@example.com,19.99","Bob,bob_at_example.com,12.50","Charlie,charlie@example.com,not_a_number","Dana,,5.00"," ,no_name@example.com,7.25","Eve,eve@example.com,15.00"]

# Define expected number of fields for each parsed record line.
expected_fields = 3

# Define a small helper function that checks email structure simply.
def looks_like_email(text):
    # Check that email contains one at sign and one period character.
    return ("@" in text) and ("." in text)

# Print a header line describing the upcoming validation results.
print("Checking parsed purchase records for basic problems and suspicious values...")

# Loop through each line and validate parsed pieces carefully.
for raw_line in lines:
    # Split the line by commas to obtain individual field pieces.
    parts = raw_line.split(",")

    # Check that the line has exactly the expected number of fields.
    if len(parts) != expected_fields:
        print("Problem line:", raw_line, "-> wrong number of fields.")
        continue

    # Unpack the parsed pieces into named variables for clarity.
    name, email, amount_text = parts

    # Strip whitespace and check that the name field is not empty.
    name = name.strip()
    if not name:
        print("Problem line:", raw_line, "-> missing customer name.")
        continue

    # Check that the email field looks roughly like a valid address.
    email = email.strip()
    if not looks_like_email(email):
        print("Problem line:", raw_line, "-> suspicious email format.")
        continue

    # Try converting the amount field into a floating point number.
    amount_text = amount_text.strip()
    try:
        amount = float(amount_text)
    except ValueError:
        print("Problem line:", raw_line, "-> purchase amount not numeric.")
        continue

    # Check that the numeric amount is positive and reasonably sized.
    if amount <= 0 or amount > 10000:
        print("Problem line:", raw_line, "-> implausible purchase amount.")
        continue

    # If all checks pass, print a confirmation message for the valid record.
    print("Valid record:", name, "spent", f"${amount:.2f}", "with email", email)



# <font color="#418FDE" size="6.5" uppercase>**Simple Text Parsing**</font>


In this lecture, you learned to:
- Use split() to break strings into lists based on separators. 
- Process each part of a split string using loops and simple conditions. 
- Apply basic parsing to simple text formats like comma-separated values. 

In the next Module (Module 7), we will go over 'Files And Data'