# Python Mini Project Portfolio Notebook  
## Iteration Automation, Validation & Parsing, and Lightweight Data Structuring

This notebook is a **project-style** collection of small Python components I built while practicing practical patterns used in data work.

It contains three mini-components that fit together as a “toolbox”:

1. **Cinema Ops Automation (Loops)**
   - Purchase counters (milestones)
   - Reservation countdown timer with warning messages

2. **Retail Utilities (Strings)**
   - ZIP code normalization + validation
   - Store ID extraction from URLs
   - URL protocol + store ID validation with clean user feedback

3. **EPA Micro-Dataset Structuring (Lists & Tuples)**
   - Building row-like records using tuples
   - Converting records to editable lists
   - Filtering records using tuple unpacking and list comprehensions

Skills demonstrated:
- `while` loops, `for` loops, nested loops
- conditionals, defensive coding, functions, docstrings
- strings: slicing, splitting, `partition`, formatting
- data structures: lists, tuples, `zip()`, list comprehensions, unpacking

In [8]:
from time import sleep
import random

# 1) Cinema Ops Automation (While Loops)

In early analytics prototypes, it’s common to start with print-based automation:
- counting events (purchases, page actions)
- printing milestone messages
- simulating time windows (like seat reservation timers)

The goal here is to demonstrate `while` loops in a practical context.


## Candy purchase counter (0 → 5)

A basic counter loop that prints each purchase as it happens.


In [28]:
# Counter starts at 0 candies purchased
candy_purchased = 0

# Keep looping until the counter reaches 5
while candy_purchased <= 5:
    print("Candy purchased: " + str(candy_purchased))
    candy_purchased += 1  # increment after printing


Candy purchased: 0
Candy purchased: 1
Candy purchased: 2
Candy purchased: 3
Candy purchased: 4
Candy purchased: 5


## Candy purchase milestone alerts (multiples of 10)

This version prints only milestone events: 0, 10, 20, ..., 100.
Useful for “checkpoint” type reporting where you don’t want every line logged.


In [29]:
candy_purchased = 0

while candy_purchased <= 100:
    # Print only when the number is divisible by 10
    if candy_purchased % 10 == 0:
        print("Candy purchased: " + str(candy_purchased))
    candy_purchased += 1


Candy purchased: 0
Candy purchased: 10
Candy purchased: 20
Candy purchased: 30
Candy purchased: 40
Candy purchased: 50
Candy purchased: 60
Candy purchased: 70
Candy purchased: 80
Candy purchased: 90
Candy purchased: 100


## Seat reservation countdown timer

Online seat selection pages often have a limited “reservation window.”
This countdown simulates that behavior and prints escalating warnings as time runs out.

Notes:
- Use `pause_seconds=0` for instant output in a notebook demo.
- Use `pause_seconds=1` if you want the countdown to run in real time.


In [None]:
def reservation_timer(start_mins=10, pause_seconds=0):
    """
    Prints a countdown for an online seat reservation window.

    Args:
        start_mins (int): starting minutes (default 10).
        pause_seconds (int | float): delay per minute tick (default 0 for instant demo).

    Behavior:
        - Prints remaining minutes
        - Special messages at 5 and 2 minutes remaining
        - Prints timeout message at 0
    """
    mins = start_mins

    while mins >= 0:
        if mins == 5:
            print(f"Place your reservation soon! {mins} minutes remaining.")
        elif mins == 2:
            print("Don't lose your seats! 2 minutes remaining.")
        elif mins == 0:
            print("User timed out.")
        else:
            print(mins)

        #Pause to simulate time passing
        sleep(pause_seconds)
        mins -= 1


# Demo
reservation_timer(start_mins=10, pause_seconds=0)


10
9
8
7
6
Place your reservation soon! 5 minutes remaining.
4
3
Don't lose your seats! 2 minutes remaining.
1
User timed out.


# 2) Customer Analytics Utilities (For Loops)

Next, I built a few small utilities that show how `for` loops help automate analysis tasks:
- categorizing scores into buckets
- validating IDs against a trusted reference list
- aggregating customer basket totals (nested loop)


## Score binning (`score_counter`)

Groups customer scores (1–10) into:
- **Negative:** 1–5
- **Neutral:** 6–8
- **Positive:** 9–10

This pattern is common when turning raw numeric signals into usable categories.


In [31]:
def score_counter(score_list):
    """
    Bins feedback scores into negative/neutral/positive counts.

    Args:
        score_list (list[int]): list of integer scores (1–10).

    Prints:
        Negative: <count>
        Neutral: <count>
        Positive: <count>
    """
    negative_count = 0
    neutral_count = 0
    positive_count = 0

    for score in score_list:
        # Negative bucket
        if score in range(1, 6):
            negative_count += 1

        # Neutral bucket
        elif score in range(6, 9):
            neutral_count += 1

        # Positive bucket (9–10)
        else:
            positive_count += 1

    print("Negative: " + str(negative_count))
    print("Neutral: " + str(neutral_count))
    print("Positive: " + str(positive_count))


# Quick sanity check
score_counter([1,2,3,4,5,6,7,8,9,10])


Negative: 5
Neutral: 3
Positive: 2


## Randomized score tests (self-contained)

Instead of relying on external lab helpers, I generate a few repeatable test lists using `random`.
This keeps the notebook portable.


In [32]:
random.seed(42)

possible_scores = list(range(1, 11))

# Different weighting patterns simulate different “customer populations”
score_list1 = random.choices(possible_scores, weights=[8,8,8,8,8,3,3,4,20,30], k=10)
score_list2 = random.choices(possible_scores, weights=[1,2,3,4,5,10,15,15,7,9], k=450)
score_list3 = random.choices(possible_scores, weights=[1,2,3,4,4,5,5,10,15,25], k=10000)

print("Run 1:")
score_counter(score_list1)

print("\nRun 2:")
score_counter(score_list2)

print("\nRun 3:")
score_counter(score_list3)


Run 1:
Negative: 5
Neutral: 1
Positive: 4

Run 2:
Negative: 85
Neutral: 253
Positive: 112

Run 3:
Negative: 1935
Neutral: 2652
Positive: 5413


## Verified ID cross-check (`id_validator`)

Sometimes feedback or event data includes unverified IDs.
This function cross-checks a feedback list against a verified reference list and prints:

- how many IDs are unverified
- what percentage of feedback IDs are unverified


In [14]:
def id_validator(verified_ids, feedback_ids):
    """
    Cross-checks feedback IDs against verified IDs.

    Args:
        verified_ids (list[str]): verified customer IDs.
        feedback_ids (list[str]): IDs that posted feedback.

    Prints:
        '{unverified} of {total} IDs unverified.'
        '{percent}% unverified.'
    """
    unverified = 0

    for _id in feedback_ids:
        if _id not in verified_ids:
            unverified += 1

    total = len(feedback_ids)
    percent = (unverified / total) * 100 if total else 0

    print(f"{unverified} of {total} IDs unverified.")
    print(f"{round(percent, 2)}% unverified.")


# Examples
id_validator(verified_ids=["1", "2"], feedback_ids=["1", "2", "3"])
print()
id_validator(verified_ids=["1a", "2b", "3c"], feedback_ids=["1a", "4d"])


1 of 3 IDs unverified.
33.33% unverified.

1 of 2 IDs unverified.
50.0% unverified.


## Synthetic ID data generator (portable tests)

To keep this notebook self-contained, I built a small generator that creates:
- a verified ID pool
- a feedback list that includes a controllable percentage of unverified IDs


In [33]:
def generate_id_lists(n_verified=20, n_feedback=30, seed=1, unverified_ratio=0.3):
    """
    Generates verified + feedback ID lists for reproducible validation tests.

    Args:
        n_verified (int): number of verified IDs to create
        n_feedback (int): number of feedback IDs (unique)
        seed (int): random seed
        unverified_ratio (float): fraction of feedback IDs that should be unverified

    Returns:
        (verified_ids, feedback_ids)
    """
    random.seed(seed)

    # Create a pool of verified IDs
    verified_ids = [f"v{1000+i}" for i in range(n_verified)]

    # Decide how many feedback IDs should be unverified
    n_unverified = int(n_feedback * unverified_ratio)
    n_verified_in_feedback = n_feedback - n_unverified

    # Sample verified IDs for the feedback list
    verified_sample = random.sample(verified_ids, k=min(n_verified_in_feedback, len(verified_ids)))

    # Create synthetic unverified IDs
    unverified_sample = [f"u{5000+i}" for i in range(n_unverified)]

    feedback_ids = verified_sample + unverified_sample
    random.shuffle(feedback_ids)

    return verified_ids, feedback_ids


ver1, fb1 = generate_id_lists(n_verified=20, n_feedback=15, seed=7, unverified_ratio=0.25)
id_validator(ver1, fb1)


3 of 15 IDs unverified.
20.0% unverified.


## Counting customers with $100+ spend (`purchases_100`)

Sales data is often stored as a list of customer baskets (lists inside a list).
This function uses a nested loop to calculate totals per customer and counts how many reach $100+.


In [None]:
def purchases_100(sales):
    """
    Counts customers whose total purchase sum is >= 100.

    Args:
        sales (list[list[float]]): each inner list contains item prices for one customer.

    Returns:
        int: number of customers with totals >= 100
    """
    purchases_over_100 = 0

    # Sum prices for this customer's basket
    for customer in sales:
        total_purchase = 0
        for purchase in customer:
            total_purchase += purchase

        # Check if total meets threshold
        if total_purchase >= 100:
            purchases_over_100 += 1

    return purchases_over_100


sales_example = [[2.75], [50.0, 50.0], [150.46, 200.12, 111.30]]
print("Customers over $100:", purchases_100(sales_example))


Customers over $100: 2


## Synthetic sales data generator (repeatable testing)

Generates random baskets so the nested-loop function can be tested quickly without external files.


In [18]:
def generate_sales_data(n_customers=10, seed=1):
    """
    Generates synthetic sales data where each customer buys 1–6 items priced $1–$80.
    """
    random.seed(seed)

    sales = []
    for _ in range(n_customers):
        n_items = random.randint(1, 6)
        basket = [round(random.uniform(1, 80), 2) for _ in range(n_items)]
        sales.append(basket)

    return sales


sales1 = generate_sales_data(n_customers=10, seed=1)
sales2 = generate_sales_data(n_customers=150, seed=18)

print("Test 1:", purchases_100(sales1))
print("Test 2:", purchases_100(sales2))


Test 1: 7
Test 2: 111


# 3) Retail Utilities (Strings)

IDs and ZIP codes often look numeric, but treating them as strings avoids common issues:
- losing leading zeros
- breaking slicing/format rules
- mismatching across datasets

This section includes a small set of “validation + parsing” helpers.


## Store IDs should be strings

Even if an ID is numeric-looking, converting to a string is safer for validation and slicing.


In [5]:
store_id = 1101

# Convert to string
store_id = str(store_id)

# Confirm the type
print("store_id:", store_id)
print("type(store_id):", type(store_id))

store_id: 1101
type(store_id): <class 'str'>


## ZIP code validation and formatting (`zip_checker`)

Rules implemented:
- If ZIP is 5 characters and does **not** start with `"00"`, return it
- If ZIP is 4 characters and does **not** start with `"0"`, pad a leading zero and return it
- Otherwise return `"Invalid ZIP Code."`

In [19]:
def zip_checker(zipcode):
    """
    arg zipcode (str or int): ZIP code with 4 or 5 characters/digits.

    Returns:
    - str: 5-digit ZIP (pads leading zero if input has 4 digits)
    - 'Invalid ZIP Code.' if not valid
    """
    # NOTE: This function expects zipcode to behave like a string (supports len() and indexing).
    if len(zipcode) == 5 and zipcode[:2] != "00":
        return zipcode
    elif len(zipcode) == 4 and zipcode[0] != "0":
        return "0" + zipcode
    else:
        return "Invalid ZIP Code."

### Test `zip_checker`

The following test cases match the prompt examples.

In [7]:
print(zip_checker('02806'))     # Should return 02806.
print(zip_checker('2806'))      # Should return 02806.
print(zip_checker('0280'))      # Should return 'Invalid ZIP Code.'
print(zip_checker('00280'))     # Should return 'Invalid ZIP Code.'

02806
02806
Invalid ZIP Code.
Invalid ZIP Code.


## Store ID extraction and URL validation (`url_checker`)

This utility checks two rules:
1. Protocol must be `https:`
2. Store ID (last path segment) must be exactly 7 characters

If something is invalid, it prints the required message(s).
If everything is valid, it returns the store ID.


In [8]:
url = "https://exampleURL1.com/r626c36"

# 1) Extract the final 7 characters
id = url[-7:]

# 2) Print the extracted store ID
print(id)

r626c36


## URL validation (`url_checker`)
Rules implemented:
- Only `https:` is considered a valid protocol
- A valid store ID must be exactly 7 characters
- Depending on what’s invalid, print the required message(s)
- If both are valid, return the store ID

Notes on approach:
- `partition(':')` is used to pull out the protocol cleanly
- `rstrip('/')` removes trailing slashes
- `split('/')[-1]` gets everything after the last `/`

In [1]:
# Sample valid URL for reference while writing your function:
url = 'https://exampleURL1.com/r626c36'

### YOUR CODE HERE ###
def url_checker(url):
    '''
    Checks whether a URL has a valid protocol and a valid 7-character store ID.

    Args:
        url (str): A URL string that ends with a store ID.

    Rules:
        - The only valid protocol is 'https:'.
        - A valid store ID must have exactly 7 characters long (taken from the end of the URL).

    Behavior / Returns:
        - If BOTH the protocol and store ID are invalid:
            Prints two lines:
                '{protocol} is an invalid protocol.'
                '{store_id} is an invalid store ID.'
        - If ONLY the protocol is invalid:
            Prints:
                '{protocol} is an invalid protocol.'
        - If ONLY the store ID is invalid:
            Prints:
                '{store_id} is an invalid store ID.'
        - If BOTH are valid:
            Returns the store ID (str).

    Notes:
        - {protocol} is the protocol portion of the URL (e.g., 'http:', 'https:', 'ftps:').
        - {store_id} is the store ID extracted from the end of the URL.
        '''
    # Finds the protocol by partitioning the URL on the first ':'
    before_sep, sep, after_sep = url.partition(':')
    protocol = before_sep + sep          # e.g., "https:" / "http:" / "ftps:"
    
    # Remove trailing slash(es) so the final split doesn't return an empty string
    url = url.rstrip('/')
    
    # Store ID is everything after the last '/'
    store_id = url.split('/')[-1]

    # Validate protocol and store ID length and print the required message(s)
    if protocol != 'https:' and len(store_id) != 7:
        print(f'{protocol} is an invalid protocol.',
            f'\n{store_id} is an invalid store ID.')
    
    elif protocol != 'https:':
        print(f'{protocol} is an invalid protocol.')
    
    elif len(store_id) != 7:
        print(f'{store_id} is an invalid store ID.')

    else:
        return str(store_id)


### Test `url_checker`

These tests match the prompt examples.  
Note: The final test uses `print(...)` to display the returned store ID.

In [2]:
# RUN THIS CELL TO TEST YOUR FUNCTION            # Should return:
url_checker('http://exampleURL1.com/r626c3')    # 'http: is an invalid protocol.'
print()                                         # 'r626c3 is an invalid store ID.'

url_checker('ftps://exampleURL1.com/r626c36')   # 'ftps: is an invalid protocol.'
print()

url_checker('https://exampleURL1.com/r626c3')   # 'r626c3 is an invalid store ID.'
print()

print(url_checker('https://exampleURL1.com/r626c36'))  # 'r626c36'

http: is an invalid protocol. 
r626c3 is an invalid store ID.

ftps: is an invalid protocol.

r626c3 is an invalid store ID.

r626c36


# 3) EPA Micro-Dataset Structuring (Lists, Tuples, Unpacking)

This section models a tiny “real-world” dataset using basic Python data structures.

**Why this matters in data work**
- Lists are great for storing ordered columns of values.
- Tuples are useful for “row-like” records that shouldn’t be changed accidentally.
- Converting tuples → lists is a common move when you *do* need to edit records.
- Unpacking makes filtering and transformations easier to read.


In [20]:
state_names = ["Arizona", "California", "California", "Kentucky", "Louisiana"]
county_names = ["Maricopa", "Alameda", "Sacramento", "Jefferson", "East Baton Rouge"]

print("state_names:", state_names)
print("county_names:", county_names)


state_names: ['Arizona', 'California', 'California', 'Kentucky', 'Louisiana']
county_names: ['Maricopa', 'Alameda', 'Sacramento', 'Jefferson', 'East Baton Rouge']


## Dataset: State + County pairs

I’m using a small set of locations (state, county) that reported air quality index readings.
The order of the lists matters because each position represents a matched pair.


In [22]:
# Two aligned lists: index i in each list refers to the same location.
state_names = ["Arizona", "California", "California", "Kentucky", "Louisiana"]
county_names = ["Maricopa", "Alameda", "Sacramento", "Jefferson", "East Baton Rouge"]

# Quick sanity check: both lists should be the same length
print("Number of states :", len(state_names))
print("Number of counties:", len(county_names))

print("\nstate_names :", state_names)
print("county_names:", county_names)


Number of states : 5
Number of counties: 5

state_names : ['Arizona', 'California', 'California', 'Kentucky', 'Louisiana']
county_names: ['Maricopa', 'Alameda', 'Sacramento', 'Jefferson', 'East Baton Rouge']


## Building row-like records with tuples (loop approach)

Here I create a list of tuples where each tuple is a **record** in the form:

`(state, county)`

This is a lightweight way to represent rows before moving to a DataFrame later.


In [23]:
state_county_tuples = []  # will hold records like ('California', 'Alameda')

# Use indexing to pair the corresponding state and county at each position
for i in range(len(state_names)):
    record = (state_names[i], county_names[i])  # tuple = immutable record
    state_county_tuples.append(record)

print("state_county_tuples:")
print(state_county_tuples)


state_county_tuples:
[('Arizona', 'Maricopa'), ('California', 'Alameda'), ('California', 'Sacramento'), ('Kentucky', 'Jefferson'), ('Louisiana', 'East Baton Rouge')]


## Building the same records with `zip()` (cleaner / more Pythonic)

`zip()` pairs elements from multiple iterables by position.

So `zip(state_names, county_names)` produces:
- `(state_names[0], county_names[0])`
- `(state_names[1], county_names[1])`
- ...


In [24]:
# zip() returns an iterator, so wrap it with list(...) to store the results
state_county_zipped = list(zip(state_names, county_names))

print("state_county_zipped:")
print(state_county_zipped)

# Confirm it matches the loop-built version
print("\nMatches loop version?", state_county_zipped == state_county_tuples)


state_county_zipped:
[('Arizona', 'Maricopa'), ('California', 'Alameda'), ('California', 'Sacramento'), ('Kentucky', 'Jefferson'), ('Louisiana', 'East Baton Rouge')]

Matches loop version? True


## Converting tuple-records into list-records (for editability)

Tuples are immutable (cannot be changed in-place).  
If I expect that a record might need updates later (data cleaning, corrections, etc.),
I can convert each tuple to a list:

`('California', 'Alameda') → ['California', 'Alameda']`


In [25]:
# Convert each tuple record into a list record
state_county_lists = [list(pair) for pair in state_county_tuples]

print("state_county_lists:")
print(state_county_lists)

# Example: demonstrate mutability (optional)
# state_county_lists[0][1] = "NEW COUNTY NAME"
# print(state_county_lists[0])


state_county_lists:
[['Arizona', 'Maricopa'], ['California', 'Alameda'], ['California', 'Sacramento'], ['Kentucky', 'Jefferson'], ['Louisiana', 'East Baton Rouge']]


## Filtering records using tuple unpacking (loop)

Unpacking allows clean, readable iteration like:

`for state, county in state_county_tuples:`

Here I filter for **California** and collect only the county names.
Expected result:
`['Alameda', 'Sacramento']`


In [26]:
ca_counties = []

# Unpack each (state, county) tuple into two variables
for state, county in state_county_tuples:
    if state == "California":
        ca_counties.append(county)

print("California counties (loop + unpacking):", ca_counties)


California counties (loop + unpacking): ['Alameda', 'Sacramento']


## Filtering with a list comprehension (compact)

Same filter as above, but in one line:

`[county for state, county in state_county_tuples if state == "California"]`


In [27]:
ca_counties = [county for state, county in state_county_tuples if state == "California"]

print("California counties (list comprehension):", ca_counties)


California counties (list comprehension): ['Alameda', 'Sacramento']



# Conclusion

This notebook captures a set of small Python components that mirror the kind of “building block” work that shows up in real analytics and data cleaning tasks.

## What I built and why it matters

### 1) Iteration for automation (while loops)
- Used counters and milestone checks to model repeated processes (like tracking purchases).
- Built a countdown timer that changes messaging as time runs out, which is a common pattern for monitoring, alerts, and user-facing prompts.

### 2) Lightweight analytics utilities (for loops + nested loops)
- Converted raw 1–10 feedback scores into meaningful categories (negative/neutral/positive), a practical step toward reporting and decision-making.
- Validated IDs by cross-checking against a trusted list, which reflects real data integrity checks.
- Aggregated nested purchase data to find customers over a spend threshold, demonstrating how nested loops support “per-customer” totals.

### 3) String validation and parsing
- Treated IDs and ZIP codes as strings to avoid formatting loss (like leading zeros).
- Implemented clear rule-based validators (`zip_checker`, `url_checker`) with predictable outputs and helpful messages.
- Used slicing, splitting, and `partition()` to extract structured values from messy text inputs.

### 4) Basic data structuring with lists and tuples
- Represented a small dataset using aligned lists and converted it into row-like tuple records.
- Converted tuples into lists when editability was needed.
- Filtered records using tuple unpacking and list comprehensions for readability and conciseness.

## Takeaway

Across all sections, the focus was on writing code that is:
- **readable** (clear structure + comments),
- **reliable** (defensive handling of inputs),
- **testable** (repeatable examples),
- and **transferable** (patterns that scale into pandas, ETL pipelines, and reporting workflows).

Overall, this project reinforced how Python’s string tools and basic data structures support common real-world data preparation tasks—ensuring inputs are valid, outputs are consistent, and data is organized for analysis.