# Mini Project: String Validation & Parsing in Python

This notebook demonstrates practical string handling tasks commonly used in data work:
- Converting data types safely
- Validating and formatting ZIP codes
- Extracting IDs from URLs using slicing and splitting
- Validating URL protocol + store ID rules with clear user feedback

**Key skills shown:** strings, slicing, splitting, conditionals, functions, docstrings, defensive coding, test cases

## Task 1 — Check and change data types

Store IDs (and ZIP codes) often look numeric, but should be treated as **strings** to avoid losing formatting
(e.g., leading zeros) and to enable slicing and string validation.

In [5]:
store_id = 1101

# Convert to string
store_id = str(store_id)

# Confirm the type
print("store_id:", store_id)
print("type(store_id):", type(store_id))

store_id: 1101
type(store_id): <class 'str'>


## Task 2 — ZIP code validation and formatting (`zip_checker`)

Rules implemented:
- If ZIP is 5 characters and does **not** start with `"00"`, return it
- If ZIP is 4 characters and does **not** start with `"0"`, pad a leading zero and return it
- Otherwise return `"Invalid ZIP Code."`

In [6]:
def zip_checker(zipcode):
    """
    arg zipcode (str or int): ZIP code with 4 or 5 characters/digits.

    Returns:
    - str: 5-digit ZIP (pads leading zero if input has 4 digits)
    - 'Invalid ZIP Code.' if not valid
    """
    # NOTE: This function expects zipcode to behave like a string (supports len() and indexing).
    if len(zipcode) == 5 and zipcode[:2] != "00":
        return zipcode
    elif len(zipcode) == 4 and zipcode[0] != "0":
        return "0" + zipcode
    else:
        return "Invalid ZIP Code."

### Test `zip_checker`

The following test cases match the prompt examples.

In [7]:
print(zip_checker('02806'))     # Should return 02806.
print(zip_checker('2806'))      # Should return 02806.
print(zip_checker('0280'))      # Should return 'Invalid ZIP Code.'
print(zip_checker('00280'))     # Should return 'Invalid ZIP Code.'

02806
02806
Invalid ZIP Code.
Invalid ZIP Code.


## Task 3 — Extract the store ID from the end of a URL

Here I use **negative slicing** (`[-7:]`) to extract the final 7 characters from the URL.

In [8]:
url = "https://exampleURL1.com/r626c36"

# 1) Extract the final 7 characters
id = url[-7:]

# 2) Print the extracted store ID
print(id)

r626c36


## Task 4 — Validate URL protocol + store ID (`url_checker`)

Rules implemented:
- Only `https:` is considered a valid protocol
- A valid store ID must be exactly 7 characters
- Depending on what’s invalid, print the required message(s)
- If both are valid, return the store ID

Notes on approach:
- `partition(':')` is used to pull out the protocol cleanly
- `rstrip('/')` removes trailing slashes
- `split('/')[-1]` gets everything after the last `/`

In [1]:
# Sample valid URL for reference while writing your function:
url = 'https://exampleURL1.com/r626c36'

### YOUR CODE HERE ###
def url_checker(url):
    '''
    Checks whether a URL has a valid protocol and a valid 7-character store ID.

    Args:
        url (str): A URL string that ends with a store ID.

    Rules:
        - The only valid protocol is 'https:'.
        - A valid store ID must have exactly 7 characters long (taken from the end of the URL).

    Behavior / Returns:
        - If BOTH the protocol and store ID are invalid:
            Prints two lines:
                '{protocol} is an invalid protocol.'
                '{store_id} is an invalid store ID.'
        - If ONLY the protocol is invalid:
            Prints:
                '{protocol} is an invalid protocol.'
        - If ONLY the store ID is invalid:
            Prints:
                '{store_id} is an invalid store ID.'
        - If BOTH are valid:
            Returns the store ID (str).

    Notes:
        - {protocol} is the protocol portion of the URL (e.g., 'http:', 'https:', 'ftps:').
        - {store_id} is the store ID extracted from the end of the URL.
        '''
    # Finds the protocol by partitioning the URL on the first ':'
    before_sep, sep, after_sep = url.partition(':')
    protocol = before_sep + sep          # e.g., "https:" / "http:" / "ftps:"
    
    # Remove trailing slash(es) so the final split doesn't return an empty string
    url = url.rstrip('/')
    
    # Store ID is everything after the last '/'
    store_id = url.split('/')[-1]

    # Validate protocol and store ID length and print the required message(s)
    if protocol != 'https:' and len(store_id) != 7:
        print(f'{protocol} is an invalid protocol.',
            f'\n{store_id} is an invalid store ID.')
    
    elif protocol != 'https:':
        print(f'{protocol} is an invalid protocol.')
    
    elif len(store_id) != 7:
        print(f'{store_id} is an invalid store ID.')

    else:
        return str(store_id)


### Test `url_checker`

These tests match the prompt examples.  
Note: The final test uses `print(...)` to display the returned store ID.

In [2]:
# RUN THIS CELL TO TEST YOUR FUNCTION            # Should return:
url_checker('http://exampleURL1.com/r626c3')    # 'http: is an invalid protocol.'
print()                                         # 'r626c3 is an invalid store ID.'

url_checker('ftps://exampleURL1.com/r626c36')   # 'ftps: is an invalid protocol.'
print()

url_checker('https://exampleURL1.com/r626c3')   # 'r626c3 is an invalid store ID.'
print()

print(url_checker('https://exampleURL1.com/r626c36'))  # 'r626c36'

http: is an invalid protocol. 
r626c3 is an invalid store ID.

ftps: is an invalid protocol.

r626c3 is an invalid store ID.

r626c36


## Conclusions

- Practiced key string operations: slicing, splitting, partitioning, and formatting output.
- Implemented basic validation logic using `if/elif/else`.
- Added a small robustness improvement by removing trailing slashes before extracting the store ID.