## Challenge Problem 1 (Moderate): Log File Analyzer

Problem:
You're given a log file where each line contains a timestamp, log level, and message in this format:

- 2024-10-15 14:23:01 ERROR Database connection failed
- 2024-10-15 14:23:45 INFO User logged in successfully
- 2024-10-15 14:24:12 WARNING Memory usage at 85%
- 2024-10-15 14:25:03 ERROR File not found: data.csv
- 2024-10-15 14:26:17 INFO Processing completed

Write a function analyze_logs(filepath) that:

- Reads the log file
- Returns a dictionary with log level statistics:
    - Count of each log level
    - List of all unique messages for each log level
    - Most recent timestamp for each log level

In [None]:
# Example Output:

{
    'ERROR': {
        'count': 2,
        'messages': ['Database connection failed', 'File not found: data.csv'],
        'last_seen': '2024-10-15 14:25:03'
    },
    'INFO': {
        'count': 2,
        'messages': ['User logged in successfully', 'Processing completed'],
        'last_seen': '2024-10-15 14:26:17'
    },
    'WARNING': {
        'count': 1,
        'messages': ['Memory usage at 85%'],
        'last_seen': '2024-10-15 14:24:12'
    }
}

### Requirements:

- Use with statement for file handling
- Specify UTF-8 encoding
- Handle empty files gracefully (return empty dict)
- Messages should be stored without duplicates (if same message appears twice, only store once)
- Timestamps should be stored as strings (no datetime parsing needed)

Test File Creation:
Create a test file called test_logs.txt with the sample data above to test your solution.

### How to verify:

- Check that counts are correct for each level
- Verify messages list has no duplicates
- Confirm last_seen shows the most recent timestamp for each level
- Test with an empty file

### Input file has 10 entries total:

- 4 ERRORs (but only 3 unique messages)
- 4 INFOs (but only 3 unique messages)
- 2 WARNINGs (2 unique messages)

#### Solution:

In [27]:
def analyze_logs(file):

    log_stats = {}

    # Get each line from the file
    # Get a count of each occurence of the keys in each line


    with open(file, encoding="utf-8") as f:
        lines = [x.split() for x in f]

        for line in lines:
            timestamp = line[0] + ' ' + line[1]
            key = line[2]
            message = " ".join(line[3:])

            # Create and increment the count for each error type
            log_stats.setdefault(key, {}).setdefault("count", 0)
            log_stats[key]["count"] += 1

            # Create messages list and append for each new unique message
            log_stats.setdefault(key, {}).setdefault("messages", [])

            if message not in log_stats[key]["messages"]:
                log_stats[key]["messages"].append(message)

            # Create timestamp and replace each time a new one is seen
            log_stats.setdefault(key, {}).setdefault("last_seen", '')
            log_stats[key]["last_seen"] = timestamp

            # For testing:
            # print(line)
            # print(timestamp)
            # print(key)
            # print(message)  

    return log_stats

In [29]:
analyze_logs("test_logs.txt")

{'ERROR': {'count': 4,
  'messages': ['Database connection failed',
   'File not found: data.csv',
   'Network timeout'],
  'last_seen': '2024-10-15 14:30:12'},
 'INFO': {'count': 4,
  'messages': ['User logged in successfully',
   'Processing completed',
   'Backup started'],
  'last_seen': '2024-10-15 14:31:01'},
  'messages': ['Memory usage at 85%', 'Disk space low'],
  'last_seen': '2024-10-15 14:29:45'}}

In [31]:
analyze_logs("empty_file.txt")

{}

## Challenge Problem 2 (Hard): Data Pipeline Processor
Problem:
You're building a data processing pipeline that reads multiple CSV-like files, transforms the data, and generates a summary report.

You have transaction data files where each line contains: user_id,product,amount,date

Example file content (transactions_2024_q1.txt):

- 101,laptop,1200.50,2024-01-15
- 102,mouse,25.99,2024-01-16
- 101,keyboard,89.99,2024-01-20
- 103,monitor,350.00,2024-02-01
- 102,laptop,1200.50,2024-02-15
- 101,mouse,25.99,2024-03-10

Write a program with these functions:

1. parse_transaction_file(filepath)

- Reads the file and returns a list of dictionaries
- Each dict should have keys: user_id, product, amount, date
- Convert user_id to int and amount to float
- Handle empty files (return empty list)


2. group_by_user(transactions)

- Takes the list of transaction dicts
- Returns a nested dictionary structure:

In [None]:
output = {
            user_id: {
                'total_spent': float,
                'products': [list of unique products],
                'transaction_count': int,
                'transactions': [list of original transaction dicts for this user]
            }
        }

3. generate_report(user_data, output_filepath)
   - Takes the user data dictionary from group_by_user()
   - Writes a formatted report to a file
   - Report format:

In [None]:
    #  USER SPENDING REPORT
    #  ====================
    # 
    #  User ID: 101
    #  Total Spent: $1316.48
    #  Number of Transactions: 3
    #  Products Purchased: keyboard, laptop, mouse
    # 
    #  User ID: 102
    #  Total Spent: $1226.49
    #  Number of Transactions: 2
    #  Products Purchased: laptop, mouse
    # 
    #  [etc...]
    # 
    #  ====================
    #  SUMMARY STATISTICS
    #  ====================
    #  Total Users: 3
    #  Total Revenue: $2892.97
    #  Average Spent per User: $964.32

### Requirements:

- Use with statements and UTF-8 encoding for all file operations
- Products list should be sorted alphabetically in the report
- Users should be sorted by user_id in the report
- Round monetary values to 2 decimal places in the report
- Handle edge cases: empty files, single user, etc.

### Test Files: 
Test input files provided for you. Your program should be able to process one or more transaction files.

### Expected flow:

In [None]:
# Parse the data
transactions = parse_transaction_file("transactions_2024_q1.txt")

# Group by user
user_data = group_by_user(transactions)

# Generate report
generate_report(user_data, "spending_report.txt")

### How to verify:

- Check that user totals are calculated correctly
- Verify unique products are captured (no duplicates)
- Confirm report formatting is readable
- Test with empty file
- Test with single transaction

#### Solution:

In [24]:
def parse_transaction_file(file):

    transactions = []
    keys = ["user_id", "product", "amount", "date"]

    with open(file, encoding="utf-8") as f:
        lines = [x.rstrip().split(",") for x in f]

        for values in lines:

            trans_dict = {}

            #For testing
            #print(keys)
            #print(values)

            for key, value in zip(keys,values):

                if(key == "user_id"):
                    trans_dict[key] = int(value)
                elif(key == "amount"):
                    trans_dict[key] = float(value)
                else:
                    trans_dict[key] = value

            transactions.append(trans_dict)

    return transactions


In [29]:
transactions = parse_transaction_file("transactions_q1.txt")
transactions

[{'user_id': 101, 'product': 'laptop', 'amount': 1200.5, 'date': '2024-01-15'},
 {'user_id': 102, 'product': 'mouse', 'amount': 25.99, 'date': '2024-01-16'},
 {'user_id': 101,
  'product': 'keyboard',
  'amount': 89.99,
  'date': '2024-01-20'},
 {'user_id': 103, 'product': 'monitor', 'amount': 350.0, 'date': '2024-02-01'},
 {'user_id': 102, 'product': 'laptop', 'amount': 1200.5, 'date': '2024-02-15'},
 {'user_id': 101, 'product': 'mouse', 'amount': 25.99, 'date': '2024-03-10'},
 {'user_id': 104,
  'product': 'keyboard',
  'amount': 89.99,
  'date': '2024-03-15'},
 {'user_id': 103, 'product': 'mouse', 'amount': 25.99, 'date': '2024-03-20'},
 {'user_id': 104, 'product': 'monitor', 'amount': 350.0, 'date': '2024-03-25'}]

In [95]:
def group_by_user(transactions):

    grouped_users = {}

    for transaction in transactions:

        user_id = transaction["user_id"]
        product = transaction["product"]
        amount = transaction["amount"]

        grouped_users.setdefault(user_id,{"total_spent": 0, "products": [], "transaction_count": 0, "transactions": []})
        grouped_users[user_id]["total_spent"] += amount
        if product not in grouped_users[user_id]["products"]:
            grouped_users[user_id]["products"].append(product)
        grouped_users[user_id]["transaction_count"] += 1
        grouped_users[user_id]["transactions"].append(transaction)

    

    return grouped_users

In [96]:
grouped_transactions = group_by_user(transactions)
grouped_transactions

{101: {'total_spent': 1316.48,
  'products': ['laptop', 'keyboard', 'mouse'],
  'transaction_count': 3,
  'transactions': [{'user_id': 101,
    'product': 'laptop',
    'amount': 1200.5,
    'date': '2024-01-15'},
   {'user_id': 101,
    'product': 'keyboard',
    'amount': 89.99,
    'date': '2024-01-20'},
   {'user_id': 101,
    'product': 'mouse',
    'amount': 25.99,
    'date': '2024-03-10'}]},
 102: {'total_spent': 1226.49,
  'products': ['mouse', 'laptop'],
  'transaction_count': 2,
  'transactions': [{'user_id': 102,
    'product': 'mouse',
    'amount': 25.99,
    'date': '2024-01-16'},
   {'user_id': 102,
    'product': 'laptop',
    'amount': 1200.5,
    'date': '2024-02-15'}]},
 103: {'total_spent': 375.99,
  'products': ['monitor', 'mouse'],
  'transaction_count': 2,
  'transactions': [{'user_id': 103,
    'product': 'monitor',
    'amount': 350.0,
    'date': '2024-02-01'},
   {'user_id': 103,
    'product': 'mouse',
    'amount': 25.99,
    'date': '2024-03-20'}]},
 104:

In [97]:
def generate_report(user_data, output_filepath):
    
    total_users = 0
    total_revenue = 0.0

    with open(output_filepath, encoding="utf-8", mode="w") as output:
        output.write("USER SPENDING REPORT\n====================\n\n")

        for user_id, user_values in sorted(user_data.items()):
            total_users += 1
            total_revenue += user_values['total_spent']
            products = ", ".join(sorted(user_values["products"]))

            output.write(f"User ID: {user_id}\n")
            output.write(f"Total Spent: ${user_values["total_spent"]:.2f}\n")
            output.write(f"Number of Transactions: {user_values["transaction_count"]}\n")
            output.write(f"Products Purchased: {products}\n\n")

        output.write("====================\nSUMMARY STATISTICS\n====================\n")
        output.write(f"Total Users: {total_users}\n")
        output.write(f"Total Revenue: ${total_revenue:.2f}\n")
        if(total_users != 0):
            output.write(f"Average Spent per User: ${total_revenue/total_users:.2f}\n")
        else:
            output.write(f"Average Spent per User: $0.00\n")


In [98]:
generate_report(grouped_transactions, "spending_report.txt")

In [99]:
empty_transactions = parse_transaction_file("transactions_empty.txt")
print(empty_transactions)
empty_grouped_users = group_by_user(empty_transactions)
generate_report(empty_grouped_users, "empty_report.txt")

[]


In [100]:
single_transaction = parse_transaction_file("transactions_single.txt")
single_grouped_user = group_by_user(single_transaction)
generate_report(single_grouped_user, "single_transaction_report.txt")