# Assignment 2: Building a Smart Data Aggregator

This notebook implements various Python functions for handling user data, transaction data, sets, and dictionaries as required in the assignment. Each part will be broken down step by step.
Starting it!

## Part 1: User Data Processing with Lists

In this section, we will process a list of user data (as tuples), where each tuple contains the user_id, user_name, age, and country. We will:

1. Filter out users older than 30 from 'USA' and 'Canada'.
2. Extract their names into a new list.
3. Sort the list by age and return the top 10 oldest users.
4. Check for duplicate names in the list.


In [2]:
# Example data: A list of user tuples
users = [
    (1, 'Alice', 25, 'USA'),
    (2, 'Bob', 32, 'Canada'),
    (3, 'Charlie', 35, 'USA'),
    (4, 'David', 28, 'Mexico'),
    (5, 'Eve', 40, 'USA'),
    (6, 'Frank', 33, 'Canada'),
    (7, 'Grace', 29, 'Germany'),
    (8, 'Hannah', 31, 'Canada'),
    (9, 'Ivy', 45, 'USA'),
    (10, 'Jack', 22, 'Canada'),
    (11, 'Kim', 30, 'Australia'),
    (12, 'Leo', 27, 'USA'),
    (13, 'Mona', 36, 'USA'),
    (14, 'Nate', 34, 'Canada'),
    (15, 'Oscar', 29, 'UK'),
    (16, 'Paula', 38, 'USA'),
    (17, 'Quinn', 31, 'Canada'),
    (18, 'Rachel', 41, 'USA'),
    (19, 'Sam', 42, 'USA'),
    (20, 'Tina', 39, 'Canada')
]

# 1. Filter out users older than 30 from 'USA' and 'Canada' and extract their names
def filter_users(users):
    filtered_users = [user[1] for user in users if user[2] > 30 and user[3] in ['USA', 'Canada']]
    return filtered_users

# 2. Sort by age and return the top 10 oldest users
def top_10_oldest(users):
    sorted_users = sorted(users, key=lambda x: x[2], reverse=True)
    return sorted_users[:10]

# 3. Check for duplicate names
def find_duplicate_names(users):
    names = [user[1] for user in users]
    duplicates = [name for name in set(names) if names.count(name) > 1]
    return duplicates

# Testing the functions
print("Filtered users:", filter_users(users))
print("Top 10 oldest users:", top_10_oldest(users))
print("Duplicate names:", find_duplicate_names(users))


Filtered users: ['Bob', 'Charlie', 'Eve', 'Frank', 'Hannah', 'Ivy', 'Mona', 'Nate', 'Paula', 'Quinn', 'Rachel', 'Sam', 'Tina']
Top 10 oldest users: [(9, 'Ivy', 45, 'USA'), (19, 'Sam', 42, 'USA'), (18, 'Rachel', 41, 'USA'), (5, 'Eve', 40, 'USA'), (20, 'Tina', 39, 'Canada'), (16, 'Paula', 38, 'USA'), (13, 'Mona', 36, 'USA'), (3, 'Charlie', 35, 'USA'), (14, 'Nate', 34, 'Canada'), (6, 'Frank', 33, 'Canada')]
Duplicate names: []


## Part 2: Immutable Data Management with Tuples

We will handle transaction data stored in tuples. Each transaction contains transaction_id, user_id, amount, and timestamp. The tasks are:

1. Count the number of unique users.
2. Find the transaction with the highest amount.
3. Separate the transaction_ids and user_ids into two lists.


In [3]:
# Expanded example transaction data for Part 2
transactions = [
    (101, 1, 100.50, '2024-10-12'),
    (102, 2, 200.75, '2024-10-13'),
    (103, 1, 150.00, '2024-10-14'),
    (104, 3, 250.00, '2024-10-15'),
    (105, 4, 300.00, '2024-10-16'),
    (106, 2, 50.00, '2024-10-17'),
    (107, 5, 120.00, '2024-10-18'),
    (108, 6, 500.00, '2024-10-19'),
    (109, 3, 275.00, '2024-10-20'),
    (110, 7, 320.00, '2024-10-21'),
    (111, 8, 400.00, '2024-10-22'),
    (112, 9, 220.00, '2024-10-23'),
    (113, 4, 180.00, '2024-10-24'),
    (114, 5, 50.00, '2024-10-25')
]


# 1. Count unique users
def count_unique_users(transactions):
    unique_users = set(transaction[1] for transaction in transactions)
    return len(unique_users)

# 2. Find the transaction with the highest amount
def highest_transaction(transactions):
    return max(transactions, key=lambda x: x[2])

# 3. Separate transaction_ids and user_ids
def split_transactions(transactions):
    transaction_ids = [transaction[0] for transaction in transactions]
    user_ids = [transaction[1] for transaction in transactions]
    return transaction_ids, user_ids

# Testing the functions
print("Unique users:", count_unique_users(transactions))
print("Highest transaction:", highest_transaction(transactions))
print("Transaction IDs and User IDs:", split_transactions(transactions))


Unique users: 9
Highest transaction: (108, 6, 500.0, '2024-10-19')
Transaction IDs and User IDs: ([101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114], [1, 2, 1, 3, 4, 2, 5, 6, 3, 7, 8, 9, 4, 5])


## Part 3: Unique Data Handling with Sets

This part deals with managing unique sets of user IDs who visited different pages. We will:

1. Find users who visited both Page A and Page B.
2. Find users who visited either Page A or Page C but not both.
3. Update the set for Page A with new user IDs.
4. Remove a list of user IDs from Page B.


In [4]:
# Expanded example sets for Part 3
page_a = {1, 2, 3, 4, 8, 9, 10, 11}
page_b = {3, 4, 5, 6, 9, 12, 13}
page_c = {1, 7, 8, 14, 15}


# 1. Users who visited both Page A and Page B
def visited_both(page_a, page_b):
    return page_a & page_b

# 2. Users who visited either Page A or Page C but not both
def visited_either_but_not_both(page_a, page_c):
    return page_a ^ page_c

# 3. Update Page A with new user IDs
def update_page_a(page_a, new_users):
    page_a.update(new_users)
    return page_a

# 4. Remove users from Page B
def remove_from_page_b(page_b, remove_users):
    page_b.difference_update(remove_users)
    return page_b

# Testing the functions
new_users = {8, 9}
users_to_remove = {3}

print("Users who visited both Page A and Page B:", visited_both(page_a, page_b))
print("Users who visited either Page A or Page C but not both:", visited_either_but_not_both(page_a, page_c))
print("Updated Page A:", update_page_a(page_a, new_users))
print("Updated Page B:", remove_from_page_b(page_b, users_to_remove))


Users who visited both Page A and Page B: {9, 3, 4}
Users who visited either Page A or Page C but not both: {2, 3, 4, 7, 9, 10, 11, 14, 15}
Updated Page A: {1, 2, 3, 4, 8, 9, 10, 11}
Updated Page B: {4, 5, 6, 9, 12, 13}


## Part 4: Data Aggregation with Dictionaries

In this part, we will handle feedback data stored in dictionaries. The tasks are:

1. Filter users with a rating of 4 or higher.
2. Sort the dictionary by rating and return the top 5 users.
3. Combine feedback from multiple dictionaries.
4. Use dictionary comprehension to filter users with a rating greater than 3.


In [5]:
# Expanded example dictionary of feedback for Part 4
feedback = {
    1: {'rating': 5, 'comments': 'Great!'},
    2: {'rating': 3, 'comments': 'Okay.'},
    3: {'rating': 4, 'comments': 'Good.'},
    4: {'rating': 2, 'comments': 'Bad.'},
    5: {'rating': 4, 'comments': 'Nice.'},
    6: {'rating': 5, 'comments': 'Excellent!'},
    7: {'rating': 3, 'comments': 'Could be better.'},
    8: {'rating': 4, 'comments': 'Pretty good!'},
    9: {'rating': 1, 'comments': 'Terrible.'},
    10: {'rating': 2, 'comments': 'Not happy.'},
    11: {'rating': 5, 'comments': 'Perfect!'},
    12: {'rating': 4, 'comments': 'Very good.'}
}


# 1. Filter users with a rating of 4 or higher
def filter_high_ratings(feedback):
    return {user_id: info['rating'] for user_id, info in feedback.items() if info['rating'] >= 4}

# 2. Sort feedback by rating
def top_5_feedback(feedback):
    sorted_feedback = sorted(feedback.items(), key=lambda x: x[1]['rating'], reverse=True)
    return sorted_feedback[:5]

# 3. Combine feedback from multiple dictionaries
def combine_feedback(dict1, dict2):
    combined = dict1.copy()
    for user_id, info in dict2.items():
        if user_id in combined:
            combined[user_id]['rating'] = max(combined[user_id]['rating'], info['rating'])
            combined[user_id]['comments'] += " | " + info['comments']
        else:
            combined[user_id] = info
    return combined

# 4. Dictionary comprehension for rating > 3
def filter_rating_above_3(feedback):
    return {user_id: info['rating'] for user_id, info in feedback.items() if info['rating'] > 3}

# Testing the functions
print("Users with rating 4 or higher:", filter_high_ratings(feedback))
print("Top 5 feedback:", top_5_feedback(feedback))

feedback2 = {
    1: {'rating': 4, 'comments': 'Good job!'},
    6: {'rating': 3, 'comments': 'Could be better.'}
}
print("Combined feedback:", combine_feedback(feedback, feedback2))
print("Users with rating above 3:", filter_rating_above_3(feedback))


Users with rating 4 or higher: {1: 5, 3: 4, 5: 4, 6: 5, 8: 4, 11: 5, 12: 4}
Top 5 feedback: [(1, {'rating': 5, 'comments': 'Great!'}), (6, {'rating': 5, 'comments': 'Excellent!'}), (11, {'rating': 5, 'comments': 'Perfect!'}), (3, {'rating': 4, 'comments': 'Good.'}), (5, {'rating': 4, 'comments': 'Nice.'})]
Combined feedback: {1: {'rating': 5, 'comments': 'Great! | Good job!'}, 2: {'rating': 3, 'comments': 'Okay.'}, 3: {'rating': 4, 'comments': 'Good.'}, 4: {'rating': 2, 'comments': 'Bad.'}, 5: {'rating': 4, 'comments': 'Nice.'}, 6: {'rating': 5, 'comments': 'Excellent! | Could be better.'}, 7: {'rating': 3, 'comments': 'Could be better.'}, 8: {'rating': 4, 'comments': 'Pretty good!'}, 9: {'rating': 1, 'comments': 'Terrible.'}, 10: {'rating': 2, 'comments': 'Not happy.'}, 11: {'rating': 5, 'comments': 'Perfect!'}, 12: {'rating': 4, 'comments': 'Very good.'}}
Users with rating above 3: {1: 5, 3: 4, 5: 4, 6: 5, 8: 4, 11: 5, 12: 4}


## Conclusion

In this notebook, we successfully implemented Python functions to process lists, tuples, sets, and dictionaries. Each function was designed with efficiency and simplicity in mind, helping to build the core of a Smart Data Aggregator tool.
