# Assignment 2: Building a Smart Data Aggregator

This notebook implements various Python functions for handling user data, transaction data, sets, and dictionaries as required in the assignment. Each part will be broken down step by step.
Starting it!

## Part 1: User Data Processing with Lists

In this section, we will process a list of user data (as tuples), where each tuple contains the user_id, user_name, age, and country. We will:

1. Filter out users older than 30 from 'USA' and 'Canada'.
2. Extract their names into a new list.
3. Sort the list by age and return the top 10 oldest users.
4. Check for duplicate names in the list.


In [10]:
users = [
    (1, 'Alice', 25, 'USA'),
    (2, 'Bob', 32, 'Canada'),
    (3, 'Charlie', 35, 'USA'),
    (4, 'David', 28, 'Mexico'),
    (5, 'Eve', 40, 'USA'),
    (6, 'Frank', 33, 'Canada'),
    (7, 'Grace', 29, 'Germany'),
    (8, 'Hannah', 31, 'Canada'),
    (9, 'Ivy', 45, 'USA'),
    (10, 'Jack', 22, 'Canada'),
    (11, 'Kim', 30, 'Australia'),
    (12, 'Leo', 27, 'USA'),
    (13, 'Mona', 36, 'USA'),
    (14, 'Nate', 34, 'Canada'),
    (15, 'Oscar', 29, 'UK'),
    (16, 'Paula', 38, 'USA'),
    (17, 'Quinn', 31, 'Canada'),
    (18, 'Rachel', 41, 'USA'),
    (19, 'Sam', 42, 'USA'),
    (20, 'Tina', 39, 'Canada')
]

def filter_users(users):
    filtered_users = []
    for user in users:
        user_id, user_name, age, country = user
        
        if age > 30 and country in ['USA', 'Canada']:
            filtered_users.append(user_name)  
    return filtered_users

def top_10_oldest(users):
    sorted_users = sorted(users, key=lambda user: user[2], reverse=True)
    
    top_10 = []
    for i in range(min(10, len(sorted_users))):
        top_10.append(sorted_users[i])
    return top_10

def find_duplicate_names(users):
    names = []
    duplicates = []
    
    for user in users:
        user_name = user[1]
        if user_name in names:
            duplicates.append(user_name)
        else:
            names.append(user_name)
    
    return duplicates

filtered_users = filter_users(users)
print("Filtered users:", filtered_users)

top_10_oldest_users = top_10_oldest(users)
print("Top 10 oldest users:", top_10_oldest_users)

duplicate_names = find_duplicate_names(users)
print("Duplicate names:", duplicate_names)


Filtered users: ['Bob', 'Charlie', 'Eve', 'Frank', 'Hannah', 'Ivy', 'Mona', 'Nate', 'Paula', 'Quinn', 'Rachel', 'Sam', 'Tina']
Top 10 oldest users: [(9, 'Ivy', 45, 'USA'), (19, 'Sam', 42, 'USA'), (18, 'Rachel', 41, 'USA'), (5, 'Eve', 40, 'USA'), (20, 'Tina', 39, 'Canada'), (16, 'Paula', 38, 'USA'), (13, 'Mona', 36, 'USA'), (3, 'Charlie', 35, 'USA'), (14, 'Nate', 34, 'Canada'), (6, 'Frank', 33, 'Canada')]
Duplicate names: []


## Part 2: Immutable Data Management with Tuples

We will handle transaction data stored in tuples. Each transaction contains transaction_id, user_id, amount, and timestamp. The tasks are:

1. Count the number of unique users.
2. Find the transaction with the highest amount.
3. Separate the transaction_ids and user_ids into two lists.


In [11]:
transactions = [
    (101, 1, 100.50, '2024-10-12'),
    (102, 2, 200.75, '2024-10-13'),
    (103, 1, 150.00, '2024-10-14'),
    (104, 3, 250.00, '2024-10-15'),
    (105, 4, 300.00, '2024-10-16'),
    (106, 2, 50.00, '2024-10-17'),
    (107, 5, 120.00, '2024-10-18'),
    (108, 6, 500.00, '2024-10-19'),
    (109, 3, 275.00, '2024-10-20'),
    (110, 7, 320.00, '2024-10-21'),
    (111, 8, 400.00, '2024-10-22'),
    (112, 9, 220.00, '2024-10-23'),
    (113, 4, 180.00, '2024-10-24'),
    (114, 5, 50.00, '2024-10-25')
]

def count_unique_users(transactions):
    user_ids = []
    
    for transaction in transactions:
        user_id = transaction[1]
        
        if user_id not in user_ids:
            user_ids.append(user_id)
    
    return len(user_ids)

def highest_transaction(transactions):
    highest = transactions[0]
    
    for transaction in transactions:
        if transaction[2] > highest[2]:  
            highest = transaction  
    
    return highest

def split_transactions(transactions):
    transaction_ids = []
    user_ids = []
    
    for transaction in transactions:
        transaction_id = transaction[0]
        user_id = transaction[1]
        
        transaction_ids.append(transaction_id)
        user_ids.append(user_id)
    
    return transaction_ids, user_ids

unique_user_count = count_unique_users(transactions)
print("Unique users:", unique_user_count)

highest_txn = highest_transaction(transactions)
print("Highest transaction:", highest_txn)

transaction_ids, user_ids = split_transactions(transactions)
print("Transaction IDs:", transaction_ids)
print("User IDs:", user_ids)


Unique users: 9
Highest transaction: (108, 6, 500.0, '2024-10-19')
Transaction IDs: [101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114]
User IDs: [1, 2, 1, 3, 4, 2, 5, 6, 3, 7, 8, 9, 4, 5]


## Part 3: Unique Data Handling with Sets

This part deals with managing unique sets of user IDs who visited different pages. We will:

1. Find users who visited both Page A and Page B.
2. Find users who visited either Page A or Page C but not both.
3. Update the set for Page A with new user IDs.
4. Remove a list of user IDs from Page B.


In [12]:
page_a = {1, 2, 3, 4, 8, 9, 10, 11}
page_b = {3, 4, 5, 6, 9, 12, 13}
page_c = {1, 7, 8, 14, 15}

def visited_both(page_a, page_b):
    common_users = set()
    
    for user in page_a:
        if user in page_b:
            common_users.add(user)
    
    return common_users

def visited_either_but_not_both(page_a, page_c):
    either_but_not_both = set()
    
    for user in page_a:
        if user not in page_c:
            either_but_not_both.add(user)
    
    for user in page_c:
        if user not in page_a:
            either_but_not_both.add(user)
    
    return either_but_not_both

def update_page_a(page_a, new_users):
    for user in new_users:
        page_a.add(user)  
    return page_a

def remove_from_page_b(page_b, remove_users):
    for user in remove_users:
        if user in page_b:
            page_b.remove(user)  
    return page_b

common_users = visited_both(page_a, page_b)
print("Users who visited both Page A and Page B:", common_users)

either_or_users = visited_either_but_not_both(page_a, page_c)
print("Users who visited either Page A or Page C but not both:", either_or_users)

updated_page_a = update_page_a(page_a, {16, 17})
print("Updated Page A:", updated_page_a)

updated_page_b = remove_from_page_b(page_b, {3, 9})
print("Updated Page B:", updated_page_b)


Users who visited both Page A and Page B: {9, 3, 4}
Users who visited either Page A or Page C but not both: {2, 3, 4, 7, 9, 10, 11, 14, 15}
Updated Page A: {1, 2, 3, 4, 8, 9, 10, 11, 16, 17}
Updated Page B: {4, 5, 6, 12, 13}


## Part 4: Data Aggregation with Dictionaries

In this part, we will handle feedback data stored in dictionaries. The tasks are:

1. Filter users with a rating of 4 or higher.
2. Sort the dictionary by rating and return the top 5 users.
3. Combine feedback from multiple dictionaries.
4. Use dictionary comprehension to filter users with a rating greater than 3.


In [13]:
feedback = {
    1: {'rating': 5, 'comments': 'Great!'},
    2: {'rating': 3, 'comments': 'Okay.'},
    3: {'rating': 4, 'comments': 'Good.'},
    4: {'rating': 2, 'comments': 'Bad.'},
    5: {'rating': 4, 'comments': 'Nice.'},
    6: {'rating': 5, 'comments': 'Excellent!'},
    7: {'rating': 3, 'comments': 'Could be better.'},
    8: {'rating': 4, 'comments': 'Pretty good!'},
    9: {'rating': 1, 'comments': 'Terrible.'},
    10: {'rating': 2, 'comments': 'Not happy.'},
    11: {'rating': 5, 'comments': 'Perfect!'},
    12: {'rating': 4, 'comments': 'Very good.'}
}

def filter_high_ratings(feedback):
    high_rating_users = {}
    
    for user_id, info in feedback.items():
        if info['rating'] >= 4:
            high_rating_users[user_id] = info['rating']
    
    return high_rating_users

def top_5_feedback(feedback):
    sorted_feedback = sorted(feedback.items(), key=lambda item: item[1]['rating'], reverse=True)
    
    top_5 = []
    for i in range(min(5, len(sorted_feedback))):
        top_5.append(sorted_feedback[i])
    
    return top_5

def combine_feedback(dict1, dict2):
    combined_feedback = dict1.copy()
    
    for user_id, info in dict2.items():
        if user_id in combined_feedback:
            combined_feedback[user_id]['rating'] = max(combined_feedback[user_id]['rating'], info['rating'])
            combined_feedback[user_id]['comments'] += " | " + info['comments']
        else:
            combined_feedback[user_id] = info
    
    return combined_feedback

def filter_rating_above_3(feedback):
    rating_above_3 = {}
    
    for user_id, info in feedback.items():
        if info['rating'] > 3:
            rating_above_3[user_id] = info['rating']
    
    return rating_above_3

high_ratings = filter_high_ratings(feedback)
print("Users with rating 4 or higher:", high_ratings)

top_5_users = top_5_feedback(feedback)
print("Top 5 feedback:", top_5_users)

feedback2 = {
    1: {'rating': 4, 'comments': 'Good job!'},
    6: {'rating': 3, 'comments': 'Could be better.'}
}
combined = combine_feedback(feedback, feedback2)
print("Combined feedback:", combined)

above_3 = filter_rating_above_3(feedback)
print("Users with rating above 3:", above_3)


Users with rating 4 or higher: {1: 5, 3: 4, 5: 4, 6: 5, 8: 4, 11: 5, 12: 4}
Top 5 feedback: [(1, {'rating': 5, 'comments': 'Great!'}), (6, {'rating': 5, 'comments': 'Excellent!'}), (11, {'rating': 5, 'comments': 'Perfect!'}), (3, {'rating': 4, 'comments': 'Good.'}), (5, {'rating': 4, 'comments': 'Nice.'})]
Combined feedback: {1: {'rating': 5, 'comments': 'Great! | Good job!'}, 2: {'rating': 3, 'comments': 'Okay.'}, 3: {'rating': 4, 'comments': 'Good.'}, 4: {'rating': 2, 'comments': 'Bad.'}, 5: {'rating': 4, 'comments': 'Nice.'}, 6: {'rating': 5, 'comments': 'Excellent! | Could be better.'}, 7: {'rating': 3, 'comments': 'Could be better.'}, 8: {'rating': 4, 'comments': 'Pretty good!'}, 9: {'rating': 1, 'comments': 'Terrible.'}, 10: {'rating': 2, 'comments': 'Not happy.'}, 11: {'rating': 5, 'comments': 'Perfect!'}, 12: {'rating': 4, 'comments': 'Very good.'}}
Users with rating above 3: {1: 5, 3: 4, 5: 4, 6: 5, 8: 4, 11: 5, 12: 4}


## Conclusion

In this notebook, I successfully implemented various Python functions to handle user data, transaction data, sets, and dictionaries. Each function was designed to perform specific tasks efficiently, including filtering, sorting, and managing collections. By breaking down each task into clear steps, this notebook serves as a useful guide for anyone starting with Python.

You can find the complete code and project files in my GitHub repository: [Data Aggregator Tool in Python](https://github.com/shoaib1522/Data-Aggregator-Tool-In-Python).

Feel free to explore the code, run the functions, and enhance them further for different use cases!
