# Building a Smart Data Aggregator

### Scenario:
*You have been hired by a startup working on a Smart Data Aggregator tool. The goal of the tool is to manage and analyze large sets of user data efficiently. The startup focuses on different types of collections (List, Tuple, Set, and Dictionary) to handle various tasks such as real-time analytics, tracking data, and reporting. Your task is to develop different modules for the aggregator using Python.*

## Part 1: User Data Processing with Lists
*You are provided with user information in the form of a list of tuples. Each tuple represents a user with the format: (user_id, user_name, age, country). The list can contain more than 100 records, and you are required to:*

**Write a function to:**
* Filter out users older than 30 from specific countries (‘USA’, ‘Canada’).
* Extract their names into a new list.

In [14]:
def filter_names(lst):
    filtered = []
    for i in lst:
        if i[2]>30 and i[3] in ('USA','Canada'):
            filtered.append(i[1])
    return filtered

#Sample Run
users = [
  (1, 'Alice', 25, 'USA'),
  (2, 'Bob', 35, 'Canada'),
  (3, 'Charlie', 28, 'UK'),
  (4, 'David', 40, 'USA'),
  (5, 'Eve', 32, 'Canada'),
  (6, 'Frank', 20, 'France'),
]

filtered_names = filter_names(users)
print(filtered_names)

['Bob', 'David', 'Eve']


**Implement a function that:**
* Sorts the original list of tuples by age and returns the top 10 oldest users.
* checks if there are any users with duplicate names in the list. If duplicates are found, output those names.

In [36]:
def top_10(lst):
    sorts = sorted(lst, key = lambda lst: lst[2],reverse=True)
    tops = sorts[:10]
    dicts = dict()
    for i in tops:
        if i[1] not in dicts.keys():
            dicts[i[1]] = 1
        else:
            dicts[i[1]] += 1
    for key,value in dicts.items():
        if value>1:
            print(key)
    return tops

#Sample Run
users = [
    (1, 'Alice', 25),
    (2, 'Bob', 35),
    (3, 'Charlie', 28),
    (4, 'David', 40),
    (5, 'Eve', 32),
    (6, 'Frank', 20),
    (7, 'Alice', 27),
    (8, 'Bob', 25),
    (9, 'Chris', 65),
    (10, 'Black', 34),
    (11, 'Harry', 51)
]
print(top_10(users))

Bob
Alice
[(9, 'Chris', 65), (11, 'Harry', 51), (4, 'David', 40), (2, 'Bob', 35), (10, 'Black', 34), (5, 'Eve', 32), (3, 'Charlie', 28), (7, 'Alice', 27), (1, 'Alice', 25), (8, 'Bob', 25)]


## Part 2: Immutable Data Management with Tuples
*You need to handle transaction data from the aggregator's analytics module. This data is stored in the form of tuples since it needs to remain immutable. Each transaction is represented as a tuple: (transaction_id, user_id, amount, timestamp).*

**Write a function that:**
* Takes alist of transactions (tuples) and finds the total number of unique users involved in transactions.
* Ensures the integrity of the tuples by avoiding any changes to the original data.

In [16]:
def unique_users(lst):
    unq = set()
    for user in lst:
        unq.add(user[1])
    return len(unq)

#Sample Run
sample_data = [
    (384494, 944, 85.0356541519755, "2024-10-08 19:46:47.874666"),
    (232811, 48, 74.17326988286018, "2024-10-13 22:32:48.874796"),
    (592939, 932, 96.71273372699045, "2024-10-14 10:38:44.874811"),
    (992793, 48, 46.478930026829595, "2024-10-08 18:41:43.874831"),
    (523355, 131, 87.19392716304333, "2024-10-14 03:28:45.874845"),
    (118269, 240, 43.5524580205134, "2024-10-11 20:22:07.874860"),
    (646962, 372, 64.85933082188453, "2024-10-10 10:44:29.874875"),
    (687738, 33, 85.23815373846004, "2024-10-11 07:37:49.874889"),
    (910311, 189, 97.67079804570515, "2024-10-10 02:43:19.874903"),
    (581775, 932, 87.12950665048676, "2024-10-11 07:17:44.874946")
]
print(unique_users(sample_data))

8


 **Implement a function that:**
* Identifies and returns the transaction with the highest amount without altering the list of tuples.
* receives a list of tuples and returns two separate lists: one containing all the transaction_ids and the other containing all user_ids. What challenges might arise if the tuple size is inconsistent?


In [17]:
def highest_amount(lst):
    mx = 0
    usr = None
    for user  in lst:
        if  user[2] > mx:
            mx = user[2]
            usr = user
    return usr

#Sample Run
sample_data = [
    (384494, 944, 85.0356541519755, "2024-10-08 19:46:47.874666"),
    (232811, 48, 74.17326988286018, "2024-10-13 22:32:48.874796"),
    (592939, 932, 96.71273372699045, "2024-10-14 10:38:44.874811"),
    (992793, 48, 46.478930026829595, "2024-10-08 18:41:43.874831"),
    (523355, 131, 87.19392716304333, "2024-10-14 03:28:45.874845"),
    (118269, 240, 43.5524580205134, "2024-10-11 20:22:07.874860"),
    (646962, 372, 64.85933082188453, "2024-10-10 10:44:29.874875"),
    (687738, 33, 85.23815373846004, "2024-10-11 07:37:49.874889"),
    (910311, 189, 97.67079804570515, "2024-10-10 02:43:19.874903"),
    (581775, 932, 87.12950665048676, "2024-10-11 07:17:44.874946")
]

print(highest_amount(sample_data))

(910311, 189, 97.67079804570515, '2024-10-10 02:43:19.874903')


In [18]:
def separate(lst):
    trans_id = []
    user_id = []
    for user in lst:
        trans_id.append(user[0])
        user_id.append(user[1])
    return trans_id, user_id

#Sample Run
sample_data = [
    (384494, 944, 85.0356541519755, "2024-10-08 19:46:47.874666"),
    (232811, 48, 74.17326988286018, "2024-10-13 22:32:48.874796"),
    (592939, 932, 96.71273372699045, "2024-10-14 10:38:44.874811"),
    (992793, 48, 46.478930026829595, "2024-10-08 18:41:43.874831"),
    (523355, 131, 87.19392716304333, "2024-10-14 03:28:45.874845"),
    (118269, 240, 43.5524580205134, "2024-10-11 20:22:07.874860"),
    (646962, 372, 64.85933082188453, "2024-10-10 10:44:29.874875"),
    (687738, 33, 85.23815373846004, "2024-10-11 07:37:49.874889"),
    (910311, 189, 97.67079804570515, "2024-10-10 02:43:19.874903"),
    (581775, 932, 87.12950665048676, "2024-10-11 07:17:44.874946")
]

print(separate(sample_data))

([384494, 232811, 592939, 992793, 523355, 118269, 646962, 687738, 910311, 581775], [944, 48, 932, 48, 131, 240, 372, 33, 189, 932])


## Part 3: Unique Data Handling with Sets
*The Smart Data Aggregator also manages sets of unique user IDs who visited certain pages. You have three sets, each representing user IDs of visitors to pages A, B, and C.*

**Write a function that:**
* Finds the users who visited both Page A and Page B.
* Finds users who visited either Page A or Page C, but not both.

In [19]:
def visit_a_and_b(A,B):
    return A & B

#Sample Run
A = {1,2,3,4,5,6,7,8,9,10}
B = {1,3,5,6,7,8,11,12,45}

print(visit_a_and_b(A,B))

{1, 3, 5, 6, 7, 8}


In [20]:
def visit_a_or_c(A,C):
    return A ^ C

#Sample Run
A = {1,2,3,4,5,6,7,8,9,10}
C = {21,22,1,6,7,25,15,16}

print(visit_a_or_c(A,C))

{2, 3, 4, 5, 8, 9, 10, 15, 16, 21, 22, 25}


**Implement a function that:**
* Updates the set for Page A with new user IDs.
* Removes a list of user IDs from the set for Page B.

In [21]:
def update(A,lst):
    for i in lst:
        A.add(i)
    return A

#Sample Run
A = {1,2,3,4,5,6,7,8,9,10}
l = [1,11,21,31,41,51]

print(update(A,l))

{1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 41, 51, 21, 31}


In [38]:
def remove(B,lst):
    for user in lst:
        if user in B:
            B.remove(user)
    return B

#Sample Run
B = {1,3,5,6,7,8,11,12,45}
l = [1,11,21,31,41,51]

print(remove(B,l))

{3, 5, 6, 7, 8, 12, 45}


## Part 4: Data Aggregation with Dictionaries
*The aggregator collects user feedback stored in a dictionary. The dictionary uses the user_id as keys, and the values are nested dictionaries with feedback details: {'rating': int, 'comments': str}.*

**Write a function that:**
* Filters out users who rated 4 or higher and stores their user_id and rating in a new dictionary.
* sort the dictionary of user feedback by rating in descending order and return the top 5 users.

In [23]:
def feedback(dicts):
    fourplus = dict()
    for key,value in dicts.items():
        if value['rating']>=4:
            fourplus[key] = value['rating']
    return fourplus

#Sample Run
user_feedback = {
    1: {'rating': 5, 'comments': 'Excellent product!'},
    2: {'rating': 3, 'comments': 'Could be better.'},
    3: {'rating': 4, 'comments': 'Good overall, but some minor issues.'},
    4: {'rating': 2, 'comments': 'Not satisfied with the purchase.'},
    5: {'rating': 5, 'comments': 'Highly recommend this product!'},
    6: {'rating': 1, 'comments': 'Terrible product, avoid it.'},
    7: {'rating': 4, 'comments': 'Great value for the price.'},
    8: {'rating': 3, 'comments': 'Could use some improvements.'},
    9: {'rating': 5, 'comments': 'Best product I\'ve ever bought!'},
    10: {'rating': 2, 'comments': 'Not worth the money.'}
}

print(feedback(user_feedback))

{1: 5, 3: 4, 5: 5, 7: 4, 9: 5}


In [24]:
def top5(dicts):
    tops  = dict()

    sorts = sorted(dicts,key=lambda x:dicts[x]['rating'],reverse=True)
    for  i in range(5):
        tops[sorts[i]] = dicts[sorts[i]]['rating']
    return tops

#Sample Run
user_feedback = {
    1: {'rating': 5, 'comments': 'Excellent product!'},
    2: {'rating': 3, 'comments': 'Could be better.'},
    3: {'rating': 4, 'comments': 'Good overall, but some minor issues.'},
    4: {'rating': 2, 'comments': 'Not satisfied with the purchase.'},
    5: {'rating': 5, 'comments': 'Highly recommend this product!'},
    6: {'rating': 1, 'comments': 'Terrible product, avoid it.'},
    7: {'rating': 4, 'comments': 'Great value for the price.'},
    8: {'rating': 3, 'comments': 'Could use some improvements.'},
    9: {'rating': 5, 'comments': 'Best product I\'ve ever bought!'},
    10: {'rating': 2, 'comments': 'Not worth the money.'}
}

print(top5(user_feedback))

{1: 5, 5: 5, 9: 5, 3: 4, 7: 4}


**Implement a function that:**
* Combines feedback from multiple dictionaries. If a user is present in more than one dictionary, update their rating to the highest one and append their comments.
* Use dictionary comprehension to create a dictionary of user_id and rating for all users whose rating is greater than 3.

In [25]:
def combine_feedback(dicts1, dicts2):
    combined_fb = {}
    keys1 = list(dicts1.keys())
    keys2 = list(dicts2.keys())
    all_keys = set(keys1+keys2)
    for usr in all_keys:
        if usr in keys1:
            rat1 = dicts1[usr]['rating']
            com1 = dicts1[usr]['comments']
        else:
            rat1 = 0
            com1 = ''
        if usr in keys2:
            rat2 = dicts2[usr]['rating']
            com2 = dicts2[usr]['comments']
        else:
            rat2 = 0
            com2 = ''
        combined_rat = max(rat1, rat2)
        combined_coms = com1 + com2
        combined_fb[usr] = {'rating': combined_rat, 'comments': combined_coms}
    return combined_fb

#Sample Run
user_feedback_1 = {
    1: {'rating': 5, 'comments': 'Excellent product!'},
    2: {'rating': 3, 'comments': 'Could be better.'},
    3: {'rating': 4, 'comments': 'Good overall, but some minor issues.'},
    4: {'rating': 2, 'comments': 'Not satisfied with the purchase.'},
    5: {'rating': 5, 'comments': 'Highly recommend this product!'},
    6: {'rating': 1, 'comments': 'Terrible product, avoid it.'},
    7: {'rating': 4, 'comments': 'Great value for the price.'},
    8: {'rating': 3, 'comments': 'Could use some improvements.'},
    9: {'rating': 5, 'comments': 'Best product I\'ve ever bought!'},
    10: {'rating': 2, 'comments': 'Not worth the money.'}}
user_feedback_2 = {
    1: {'rating': 4, 'comments': 'Great product, but a bit expensive.'},
    11: {'rating': 3, 'comments': 'Could be better, but overall satisfied.'},
    13: {'rating': 5, 'comments': 'Excellent product!'},
    3: {'rating': 2, 'comments': 'Not satisfied with the purchase.'},
    4: {'rating': 4, 'comments': 'Good overall, but some minor issues.'},
}

print(combine_feedback(user_feedback_1,user_feedback_2))

{1: {'rating': 5, 'comments': 'Excellent product!Great product, but a bit expensive.'}, 2: {'rating': 3, 'comments': 'Could be better.'}, 3: {'rating': 4, 'comments': 'Good overall, but some minor issues.Not satisfied with the purchase.'}, 4: {'rating': 4, 'comments': 'Not satisfied with the purchase.Good overall, but some minor issues.'}, 5: {'rating': 5, 'comments': 'Highly recommend this product!'}, 6: {'rating': 1, 'comments': 'Terrible product, avoid it.'}, 7: {'rating': 4, 'comments': 'Great value for the price.'}, 8: {'rating': 3, 'comments': 'Could use some improvements.'}, 9: {'rating': 5, 'comments': "Best product I've ever bought!"}, 10: {'rating': 2, 'comments': 'Not worth the money.'}, 11: {'rating': 3, 'comments': 'Could be better, but overall satisfied.'}, 13: {'rating': 5, 'comments': 'Excellent product!'}}


In [26]:
def high_rating(dicts):
    return {user_id: user_feedback['rating'] for user_id, user_feedback in dicts.items() if user_feedback['rating']>3}

#Sample Run
user_feedback = {
    1: {'rating': 5, 'comments': 'Excellent product!'},
    2: {'rating': 3, 'comments': 'Could be better.'},
    3: {'rating': 4, 'comments': 'Good overall, but some minor issues.'},
    4: {'rating': 2, 'comments': 'Not satisfied with the purchase.'},
    5: {'rating': 5, 'comments': 'Highly recommend this product!'},
    6: {'rating': 1, 'comments': 'Terrible product, avoid it.'},
    7: {'rating': 4, 'comments': 'Great value for the price.'},
    8: {'rating': 3, 'comments': 'Could use some improvements.'},
    9: {'rating': 5, 'comments': 'Best product I\'ve ever bought!'},
    10: {'rating': 2, 'comments': 'Not worth the money.'}}

print(high_rating(user_feedback))

{1: 5, 3: 4, 5: 5, 7: 4, 9: 5}



### Credits

**Name:** Muhammad Rehan

**Roll no.** BSDSF22A001

FCIT - University of the Punjab
