**Question 1 — Frequency Map (Easy)**

Problem Statement
You are given a list of event names representing user actions in an application.
Your task is to build a frequency map that counts how many times each event occurs.

This simulates counting event types in application logs before aggregation.

Input Format

events = [
    "login",
    "view",
    "login",
    "purchase",
    "view",
    "login"
]


Output Format
Return a dictionary where:

key = event name

value = number of occurrences

{
    "login": 3,
    "view": 2,
    "purchase": 1
}


Constraints

1 <= len(events) <= 10^5

Event names are non-empty strings

Case-sensitive ("Login" ≠ "login")

Do not use collections.Counter for this question

In [0]:
events = [
    "login",
    "view",
    "login",
    "purchase",
    "view",
    "login"
]

freq = {}

for e in events:
    freq[e] = freq.get(e,0) + 1
freq

**Question 2 — Group By List (Easy)**

Problem Statement
You are given a list of user activity records.
Each record contains a user_id and an activity_name.

Your task is to group activities by user, producing a dictionary where each user maps to the list of activities they performed in input order.

This simulates grouping raw clickstream logs before aggregation.

Input Format

activities = [
    (1, "login"),
    (2, "view"),
    (1, "purchase"),
    (1, "logout"),
    (2, "purchase")
]


Output Format
Return a dictionary:

{
    1: ["login", "purchase", "logout"],
    2: ["view", "purchase"]
}


Constraints

1 <= len(activities) <= 10^5

user_id is an integer

Maintain original order of activities per user

Do not sort

Use basic dictionary logic (no pandas)

In [0]:
activities = [
    (1, "login"),
    (2, "view"),
    (1, "purchase"),
    (1, "logout"),
    (2, "purchase")
]

from collections import defaultdict
agg = defaultdict(list)

for i,j in activities:
    agg[i].append(j)
agg

**Question 3 — Aggregation Map (Easy)**

Problem Statement
You are given a list of transaction records.
Each record contains a store_id and an amount.

Your task is to compute the total sales per store using an aggregation map.

This mirrors a basic ETL aggregation step before reporting.

Input Format

transactions = [
    (1, 100.0),
    (2, 50.0),
    (1, 25.0),
    (2, 75.0),
    (3, 40.0)
]


Output Format
Return a dictionary where:

key = store_id

value = total sales amount

{
    1: 125.0,
    2: 125.0,
    3: 40.0
}


Constraints

1 <= len(transactions) <= 10^5

store_id is an integer

amount is a positive float

Do not use pandas

One pass solution expected

In [0]:
transactions = [
    (1, 100.0),
    (2, 50.0),
    (1, 25.0),
    (2, 75.0),
    (3, 40.0)
]

agg = {}

for k,v in transactions:
    agg[k] = agg.get(k,0) + v
agg

**Question 4 — Inner Join (Easy)**

Problem Statement
You are given two datasets:

users: basic user information

orders: purchase records

Your task is to perform an inner join on user_id, returning only users who have at least one order.

This simulates a hash join between dimension and fact tables.

Input Format

users = [
    (1, "Alice"),
    (2, "Bob"),
    (3, "Charlie")
]

orders = [
    (1, "2025-01-01"),
    (1, "2025-01-05"),
    (3, "2025-01-03")
]


Output Format
Return a list of tuples:

[
    (1, "Alice", "2025-01-01"),
    (1, "Alice", "2025-01-05"),
    (3, "Charlie", "2025-01-03")
]


Constraints

Preserve order of orders

Use a hash map (dictionary) for the join

Time complexity should be O(n + m)

No nested loops over full datasets

In [0]:
users = [
    (1, "Alice"),
    (2, "Bob"),
    (3, "Charlie")
]

orders = [
    (1, "2025-01-01"),
    (1, "2025-01-05"),
    (3, "2025-01-03")
]

user_map = {d:k for d,k in users}

res = []
for i,o in orders:
    res.append(
        (i,user_map.get(i,None),o)
    )
res


**Question 5 — Left Join (Easy)**

Problem Statement
You are given two datasets:

users: master user table

logins: login activity table

Your task is to perform a left join on user_id.

For every user, attach their most recent login date.
If a user has never logged in, the login date should be None.

This mirrors a common dimension enrichment step in ETL pipelines.

Input Format

users = [
    (1, "Alice"),
    (2, "Bob"),
    (3, "Charlie"),
    (4, "Diana")
]

logins = [
    (1, "2025-01-01"),
    (1, "2025-01-05"),
    (3, "2025-01-03")
]


Output Format
Return a list of tuples:

[
    (1, "Alice", "2025-01-05"),
    (2, "Bob", None),
    (3, "Charlie", "2025-01-03"),
    (4, "Diana", None)
]


Constraints

Preserve order of users

Use a dictionary to pre-aggregate login dates

Dates are ISO strings (YYYY-MM-DD)

No sorting required

Expected time complexity: O(n + m)

In [0]:
users = [
    (1, "Alice"),
    (2, "Bob"),
    (3, "Charlie"),
    (4, "Diana")
]

logins = [
    (1, "2025-01-01"),
    (1, "2025-01-05"),
    (3, "2025-01-03")
]

lookup = {}
for i,d in logins:
    lookup[i] = max(d,lookup.get(i,d))

res = []

for i,n in users:
    res.append(
        (i,n,lookup.get(i,None))
    )
res

**Question 6 — Semi Join (Easy)**

Problem Statement
You are given two datasets:

users: list of all users

orders: list of users who placed at least one order

Your task is to perform a semi join:

Return only users who have at least one order

Do not include order details

This is commonly used for existence filtering in SQL (WHERE EXISTS).

Input Format

users = [
    (1, "Alice"),
    (2, "Bob"),
    (3, "Charlie"),
    (4, "Diana")
]

orders = [
    (1, "o101"),
    (1, "o102"),
    (3, "o201")
]


Output Format

[
    (1, "Alice"),
    (3, "Charlie")
]


Constraints

Preserve order of users

Use a hash-based lookup

Do not duplicate users

Time complexity: O(n + m)

In [0]:
users = [
    (1, "Alice"),
    (2, "Bob"),
    (3, "Charlie"),
    (4, "Diana")
]

orders = [
    (1, "o101"),
    (1, "o102"),
    (3, "o201")
]

index = {d[0] for d in orders}
semi = [ (i,n) for i,n in users if i in index]
semi

**Question 7 — Anti Join (Easy)**

Problem Statement
You are given two datasets:

users: list of all users

orders: list of users who placed orders

Your task is to perform an anti join:

Return only users who have never placed an order

This is the logical inverse of a semi join and maps directly to
WHERE NOT EXISTS in SQL.

Input Format

users = [
    (1, "Alice"),
    (2, "Bob"),
    (3, "Charlie"),
    (4, "Diana")
]

orders = [
    (1, "o101"),
    (1, "o102"),
    (3, "o201")
]


Output Format

[
    (2, "Bob"),
    (4, "Diana")
]


Constraints

Preserve order of users

Use hash lookup

No nested loops

Time complexity: O(n + m)

In [0]:
users = [
    (1, "Alice"),
    (2, "Bob"),
    (3, "Charlie"),
    (4, "Diana")
]

orders = [
    (1, "o101"),
    (1, "o102"),
    (3, "o201")
]

index = [d[0] for d in orders]
anti = [(i,n) for i,n in users if i not in index]
anti