**1. Problem Statement**

You are given a list of user events. Each event contains a user_id and an event_type.
Group all event types by user_id.

This simulates a common ETL step where you collect raw events per entity before downstream aggregation.

**Input Format**

events = [
    ("u1", "login"),
    ("u2", "view"),
    ("u1", "purchase"),
    ("u2", "logout"),
    ("u1", "logout")
]


**Output Format**

{
    "u1": ["login", "purchase", "logout"],
    "u2": ["view", "logout"]
}


**Constraints**

Use Python

Preserve input order within each group

Do not use pandas

Time complexity should be O(n)

This is an in-memory grouping (not streaming yet)

In [0]:
events = [ ("u1", "login"), ("u2", "view"), ("u1", "purchase"), ("u2", "logout"), ("u1", "logout") ]

from collections import defaultdict

user_events = defaultdict(list)
for u, e in events:
    user_events[u].append(e)

user_events

**2. Problem Statement**
You are given a list of (department, employee_name) tuples.
Group all employees by department.

This mirrors a common HR / dimension-table preparation step in data pipelines.

**Input Format**

records = [
    ("engineering", "alice"),
    ("sales", "bob"),
    ("engineering", "carol"),
    ("hr", "dave"),
    ("sales", "eve")
]

**Output Format**

{
    "engineering": ["alice", "carol"],
    "sales": ["bob", "eve"],
    "hr": ["dave"]
}

**Constraints**

Preserve input order

Use core Python only

No Counter, no pandas

Expected time complexity: O(n)

In [0]:
records = [ ("engineering", "alice"), ("sales", "bob"), ("engineering", "carol"), ("hr", "dave"), ("sales", "eve") ]

from collections import defaultdict

dept_emp = defaultdict(list)

for dept,emp in records:
    dept_emp[dept].append(emp)

dept_emp

**3. Problem Statement**

You have a log of user activities on a platform. Each record contains a user_id, an activity_type, and a timestamp. You need to group activities by user_id and return a dictionary where each user_id maps to a list of their activities sorted by timestamp.

Additionally: Only include users who have performed more than one activity.

**Input Format**

A list of dictionaries:

logs = [
    {"user_id": 101, "activity_type": "login", "timestamp": "2026-01-10T08:00:00"},
    {"user_id": 102, "activity_type": "click", "timestamp": "2026-01-10T08:05:00"},
    {"user_id": 101, "activity_type": "logout", "timestamp": "2026-01-10T08:30:00"},
    {"user_id": 103, "activity_type": "login", "timestamp": "2026-01-10T09:00:00"},
    {"user_id": 102, "activity_type": "logout", "timestamp": "2026-01-10T09:05:00"},
]

**Output Format**

A dictionary mapping user_id â†’ list of activities (sorted by timestamp):

{
    101: ["login", "logout"],
    102: ["click", "logout"]
}


Note: User 103 is excluded because they only have one activity.

**Constraints**

Timestamps are ISO 8601 strings and can be compared lexicographically.

Input list can contain up to 10^6 records.

Output must preserve chronological order of activities for each user.

Aim for O(n log k) complexity, where k is the number of activities per user, since sorting per user is required.

In [0]:
logs = [
    {"user_id": 101, "activity_type": "login", "timestamp": "2026-01-10T08:00:00"},
    {"user_id": 102, "activity_type": "click", "timestamp": "2026-01-10T08:05:00"},
    {"user_id": 101, "activity_type": "logout", "timestamp": "2026-01-10T08:30:00"},
    {"user_id": 103, "activity_type": "login", "timestamp": "2026-01-10T09:00:00"},
    {"user_id": 102, "activity_type": "logout", "timestamp": "2026-01-10T09:05:00"},
]

from collections import defaultdict

user_activities = defaultdict(list)

for record in logs:
    user_id = record["user_id"]
    timestamp = record["timestamp"]
    activity_type = record["activity_type"]
    user_activities[user_id].append((timestamp, activity_type))

for user_id,activities in user_activities.items():
    activities.sort()
    user_activities[user_id] = [act for ts,act in activities]

result = {user_id: acts for user_id, acts in user_activities.items() if len(acts) > 1}
result