üü¢ EASY ‚Äî Nested Dictionary + Simple Pivot
Problem Statement

You are processing daily event metrics from an application.

Each record contains:

(date, country, metric, value)


Your task is to build a nested dictionary where:

Top level = date

Second level = country

Values = a pivoted dictionary of metrics ‚Üí aggregated value

If the same (date, country, metric) appears multiple times, sum the values.

Input Format
events = [
    ("2025-01-01", "US", "clicks", 3),
    ("2025-01-01", "US", "views", 10),
    ("2025-01-01", "IN", "clicks", 5),
    ("2025-01-01", "US", "clicks", 2),
    ("2025-01-02", "US", "views", 7),
]

Output Format
{
  "2025-01-01": {
      "US": {
          "clicks": 5,
          "views": 10
      },
      "IN": {
          "clicks": 5
      }
  },
  "2025-01-02": {
      "US": {
          "views": 7
      }
  }
}

Constraints

Assume input fits in memory

Dates and countries are strings

Metrics are dynamic (not fixed keys)

Use Python dictionaries (no pandas)

In [0]:
events = [
    ("2025-01-01", "US", "clicks", 3),
    ("2025-01-01", "US", "views", 10),
    ("2025-01-01", "IN", "clicks", 5),
    ("2025-01-01", "US", "clicks", 2),
    ("2025-01-02", "US", "views", 7),
]

In [0]:
from collections import defaultdict

result = defaultdict(lambda:defaultdict(lambda:defaultdict(int)))

for date, country, metric, value in events:
    result[date][country][metric] += value

final_result = { date: 
        {country: { metric:value for metric, value in metrics.items()} for country, metrics in countries.items() }
                for date, countries in result.items()}
final_result

üü° MEDIUM ‚Äî Multi-Level Nested Dictionary + Pivot with Defaults
Problem Statement

You are processing user activity metrics generated by a product.

Each record contains:

(date, user_id, device, metric, value)


Your task is to build a 3-level nested dictionary with a pivot such that:

date ‚Üí user_id ‚Üí device ‚Üí { metric: aggregated_value }


If the same (date, user_id, device, metric) appears multiple times, sum the values.

‚ö†Ô∏è Additional requirement (this is what makes it Medium):

Every (date, user_id, device) must contain all metrics seen globally

Missing metrics must be filled with 0

Input Format
events = [
    ("2025-01-01", "u1", "mobile", "clicks", 3),
    ("2025-01-01", "u1", "mobile", "views", 10),
    ("2025-01-01", "u1", "web", "clicks", 5),
    ("2025-01-01", "u2", "mobile", "views", 7),
    ("2025-01-02", "u1", "mobile", "clicks", 2),
]

Output Format
{
  "2025-01-01": {
      "u1": {
          "mobile": {
              "clicks": 3,
              "views": 10
          },
          "web": {
              "clicks": 5,
              "views": 0
          }
      },
      "u2": {
          "mobile": {
              "clicks": 0,
              "views": 7
          }
      }
  },
  "2025-01-02": {
      "u1": {
          "mobile": {
              "clicks": 2,
              "views": 0
          }
      }
  }
}

Constraints

Metrics are not known upfront

You must infer the full metric set from input

Use Python only (dict, defaultdict, etc.)

No pandas / no Spark

In [0]:
events = [
    ("2025-01-01", "u1", "mobile", "clicks", 3),
    ("2025-01-01", "u1", "mobile", "views", 10),
    ("2025-01-01", "u1", "web", "clicks", 5),
    ("2025-01-01", "u2", "mobile", "views", 7),
    ("2025-01-02", "u1", "mobile", "clicks", 2),
]

In [0]:
from collections import defaultdict

from collections import defaultdict

result = defaultdict(
    lambda: defaultdict(
        lambda: defaultdict(
            lambda: defaultdict(int)
        )
    )
)


for date, user_id, device, metric, value in events:
    result[date][user_id][device][metric] += value

final_result = {
    date: {
        user_id: {
            device: dict(metrics)
            for device, metrics in devices.items()
        }
        for user_id, devices in users.items()
    }
    for date, users in result.items()
}

final_result

In [0]:
result = defaultdict(
    lambda: defaultdict(
        lambda: defaultdict(
            lambda: defaultdict(int)
        )
    )
)

all_metrics = set()

for date, user_id, device, metric, value in events:
    result[date][user_id][device][metric] += value
    all_metrics.add(metric)

final_result = {}

for date, users in result.items():
    final_result[date] = {}

    for user_id, devices in users.items():
        final_result[date][user_id] = {}

        for device, metrics in devices.items():
            final_result[date][user_id][device] = {
                metric: metrics.get(metric, 0)
                for metric in all_metrics
            }
final_result

üî¥ HARD ‚Äî Pivot + Nested Dictionary + Ordering + Incremental Constraints
Problem Statement

You are ingesting hourly metrics from multiple services.

Each record contains:

(timestamp, service, region, metric, value)


Your task is to build a nested, pivoted aggregation with the following structure:

date ‚Üí service ‚Üí region ‚Üí ordered_metrics_dict


Where:

date is extracted from timestamp (YYYY-MM-DD)

ordered_metrics_dict:

Contains all metrics seen for that service across all regions on that date

Metrics must be sorted alphabetically

Missing metrics must be filled with 0

Input Format
events = [
    ("2025-01-01T10:15:00", "auth", "us-east", "latency", 120),
    ("2025-01-01T10:16:00", "auth", "us-east", "errors", 2),
    ("2025-01-01T10:17:00", "auth", "eu-west", "latency", 150),
    ("2025-01-01T11:00:00", "billing", "us-east", "requests", 300),
    ("2025-01-01T11:05:00", "auth", "us-east", "latency", 30),
    ("2025-01-02T09:00:00", "auth", "us-east", "errors", 1),
]

Output Format
{
  "2025-01-01": {
      "auth": {
          "us-east": {
              "errors": 2,
              "latency": 150
          },
          "eu-west": {
              "errors": 0,
              "latency": 150
          }
      },
      "billing": {
          "us-east": {
              "requests": 300
          }
      }
  },
  "2025-01-02": {
      "auth": {
          "us-east": {
              "errors": 1
          }
      }
  }
}

Constraints (Very Important)

Input may not be sorted

Metrics are dynamic

You must preserve alphabetical order of metric keys

Assume batch input for now (streaming follow-up will come)

Python only (no pandas, no Spark)