Task 1

Build a raw log dataset
Write code that generates a list of dictionaries representing support tickets. Each dictionary should include the fields described in the setup. Include at least 200 entries so that summaries are meaningful. Introduce realistic variation, such as a few categories that appear more frequently and occasional missing or malformed resolution_minutes values to simulate dirty data.

You are expected to write the generator logic yourself. Keep it readable and explain the logic in short markdown notes where necessary. After generating the list, print the first five entries and the total count to validate the structure.

In [17]:
import random
from datetime import datetime,timedelta

In [18]:
categories = ["billing", "technical", "account", "shipping"]
statuses = ["open", "closed", "pending"]
priorities = ["low", "medium", "high"]
agents = ["Aysel", "Murad", "Kenan", "Nigar", "Rashad"]

In [19]:
def generate_ticket(ticket_id):
    created_at=datetime.now()-timedelta(days=random.randint(0,60))

    category=random.choices(categories,weights=[30,40,20,10],k=1)[0]
    status = random.choice(statuses)
    priority = random.choice(priorities)
    agent = random.choice(agents)
    resolution_minutes = random.randint(5, 500)
    
    if random.random() < 0.05:
        resolution_minutes = None
    elif random.random() < 0.05:
        resolution_minutes = "unknown"
        
    return {
     "ticket_id": ticket_id,
        "created_at": created_at.strftime("%Y-%m-%d %H:%M:%S"),
        "category": category,
        "status": status,
        "priority": priority,
        "assigned_agent": agent,
        "resolution_minutes": resolution_minutes
    }
   

In [20]:
 ticket=[generate_ticket(i) for i in range(1,201)]

In [21]:
 print("First 5 entries:")
 for t in ticket[:5]:
    print(t)

 print("\nTotal count:", len(ticket))

First 5 entries:
{'ticket_id': 1, 'created_at': '2026-02-17 00:03:01', 'category': 'billing', 'status': 'open', 'priority': 'low', 'assigned_agent': 'Nigar', 'resolution_minutes': 360}
{'ticket_id': 2, 'created_at': '2026-01-09 00:03:01', 'category': 'billing', 'status': 'closed', 'priority': 'high', 'assigned_agent': 'Nigar', 'resolution_minutes': 491}
{'ticket_id': 3, 'created_at': '2026-01-17 00:03:01', 'category': 'technical', 'status': 'pending', 'priority': 'high', 'assigned_agent': 'Kenan', 'resolution_minutes': 315}
{'ticket_id': 4, 'created_at': '2026-01-23 00:03:01', 'category': 'technical', 'status': 'open', 'priority': 'low', 'assigned_agent': 'Nigar', 'resolution_minutes': 465}
{'ticket_id': 5, 'created_at': '2026-01-16 00:03:01', 'category': 'technical', 'status': 'closed', 'priority': 'medium', 'assigned_agent': 'Kenan', 'resolution_minutes': 254}

Total count: 200


First 5 entries:
{'ticket_id': 1, 'created_at': '2026-02-11 23:45:24', 'category': 'shipping', 'status': 'pending', 'priority': 'low', 'assigned_agent': 'Aysel', 'resolution_minutes': 'unknown'}
{'ticket_id': 2, 'created_at': '2026-02-12 23:45:24', 'category': 'technical', 'status': 'closed', 'priority': 'low', 'assigned_agent': 'Kenan', 'resolution_minutes': 104}
{'ticket_id': 3, 'created_at': '2026-02-15 23:45:24', 'category': 'shipping', 'status': 'open', 'priority': 'low', 'assigned_agent': 'Aysel', 'resolution_minutes': 330}
{'ticket_id': 4, 'created_at': '2025-12-29 23:45:24', 'category': 'account', 'status': 'open', 'priority': 'low', 'assigned_agent': 'Nigar', 'resolution_minutes': 16}
{'ticket_id': 5, 'created_at': '2025-12-27 23:45:24', 'category': 'account', 'status': 'closed', 'priority': 'medium', 'assigned_agent': 'Rashad', 'resolution_minutes': 489}

Total count: 200

#### Task2  Design validation helpers
Create small functions that validate the dataset. For example, write one function that checks whether all required keys are present in each record, and another function that identifies records with missing or invalid resolution_minutes. These functions should return clear results such as a list of bad records or counts of issues.

Keep function signatures simple and explicit. For instance, a validation function should take the list of records as input and return a list of indices or a filtered list. Avoid printing inside these functions; return values instead so you can reuse them in other contexts.

In [25]:
def validate_required_keys(records, required_keys):
    return [
        i for i, record in enumerate(records)
        if not all(key in record for key in required_keys)
    ]


def validate_resolution_minutes(records):
    bad_indices = []
    for i, record in enumerate(records):
        value = record.get("resolution_minutes")
        if not isinstance(value, int) or value <= 0:
            bad_indices.append(i)
    return bad_indices

In [29]:
required_keys = ["ticket_id","created_at","category","status","priority","assigned_agent","resolution_minutes"]
missing_key_indices = validate_required_keys(ticket, required_keys)
bad_resolution_indices = validate_resolution_minutes(ticket)



In [30]:
print("Records missing keys:", len(missing_key_indices))
print("Records with invalid resolution_minutes:", len(bad_resolution_indices))

Records missing keys: 0
Records with invalid resolution_minutes: 17


#### Task3 Clean and normalize records
Write a function that takes the raw records and returns a cleaned version. At minimum, it should handle missing resolution_minutes values in a defined way and normalize category strings (such as trimming whitespace and standardizing case). If you introduced malformed values, decide whether to drop those records or repair them, and document the decision in a short markdown cell.

Use list comprehensions or loops to build the cleaned list. Avoid mutating the original list in place. At the end, show the number of records before and after cleaning, and display a few cleaned records.

In [59]:
def clean_records(records):
    cleaned = []
    for record in records:
        value = record.get("resolution_minutes") 
        if not isinstance(value, int) or value <= 0:
            continue 
            
        clean_record = record.copy() 
        for field in ["category", "status", "priority", "assigned_agent"]:
            clean_record[field] = clean_record[field].strip().lower()  # normalize

        cleaned.append(clean_record) 
    return cleaned 

In [63]:
before_count = len(ticket)
cleaned_tickets = clean_records(ticket)
after_count = len(cleaned_tickets)

print(f"Records before cleaning: {before_count}")
print(f"Records after cleaning: {after_count}")
print("\nFirst 5 cleaned records:")
for t in cleaned_tickets[:5]:
    print(t)

Records before cleaning: 200
Records after cleaning: 183

First 5 cleaned records:
{'ticket_id': 1, 'created_at': '2026-02-17 00:03:01', 'category': 'billing', 'status': 'open', 'priority': 'low', 'assigned_agent': 'nigar', 'resolution_minutes': 360}
{'ticket_id': 2, 'created_at': '2026-01-09 00:03:01', 'category': 'billing', 'status': 'closed', 'priority': 'high', 'assigned_agent': 'nigar', 'resolution_minutes': 491}
{'ticket_id': 3, 'created_at': '2026-01-17 00:03:01', 'category': 'technical', 'status': 'pending', 'priority': 'high', 'assigned_agent': 'kenan', 'resolution_minutes': 315}
{'ticket_id': 4, 'created_at': '2026-01-23 00:03:01', 'category': 'technical', 'status': 'open', 'priority': 'low', 'assigned_agent': 'nigar', 'resolution_minutes': 465}
{'ticket_id': 5, 'created_at': '2026-01-16 00:03:01', 'category': 'technical', 'status': 'closed', 'priority': 'medium', 'assigned_agent': 'kenan', 'resolution_minutes': 254}


Records before cleaning: 200
Records after cleaning: 183

First 5 cleaned records:
{'ticket_id': 1, 'created_at': '2026-02-17 00:03:01', 'category': 'billing', 'status': 'open', 'priority': 'low', 'assigned_agent': 'nigar', 'resolution_minutes': 360}
{'ticket_id': 2, 'created_at': '2026-01-09 00:03:01', 'category': 'billing', 'status': 'closed', 'priority': 'high', 'assigned_agent': 'nigar', 'resolution_minutes': 491}
{'ticket_id': 3, 'created_at': '2026-01-17 00:03:01', 'category': 'technical', 'status': 'pending', 'priority': 'high', 'assigned_agent': 'kenan', 'resolution_minutes': 315}
{'ticket_id': 4, 'created_at': '2026-01-23 00:03:01', 'category': 'technical', 'status': 'open', 'priority': 'low', 'assigned_agent': 'nigar', 'resolution_minutes': 465}
{'ticket_id': 5, 'created_at': '2026-01-16 00:03:01', 'category': 'technical', 'status': 'closed', 'priority': 'medium', 'assigned_agent': 'kenan', 'resolution_minutes': 254}

#### Task4 Build summary functions
Create functions that compute useful summaries from the cleaned data. At a minimum, include:

Average resolution time per category
Count of tickets per customer
Escalation rate overall and by category
Use dictionaries to store summary results, with clear keys and values. For example, the average resolution time per category should be a dictionary mapping category name to average minutes. Your functions should return these dictionaries rather than printing them directly.

After computing each summary, write a small validation check. For example, confirm that the sum of category counts matches the total number of cleaned records. These checks are essential for catching logic errors early.

In [82]:
 def avg_resolution_per_category(records):
     sums={}
     counts={}
     for r in records:
         cat = r["category"]
         mins = r["resolution_minutes"]
         sums[cat]=sums.get(cat,0)+mins
         counts[cat]=counts.get(cat,0)+1
     avg = {cat: sums[cat]/counts[cat] for cat in sums}
     assert sum(counts.values()) == len(records)
     return avg

In [83]:
def count_tickets_per_agent(records):
    counts={}
    for r in records:
        agent=r["assigned_agent"]
        counts[agent]=counts.get(agent,0)+1
    assert sum(counts.values())==len(records)
    return counts
    

In [86]:
def escalation_rate(records):
    total=len(records)
    escalated_total=sum(1 for r in records if r["priority"]=="high")

    by_category={}
    for r in records:
        cat=r["category"]
        if cat not in by_category:
            by_category[cat] = {"total":0,"escalated":0}
        by_category[cat]["total"] += 1
        if r["priority"] == "high":
            by_category[cat]["escalated"] += 1
    rate_overall = escalated_total / total if total else 0
    rate_by_category = {cat: data["escalated"]/data["total"] for cat, data in by_category.items()}
    assert sum(d["total"] for d in by_category.values()) == total
    return {"overall": rate_overall, "by_category": rate_by_category}

In [87]:
avg_res = avg_resolution_per_category(cleaned_tickets)
count_agent = count_tickets_per_agent(cleaned_tickets)
escalation = escalation_rate(cleaned_tickets)

print("Average resolution per category:", avg_res)
print("Ticket count per agent:", count_agent)
print("Escalation rates:", escalation)

Average resolution per category: {'billing': 240.92857142857142, 'technical': 290.15555555555557, 'shipping': 228.8235294117647, 'account': 242.61764705882354}
Ticket count per agent: {'nigar': 39, 'kenan': 35, 'murad': 33, 'aysel': 39, 'rashad': 37}
Escalation rates: {'overall': 0.3005464480874317, 'by_category': {'billing': 0.2857142857142857, 'technical': 0.3111111111111111, 'shipping': 0.17647058823529413, 'account': 0.35294117647058826}}


Average resolution per category: {'billing': 240.92857142857142, 'technical': 290.15555555555557, 'shipping': 228.8235294117647, 'account': 242.61764705882354}
Ticket count per agent: {'nigar': 39, 'kenan': 35, 'murad': 33, 'aysel': 39, 'rashad': 37}
Escalation rates: {'overall': 0.3005464480874317, 'by_category': {'billing': 0.2857142857142857, 'technical': 0.3111111111111111, 'shipping': 0.17647058823529413, 'account': 0.35294117647058826}}

#### Task 5 Task 5: Package a final report
Write a function that combines the outputs of your summaries into a single report structure. This might be a dictionary that contains other dictionaries. The goal is to provide a single object that could be serialized or used by another part of a pipeline.

In a final notebook cell, print a compact report and add a short text explanation of one insight you observed. Keep the report readable and avoid overly verbose output.

In [88]:
def generate_final_report(records):
    report={}
    report["avg_resolution_per_category"] = avg_resolution_per_category(records)
    report["tickets_per_agent"] = count_tickets_per_agent(records)
    report["escalation_rates"] = escalation_rate(records)
    report["total_records"] = len(records)
    return report

In [90]:
final_report = generate_final_report(cleaned_tickets)
import pprint
pprint.pprint(final_report,width=120)
print("\nInsight: Technical tickets tend to have higher escalation rates, "
      "indicating they often require expert intervention.")


{'avg_resolution_per_category': {'account': 242.61764705882354,
                                 'billing': 240.92857142857142,
                                 'shipping': 228.8235294117647,
                                 'technical': 290.15555555555557},
 'escalation_rates': {'by_category': {'account': 0.35294117647058826,
                                      'billing': 0.2857142857142857,
                                      'shipping': 0.17647058823529413,
                                      'technical': 0.3111111111111111},
                      'overall': 0.3005464480874317},
 'tickets_per_agent': {'aysel': 39, 'kenan': 35, 'murad': 33, 'nigar': 39, 'rashad': 37},
 'total_records': 183}

Insight: Technical tickets tend to have higher escalation rates, indicating they often require expert intervention.


{'avg_resolution_per_category': {'account': 242.61764705882354,
                                 'billing': 240.92857142857142,
                                 'shipping': 228.8235294117647,
                                 'technical': 290.15555555555557},
 'escalation_rates': {'by_category': {'account': 0.35294117647058826,
                                      'billing': 0.2857142857142857,
                                      'shipping': 0.17647058823529413,
                                      'technical': 0.3111111111111111},
                      'overall': 0.3005464480874317},
 'tickets_per_agent': {'aysel': 39, 'kenan': 35, 'murad': 33, 'nigar': 39, 'rashad': 37},
 'total_records': 183}

Insight: Technical tickets tend to have higher escalation rates, indicating they often require expert intervention.