## Imports

In [1]:
import pandas as pd
import numpy as np
import json
import requests
from datetime import datetime
import copy

Hi Mike,

As part of the credit underwriting logic you are pulling a B2B Experian credit report and are extracting a couple of KPIs from it.

- [Level 1] Extract the employee_size from the sample data in a function which also works if the value is missing
- [Level 2] One of the KPIs you want to use in your logic is the number of total inquiries in the last 6 months. Write a function which can calculate this robustly in production based on the example data provided below. Describe the edge cases which could happen.
We would recommend you use a jupyter notebook for this exercise. Feel free to use any tool of your choice which helps you understand the json data better (e.g. VS Code). You will share your entire screen during the live coding challenge and are allowed to use Google for it.

Best,
Serena

### Level 1

In [2]:
#Load the JSON
with open("sample_data_thc.json") as f:
    sample_response = json.load(f)

In [3]:
#Make a copy of JSON that doesn't include the employee_size key
sample_response2 = copy.deepcopy(sample_response)
sample_response2["data"]["data"]["business_facts"].pop("employee_size", None) #pop to remove employee_size

8

In [4]:
#Try Manually
sample_response["data"]["data"]["business_facts"]["employee_size"]

8

In [11]:
#Input: JSON Response
#Output: Employee Size 
def extract_employee_size(json_response):

    #IF employee_size is error, then return "", else return json_response
    try: 
        employee_size = json_response["data"]["data"]["business_facts"]["employee_size"]
        return employee_size

    except (TypeError, KeyError, AttributeError):
        return ""
    

In [None]:
#Test Case - Employee_Size Exists

In [13]:
e_size = extract_employee_size(sample_response)
e_size

8

In [15]:
#Test Case - Employee_Size Doesn't Exists
e_size2 = extract_employee_size(sample_response2)
e_size2

''

### Level 2

In [17]:
sample_response["data"]["data"]["inquiries"]

[{'inquiry_count': [{'year': 2023, 'count': 0, 'month': 7},
   {'year': 2023, 'count': 1, 'month': 6},
   {'year': 2023, 'count': 0, 'month': 5},
   {'year': 2023, 'count': 0, 'month': 4},
   {'year': 2023, 'count': 0, 'month': 3},
   {'year': 2023, 'count': 0, 'month': 2},
   {'year': 2023, 'count': 0, 'month': 1},
   {'year': 2022, 'count': 1, 'month': 12},
   {'year': 2022, 'count': 0, 'month': 11}],
  'inquiry_business_category': 'DISPOSAL'},
 {'inquiry_count': [{'year': 2023, 'count': 0, 'month': 7},
   {'year': 2023, 'count': 0, 'month': 6},
   {'year': 2023, 'count': 0, 'month': 5},
   {'year': 2023, 'count': 0, 'month': 4},
   {'year': 2023, 'count': 1, 'month': 3},
   {'year': 2023, 'count': 0, 'month': 2},
   {'year': 2023, 'count': 1, 'month': 1},
   {'year': 2022, 'count': 0, 'month': 12},
   {'year': 2022, 'count': 0, 'month': 11}],
  'inquiry_business_category': 'INSURANCE'},
 {'inquiry_count': [{'year': 2023, 'count': 0, 'month': 7},
   {'year': 2023, 'count': 1, 'month'

In [19]:
temp = sample_response["data"]["data"]["inquiries"][0]["inquiry_count"]
temp

[{'year': 2023, 'count': 0, 'month': 7},
 {'year': 2023, 'count': 1, 'month': 6},
 {'year': 2023, 'count': 0, 'month': 5},
 {'year': 2023, 'count': 0, 'month': 4},
 {'year': 2023, 'count': 0, 'month': 3},
 {'year': 2023, 'count': 0, 'month': 2},
 {'year': 2023, 'count': 0, 'month': 1},
 {'year': 2022, 'count': 1, 'month': 12},
 {'year': 2022, 'count': 0, 'month': 11}]

In [21]:
#Define function to determine total_inquiries. 
def total_inquiries(json_response, today_tot_months): #note passing today_tot_months for demo purposes. Can just do a today variable for dynamic purposes

    sumcount = 0

    for inquiry_cat in json_response["data"]["data"]["inquiries"]: #Need to go through all the inquiry categories

        for obj in inquiry_cat["inquiry_count"]:

            if obj["year"]*12 + obj["month"] >= today_tot_months - 6: #check if is in last 6 months, if so add to sumcount
                sumcount = sumcount + obj["count"]

    return sumcount

In [23]:
#Example Test
today_tot_months = 24284 #note we can make this a today() variable to be dynamic but this is for demo purposes

total_inquiries(sample_response, today_tot_months)

4

#### Level 2 - Describing Edge Cases

Several edge cases could occur. Here are some Examples:

- Missing sections: inquiries absent or not a list; inquiry_count missing; data nested differently than expected.
- Empty data: No rows in 6month window.
- Invalid month/year: (e.g. month = 13, year=2089)
- Duplicate rows: (e.g. two rows with same month and year)
- Non-numeric counts: (e.g. "count" has unexpected input like "True" or something)
  