# Cleaning Coverage and EOB

| Date | User | Change Type | Remarks |  
| ---- | ---- | ----------- | ------- |
| 01/09/2025   | Martin | Created   | Created to perform alternative preprocessing and data understanding | 
| 03/09/2025   | Martin | New   | Completed cleaning for coverage branch |

# Content

* [Introduction](#introduction)

# Introduction

In [1]:
%load_ext watermark

In [2]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import re
from datetime import datetime

# Preprocess JSON

## Coverage

__Columns__

- `beneficiary` - References the Patient ID
- `class` - Medical coverage type containing group and plan info
- `extension` - Additional details from CMS Blue Button (dropped)
- `id` - Patient ID (dropped)
- `meta` - Date when the record was last updated
- `payor` - Issuer of the policy
- `relationship` - Beneficiary relationship to the subscriber ([refer here](https://hl7.org/fhir/R4/valueset-subscriber-relationship.html))
- `resourceType` - Identifier for data type (Coverage)
- `status` - Current status of the coverage (active | cancelled | draft | entered-in-error)
- `subscriberId` - ID assigned to the subscriber
- `type` - A code specifying the particular kind of Act that the Act-instance represents within its class ([refer here](https://terminology.hl7.org/6.5.0/ValueSet-v3-ActCode.html))

In [3]:
path = "../data/raw"
coverage = pd.read_json(f"{path}/Coverage.ndjson", lines=True)

In [4]:
# Processing functions
def process_beneficiary(item):
  return int(item['reference'].replace('Patient/-', ''))

def process_class(item):
  return {
    'coverageGroup': item[0]['value'],
    'coveragePlan': item[1]['value']
  }

In [None]:
# ========== Processing Coverage ==========
# beneficiary
coverage['id'] = coverage['beneficiary'].apply(lambda x: process_beneficiary(x))

# class
cov = []
for i in coverage['class']:
  cov.append(process_class(i))
pclass = pd.DataFrame.from_records(cov)
coverage = pd.concat([coverage, pclass], axis=1)
coverage = coverage.drop('class', axis=1)

# extention
coverage = coverage.drop('extension', axis=1)

# # id
# coverage = coverage.drop('id', axis=1)

# meta
coverage['lastUpdated'] = coverage['meta'].apply(lambda x: datetime.strptime(x['lastUpdated'], '%Y-%m-%dT%H:%M:%S.%f%z'))
coverage = coverage.drop('meta', axis=1)

# Payor
coverage['payor'] = coverage['payor'].apply(lambda x: x[0]['identifier']['value'])

# Relationship
coverage['relationship'] = coverage['relationship'].apply(lambda x: x['coding'][0]['code'])

# Type
coverage['actCode'] = coverage['type'].apply(lambda x: x['coding'][0]['code'])
coverage = coverage.drop('type', axis=1)

In [6]:
coverage.to_pickle("../data/clean/coverage.pkl")

## Explanation of Benefits

- `benefitBalance` - Series of benefits included in the insurance coverage and the amount covered
- `billablePeriod` - Start and end date of the billable period
  - Type of claim ([Refer Here](https://bluebutton.cms.gov/resources/variables/claim_query_cd/))

In [7]:
eob = pd.read_json(f"{path}/ExplanationOfBenefit.ndjson", lines=True)

In [174]:
temp = eob.copy()
eob.head()

Unnamed: 0,benefitBalance,billablePeriod,careTeam,contained,created,diagnosis,extension,facility,id,identifier,...,provider,resourceType,status,subType,supportingInfo,total,type,use,disposition,procedure
0,"[{'category': {'coding': [{'code': '1', 'displ...","{'end': '2011-08-07', 'extension': [{'url': 'h...",[{'provider': {'identifier': {'type': {'coding...,"[{'active': True, 'id': 'provider-org', 'ident...",2025-08-31T22:12:12+00:00,[{'diagnosisCodeableConcept': {'coding': [{'co...,[{'url': 'https://bluebutton.cms.gov/resources...,{'extension': [{'url': 'https://bluebutton.cms...,inpatient--10000002646806,[{'system': 'https://bluebutton.cms.gov/resour...,...,{'reference': '#provider-org'},ExplanationOfBenefit,active,"{'coding': [{'code': 'inpatient', 'system': 'h...",[{'category': {'coding': [{'code': 'admissionp...,"[{'amount': {'currency': 'USD', 'value': 129.1...","{'coding': [{'code': '60', 'display': 'Inpatie...",claim,,
1,"[{'category': {'coding': [{'code': '1', 'displ...","{'end': '2020-12-05', 'extension': [{'url': 'h...",[{'provider': {'identifier': {'type': {'coding...,"[{'active': True, 'id': 'provider-org', 'ident...",2025-08-31T22:12:12+00:00,[{'diagnosisCodeableConcept': {'coding': [{'co...,[{'url': 'https://bluebutton.cms.gov/resources...,{'extension': [{'url': 'https://bluebutton.cms...,inpatient--10000002646833,[{'system': 'https://bluebutton.cms.gov/resour...,...,{'reference': '#provider-org'},ExplanationOfBenefit,active,"{'coding': [{'code': 'inpatient', 'system': 'h...",[{'category': {'coding': [{'code': 'admissionp...,"[{'amount': {'currency': 'USD', 'value': 134.4...","{'coding': [{'code': '60', 'display': 'Inpatie...",claim,,
2,"[{'category': {'coding': [{'code': '1', 'displ...","{'end': '1973-09-23', 'extension': [{'url': 'h...",[{'provider': {'identifier': {'type': {'coding...,"[{'active': True, 'id': 'provider-org', 'ident...",2025-08-31T22:12:12+00:00,,[{'url': 'https://bluebutton.cms.gov/resources...,{'extension': [{'url': 'https://bluebutton.cms...,outpatient--10000002646839,[{'system': 'https://bluebutton.cms.gov/resour...,...,{'reference': '#provider-org'},ExplanationOfBenefit,active,"{'coding': [{'code': 'outpatient', 'system': '...",[{'category': {'coding': [{'code': 'clmrecvdda...,"[{'amount': {'currency': 'USD', 'value': 218.0...","{'coding': [{'code': '40', 'display': 'Hospita...",claim,,
3,"[{'category': {'coding': [{'code': '1', 'displ...","{'end': '1978-10-08', 'extension': [{'url': 'h...",[{'provider': {'identifier': {'type': {'coding...,"[{'active': True, 'id': 'provider-org', 'ident...",2025-08-31T22:12:12+00:00,,[{'url': 'https://bluebutton.cms.gov/resources...,{'extension': [{'url': 'https://bluebutton.cms...,outpatient--10000002646843,[{'system': 'https://bluebutton.cms.gov/resour...,...,{'reference': '#provider-org'},ExplanationOfBenefit,active,"{'coding': [{'code': 'outpatient', 'system': '...",[{'category': {'coding': [{'code': 'clmrecvdda...,"[{'amount': {'currency': 'USD', 'value': 218.0...","{'coding': [{'code': '40', 'display': 'Hospita...",claim,,
4,"[{'category': {'coding': [{'code': '1', 'displ...","{'end': '1992-04-26', 'extension': [{'url': 'h...",[{'provider': {'identifier': {'type': {'coding...,"[{'active': True, 'id': 'provider-org', 'ident...",2025-08-31T22:12:12+00:00,[{'diagnosisCodeableConcept': {'coding': [{'co...,[{'url': 'https://bluebutton.cms.gov/resources...,{'extension': [{'url': 'https://bluebutton.cms...,outpatient--10000002646848,[{'system': 'https://bluebutton.cms.gov/resour...,...,{'reference': '#provider-org'},ExplanationOfBenefit,active,"{'coding': [{'code': 'outpatient', 'system': '...",[{'category': {'coding': [{'code': 'clmrecvdda...,"[{'amount': {'currency': 'USD', 'value': 218.0...","{'coding': [{'code': '40', 'display': 'Hospita...",claim,,


In [185]:
# Processing Functions
def process_billablePeriod(item):
  if "extension" in item.keys():
    return {
      'billablePeriodStart': item['start'],
      'billablePeriodEnd': item['end'],
      'ClaimType': item['extension'][0]['valueCoding']['code']
    }
  return {
    'billablePeriodStart': item['start'],
    'billablePeriodEnd': item['end'],
    'ClaimType': np.nan
  }

def process_contained(item):
  if pd.isnull(item):
    return {
      "active": np.nan,
      "PRNCode": np.nan,
      "NPICode": np.nan
    }
  
  if len(item[0]['identifier']) == 2:
    return {
      "active": item[0]['active'],
      "PRNCode": item[0]['identifier'][0]['value'],
      "NPICode": item[0]['identifier'][1]['value']
    }
  else:
    return {
      "active": item[0]['active'],
      "PRNCode": item[0]['identifier'][0]['value'],
      "NPICode": np.nan
    }

def process_insurance(item):
  mapper = {
    'a': 'Part A',
    'b': 'Part B',
    'd': 'Part D'
  }

  s = item[0]['coverage']['reference'].split('-')
  part = mapper[s[1]]
  cov_id = s[-1]
  return part, cov_id

def process_payment(item):
  payment = {
    "currency": np.nan,
    "paymentAmount": np.nan,
    "paymentDate": np.nan
  }
  if "amount" in item.keys():
    payment["currency"] = item['amount']['currency'][0],
    payment["paymentAmount"] = item['amount']['value']
  else:
    payment["paymentDate"] = item['date']
  
  return payment
  
def process_provider(item):
  if "identifier" in item.keys():
    return item['identifier']['value']
  return np.nan

def process_total(item):
  total = {
    "totalChargeType": np.nan,
    "totalChargeCurrency": item[0]['amount']['currency'],
    "totalChargeAmount": item[0]['amount']['value']
  } 

  if len(item) == 1:
    if len(item[0]['category']['coding']) == 1:
      total['totalChargeType'] = 'Drug Cost'
    else:
      total['totalChargeType'] = 'Claim Total Charge Amount'
  else:
    total['totalChargeType'] = 'Drug Cost'

  return total

In [None]:
# ========== Processing Coverage ==========
# benefitBalance

# billablePeriod
bp = temp['billablePeriod'].apply(lambda x: process_billablePeriod(x))
bp = pd.DataFrame.from_records(bp)
temp = pd.concat([temp, bp], axis=1)
temp.drop('billablePeriod', axis=1, inplace=True)

# careTeam

# contained
contained = temp['contained'].apply(lambda x: process_contained(x))
contained = pd.DataFrame.from_records(contained)
temp = pd.concat([temp, contained], axis=1)
temp.drop('contained', axis=1, inplace=True)

# diagnosis
# extension
# facility

# id
temp['subType'] = temp['id'].str.split('-')[0]
temp['id'] = temp['id'].str.split('-')[-1]

# identifier

# insurance
temp[['coveragePart', 'coverageId']] = temp.apply(lambda x: process_insurance(x['insurance']), result_type='expand', axis='columns') 

# insurer
temp['insurer'] = temp['insurer'].apply(lambda x: x['identifier']['value'])

# item

# meta
temp['lastUpdated'] = temp['meta'].apply(lambda x: datetime.strptime(x['lastUpdated'], '%Y-%m-%dT%H:%M:%S.%f%z'))

# patient
temp['patient'] = temp['patient'].apply(lambda x: x['reference'].split('/')[-1])

# payment
payment = temp['payment'].apply(lambda x: process_payment(x))
payment = pd.DataFrame.from_records(payment)
temp = pd.concat([temp, payment], axis=1)
temp.drop('payment', axis=1, inplace=True)

# provider
temp['providerId'] = temp['provider'].apply(process_provider)
temp.drop('provider', axis=1, inplace=True)

# supportingInfo

# total
total = temp['total'].apply(lambda x: process_total(x))
total = pd.DataFrame.from_records(total)
temp = pd.concat([temp, total], axis=1)
temp.drop('total', axis=1, inplace=True)

In [186]:
total = temp['total'].apply(lambda x: process_total(x))
total = pd.DataFrame.from_records(total)

In [187]:
total['totalChargeType'].unique()

array(['Claim Total Charge Amount', 'Drug Cost'], dtype=object)

In [190]:
t1 = []
for i in temp['type']:
  t1.append(len(i['coding']))

In [191]:
pd.Series(t1).unique()

array([3, 2])

In [None]:
# Processing Functions
def process_contained(item):
  patient = item[0]
  provider = item[1]

  try:
    patient_info = {
      'birthDate': patient['birthDate'],
      'gender': patient['gender'],
      'medicareNumber': patient['identifier'][0]['value'],
      'familyName': patient['name'][0]['family'],
      'givenName': patient['name'][0]['given'][0]
    }
  except KeyError:
    patient_info = {
      'birthDate': np.nan,
      'gender': patient['gender'],
      'medicareNumber': patient['identifier'][0]['value'],
      'familyName': patient['name'][0]['family'],
      'givenName': patient['name'][0]['given'][0]
    }

  try:
    provider_info = {
      'providerIdType': provider['identifier'][1]['type']['coding'][0]['code'],
      'providerCode': provider['identifier'][1]['value']
    }
  except (KeyError, IndexError):
    try:
      provider_info = {
        'providerIdType': provider['identifier'][0]['type']['coding'][0]['code'],
        'providerCode': provider['identifier'][0]['value']
      }
    except:
      provider_info = {
        'providerIdType': np.nan,
        'providerCode': np.nan
      }
  
  return patient_info, provider_info

In [None]:
# Split billablePeriod into start and end
temp[['billablePeriodStart', 'billablePeriodEnd']] = pd.json_normalize(temp['billablePeriod'])
temp = temp.drop('billablePeriod', axis=1)

# Process contained column
patient = []
provider = []
for item in temp['contained']:
  pa, pr = process_contained(item)
  patient.append(pa)
  provider.append(pr)

patient = pd.DataFrame.from_records(patient)
provider = pd.DataFrame.from_records(provider)

In [None]:
temp.columns

Index(['contained', 'created', 'diagnosis', 'extension', 'facility', 'id',
       'identifier', 'insurance', 'item', 'meta', 'patient', 'priority',
       'provider', 'resourceType', 'status', 'supportingInfo', 'total', 'type',
       'use', 'procedure', 'billablePeriodStart', 'billablePeriodEnd'],
      dtype='object')

In [None]:
temp.head()

Unnamed: 0,contained,created,diagnosis,extension,facility,id,identifier,insurance,item,meta,...,provider,resourceType,status,supportingInfo,total,type,use,procedure,billablePeriodStart,billablePeriodEnd
0,"[{'birthDate': '1944-05-25', 'extension': [{'u...",2025-08-31T21:23:49+00:00,[{'diagnosisCodeableConcept': {'coding': [{'co...,[{'url': 'https://bluebutton.cms.gov/resources...,{'extension': [{'url': 'https://bluebutton.cms...,f-LTEwMDAwMDAzNTUxNzU5,[{'system': 'https://bluebutton.cms.gov/resour...,[{'coverage': {'identifier': {'system': 'https...,[{'extension': [{'url': 'https://bluebutton.cm...,{'lastUpdated': '2023-05-11T21:17:37.364+00:00'},...,{'reference': '#provider-org'},Claim,active,[{'category': {'coding': [{'code': 'typeofbill...,"{'currency': 'USD', 'value': 119.62}","{'coding': [{'code': 'institutional', 'display...",claim,,2012-09-16,2012-09-16
1,"[{'birthDate': '1944-05-25', 'extension': [{'u...",2025-08-31T21:23:49+00:00,[{'diagnosisCodeableConcept': {'coding': [{'co...,[{'url': 'https://bluebutton.cms.gov/resources...,{'extension': [{'url': 'https://bluebutton.cms...,f-LTEwMDAwMDAzNTUxNzY0,[{'system': 'https://bluebutton.cms.gov/resour...,[{'coverage': {'identifier': {'system': 'https...,[{'extension': [{'url': 'https://bluebutton.cm...,{'lastUpdated': '2023-05-11T21:17:36.876+00:00'},...,{'reference': '#provider-org'},Claim,active,[{'category': {'coding': [{'code': 'typeofbill...,"{'currency': 'USD', 'value': 119.62}","{'coding': [{'code': 'institutional', 'display...",claim,,2013-06-11,2013-06-11
2,"[{'birthDate': '1944-05-25', 'extension': [{'u...",2025-08-31T21:23:49+00:00,[{'diagnosisCodeableConcept': {'coding': [{'co...,[{'url': 'https://bluebutton.cms.gov/resources...,{'extension': [{'url': 'https://bluebutton.cms...,f-LTEwMDAwMDAzNTUxNzY4,[{'system': 'https://bluebutton.cms.gov/resour...,[{'coverage': {'identifier': {'system': 'https...,[{'extension': [{'url': 'https://bluebutton.cm...,{'lastUpdated': '2023-05-11T21:17:37.098+00:00'},...,{'reference': '#provider-org'},Claim,active,[{'category': {'coding': [{'code': 'typeofbill...,"{'currency': 'USD', 'value': 119.62}","{'coding': [{'code': 'institutional', 'display...",claim,,2014-04-02,2014-04-01
3,"[{'birthDate': '1944-05-25', 'extension': [{'u...",2025-08-31T21:23:49+00:00,[{'diagnosisCodeableConcept': {'coding': [{'co...,[{'url': 'https://bluebutton.cms.gov/resources...,{'extension': [{'url': 'https://bluebutton.cms...,f-LTEwMDAwMDAzNTUxNzc2,[{'system': 'https://bluebutton.cms.gov/resour...,[{'coverage': {'identifier': {'system': 'https...,[{'extension': [{'url': 'https://bluebutton.cm...,{'lastUpdated': '2023-05-11T21:17:37.145+00:00'},...,{'reference': '#provider-org'},Claim,active,[{'category': {'coding': [{'code': 'typeofbill...,"{'currency': 'USD', 'value': 119.62}","{'coding': [{'code': 'institutional', 'display...",claim,,2014-11-18,2014-11-17
4,"[{'birthDate': '1944-05-25', 'extension': [{'u...",2025-08-31T21:23:49+00:00,[{'diagnosisCodeableConcept': {'coding': [{'co...,[{'url': 'https://bluebutton.cms.gov/resources...,{'extension': [{'url': 'https://bluebutton.cms...,f-LTEwMDAwMDAzNTUxNzc4,[{'system': 'https://bluebutton.cms.gov/resour...,[{'coverage': {'identifier': {'system': 'https...,[{'extension': [{'url': 'https://bluebutton.cm...,{'lastUpdated': '2023-05-11T21:17:37.099+00:00'},...,{'reference': '#provider-org'},Claim,active,[{'category': {'coding': [{'code': 'typeofbill...,"{'currency': 'USD', 'value': 119.62}","{'coding': [{'code': 'institutional', 'display...",claim,,2016-04-04,2016-04-04


In [None]:
%watermark

Last updated: 2025-09-01T21:36:35.726067+08:00

Python implementation: CPython
Python version       : 3.11.9
IPython version      : 9.5.0

Compiler    : MSC v.1938 64 bit (AMD64)
OS          : Windows
Release     : 10
Machine     : AMD64
Processor   : Intel64 Family 6 Model 183 Stepping 1, GenuineIntel
CPU cores   : 20
Architecture: 64bit

