# Cleaning Coverage and EOB

| Date | User | Change Type | Remarks |  
| ---- | ---- | ----------- | ------- |
| 01/09/2025   | Martin | Created   | Created to perform alternative preprocessing and data understanding | 
| 03/09/2025   | Martin | New   | |

# Content

* [Introduction](#introduction)

# Introduction

In [1]:
%load_ext watermark

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import re
from datetime import datetime

# Preprocess JSON

## Coverage

__Columns__

- `beneficiary` - References the Patient ID
- `class` - Medical coverage type containing group and plan info
- `extension` - Additional details from CMS Blue Button (dropped)
- `id` - Patient ID (dropped)
- `meta` - Date when the record was last updated
- `payor` - Issuer of the policy
- `relationship` - Beneficiary relationship to the subscriber ([refer here](https://hl7.org/fhir/R4/valueset-subscriber-relationship.html))
- `resourceType` - Identifier for data type (Coverage)
- `status` - Current status of the coverage (active | cancelled | draft | entered-in-error)
- `subscriberId` - ID assigned to the subscriber
- `type` - A code specifying the particular kind of Act that the Act-instance represents within its class ([refer here](https://terminology.hl7.org/6.5.0/ValueSet-v3-ActCode.html))

In [69]:
path = "../data/raw"
coverage = pd.read_json(f"{path}/Coverage.ndjson", lines=True)

In [None]:
# Processing functions
def process_beneficiary(item):
  return int(item['reference'].replace('Patient/-', ''))

def process_class(item):
  return {
    'coverageGroup': item[0]['value'],
    'coveragePlan': item[1]['value']
  }

In [None]:
# ========== Processing Coverage ==========
# beneficiary
coverage['beneficiary'] = coverage['beneficiary'].apply(lambda x: process_beneficiary(x))

# class
cov = []
for i in coverage['class']:
  cov.append(process_class(i))
pclass = pd.DataFrame.from_records(cov)
coverage = pd.concat([coverage, pclass], axis=1)
coverage = coverage.drop('class', axis=1)

# extention
coverage = coverage.drop('extension', axis=1)

# id
coverage = coverage.drop('id', axis=1)

# meta
coverage['lastUpdated'] = coverage['meta'].apply(lambda x: datetime.strptime(x['lastUpdated'], '%Y-%m-%dT%H:%M:%S.%f%z'))
coverage = coverage.drop('meta', axis=1)

# Payor
coverage['payor'] = coverage['payor'].apply(lambda x: x[0]['identifier']['value'])

# Relationship
coverage['relationship'] = coverage['relationship'].apply(lambda x: x['coding'][0]['code'])

# Type
coverage['actCode'] = coverage['type'].apply(lambda x: x['coding'][0]['code'])
coverage = coverage.drop('type', axis=1)

In [148]:
coverage.to_pickle("../data/clean/coverage.pkl")

## Explanation of Benefits

- `benefitBalance` - Series of benefits included in the insurance coverage and the amount covered
- `billablePeriod` - Start and end date of the billable period
  - Type of claim ([Refer Here](https://bluebutton.cms.gov/resources/variables/claim_query_cd/))

In [127]:
eob = pd.read_json(f"{path}/ExplanationOfBenefit.ndjson", lines=True)

In [133]:
temp = eob.copy()
eob.head()

Unnamed: 0,benefitBalance,billablePeriod,careTeam,contained,created,diagnosis,extension,facility,id,identifier,...,provider,resourceType,status,subType,supportingInfo,total,type,use,disposition,procedure
0,"[{'category': {'coding': [{'code': '1', 'displ...","{'end': '2011-08-07', 'extension': [{'url': 'h...",[{'provider': {'identifier': {'type': {'coding...,"[{'active': True, 'id': 'provider-org', 'ident...",2025-08-31T22:12:12+00:00,[{'diagnosisCodeableConcept': {'coding': [{'co...,[{'url': 'https://bluebutton.cms.gov/resources...,{'extension': [{'url': 'https://bluebutton.cms...,inpatient--10000002646806,[{'system': 'https://bluebutton.cms.gov/resour...,...,{'reference': '#provider-org'},ExplanationOfBenefit,active,"{'coding': [{'code': 'inpatient', 'system': 'h...",[{'category': {'coding': [{'code': 'admissionp...,"[{'amount': {'currency': 'USD', 'value': 129.1...","{'coding': [{'code': '60', 'display': 'Inpatie...",claim,,
1,"[{'category': {'coding': [{'code': '1', 'displ...","{'end': '2020-12-05', 'extension': [{'url': 'h...",[{'provider': {'identifier': {'type': {'coding...,"[{'active': True, 'id': 'provider-org', 'ident...",2025-08-31T22:12:12+00:00,[{'diagnosisCodeableConcept': {'coding': [{'co...,[{'url': 'https://bluebutton.cms.gov/resources...,{'extension': [{'url': 'https://bluebutton.cms...,inpatient--10000002646833,[{'system': 'https://bluebutton.cms.gov/resour...,...,{'reference': '#provider-org'},ExplanationOfBenefit,active,"{'coding': [{'code': 'inpatient', 'system': 'h...",[{'category': {'coding': [{'code': 'admissionp...,"[{'amount': {'currency': 'USD', 'value': 134.4...","{'coding': [{'code': '60', 'display': 'Inpatie...",claim,,
2,"[{'category': {'coding': [{'code': '1', 'displ...","{'end': '1973-09-23', 'extension': [{'url': 'h...",[{'provider': {'identifier': {'type': {'coding...,"[{'active': True, 'id': 'provider-org', 'ident...",2025-08-31T22:12:12+00:00,,[{'url': 'https://bluebutton.cms.gov/resources...,{'extension': [{'url': 'https://bluebutton.cms...,outpatient--10000002646839,[{'system': 'https://bluebutton.cms.gov/resour...,...,{'reference': '#provider-org'},ExplanationOfBenefit,active,"{'coding': [{'code': 'outpatient', 'system': '...",[{'category': {'coding': [{'code': 'clmrecvdda...,"[{'amount': {'currency': 'USD', 'value': 218.0...","{'coding': [{'code': '40', 'display': 'Hospita...",claim,,
3,"[{'category': {'coding': [{'code': '1', 'displ...","{'end': '1978-10-08', 'extension': [{'url': 'h...",[{'provider': {'identifier': {'type': {'coding...,"[{'active': True, 'id': 'provider-org', 'ident...",2025-08-31T22:12:12+00:00,,[{'url': 'https://bluebutton.cms.gov/resources...,{'extension': [{'url': 'https://bluebutton.cms...,outpatient--10000002646843,[{'system': 'https://bluebutton.cms.gov/resour...,...,{'reference': '#provider-org'},ExplanationOfBenefit,active,"{'coding': [{'code': 'outpatient', 'system': '...",[{'category': {'coding': [{'code': 'clmrecvdda...,"[{'amount': {'currency': 'USD', 'value': 218.0...","{'coding': [{'code': '40', 'display': 'Hospita...",claim,,
4,"[{'category': {'coding': [{'code': '1', 'displ...","{'end': '1992-04-26', 'extension': [{'url': 'h...",[{'provider': {'identifier': {'type': {'coding...,"[{'active': True, 'id': 'provider-org', 'ident...",2025-08-31T22:12:12+00:00,[{'diagnosisCodeableConcept': {'coding': [{'co...,[{'url': 'https://bluebutton.cms.gov/resources...,{'extension': [{'url': 'https://bluebutton.cms...,outpatient--10000002646848,[{'system': 'https://bluebutton.cms.gov/resour...,...,{'reference': '#provider-org'},ExplanationOfBenefit,active,"{'coding': [{'code': 'outpatient', 'system': '...",[{'category': {'coding': [{'code': 'clmrecvdda...,"[{'amount': {'currency': 'USD', 'value': 218.0...","{'coding': [{'code': '40', 'display': 'Hospita...",claim,,


In [134]:
# Processing Functions
def process_billablePeriod(item):
  if "extension" in item.keys():
    return {
      'billablePeriodStart': item['start'],
      'billablePeriodEnd': item['end'],
      'ClaimType': item['extension'][0]['valueCoding']['code']
    }
  return {
    'billablePeriodStart': item['start'],
    'billablePeriodEnd': item['end'],
    'ClaimType': np.nan
  }

In [None]:
# ========== Processing Coverage ==========
# benefitBalance

# billablePeriod
bp = temp['billablePeriod'].apply(lambda x: process_billablePeriod(x))
bp = pd.DataFrame.from_records(bp)


0       {'billablePeriodStart': '2011-08-07', 'billabl...
1       {'billablePeriodStart': '2020-12-05', 'billabl...
2       {'billablePeriodStart': '1973-09-23', 'billabl...
3       {'billablePeriodStart': '1978-10-08', 'billabl...
4       {'billablePeriodStart': '1992-04-26', 'billabl...
                              ...                        
4086    {'billablePeriodStart': '2020-03-04', 'billabl...
4087    {'billablePeriodStart': '2020-03-04', 'billabl...
4088    {'billablePeriodStart': '2020-04-20', 'billabl...
4089    {'billablePeriodStart': '2020-11-12', 'billabl...
4090    {'billablePeriodStart': '2021-04-26', 'billabl...
Name: billablePeriod, Length: 4091, dtype: object

In [146]:
a = temp['careTeam'][0]
temp[temp['careTeam'].isna()].head()

Unnamed: 0,benefitBalance,billablePeriod,careTeam,contained,created,diagnosis,extension,facility,id,identifier,...,provider,resourceType,status,subType,supportingInfo,total,type,use,disposition,procedure
53,"[{'category': {'coding': [{'code': '1', 'displ...","{'end': '1992-04-26', 'start': '1992-04-26'}",,,2025-08-31T22:12:12+00:00,[{'diagnosisCodeableConcept': {'coding': [{'co...,[{'url': 'https://bluebutton.cms.gov/resources...,,dme--10000002647321,[{'system': 'https://bluebutton.cms.gov/resour...,...,"{'display': 'UNKNOWN', 'identifier': {'system'...",ExplanationOfBenefit,active,,[{'category': {'coding': [{'code': 'clmrecvdda...,"[{'amount': {'currency': 'USD', 'value': 0}, '...","{'coding': [{'code': '82', 'display': 'DMERC; ...",claim,1.0,
162,"[{'category': {'coding': [{'code': '1', 'displ...","{'end': '2013-10-20', 'extension': [{'url': 'h...",,"[{'active': True, 'id': 'provider-org', 'ident...",2025-08-31T22:12:12+00:00,[{'diagnosisCodeableConcept': {'coding': [{'co...,[{'url': 'https://bluebutton.cms.gov/resources...,{'extension': [{'url': 'https://bluebutton.cms...,snf--10000002863503,[{'system': 'https://bluebutton.cms.gov/resour...,...,{'reference': '#provider-org'},ExplanationOfBenefit,active,"{'coding': [{'code': 'inpatient', 'system': 'h...",[{'category': {'coding': [{'code': 'clmrecvdda...,"[{'amount': {'currency': 'USD', 'value': 6757....","{'coding': [{'code': '20', 'display': 'Non swi...",claim,,"[{'date': '2013-10-20T00:00:00+00:00', 'proced..."
223,"[{'category': {'coding': [{'code': '1', 'displ...","{'end': '1996-11-24', 'start': '1996-11-24'}",,,2025-08-31T22:12:12+00:00,[{'diagnosisCodeableConcept': {'coding': [{'co...,[{'url': 'https://bluebutton.cms.gov/resources...,,dme--10000001933307,[{'system': 'https://bluebutton.cms.gov/resour...,...,"{'display': 'UNKNOWN', 'identifier': {'system'...",ExplanationOfBenefit,active,,[{'category': {'coding': [{'code': 'clmrecvdda...,"[{'amount': {'currency': 'USD', 'value': 0}, '...","{'coding': [{'code': '82', 'display': 'DMERC; ...",claim,1.0,
224,"[{'category': {'coding': [{'code': '1', 'displ...","{'end': '2019-03-10', 'extension': [{'url': 'h...",,"[{'active': True, 'id': 'provider-org', 'ident...",2025-08-31T22:12:12+00:00,[{'diagnosisCodeableConcept': {'coding': [{'co...,[{'url': 'https://bluebutton.cms.gov/resources...,{'extension': [{'url': 'https://bluebutton.cms...,snf--10000001933344,[{'system': 'https://bluebutton.cms.gov/resour...,...,{'reference': '#provider-org'},ExplanationOfBenefit,active,"{'coding': [{'code': 'inpatient', 'system': 'h...",[{'category': {'coding': [{'code': 'clmrecvdda...,"[{'amount': {'currency': 'USD', 'value': 75780...","{'coding': [{'code': '20', 'display': 'Non swi...",claim,,"[{'date': '2019-03-10T00:00:00+00:00', 'proced..."
277,"[{'category': {'coding': [{'code': '1', 'displ...","{'end': '2015-08-25', 'extension': [{'url': 'h...",,"[{'active': True, 'id': 'provider-org', 'ident...",2025-08-31T22:12:12+00:00,[{'diagnosisCodeableConcept': {'coding': [{'co...,[{'url': 'https://bluebutton.cms.gov/resources...,{'extension': [{'url': 'https://bluebutton.cms...,snf--10000001981875,[{'system': 'https://bluebutton.cms.gov/resour...,...,{'reference': '#provider-org'},ExplanationOfBenefit,active,"{'coding': [{'code': 'inpatient', 'system': 'h...",[{'category': {'coding': [{'code': 'clmrecvdda...,"[{'amount': {'currency': 'USD', 'value': 18034...","{'coding': [{'code': '20', 'display': 'Non swi...",claim,,"[{'date': '2015-08-25T00:00:00+00:00', 'proced..."


In [147]:
a

[{'provider': {'identifier': {'type': {'coding': [{'code': 'npi',
       'display': 'National Provider Identifier',
       'system': 'http://hl7.org/fhir/us/carin-bb/CodeSystem/C4BBIdentifierType'}]},
    'value': '9999999698'}},
  'role': {'coding': [{'code': 'attending',
     'display': 'Attending',
     'system': 'http://hl7.org/fhir/us/carin-bb/CodeSystem/C4BBClaimCareTeamRole'}]},
  'sequence': 1},
 {'provider': {'identifier': {'type': {'coding': [{'code': 'UPIN',
       'display': "Medicare/CMS (formerly HCFA)'s Universal Physician Identification numbers",
       'system': 'http://terminology.hl7.org/CodeSystem/v2-0203'}]}}},
  'role': {'coding': [{'code': 'attending',
     'display': 'Attending',
     'system': 'http://hl7.org/fhir/us/carin-bb/CodeSystem/C4BBClaimCareTeamRole'}]},
  'sequence': 2},
 {'provider': {'identifier': {'type': {'coding': [{'code': 'npi',
       'display': 'National Provider Identifier',
       'system': 'http://hl7.org/fhir/us/carin-bb/CodeSystem/C4BBId

In [143]:
check = []
for i in temp['careTeam']:
  try:
    print(len(i))
  except:
    print(i)
    break

4
4
4
4
4
4
4
4
4
4
4
4
4
4
4
4
4
4
4
4
4
4
4
4
4
4
4
4
4
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
nan


In [130]:
for i in eob['billablePeriod']:
  if 'extension' in i.keys():
    print(len(i['extension']))

1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1


In [60]:
# Processing Functions
def process_contained(item):
  patient = item[0]
  provider = item[1]

  try:
    patient_info = {
      'birthDate': patient['birthDate'],
      'gender': patient['gender'],
      'medicareNumber': patient['identifier'][0]['value'],
      'familyName': patient['name'][0]['family'],
      'givenName': patient['name'][0]['given'][0]
    }
  except KeyError:
    patient_info = {
      'birthDate': np.nan,
      'gender': patient['gender'],
      'medicareNumber': patient['identifier'][0]['value'],
      'familyName': patient['name'][0]['family'],
      'givenName': patient['name'][0]['given'][0]
    }

  try:
    provider_info = {
      'providerIdType': provider['identifier'][1]['type']['coding'][0]['code'],
      'providerCode': provider['identifier'][1]['value']
    }
  except (KeyError, IndexError):
    try:
      provider_info = {
        'providerIdType': provider['identifier'][0]['type']['coding'][0]['code'],
        'providerCode': provider['identifier'][0]['value']
      }
    except:
      provider_info = {
        'providerIdType': np.nan,
        'providerCode': np.nan
      }
  
  return patient_info, provider_info

In [None]:
# Split billablePeriod into start and end
temp[['billablePeriodStart', 'billablePeriodEnd']] = pd.json_normalize(temp['billablePeriod'])
temp = temp.drop('billablePeriod', axis=1)

# Process contained column
patient = []
provider = []
for item in temp['contained']:
  pa, pr = process_contained(item)
  patient.append(pa)
  provider.append(pr)

patient = pd.DataFrame.from_records(patient)
provider = pd.DataFrame.from_records(provider)

In [32]:
temp.columns

Index(['contained', 'created', 'diagnosis', 'extension', 'facility', 'id',
       'identifier', 'insurance', 'item', 'meta', 'patient', 'priority',
       'provider', 'resourceType', 'status', 'supportingInfo', 'total', 'type',
       'use', 'procedure', 'billablePeriodStart', 'billablePeriodEnd'],
      dtype='object')

In [11]:
temp.head()

Unnamed: 0,contained,created,diagnosis,extension,facility,id,identifier,insurance,item,meta,...,provider,resourceType,status,supportingInfo,total,type,use,procedure,billablePeriodStart,billablePeriodEnd
0,"[{'birthDate': '1944-05-25', 'extension': [{'u...",2025-08-31T21:23:49+00:00,[{'diagnosisCodeableConcept': {'coding': [{'co...,[{'url': 'https://bluebutton.cms.gov/resources...,{'extension': [{'url': 'https://bluebutton.cms...,f-LTEwMDAwMDAzNTUxNzU5,[{'system': 'https://bluebutton.cms.gov/resour...,[{'coverage': {'identifier': {'system': 'https...,[{'extension': [{'url': 'https://bluebutton.cm...,{'lastUpdated': '2023-05-11T21:17:37.364+00:00'},...,{'reference': '#provider-org'},Claim,active,[{'category': {'coding': [{'code': 'typeofbill...,"{'currency': 'USD', 'value': 119.62}","{'coding': [{'code': 'institutional', 'display...",claim,,2012-09-16,2012-09-16
1,"[{'birthDate': '1944-05-25', 'extension': [{'u...",2025-08-31T21:23:49+00:00,[{'diagnosisCodeableConcept': {'coding': [{'co...,[{'url': 'https://bluebutton.cms.gov/resources...,{'extension': [{'url': 'https://bluebutton.cms...,f-LTEwMDAwMDAzNTUxNzY0,[{'system': 'https://bluebutton.cms.gov/resour...,[{'coverage': {'identifier': {'system': 'https...,[{'extension': [{'url': 'https://bluebutton.cm...,{'lastUpdated': '2023-05-11T21:17:36.876+00:00'},...,{'reference': '#provider-org'},Claim,active,[{'category': {'coding': [{'code': 'typeofbill...,"{'currency': 'USD', 'value': 119.62}","{'coding': [{'code': 'institutional', 'display...",claim,,2013-06-11,2013-06-11
2,"[{'birthDate': '1944-05-25', 'extension': [{'u...",2025-08-31T21:23:49+00:00,[{'diagnosisCodeableConcept': {'coding': [{'co...,[{'url': 'https://bluebutton.cms.gov/resources...,{'extension': [{'url': 'https://bluebutton.cms...,f-LTEwMDAwMDAzNTUxNzY4,[{'system': 'https://bluebutton.cms.gov/resour...,[{'coverage': {'identifier': {'system': 'https...,[{'extension': [{'url': 'https://bluebutton.cm...,{'lastUpdated': '2023-05-11T21:17:37.098+00:00'},...,{'reference': '#provider-org'},Claim,active,[{'category': {'coding': [{'code': 'typeofbill...,"{'currency': 'USD', 'value': 119.62}","{'coding': [{'code': 'institutional', 'display...",claim,,2014-04-02,2014-04-01
3,"[{'birthDate': '1944-05-25', 'extension': [{'u...",2025-08-31T21:23:49+00:00,[{'diagnosisCodeableConcept': {'coding': [{'co...,[{'url': 'https://bluebutton.cms.gov/resources...,{'extension': [{'url': 'https://bluebutton.cms...,f-LTEwMDAwMDAzNTUxNzc2,[{'system': 'https://bluebutton.cms.gov/resour...,[{'coverage': {'identifier': {'system': 'https...,[{'extension': [{'url': 'https://bluebutton.cm...,{'lastUpdated': '2023-05-11T21:17:37.145+00:00'},...,{'reference': '#provider-org'},Claim,active,[{'category': {'coding': [{'code': 'typeofbill...,"{'currency': 'USD', 'value': 119.62}","{'coding': [{'code': 'institutional', 'display...",claim,,2014-11-18,2014-11-17
4,"[{'birthDate': '1944-05-25', 'extension': [{'u...",2025-08-31T21:23:49+00:00,[{'diagnosisCodeableConcept': {'coding': [{'co...,[{'url': 'https://bluebutton.cms.gov/resources...,{'extension': [{'url': 'https://bluebutton.cms...,f-LTEwMDAwMDAzNTUxNzc4,[{'system': 'https://bluebutton.cms.gov/resour...,[{'coverage': {'identifier': {'system': 'https...,[{'extension': [{'url': 'https://bluebutton.cm...,{'lastUpdated': '2023-05-11T21:17:37.099+00:00'},...,{'reference': '#provider-org'},Claim,active,[{'category': {'coding': [{'code': 'typeofbill...,"{'currency': 'USD', 'value': 119.62}","{'coding': [{'code': 'institutional', 'display...",claim,,2016-04-04,2016-04-04


In [3]:
%watermark

Last updated: 2025-09-01T21:36:35.726067+08:00

Python implementation: CPython
Python version       : 3.11.9
IPython version      : 9.5.0

Compiler    : MSC v.1938 64 bit (AMD64)
OS          : Windows
Release     : 10
Machine     : AMD64
Processor   : Intel64 Family 6 Model 183 Stepping 1, GenuineIntel
CPU cores   : 20
Architecture: 64bit

