# Deep Claim: Payer Response Prediction from Claims Data with Deep Learning

In [1]:
from IPython.display import IFrame
IFrame('https://arxiv.org/pdf/2007.06229.pdf', width=1200, height=550)

## Abstract
Each year, almost 10% of claims are denied by payers (i.e., health insurance plans). With the cost to recover these denials and underpayments, predicting payer response (likelihood of payment) from claims data with a high degree of accuracyand precision is anticipated to improve healthcare staffs’ performance productivity and drive better patient financial experience and satisfaction in the revenue cycle (Barkholz, 2017).  

However, con-structing advanced predictive analytics models has been considered challenging in the last twentyyears. That said, 

we propose __a (low-level) context-dependent compact representation of patients’ historical claim records by effectively learning complicated dependencies in the (high-level) claim inputs__.  

Built on this new latent representation,we demonstrate that __a deep learning-based frame-work__, Deep Claim, can accurately predict various responses from multiple payers using 2,905,026 de-identified claims data from two US health sys-tems. Deep Claim’s improvements over carefully chosen baselines in predicting claim denials aremost pronounced as 22.21% relative recall gain(at 95% precision) on Health System A, whichimplies Deep Claim can find 22.21% more denials than the best baseline system

## Deep Claim
We propose Deep Claim as a neural network-based system to predict whether, when, and how much a payer will pay for each claim. 

Deep Claim takes the claims data composed of 
* demographic(人口统计数据) information, 
* diagnoses, 
* treatments, and 
* billed amounts(账单金额 

as an input. 

Given that, Deep Claim predicts 
* the first response date, 
* denial probability, 
* denial reason codes with probability, and 
* questionable fields in the claim. 

In this section, we describe the Deep Claim model in detail, which the complete architecture illustrated in Figure 1.

Figure 1.Architecture of a Deep Claim for the payer response prediction as described in Section 3.

### 3.1. Claims Input Representation 
The claim vector we create from the raw claim is composed of a huge number of variables (i.e., features) - 
* subscriber gender, 
* an individual relationship code, 
* a payer state, 
* the duration of the corresponding service, 
* the subscriber’s age,
* the patient’s age, 
* a payer identifier, 
* the total charges, 
* the services date, and 
* transmission of the claim date. 

The claim vector also includes an indication of procedures performed and diagnoses received.  The value of each feature is as-signed a single unique token for singular elements or sub-context vectors of tokens for procedures and diagnoses (that can have multiple values).We tokenize procedures and diagnoses and map them toa sub-context vector of tokens.  Less frequent tokens are mapped to an out-of-vocabulary (OOV) token (for example,procedure token appears less than 500 times in the dataset).


We also normalize numeric values. 
* The date is mapped totokens in years, months, and days.  
* The charge amount in dollars is quantized to thousands, hundreds, tens, and ones. 
* The patient’s age is discretized in years.

After defining the features, we categorize them into three contextual categories: 
* procedure, 
* diagnosis, and 
* other features regarding the claim, such as demographic patient information.   

Procedures  and  diagnosis  token  vectors  can be expressed as a normalized count vector (e.g., relative frequency) xc and xd with a length of the possible procedure and diagnosis tokens respectively. All the other single unique feature tokens can be comprised as xo, which is a binary vector of a length of the total number of single unique tokens. One can piece all of them together to converta single claim to a concatenated vector x as (xc,xd,xo). Typically, this vector x can have a length in the thousands and be the extremely sparse vector.

### 3.2. Claims Embedding Network
Unlike natural language sentences,  the extremely sparse vectorx is an unordered collection of medical events and aggregations of diverse code types that encapsulates various aspects of complicate dependencies.  So it is not straight-forward to apply off-the-shelf NLP embedding techniquesfor compressing this sparse vector into a fixed-sized latentvectorh(94  in  our  experiments).   Instead,  we  leveragegating mechanism, which is essential for recurrent neuralnetworks  (Hochreiter  &  Schmidhuber,  1997;  Cho  et  al.,2014) and bilinear models (Tenenbaum & Freeman, 2000;Kim et al., 2017) that provide richer representations thanlinear models.  To be specific, we propose the followingnovel methods to learn effective embedding representationmappingsH:x→hof each claim by activating the gateover each context sub-vector to extract inter-componentdependencies within each category and combining themfurther to learn intra-dependencies among the context sub-vectors by taking the pairwise inner product in the latentlow-dimensional space.First,   we  convert  each  sub-category  vector  to  lower-dimensional context vectorsf(0,i)simply asσ(W(0,i)fxi+b(0,i)f)whereWfis the low-dimensional embedding ma-trix andσis a ReLU (Nair & Hinton, 2010) function fori={c,d,o}. Then, the context vectorsc(0,i)modulated bythe gates is represented asf(0,i)g(0,i)(W(0,i)gxi+b(0,i)g)wheregis the Softmax function anddenotes element-wise multiplication. These gate activation values over eachsub-vector can be viewed as dynamic importance scores ofthe (high-level) input feature that enables learnable featureselection and simultaneous dimensionality reduction whilehandling sparsity in each sub-vector.  To further increasethe hierarchy of gated layers like a probabilistic decisiontree, we add one more set of gated networks forc(1,c)in the