Causal Analysis: Effects of Nursing Home Facilities on Health Inspection Rating

See the presentation (PPT or PDF)

See the complete report

OVERVIEW

Established by the United States Centers for Medicare and Medicaid Services (CMS) in 2009, the star rating is a system that helps family members make decisions in which nursing home their senior family members would reside. The decision process of choosing the right facility is not simple. With the COVID-19 pandemic still happening, the system fails largely due to a lack of data audit and self-report bias. For example, the hypothesis testing is unable to confirm that the number of COVID-19 deaths at five-star facilities is different from one-star facilities on a significance level (Silver-Greenberg and Gebeloff, 2021). The stipulation in this study is that the CMS institution may face a massive decline in public trust. For that specific reason, the causal analysis is vital to addressing this problem.

KEY FINDINGS

The analysis shows that based on the health inspection rate during the COVID-19 pandemic, licensed practical nurses do well. However, when licensed nurses spend at least one hour with the residents, the chances of a good health inspection rating decline. Another important finding is that having at least one family member involved in the decision-making process boosts positive health inspection ratings. However, family involvement is not common.

METHOD

With attempts to establish a causal relationship, two different frameworks will be used: data science and econometrics. The data science approach begins by locating patterns from the data to test the model against the existing data. The role of econometrics is reversed. For example, the econometric approach begins by writing a causal model of economic behavior and its underlying assumptions, followed by determining whether the available data fits in the causal model.

DATASETS

Two datasets obtained from CMS databases are Minimum Data Set (MDS) Quality Measures and Provider Information datasets. The MDS dataset contains over 15,000 different providers from 50 states, including the District of Columbia. The target variable is the measure quality score. No variables hold predictive power for measuring quality scores. Some features are useful for statistical insights. The second dataset contains more than 80 features with at least 15,000 entities. At least 70 features are usable for prediction.

STATISTICAL INFERENCE

The measured quality in the MDS dataset is in use to describe the overall rating of what each facility does well associated with every measure code. The measure code is associated with a measure description explaining how the score is calculated. As indicated by the complete data quality report, the score is not normally distributed. The Empirical Cumulative Distribution Function (ECDF) tool is undertaken to compare empirical distribution with theoretical Beta distribution and determine if the empirical distribution is parametric. The alpha and beta for the theoretical distribution are unknown. The identifications for both parameters are:

$\hat\alpha = \bar x \left[\frac{\bar x (1- \bar x)}{s^2 - 1}\right]$ $\hat\beta = (1- \bar x) \left[ \frac{\bar x (1- \bar x)}{s^2-1} \right]$

Because the continuous score is ε(0,100) and cross-sectional, the continuous version of binomial called Beta distribution is used. Both parameters help build the theoretical distribution by using the random generator (Sinharay, 2010). As the result shows, the empirical distribution is not consistent with the theoretical distribution. The shape of the distribution is unlikely to change due to the Law of Large Numbers. The adjusted density plot is a demonstration.

The permutation test can be used instead for testing the null hypothesis. This hypothesis is that two groups are within the same distribution. Since this hypothesis tests against 13 different measure codes, the alpha level with Bonferroni correction is 0.38%. The purpose of Bonferroni correction is to ensure that the chance of Type I Error is minimal.

The expected difference is by averaging scores tied with a particular measure code and subtracting by these without the code. As indicated by the plot, the confidence interval is difficult to see due to small standard errors. The margins of errors are between 0.001 and 0.002. As the plot shows, the expected difference for catheter-related measure code is lower than -40, while the expected difference for depressive-related measure code is higher than 60. Intuitively, the facilities are likely to perform poorly when treating residents who have catheters inserted and left in their bladders. The facilities do better with treating residents with depressive symptoms. However, this plot is not established with a causal relationship due to self-report bias (CMS, 2021).

DATA WRANGLING

As mentioned earlier, the Provider Information dataset has many features. The challenge with this dataset is that several features are redundant (some are perfectly correlated). The wrangling process requires automation techniques to identify redundancies and leakages. The leakage refers to when the features in the training process do not exist when integrating the production, which, in turn, causes the predictive scores to overestimate. This is a common mistake in the data science field. For example, the total weighted health survey score as a feature predicting the health inspection rating is a form of leakage. The final process is to investigate if several leaked features are overlooked. As a result, 30 leaked features were found in the dataset. As a result, the number of features is reduced.

PREDICTIVE MODELING

The objective for predictive modeling is that the model should explain at least 80% of the variance for the target variable, and it should be generalizable. Not only that but it should also be well-calibrated. In order to determine if the criteria could be met, this dataset is separated into three sets: training set, validation set, and testing set. The training set is in use for the model to learn while the validation set is in use for tuning hyperparameters. Finally, the testing set is an unseen dataset. The feature and model selections are undertaken to maximize model performance.

There are 36 features in total besides the target variable. An attempt to select features manually is not possible since the total possible combinations are 68,719,476,736. Two automation are, for that specific reason, in use to optimize a model, which is the least shrinkage and selection operator (lasso) and Bayes optimal feature selection.

The lasso regularization is a popular method that proves to be successful in data science. This function is robust to the outlier but it is not differentiable due to piecewise function inderivative (Boehmke, 2021). This algorithm has the ability to identify unimportant predictors due to sparsity (predictors are set to 0). The Bayes optimal feature selection is different because it is an iterative process by balancing its needs of exploration and exploitation depending on three functions.

Objective function: The true shape of this function is not observable, and it can only reveal some data points that can otherwise be expensive to compute.
Surrogate function: The probabilistic model is being built to exploit what is known, and it alters in the light of new information.
Acquistion function: this function is to calculate a vector of hyperparameters that is likely to yield the higher local maximum of objective function using surrogate function.

One of these studies shows that the approach has the ability to compete with many state-of-the-art methods (Ahmed et al., 2015).

As a result, the best lambda for lasso is 0.005, and the R-squared score is 32.19% for the validation set, while the R-squared score for Bayes optimal feature selection is 32.87%. However, the number of features in total for lasso is 16, which is lower than the Bayesian approach (equal to 21 features in total).

The best model is light gradient boosting, which is equal to 43.47% for the validation set. This model has already been optimized using Bayesian search theory. The R-squared score for the testing set is 41.92%. The root mean square error (RMSE) score is 0.97. For example, if this model predicts that the health inspection rating is 3.5, the actual score may fall somewhere between 2.53 and 4.47. The error scores indicate how large the error is by averaging the difference between predicted and actual values.

ECONOMETRIC METHOD

To the extent that the causal relationship fails to be established, the previous discussion indicates the limits of the data science approach. Resolving these issues is by adopting the econometric approach. Some suggest that the minimal requirement for this approach is that the adjusted R-squared score must be positive. In a technical sense, the approach stipulates with model misspecification. More importantly, Gauss Markov assumptions are fundamental to the econometric approach, though the causal model may depart from some assumptions, as the following:

A1: linearity in parameters
A2: no perfect collinearity
A3: zero conditional mean error
A4: homoskedasticity and no serial correlation
A5: normality of the error

For this analysis, the causal model does not follow A1 and A4 assumptions. The endogenous variable (or target variable) is limited between 1 and 5 points and is also known to be a Limited Dependent Variable (LDV). The popular solution is logarithmic transformation without Monte Carlo simulation. The transformation is, unfortunately, incorrect. This particular model misspecification is called Duan's Smear (Goldstein, 2020).

The solution is to use a nonlinear least square called Probit with a Quasi-Maximum Likelihood Estimation condition (QMLE). This model is more efficient for heteroskedasticity. The heteroskedasticity is the variance of the residual term that is not constant through the regression. For example, the causal model is consistent and asymptotically normal, where V is not proportional to A. The assumption for consistency is:

$\sqrt{n}\hat\theta_{QMLE} - \theta_0)\sim^a N(0,A^{-1}VA^{-1})$

$A = - E\left [ H(\omega_i,\theta_0) \right ]$ $V = E[ s(\omega_i,\theta_0)s(\omega_i,\theta_0)']$

Because the Breusch-Pagan test confirms that the error variances are all equal and is rejected at the significance level, this assumption for consistency is in use. At least one coefficient is heteroskedastic. More importantly, the causal model is contemporaneously exogenous, a weaker version of strict exogenity. In other words, the serial correlation may exist since the Provider Information dataset is cross-sectional. There is another part that is worth mentioning. The abuse icon is a dummy variable that is used for the Wald test of homogeneity with IV Probit. The null hypothesis that the endogeneity does not exist failed to be rejected at 5% alpha level.

$y_i = \phi (\beta_0 + \beta_1 bed_i + \beta_2 hr_i + cond_i \beta) + u_i$

The causal model contains Φ(.) that is Probit function. The cond_iβ is a vector of conditions that include the level of quality care and competency the staff provides, type of location area, and kind of environment residents live in, and kind of vulnerability the residents have.

ACTIONABLE INSIGHTS

When the causal relationship is established, each coefficient of exogenous variables is statistically different from zero at 5% using Huber-White (HCO covariance type) robust standard errors. The pseudo-R-squared score is 32.54%, while the p-value for LLR is below 5%. The null hypothesis for LLR is that the fit of the intercept-only model and causal model is equal and is rejected. Every coefficient of exogenous variables is an important contributor to this model except for intercept. The p-value for the intercept is 0.881.

When all variables are being controlled, and the causal model is specified, the plot shows that the registered and licensed practical nurses have significant effects on health inspection ratings as expected. Interestingly, with the nurse aides being present in facilities, their positive effect on health inspection ratings is relatively less. When the interaction term is applied, there are a few interesting results that are important to mention.

When the h_i interacts with nur_aid_i, the increase in a number of hours that the nurse aides spend with residents each day has little to no impact on health inspection rating.
The registered nurses that interact with h_i have a positive impact on this rating on the significance level.
More surprisingly, when the h_i interacts with lpn_i, the coefficient is -0.0881 with a margin of error equal 0.07, and that is, the incremental increase in the number of hours that license practical nurses spend has a negative impact on this rating on the significance level.
Having family members on the council has a positive impact on this rating, but when the bed_i interacts with this variable, the coefficient is -0.011 with a margin of error equal to 0.008, which means an increase in a number of certified beds that lead to the decrease in this rating even with family members on the council.

Tethered to the causal analysis, the measured quality for depressive-related code is higher, while the score for catheter-related code is lower. There is a potential linkage between the roles of nurses. For example, the role of registered nurses is to administer medication and treatments, while the role of licensed practical nurses is to comfort the residents and provide basic care, including the insertion of catheters.

FUTURE RESEARCH

As mentioned earlier, there is a potential linkage between the roles of nurses and measure codes. However, the linkage is undetermined since the two datasets are incompatible based on the record linkage. Hopefully, future study has this particular dataset that can be used to establish the causal relationship.
The CMS pilot program is established by either helping to provide financial aid for the LPN-to-RN career pathway or promoting awareness of existing programs. Randomized controlled trials (RCT) should be used to identify the real impacts of either approach on health inspection ratings over time.
The alternative study is the existing dataset that can establish a causal relationship by comparing particular facilities that have undergone LPN-to-RN training with those that do not.
The direct solution to inflation in self-report data is to adopt AI solutions, including Natural Language Processing. For example, if the data audit is undertaken to identify inflation and adjust, this information can be used to train either machine learning or deep learning in order to make the data audit cost-effective.

REFERENCES

"Technical Details." Nursing homes including rehab services, the Centers for Medicare and Medicaid Services, Sep. 2021. https://data.cms.gov/provider-data/topics/nursing-homes/technical-details#health-inspections

Ahmed, S., Narasimhan, H., and Agarwal, S. "Bayes Optimal Feature for Supervised Learning with General Performance Measures." arXiv, 2015. http://auai.org/uai2015/proceedings/papers/72.pdf

Boehmke, B. “Regularized Regression.” UC Business Analytics R Programming Guide, University of Cincinnati, 2021. http://uc-r.github.io/regularized_regression#lasso

Datta, A., Fredrikson, M., Ko, G., Mardziel, P., and Sen, S. "Proxy Discrimination in Data-Driven Systems." Theory and Experiments with Learnt Programs, arXiv, 25 Jul. 2017. https://arxiv.org/abs/1707.08120

Freud, RJ., Wilson, WJ., and Mohr, DL. "Inferences for Two or More Means." Statistical Methods, Third Edition, Academic Press, 2010. https://doi.org/10.1016/B978-0-12-374970-3.00006-8

Goldstein, Nathan. "Lecture 1. Foundations of Microeconometrics." Microeconometrics, Zanvyl Krieger School of Arts and Sciences Johns Hopkins University, 2021.

Kaynig-Fattkau, V., Blitzstein, J., and Pfister, H. "CS109 - Data Science." Decision Trees, Harvard University, 2021. https://matterhorn.dce.harvard.edu/engage/player/watch.html?id=c22cbde8-94dd-42ad-86ef-091448ad02e4

Khandelwal, P. "Which algorithm takes the crown: Light GBM vs XGBOOST?" Analytics Vidhay, 12 Jun 2017. https://analyticsvidhya.com/blog/2017/06/which-algorithm-takes-the-crown-light-gbm-vs-xgboost/

Park, Y. and Ho JC. "PaloBoost." An Overfitting-robust TreeBoost with Out-of-Bag Sample Regularization Techniques, Emory University, 22 Jul 2018. http://arxiv-export-lb.library.cornell.edu/pdf/1807.08383

Silver-Greenberg, Jessica, and Robert Gebeloff. “Maggots, Rape and Yet Five Stars: How U.S. Ratings of Nursing Homes Mislead the Public.” How U.S. Ratings of Nursing Homes Mislead the Public, New York Times, 13 Mar. 2021. https://www.nytimes.com/2021/03/13/business/nursing-homes-ratings-medicare-covid.html

Sinharay, S. "Coninuous Probability Distributions." The International Encyclopedia of Education, Elsevier Science, 2010. https://doi.org/10.1016/B978-0-08-044894-7.01720-6

Name		Name	Last commit message	Last commit date
Latest commit History 335 Commits
Assets		Assets
Data Wrangling		Data Wrangling
Documentation		Documentation
Exploratory Data Analysis		Exploratory Data Analysis
Feature Engineering		Feature Engineering
Modeling		Modeling
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Causal Analysis: Effects of Nursing Home Facilities on Health Inspection Rating

OVERVIEW

KEY FINDINGS

METHOD

DATASETS

STATISTICAL INFERENCE

DATA WRANGLING

PREDICTIVE MODELING

ECONOMETRIC METHOD

ACTIONABLE INSIGHTS

FUTURE RESEARCH

REFERENCES

About

Releases

Packages

Languages

License

jonahwinninghoff/Causal-Analysis-Nursing-Home

Folders and files

Latest commit

History

Repository files navigation

Causal Analysis: Effects of Nursing Home Facilities on Health Inspection Rating

OVERVIEW

KEY FINDINGS

METHOD

DATASETS

STATISTICAL INFERENCE

DATA WRANGLING

PREDICTIVE MODELING

ECONOMETRIC METHOD

ACTIONABLE INSIGHTS

FUTURE RESEARCH

REFERENCES

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages