---

## 01. Explanations for observation with the good/usual prediction

---

This notebook provides some explanations of final model for observation for which the model performed well.

---

### Import packages

In [119]:
import functions as fun
import pandas as pd
import numpy as np
import pickle
import matplotlib.pyplot as plt
import dalex as dx
import lime
import lime.lime_tabular

from sklearn.metrics import r2_score, mean_squared_error, mean_absolute_error
from sklearn.inspection import permutation_importance

### Load model to be explain

In [120]:
pickle_in = open(".\..\models\XGB\MEPS_xgb_model_final_v2.pickle", "rb")
reg_xgb = pickle.load(pickle_in)


Trying to unpickle estimator OneHotEncoder from version 0.22.2 when using version 0.22.1. This might lead to breaking code or invalid results. Use at your own risk.


Trying to unpickle estimator Pipeline from version 0.22.2 when using version 0.22.1. This might lead to breaking code or invalid results. Use at your own risk.


Trying to unpickle estimator StandardScaler from version 0.22.2 when using version 0.22.1. This might lead to breaking code or invalid results. Use at your own risk.


Trying to unpickle estimator ColumnTransformer from version 0.22.2 when using version 0.22.1. This might lead to breaking code or invalid results. Use at your own risk.


Trying to unpickle estimator DummyRegressor from version 0.22.2 when using version 0.22.1. This might lead to breaking code or invalid results. Use at your own risk.


Trying to unpickle estimator DecisionTreeRegressor from version 0.22.2 when using version 0.22.1. This might lead to breaking code or invalid results. Use at your 

### Read data

In [121]:
path = ".\..\data\MEPS_data_preprocessed"
X_train, y_train = fun.read_x_y(path + "_train.csv", "HEALTHEXP")
X_test, y_test = fun.read_x_y(path + "_test.csv", "HEALTHEXP")

In [122]:
raw_test_data = pd.read_csv(path + "_test.csv")
X_test_raw = raw_test_data.drop("HEALTHEXP", axis = 1)
y_test_raw = raw_test_data["HEALTHEXP"]

### Find and explore observation with the worst prediction

In [123]:
y_pred_test = reg_xgb.predict(X_test)
obs_idx = fun.find_nth_obs_idx(y_test, y_pred_test, 0)


Given feature/column names or counts do not match the ones for the data given during fit. This will fail from v0.24.



In [124]:
# Real value (target transformed with log base 3)
y_test[obs_idx]

6.347592404936993

In [125]:
# Real value (raw)
y_test_raw[obs_idx]

1068

In [126]:
# Predicted value
y_pred_test[obs_idx]

6.34754898371872

In [127]:
# Observation
obs = X_test.iloc[[obs_idx], :]
obs

Unnamed: 0,PANEL,REGION,AGE31X,GENDER,RACE3,MARRY31X,EDRECODE,FTSTU31X,ACTDTY31,HONRDC31,...,ADSMOK42,PCS42,MCS42,K6SUM42,PHQ242,EMPST31,POVCAT15,INSCOV15,INCOME_M,PERSONWT
115,19,3,16,0.0,0.0,5,1,-1,2,3,...,-1,-1.0,-1.0,-1,-1,4,1,1,0.0,15264.069192


**Meaning of variables and their values:** <br/>
* REGION = 3.0 	 - 	 south
* AGE31X = 76.0 	 - 	 16 years old
* GENDER = 1.0 	 - 	 male
* RACE3 = 0.0 	 - 	 
* MARRY31X = 2.0 	 - 	 widowed
* EDRECODE = 13.0 	 - 	 GED or high school degree
* FTSTU31X = -1.0 	 - 	 student status - inapplicable
* ACTDTY31 = 4.0 	 - 	 military full-time active duty - over 59 - inapplicable
* HONRDC31 = 2.0 	 - 	 not honorably discharged from military
* RTHLTH31 = 5.0 	 - 	 perceived health status - poor
* MNHLTH31 = 4.0 	 - 	 perceived mental health status - fair
* HIBPDX = 1.0 	 - 	 high blood pressure diagnosed
* CHDDX = 1.0 	 - 	 coronary heart disease diagnosed
* ANGIDX = 2.0 	 - 	 angina wasn't diagnosed
* MIDX = 2.0 	 - 	 heart attack wasn't diagnosed
* OHRTDX = 2.0 	 - 	 any other heart diseases weren't diagnosed
* STRKDX = 2.0 	 - 	 stroke wasn't diagnosed
* EMPHDX = 1.0 	 - 	 emphysema diagnosed
* CHBRON31 = 1.0 	 - 	 chronic bronchitis diagnosed
* CHOLDX = 1.0 	 - 	 high cholesterol diagnosed
* CANCERDX = 1.0 	 - 	 cancer diagnosed
* DIABDX = 2.0 	 - 	 diabetes wasn't diagnosed
* JTPAIN31 = 1.0 	 - 	 joint pain last 12 months diagnosed
* ARTHDX = 1.0 	 - 	 arthritis diagnosed
* ARTHTYPE = 1.0 	 - 	 type of arthritis - rheumatoid
* ASTHDX = 2.0 	 - 	 asthma wasn't diagnosed
* ADHDADDX = -1.0 	 - 	 ADHD or ADD diagnosis - inapplicable
* PREGNT31 = -1.0 	 - 	 pregnant - inapplicable
* WLKLIM31 = 1.0 	 - 	 has limitation in physical functioning
* ACTLIM31 = 1.0 	 - 	 has any other limitation work/house work/school
* SOCLIM31 = 1.0 	 - 	 has social limitation
* COGLIM31 = 1.0 	 - 	 has cognitive limitation
* DFHEAR42 = 2.0 	 - 	 hasn't serious difficulty hearing
* DFSEE42 = 1.0 	 - 	 has serious difficulty see or wears glasses
* ADSMOK42 = -1.0 	 - 	 doesn't smoke
* PCS42 = -1.0 	 - 	 saq:phy component summry sf-12v2 imputed - inapplicable
* MCS42 = -1.0 	 - 	 mnt component summry sf-12v2 imputed - inapplicable
* K6SUM42 = -1.0 	 - 	 overall rating of feelings - inapplicable (last 30 days)
* PHQ242 = -1.0 	 - 	 overall rating of feelings - inapplicable (last 2 weeks)
* EMPST31 = 4.0 	 - 	 employment status - ?
* POVCAT15 = 5.0 	 - 	 family income as % of poverty line - high income
* INSCOV15 = 2.0 	 - 	 health insurance coverage indicator 2015 - public only
* INCOME_M = 12345.0 - 	 person total income = 12345.0 

### Explanations

#### 1) Break Down / Shap

In [128]:
import dalex as dx

In [155]:
X_train

Unnamed: 0,PANEL,REGION,AGE31X,GENDER,RACE3,MARRY31X,EDRECODE,FTSTU31X,ACTDTY31,HONRDC31,...,ADSMOK42,PCS42,MCS42,K6SUM42,PHQ242,EMPST31,POVCAT15,INSCOV15,INCOME_M,PERSONWT
0,20,1,49,1.0,0.0,1,16,-1,2,2,...,1,51.29,59.04,1,0,1,3,1,8400.0,6156.790949
1,19,3,43,1.0,0.0,4,14,-1,2,2,...,1,19.36,31.90,14,2,4,1,2,0.0,23114.487222
2,19,1,75,1.0,0.0,1,13,-1,4,2,...,2,25.23,45.46,6,2,4,4,1,22619.0,17966.491961
3,20,1,26,0.0,1.0,5,13,-1,2,2,...,2,49.13,63.97,4,0,1,3,1,20000.0,4175.967957
4,20,3,43,1.0,0.0,9,14,-1,2,2,...,-1,-1.00,-1.00,-1,-1,1,4,1,58000.0,8877.535274
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
12840,20,1,46,0.0,0.0,5,15,-1,2,2,...,-1,-1.00,-1.00,-1,-1,1,5,1,204359.0,15195.760703
12841,19,3,7,0.0,0.0,6,1,-1,3,3,...,-1,-1.00,-1.00,-1,-1,-1,4,1,0.0,15065.800371
12842,20,2,73,1.0,0.0,2,15,-1,4,2,...,2,56.95,52.34,3,0,4,5,2,63726.0,22527.032283
12843,20,4,30,0.0,0.0,5,15,-1,2,2,...,2,62.09,50.54,5,0,4,1,3,0.0,6208.758746


In [148]:
for col in ['RTHLTH31', 'ADSMOK42', 'INSCOV15','ACTDTY31','MNHLTH31','PREGNT31']:
    X_train[col] =  X_train[col].astype(str)

In [149]:
exp = dx.Explainer(reg_xgb, X_train, y_train, label = "MEPS XGB Pipeline")

Preparation of a new explainer is initiated

  -> label             : MEPS XGB Pipeline
  -> data              : 12845 rows 45 cols
  -> target variable   : Argument 'y' was a pandas.Series. Converted to a numpy.ndarray.
  -> target variable   : 12845 values
  -> predict function  : <function yhat_default at 0x00000184D0023EA0> will be used



Given feature/column names or counts do not match the ones for the data given during fit. This will fail from v0.24.



  -> predicted values  : min = 1.1638865003531105, mean = 5.758212948154973, max = 10.352030167136506
  -> residual function : difference between y and yhat
  -> residuals         : min = -8.009123652384009, mean = -0.04679928495844871, max = 5.9315396363505295


Given feature/column names or counts do not match the ones for the data given during fit. This will fail from v0.24.




  -> model_info        : package sklearn

A new explainer has been created!


In [150]:
X_train

Unnamed: 0,PANEL,REGION,AGE31X,GENDER,RACE3,MARRY31X,EDRECODE,FTSTU31X,ACTDTY31,HONRDC31,...,ADSMOK42,PCS42,MCS42,K6SUM42,PHQ242,EMPST31,POVCAT15,INSCOV15,INCOME_M,PERSONWT
0,20,1,49,1.0,0.0,1,16,-1,2,2,...,1,51.29,59.04,1,0,1,3,1,8400.0,6156.790949
1,19,3,43,1.0,0.0,4,14,-1,2,2,...,1,19.36,31.90,14,2,4,1,2,0.0,23114.487222
2,19,1,75,1.0,0.0,1,13,-1,4,2,...,2,25.23,45.46,6,2,4,4,1,22619.0,17966.491961
3,20,1,26,0.0,1.0,5,13,-1,2,2,...,2,49.13,63.97,4,0,1,3,1,20000.0,4175.967957
4,20,3,43,1.0,0.0,9,14,-1,2,2,...,-1,-1.00,-1.00,-1,-1,1,4,1,58000.0,8877.535274
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
12840,20,1,46,0.0,0.0,5,15,-1,2,2,...,-1,-1.00,-1.00,-1,-1,1,5,1,204359.0,15195.760703
12841,19,3,7,0.0,0.0,6,1,-1,3,3,...,-1,-1.00,-1.00,-1,-1,-1,4,1,0.0,15065.800371
12842,20,2,73,1.0,0.0,2,15,-1,4,2,...,2,56.95,52.34,3,0,4,5,2,63726.0,22527.032283
12843,20,4,30,0.0,0.0,5,15,-1,2,2,...,2,62.09,50.54,5,0,4,1,3,0.0,6208.758746


In [151]:
X_train.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 12845 entries, 0 to 12844
Data columns (total 45 columns):
 #   Column    Non-Null Count  Dtype  
---  ------    --------------  -----  
 0   PANEL     12845 non-null  int64  
 1   REGION    12845 non-null  int64  
 2   AGE31X    12845 non-null  int64  
 3   GENDER    12845 non-null  float64
 4   RACE3     12845 non-null  float64
 5   MARRY31X  12845 non-null  int64  
 6   EDRECODE  12845 non-null  int64  
 7   FTSTU31X  12845 non-null  int64  
 8   ACTDTY31  12845 non-null  object 
 9   HONRDC31  12845 non-null  int64  
 10  RTHLTH31  12845 non-null  object 
 11  MNHLTH31  12845 non-null  object 
 12  HIBPDX    12845 non-null  int64  
 13  CHDDX     12845 non-null  int64  
 14  ANGIDX    12845 non-null  int64  
 15  MIDX      12845 non-null  int64  
 16  OHRTDX    12845 non-null  int64  
 17  STRKDX    12845 non-null  int64  
 18  EMPHDX    12845 non-null  int64  
 19  CHBRON31  12845 non-null  int64  
 20  CHOLDX    12845 non-null  in

In [153]:
pdp_cat = exp.model_profile(type = 'partial', variable_type='categorical',
                            variables = ["ADSMOK42","MNHLTH31","PREGNT31"])
pdp_cat.result['_label_'] = 'pdp'

ale_cat = exp.model_profile(type = 'accumulated', variable_type='categorical', 
                            variables = ["ADSMOK42","MNHLTH31",'PREGNT31'])
ale_cat.result['_label_'] = 'ale'




Given feature/column names or counts do not match the ones for the data given during fit. This will fail from v0.24.


Given feature/column names or counts do not match the ones for the data given during fit. This will fail from v0.24.


Given feature/column names or counts do not match the ones for the data given during fit. This will fail from v0.24.


Given feature/column names or counts do not match the ones for the data given during fit. This will fail from v0.24.


Given feature/column names or counts do not match the ones for the data given during fit. This will fail from v0.24.


Given feature/column names or counts do not match the ones for the data given during fit. This will fail from v0.24.


Given feature/column names or counts do not match the ones for the data given during fit. This will fail from v0.24.


Given feature/column names or counts do not match the ones for the data given during fit. This will fail from v0.24.


Given feature/column names or counts do not mat


Given feature/column names or counts do not match the ones for the data given during fit. This will fail from v0.24.


Given feature/column names or counts do not match the ones for the data given during fit. This will fail from v0.24.


Given feature/column names or counts do not match the ones for the data given during fit. This will fail from v0.24.


Given feature/column names or counts do not match the ones for the data given during fit. This will fail from v0.24.


Given feature/column names or counts do not match the ones for the data given during fit. This will fail from v0.24.


Given feature/column names or counts do not match the ones for the data given during fit. This will fail from v0.24.


Given feature/column names or counts do not match the ones for the data given during fit. This will fail from v0.24.


Given feature/column names or counts do not match the ones for the data given during fit. This will fail from v0.24.


Given feature/column names or counts do not mat


Given feature/column names or counts do not match the ones for the data given during fit. This will fail from v0.24.


Given feature/column names or counts do not match the ones for the data given during fit. This will fail from v0.24.


Given feature/column names or counts do not match the ones for the data given during fit. This will fail from v0.24.


Given feature/column names or counts do not match the ones for the data given during fit. This will fail from v0.24.


Given feature/column names or counts do not match the ones for the data given during fit. This will fail from v0.24.


Given feature/column names or counts do not match the ones for the data given during fit. This will fail from v0.24.


Given feature/column names or counts do not match the ones for the data given during fit. This will fail from v0.24.


Given feature/column names or counts do not match the ones for the data given during fit. This will fail from v0.24.


Given feature/column names or counts do not mat


Given feature/column names or counts do not match the ones for the data given during fit. This will fail from v0.24.


Given feature/column names or counts do not match the ones for the data given during fit. This will fail from v0.24.


Given feature/column names or counts do not match the ones for the data given during fit. This will fail from v0.24.


Given feature/column names or counts do not match the ones for the data given during fit. This will fail from v0.24.


Given feature/column names or counts do not match the ones for the data given during fit. This will fail from v0.24.


Given feature/column names or counts do not match the ones for the data given during fit. This will fail from v0.24.

Calculating ceteris paribus!: 100%|████████████████████████████████████████████████████| 45/45 [00:05<00:00,  8.73it/s]

Given feature/column names or counts do not match the ones for the data given during fit. This will fail from v0.24.

Calculating accumulated dependency!: 100%|█████

In [154]:
fig5 = ale_cat.plot(pdp_cat, size = 4, facet_ncol=1)
# fig5 = ale_cat.plot( size = 4, facet_ncol=1)
fig5.write_image("06_dataset_ADSMOK42_pdp_ale.png", scale = 2)

In [72]:
X_train

Unnamed: 0,PANEL,REGION,AGE31X,GENDER,RACE3,MARRY31X,EDRECODE,FTSTU31X,ACTDTY31,HONRDC31,...,ADSMOK42,PCS42,MCS42,K6SUM42,PHQ242,EMPST31,POVCAT15,INSCOV15,INCOME_M,PERSONWT
0,20,1,49,1.0,0.0,1,16,-1,2,2,...,1,51.29,59.04,1,0,1,3,1,8400.0,6156.790949
1,19,3,43,1.0,0.0,4,14,-1,2,2,...,1,19.36,31.90,14,2,4,1,2,0.0,23114.487222
2,19,1,75,1.0,0.0,1,13,-1,4,2,...,2,25.23,45.46,6,2,4,4,1,22619.0,17966.491961
3,20,1,26,0.0,1.0,5,13,-1,2,2,...,2,49.13,63.97,4,0,1,3,1,20000.0,4175.967957
4,20,3,43,1.0,0.0,9,14,-1,2,2,...,-1,-1.00,-1.00,-1,-1,1,4,1,58000.0,8877.535274
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
12840,20,1,46,0.0,0.0,5,15,-1,2,2,...,-1,-1.00,-1.00,-1,-1,1,5,1,204359.0,15195.760703
12841,19,3,7,0.0,0.0,6,1,-1,3,3,...,-1,-1.00,-1.00,-1,-1,-1,4,1,0.0,15065.800371
12842,20,2,73,1.0,0.0,2,15,-1,4,2,...,2,56.95,52.34,3,0,4,5,2,63726.0,22527.032283
12843,20,4,30,0.0,0.0,5,15,-1,2,2,...,2,62.09,50.54,5,0,4,1,3,0.0,6208.758746


In [73]:
pdp_cat = exp.model_profile(type = 'partial', variable_type='numerical',
                            variables = ["AGE31X"])
pdp_cat.result['_label_'] = 'pdp'

ale_cat = exp.model_profile(type = 'accumulated', variable_type='numerical', 
                            variables = ["AGE31X"])
ale_cat.result['_label_'] = 'ale'


Given feature/column names or counts do not match the ones for the data given during fit. This will fail from v0.24.


Given feature/column names or counts do not match the ones for the data given during fit. This will fail from v0.24.


Given feature/column names or counts do not match the ones for the data given during fit. This will fail from v0.24.


Given feature/column names or counts do not match the ones for the data given during fit. This will fail from v0.24.


Given feature/column names or counts do not match the ones for the data given during fit. This will fail from v0.24.


Given feature/column names or counts do not match the ones for the data given during fit. This will fail from v0.24.


Given feature/column names or counts do not match the ones for the data given during fit. This will fail from v0.24.


Given feature/column names or counts do not match the ones for the data given during fit. This will fail from v0.24.


Given feature/column names or counts do not mat


Given feature/column names or counts do not match the ones for the data given during fit. This will fail from v0.24.


Given feature/column names or counts do not match the ones for the data given during fit. This will fail from v0.24.


Given feature/column names or counts do not match the ones for the data given during fit. This will fail from v0.24.


Given feature/column names or counts do not match the ones for the data given during fit. This will fail from v0.24.


Given feature/column names or counts do not match the ones for the data given during fit. This will fail from v0.24.


Given feature/column names or counts do not match the ones for the data given during fit. This will fail from v0.24.


Given feature/column names or counts do not match the ones for the data given during fit. This will fail from v0.24.


Given feature/column names or counts do not match the ones for the data given during fit. This will fail from v0.24.


Given feature/column names or counts do not mat


Given feature/column names or counts do not match the ones for the data given during fit. This will fail from v0.24.


Given feature/column names or counts do not match the ones for the data given during fit. This will fail from v0.24.


Given feature/column names or counts do not match the ones for the data given during fit. This will fail from v0.24.


Given feature/column names or counts do not match the ones for the data given during fit. This will fail from v0.24.


Given feature/column names or counts do not match the ones for the data given during fit. This will fail from v0.24.


Given feature/column names or counts do not match the ones for the data given during fit. This will fail from v0.24.


Given feature/column names or counts do not match the ones for the data given during fit. This will fail from v0.24.


Given feature/column names or counts do not match the ones for the data given during fit. This will fail from v0.24.


Given feature/column names or counts do not mat


Given feature/column names or counts do not match the ones for the data given during fit. This will fail from v0.24.


Given feature/column names or counts do not match the ones for the data given during fit. This will fail from v0.24.


Given feature/column names or counts do not match the ones for the data given during fit. This will fail from v0.24.


Given feature/column names or counts do not match the ones for the data given during fit. This will fail from v0.24.


Given feature/column names or counts do not match the ones for the data given during fit. This will fail from v0.24.


Given feature/column names or counts do not match the ones for the data given during fit. This will fail from v0.24.

Calculating ceteris paribus!: 100%|████████████████████████████████████████████████████| 45/45 [00:05<00:00,  8.09it/s]

Given feature/column names or counts do not match the ones for the data given during fit. This will fail from v0.24.

Calculating accumulated dependency!: 100%|█████

In [76]:
# fig = go.Figure()
fig = ale_cat.plot(pdp_cat, size = 4, facet_ncol=1)

In [77]:
fig.write_image("06_AGE_pdp_ale.png", scale = 2)

In [78]:
pdp_cat = exp.model_profile(type = 'partial', variable_type='categorical',
                            variables = ['RTHLTH31'])
pdp_cat.result['_label_'] = 'pdp'

ale_cat = exp.model_profile(type = 'accumulated', variable_type='categorical', 
                            variables = ['RTHLTH31'])
ale_cat.result['_label_'] = 'ale'


Given feature/column names or counts do not match the ones for the data given during fit. This will fail from v0.24.


Given feature/column names or counts do not match the ones for the data given during fit. This will fail from v0.24.


Given feature/column names or counts do not match the ones for the data given during fit. This will fail from v0.24.


Given feature/column names or counts do not match the ones for the data given during fit. This will fail from v0.24.


Given feature/column names or counts do not match the ones for the data given during fit. This will fail from v0.24.


Given feature/column names or counts do not match the ones for the data given during fit. This will fail from v0.24.


Given feature/column names or counts do not match the ones for the data given during fit. This will fail from v0.24.


Given feature/column names or counts do not match the ones for the data given during fit. This will fail from v0.24.


Given feature/column names or counts do not mat


Given feature/column names or counts do not match the ones for the data given during fit. This will fail from v0.24.


Given feature/column names or counts do not match the ones for the data given during fit. This will fail from v0.24.


Given feature/column names or counts do not match the ones for the data given during fit. This will fail from v0.24.


Given feature/column names or counts do not match the ones for the data given during fit. This will fail from v0.24.


Given feature/column names or counts do not match the ones for the data given during fit. This will fail from v0.24.


Given feature/column names or counts do not match the ones for the data given during fit. This will fail from v0.24.


Given feature/column names or counts do not match the ones for the data given during fit. This will fail from v0.24.


Given feature/column names or counts do not match the ones for the data given during fit. This will fail from v0.24.


Given feature/column names or counts do not mat


Given feature/column names or counts do not match the ones for the data given during fit. This will fail from v0.24.


Given feature/column names or counts do not match the ones for the data given during fit. This will fail from v0.24.


Given feature/column names or counts do not match the ones for the data given during fit. This will fail from v0.24.


Given feature/column names or counts do not match the ones for the data given during fit. This will fail from v0.24.


Given feature/column names or counts do not match the ones for the data given during fit. This will fail from v0.24.


Given feature/column names or counts do not match the ones for the data given during fit. This will fail from v0.24.


Given feature/column names or counts do not match the ones for the data given during fit. This will fail from v0.24.


Given feature/column names or counts do not match the ones for the data given during fit. This will fail from v0.24.


Given feature/column names or counts do not mat


Given feature/column names or counts do not match the ones for the data given during fit. This will fail from v0.24.


Given feature/column names or counts do not match the ones for the data given during fit. This will fail from v0.24.


Given feature/column names or counts do not match the ones for the data given during fit. This will fail from v0.24.


Given feature/column names or counts do not match the ones for the data given during fit. This will fail from v0.24.


Given feature/column names or counts do not match the ones for the data given during fit. This will fail from v0.24.


Given feature/column names or counts do not match the ones for the data given during fit. This will fail from v0.24.

Calculating ceteris paribus!: 100%|████████████████████████████████████████████████████| 45/45 [00:05<00:00,  8.57it/s]

Given feature/column names or counts do not match the ones for the data given during fit. This will fail from v0.24.

Calculating accumulated dependency!: 100%|█████

In [79]:
fig2 = ale_cat.plot(pdp_cat, size = 4, facet_ncol=1)

In [80]:
fig2.write_image("06_RTHLTH31_pdp_ale.png", scale = 2)

In [81]:
pdp_cat = exp.model_profile(type = 'partial', variable_type='numerical',
                            variables = ["PCS42"])
pdp_cat.result['_label_'] = 'pdp'

ale_cat = exp.model_profile(type = 'accumulated', variable_type='numerical', 
                            variables = ["PCS42"])
ale_cat.result['_label_'] = 'ale'


Given feature/column names or counts do not match the ones for the data given during fit. This will fail from v0.24.


Given feature/column names or counts do not match the ones for the data given during fit. This will fail from v0.24.


Given feature/column names or counts do not match the ones for the data given during fit. This will fail from v0.24.


Given feature/column names or counts do not match the ones for the data given during fit. This will fail from v0.24.


Given feature/column names or counts do not match the ones for the data given during fit. This will fail from v0.24.


Given feature/column names or counts do not match the ones for the data given during fit. This will fail from v0.24.


Given feature/column names or counts do not match the ones for the data given during fit. This will fail from v0.24.


Given feature/column names or counts do not match the ones for the data given during fit. This will fail from v0.24.


Given feature/column names or counts do not mat


Given feature/column names or counts do not match the ones for the data given during fit. This will fail from v0.24.


Given feature/column names or counts do not match the ones for the data given during fit. This will fail from v0.24.


Given feature/column names or counts do not match the ones for the data given during fit. This will fail from v0.24.


Given feature/column names or counts do not match the ones for the data given during fit. This will fail from v0.24.


Given feature/column names or counts do not match the ones for the data given during fit. This will fail from v0.24.


Given feature/column names or counts do not match the ones for the data given during fit. This will fail from v0.24.


Given feature/column names or counts do not match the ones for the data given during fit. This will fail from v0.24.


Given feature/column names or counts do not match the ones for the data given during fit. This will fail from v0.24.


Given feature/column names or counts do not mat


Given feature/column names or counts do not match the ones for the data given during fit. This will fail from v0.24.


Given feature/column names or counts do not match the ones for the data given during fit. This will fail from v0.24.


Given feature/column names or counts do not match the ones for the data given during fit. This will fail from v0.24.


Given feature/column names or counts do not match the ones for the data given during fit. This will fail from v0.24.


Given feature/column names or counts do not match the ones for the data given during fit. This will fail from v0.24.


Given feature/column names or counts do not match the ones for the data given during fit. This will fail from v0.24.


Given feature/column names or counts do not match the ones for the data given during fit. This will fail from v0.24.


Given feature/column names or counts do not match the ones for the data given during fit. This will fail from v0.24.


Given feature/column names or counts do not mat


Given feature/column names or counts do not match the ones for the data given during fit. This will fail from v0.24.


Given feature/column names or counts do not match the ones for the data given during fit. This will fail from v0.24.


Given feature/column names or counts do not match the ones for the data given during fit. This will fail from v0.24.


Given feature/column names or counts do not match the ones for the data given during fit. This will fail from v0.24.


Given feature/column names or counts do not match the ones for the data given during fit. This will fail from v0.24.


Given feature/column names or counts do not match the ones for the data given during fit. This will fail from v0.24.


Given feature/column names or counts do not match the ones for the data given during fit. This will fail from v0.24.


Given feature/column names or counts do not match the ones for the data given during fit. This will fail from v0.24.


Given feature/column names or counts do not mat

In [82]:
fig3 = ale_cat.plot(pdp_cat, size = 4, facet_ncol=1)
fig3.write_image("06_PCS42_pdp_ale.png", scale = 2)

In [90]:
pdp_cat = exp.model_profile(type = 'partial', variable_type='categorical',
                            variables = ["MNHLTH31"])
pdp_cat.result['_label_'] = 'pdp'

ale_cat = exp.model_profile(type = 'accumulated', variable_type='categorical', 
                            variables = ["MNHLTH31"])
ale_cat.result['_label_'] = 'ale'




Given feature/column names or counts do not match the ones for the data given during fit. This will fail from v0.24.


Given feature/column names or counts do not match the ones for the data given during fit. This will fail from v0.24.


Given feature/column names or counts do not match the ones for the data given during fit. This will fail from v0.24.


Given feature/column names or counts do not match the ones for the data given during fit. This will fail from v0.24.


Given feature/column names or counts do not match the ones for the data given during fit. This will fail from v0.24.


Given feature/column names or counts do not match the ones for the data given during fit. This will fail from v0.24.


Given feature/column names or counts do not match the ones for the data given during fit. This will fail from v0.24.


Given feature/column names or counts do not match the ones for the data given during fit. This will fail from v0.24.


Given feature/column names or counts do not mat


Given feature/column names or counts do not match the ones for the data given during fit. This will fail from v0.24.


Given feature/column names or counts do not match the ones for the data given during fit. This will fail from v0.24.


Given feature/column names or counts do not match the ones for the data given during fit. This will fail from v0.24.


Given feature/column names or counts do not match the ones for the data given during fit. This will fail from v0.24.


Given feature/column names or counts do not match the ones for the data given during fit. This will fail from v0.24.


Given feature/column names or counts do not match the ones for the data given during fit. This will fail from v0.24.


Given feature/column names or counts do not match the ones for the data given during fit. This will fail from v0.24.


Given feature/column names or counts do not match the ones for the data given during fit. This will fail from v0.24.


Given feature/column names or counts do not mat


Given feature/column names or counts do not match the ones for the data given during fit. This will fail from v0.24.


Given feature/column names or counts do not match the ones for the data given during fit. This will fail from v0.24.


Given feature/column names or counts do not match the ones for the data given during fit. This will fail from v0.24.


Given feature/column names or counts do not match the ones for the data given during fit. This will fail from v0.24.


Given feature/column names or counts do not match the ones for the data given during fit. This will fail from v0.24.


Given feature/column names or counts do not match the ones for the data given during fit. This will fail from v0.24.


Given feature/column names or counts do not match the ones for the data given during fit. This will fail from v0.24.


Given feature/column names or counts do not match the ones for the data given during fit. This will fail from v0.24.


Given feature/column names or counts do not mat


Given feature/column names or counts do not match the ones for the data given during fit. This will fail from v0.24.


Given feature/column names or counts do not match the ones for the data given during fit. This will fail from v0.24.


Given feature/column names or counts do not match the ones for the data given during fit. This will fail from v0.24.


Given feature/column names or counts do not match the ones for the data given during fit. This will fail from v0.24.


Given feature/column names or counts do not match the ones for the data given during fit. This will fail from v0.24.

Calculating ceteris paribus!: 100%|████████████████████████████████████████████████████| 45/45 [00:05<00:00,  8.55it/s]

Given feature/column names or counts do not match the ones for the data given during fit. This will fail from v0.24.

Calculating accumulated dependency!: 100%|███████████████████████████████████████████████| 1/1 [00:00<00:00,  3.43it/s]


In [92]:
fig4 = ale_cat.plot(pdp_cat, size = 4, facet_ncol=1)
fig4.write_image("06_MNHLTH31_pdp_ale.png", scale = 2)

In [None]:
X_train['ADSMOK42']


Given feature/column names or counts do not match the ones for the data given during fit. This will fail from v0.24.


Given feature/column names or counts do not match the ones for the data given during fit. This will fail from v0.24.


Given feature/column names or counts do not match the ones for the data given during fit. This will fail from v0.24.


Given feature/column names or counts do not match the ones for the data given during fit. This will fail from v0.24.


Given feature/column names or counts do not match the ones for the data given during fit. This will fail from v0.24.


Given feature/column names or counts do not match the ones for the data given during fit. This will fail from v0.24.


Given feature/column names or counts do not match the ones for the data given during fit. This will fail from v0.24.


Given feature/column names or counts do not match the ones for the data given during fit. This will fail from v0.24.


Given feature/column names or counts do not mat


Given feature/column names or counts do not match the ones for the data given during fit. This will fail from v0.24.


Given feature/column names or counts do not match the ones for the data given during fit. This will fail from v0.24.


Given feature/column names or counts do not match the ones for the data given during fit. This will fail from v0.24.


Given feature/column names or counts do not match the ones for the data given during fit. This will fail from v0.24.


Given feature/column names or counts do not match the ones for the data given during fit. This will fail from v0.24.


Given feature/column names or counts do not match the ones for the data given during fit. This will fail from v0.24.


Given feature/column names or counts do not match the ones for the data given during fit. This will fail from v0.24.


Given feature/column names or counts do not match the ones for the data given during fit. This will fail from v0.24.


Given feature/column names or counts do not mat


Given feature/column names or counts do not match the ones for the data given during fit. This will fail from v0.24.


Given feature/column names or counts do not match the ones for the data given during fit. This will fail from v0.24.


Given feature/column names or counts do not match the ones for the data given during fit. This will fail from v0.24.


Given feature/column names or counts do not match the ones for the data given during fit. This will fail from v0.24.


Given feature/column names or counts do not match the ones for the data given during fit. This will fail from v0.24.


Given feature/column names or counts do not match the ones for the data given during fit. This will fail from v0.24.


Given feature/column names or counts do not match the ones for the data given during fit. This will fail from v0.24.


Given feature/column names or counts do not match the ones for the data given during fit. This will fail from v0.24.


Given feature/column names or counts do not mat


Given feature/column names or counts do not match the ones for the data given during fit. This will fail from v0.24.


Given feature/column names or counts do not match the ones for the data given during fit. This will fail from v0.24.


Given feature/column names or counts do not match the ones for the data given during fit. This will fail from v0.24.


Given feature/column names or counts do not match the ones for the data given during fit. This will fail from v0.24.


Given feature/column names or counts do not match the ones for the data given during fit. This will fail from v0.24.

Calculating ceteris paribus!: 100%|████████████████████████████████████████████████████| 45/45 [00:04<00:00,  9.13it/s]

Given feature/column names or counts do not match the ones for the data given during fit. This will fail from v0.24.

Calculating accumulated dependency!: 100%|███████████████████████████████████████████████| 1/1 [00:00<00:00,  3.87it/s]
