# Shapley calculations: Example II (multiple cores)

In this example, we have previously created a classifier. We have the data used to create this classifier, and now we want to compute SHAP explainability scores for this classifier using multiple CPU cores (to speed things up a bit). Time should scale linearly with the number of cores available. Because the model has to be pushed to each core, it's advisable to use as slim of a model as possible. 

In [1]:
from simba.mixins.train_model_mixin import TrainModelMixin
from simba.mixins.config_reader import ConfigReader
from simba.utils.read_write import read_config_file, read_pickle
import glob

In [2]:
# DEFINITIONS
CONFIG_PATH = r"C:\troubleshooting\mitra\project_folder\project_config.ini"
CLASSIFIER_PATH = r"C:\troubleshooting\mitra\models\generated_models\grooming.sav"
CLASSIFIER_NAME = 'grooming'
COUNT_PRESENT = 250
COUNT_ABSENT = 250

In [3]:
# READ IN THE CONFIG AND THE CLASSIFIER
config = read_config_file(config_path=CONFIG_PATH)
config_object = ConfigReader(config_path=CONFIG_PATH)
clf = read_pickle(data_path=CLASSIFIER_PATH)

In [4]:
# READ IN THE DATA 

#Read in the path to all files inside the project_folder/csv/targets_inserted directory
file_paths = glob.glob(config_object.targets_folder + '/*' + config_object.file_type)

#Reads in the data held in all files in ``file_paths`` defined above
data, _ = TrainModelMixin().read_all_files_in_folder_mp(file_paths=file_paths, file_type=config.get('General settings', 'workflow_file_type').strip())

#We find all behavior annotations that are NOT the targets. I.e., if SHAP values for Attack is going to be calculated, bit we need to find which other annotations exist in the data e.g., Escape and Defensive.
non_target_annotations = TrainModelMixin().read_in_all_model_names_to_remove(config=config, model_cnt=config_object.clf_cnt, clf_name=CLASSIFIER_NAME)

# We remove the body-part coordinate columns and the annotations which are not the target from the data  
data = data.drop(non_target_annotations + config_object.bp_headers, axis=1)

# We place the target data in its own variable
target_df = data.pop(CLASSIFIER_NAME)


Dataset size: 544.10988MB / 0.54411GB


In [5]:
TrainModelMixin().create_shap_log_mp(rf_clf=clf,
                                     x=data,
                                     y=target_df,
                                     x_names=list(data.columns),
                                     clf_name=CLASSIFIER_NAME,
                                     cnt_present=COUNT_PRESENT,
                                     cnt_absent=COUNT_ABSENT,
                                     core_cnt=2,
                                     chunk_size=100,
                                     verbose=True,
                                     save_dir=config_object.logs_path,
                                     save_file_suffix=1,
                                     plot=True)

Computing 500 SHAP values (MULTI-CORE BATCH SIZE: 100, FOLLOW PROGRESS IN OS TERMINAL)...
Concatenating multi-processed SHAP data (batch 1/5)


[Parallel(n_jobs=32)]: Using backend ThreadingBackend with 32 concurrent workers.
[Parallel(n_jobs=32)]: Done 136 tasks      | elapsed:    0.0s
[Parallel(n_jobs=32)]: Done 386 tasks      | elapsed:    0.0s
[Parallel(n_jobs=32)]: Done 500 out of 500 | elapsed:    0.0s finished


Concatenating multi-processed SHAP data (batch 2/5)


[Parallel(n_jobs=32)]: Using backend ThreadingBackend with 32 concurrent workers.
[Parallel(n_jobs=32)]: Done 136 tasks      | elapsed:    0.0s
[Parallel(n_jobs=32)]: Done 386 tasks      | elapsed:    0.0s
[Parallel(n_jobs=32)]: Done 500 out of 500 | elapsed:    0.0s finished


Concatenating multi-processed SHAP data (batch 3/5)


[Parallel(n_jobs=32)]: Using backend ThreadingBackend with 32 concurrent workers.
[Parallel(n_jobs=32)]: Done 136 tasks      | elapsed:    0.0s
[Parallel(n_jobs=32)]: Done 386 tasks      | elapsed:    0.0s
[Parallel(n_jobs=32)]: Done 500 out of 500 | elapsed:    0.0s finished


Concatenating multi-processed SHAP data (batch 4/5)


[Parallel(n_jobs=32)]: Using backend ThreadingBackend with 32 concurrent workers.
[Parallel(n_jobs=32)]: Done 136 tasks      | elapsed:    0.0s
[Parallel(n_jobs=32)]: Done 386 tasks      | elapsed:    0.0s
[Parallel(n_jobs=32)]: Done 500 out of 500 | elapsed:    0.0s finished


Concatenating multi-processed SHAP data (batch 5/5)


[Parallel(n_jobs=32)]: Using backend ThreadingBackend with 32 concurrent workers.
[Parallel(n_jobs=32)]: Done 136 tasks      | elapsed:    0.0s
[Parallel(n_jobs=32)]: Done 386 tasks      | elapsed:    0.0s
[Parallel(n_jobs=32)]: Done 500 out of 500 | elapsed:    0.0s finished


SIMBA COMPLETE: SHAP calculations complete (elapsed time: 231.2415s) 	complete
