# Model Explanation

In [3]:
import pandas as pd
from transformers import AutoTokenizer, AutoModelForSequenceClassification, pipeline
import shap

In [4]:
model = AutoModelForSequenceClassification.from_pretrained("jeroenvdmbrugge/sp500-predictor-individual-headlines")
tokenizer = AutoTokenizer.from_pretrained("jeroenvdmbrugge/sp500-predictor-individual-headlines")
pred = pipeline("text-classification", model=model, tokenizer=tokenizer, return_all_scores=True)

Hardware accelerator e.g. GPU is available in the environment, but no `device` argument is passed to the `Pipeline` object. Model will be on CPU.


In [7]:
explainer = shap.Explainer(pred, tokenizer)

In [10]:
sample_headlines = pd.read_csv("../data/jvdm_data_sources_trends.csv").sample(3, random_state=42).Headlines.values
sample_headlines

array(['Oil rises 1 percent on tightening crude supply , upbeat economic data',
       "'Crash Monday' is the price we're paying for a decade of cheap money",
       'IMF asks G20 to back doubling of its emergency financing to fight coronavirus'],
      dtype=object)

In [11]:
shap_values = explainer(sample_headlines)

PartitionExplainer explainer:  67%|███████████████▎       | 2/3 [00:00<?, ?it/s]

  0%|          | 0/342 [00:00<?, ?it/s]

PartitionExplainer explainer: 4it [00:19,  9.75s/it]                            


In [12]:
shap.plots.text(shap_values)