# Model explainability

Neural/transformers based approaches have recently made amazing advancements. But it is difficult to understand how models make decisions. This is especially important in sensitive areas where models are being used to drive critical decisions.

This notebook will cover how to gain a level of understanding of complex natural language model outputs.

# Install dependencies

Install `txtai` and all dependencies.

In [1]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


In [2]:
%%capture
!pip install git+https://github.com/neuml/txtai

In [3]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

In [4]:
data_file = '/content/drive/MyDrive/NLP/NLPClassV1/myntra_products_catalog.csv'

In [5]:
products_df = pd.read_csv(data_file)

In [6]:
products_df.head(10)

Unnamed: 0,ProductID,ProductName,ProductBrand,Gender,Price (INR),NumImages,Description,PrimaryColor
0,10017413,DKNY Unisex Black & Grey Printed Medium Trolle...,DKNY,Unisex,11745,7,"Black and grey printed medium trolley bag, sec...",Black
1,10016283,EthnoVogue Women Beige & Grey Made to Measure ...,EthnoVogue,Women,5810,7,Beige & Grey made to measure kurta with churid...,Beige
2,10009781,SPYKAR Women Pink Alexa Super Skinny Fit High-...,SPYKAR,Women,899,7,Pink coloured wash 5-pocket high-rise cropped ...,Pink
3,10015921,Raymond Men Blue Self-Design Single-Breasted B...,Raymond,Men,5599,5,Blue self-design bandhgala suitBlue self-desig...,Blue
4,10017833,Parx Men Brown & Off-White Slim Fit Printed Ca...,Parx,Men,759,5,"Brown and off-white printed casual shirt, has ...",White
5,10014361,SHOWOFF Men Brown Solid Slim Fit Regular Shorts,SHOWOFF,Men,791,5,"Brown solid low-rise regular shorts, has four ...",Brown
6,10017869,Parx Men Blue Slim Fit Checked Casual Shirt,Parx,Men,719,5,"Blue checked casual shirt, has a spread collar...",Blue
7,10009695,SPYKAR Women Burgundy Alexa Super Skinny Fit H...,SPYKAR,Women,899,7,Burgundy coloured wash 5-pocket high-rise jean...,Burgundy
8,10000571,Parx Men Brown Tapered Fit Solid Regular Trousers,Parx,Men,664,5,Brown solid regular trousers regular trousers,Red
9,10017421,DKNY Unisex Black Large Trolley Bag,DKNY,Unisex,17360,5,"Black solid large trolley bag, secured with a ...",Black


In [81]:
products_df.shape

(12491, 9)

In [8]:
products_df.sample(5)

Unnamed: 0,ProductID,ProductName,ProductBrand,Gender,Price (INR),NumImages,Description,PrimaryColor,product_details
3137,10095267,HERE&NOW Women White Super Skinny Fit Mid-Rise...,HERE&NOW,Women,2399,7,"White light wash 5-pocket mid-rise jeans, clea...",White,HERE&NOW Women White Super Skinny Fit Mid-Rise...
2616,10052313,Fastrack Men Black Leather Analogue Watch 3124...,Fastrack,Men,1960,5,Display: AnalogueMovement: QuartzPower source:...,Black,Fastrack Men Black Leather Analogue Watch 3124...
11735,10248579,Puma Men Grey Progression Duo Idp Running Shoes,Puma,Men,2599,7,Special Technology:Softfoam+ SocklinerProduct ...,Grey,Puma Men Grey Progression Duo Idp Running Shoe...
9574,10216747,Amante Women Multicoloured Printed Bikini Brie...,Amante,Women,346,5,"Multicoloured printed low-rise bikini briefs,...",Red,Amante Women Multicoloured Printed Bikini Brie...
7773,10184313,Just Wow Black Women Casual Jumpsuit,Just Wow,Women,1299,5,"Black solid basic jumpsuit, has a V-neck, shor...",Black,Just Wow Black Women Casual JumpsuitBlack soli...


# Semantic Search

The first example we'll cover is semantic search. Semantic search applications have an understanding of natural language and identify results that have the same meaning, not necessarily the same keywords. While this produces higher quality results, one advantage of keyword search is it's easy to understand why a result why selected. The keyword is there.

Let's see if we can gain a better understanding of semantic search output. 

In [59]:
%%capture

from txtai.embeddings import Embeddings

# Create embeddings model, backed by sentence-transformers & transformers
embeddings = Embeddings({"path": "sentence-transformers/nli-mpnet-base-v2", "content": True})

In [101]:
data_v1 = list(products_df.sample(2000).Description)

In [102]:
# Create embeddings index with content enabled. The default behavior is to only store indexed vectors.
%%time
# Create an index for the list of text
embeddings.index([(uid, text, None) for uid, text in enumerate(data_v1)])

CPU times: user 7.37 s, sys: 108 ms, total: 7.48 s
Wall time: 7.26 s


In [103]:
query = 'shorts for man blue color night wear'

uid = embeddings.similarity(query, data_v1)[0:5]

In [104]:
uid

[(1525, 0.6533572673797607),
 (783, 0.6180694103240967),
 (893, 0.6091852188110352),
 (443, 0.5791661739349365),
 (369, 0.5699485540390015)]

In [105]:
for id in uid:
  print(f"{data_v1[id[0]]}")

Blue and white floral print mid-rise regular shorts with layered and pleated detail, has elasticated waistband with slip-on closure
Navy Blue solid mid-rise regular shorts, has 5 pockets, and zip closure
Blue washed mid-rise denim shorts, has 5 pockets, and button closure
Grey solid mid-rise sports shorts, has two pockets, a drawstring closure
White solid mid-rise regular shorts and zip closure


In [100]:
query = 'wear for hot weather'

uid = embeddings.similarity(query, data_v1)[0:5]

for id in uid:
  print(f"{data_v1[id[0]]}")

Special technologiesBest for training in cold weatherReeboks premium performance Thermowarm materials help you to find the right balance to remain warm and dry in cold weather conditionSpeedwick technology wicks sweat away from the body to help you stay cool and dryProduct design detailsCharcoal grey and purple self design high-rise training tightsHigh Rise ribbed waistbandHeavier weight seamless baselayer long sleeve provides lightweight warmthEngineered mesh provides added breathability and ventilation in key heat zones
Special technologySpeedwick technology wicks sweat away from the body to help you stay cool and dryProduct design detailsBlue and green colourblocked full-coverage Sports brahas regular shoulder strapsopen racerback for mobilityReebok graphic elastic bottom bandLight supportFitted fitAbout Reebok Women Workout Ready Meet You There Sports BraIn the studio or out of it, your style makes a statement. This women's bralette encourages full freedom of movement with a reveal

The `explain` method above ran an embeddings query like `search` but also analyzed each token to determine term importance. Looking at the results, it appears that `win` is the most important term. Let's visualize it.

In [106]:
# Run a search
embeddings.explain(query, limit=1)

[{'id': '1525',
  'text': 'Blue and white floral print mid-rise regular shorts with layered and pleated detail, has elasticated waistband with slip-on closure',
  'score': 0.6533572673797607,
  'tokens': [('Blue', 0.1210370659828186),
   ('and', 0.01858687400817871),
   ('white', -0.024537205696105957),
   ('floral', -0.024478375911712646),
   ('print', -0.007701873779296875),
   ('mid-rise', -0.005743801593780518),
   ('regular', 0.017734885215759277),
   ('shorts', 0.21971344947814941),
   ('with', 0.003084421157836914),
   ('layered', -0.003475964069366455),
   ('and', -0.00298994779586792),
   ('pleated', -0.008565008640289307),
   ('detail,', -0.01077413558959961),
   ('has', -0.005133092403411865),
   ('elasticated', -0.013414144515991211),
   ('waistband', -0.007938265800476074),
   ('with', 0.0012300610542297363),
   ('slip-on', 0.015411734580993652),
   ('closure', -0.013718247413635254)]}]

In [87]:
from IPython.display import HTML

def plot(query):
  result = embeddings.explain(query, limit=1)[0]

  output = f"<b>{query}</b><br/>"
  spans = []
  for token, score in result["tokens"]:
    color = None
    if score >= 0.1:
      color = "#fdd835"
    elif score >= 0.075:
      color = "#ffeb3b"
    elif score >= 0.05:
      color = "#ffee58"
    elif score >= 0.02:
      color = "#fff59d"

    spans.append((token, score, color))

  if result["score"] >= 0.05 and not [color for _, _, color in spans if color]:
    mscore = max([score for _, score, _ in spans])
    spans = [(token, score, "#fff59d" if score == mscore else color) for token, score, color in spans]

  for token, _, color in spans:
    if color:
      output += f"<span style='background-color: {color}'>{token}</span> "
    else:
      output += f"{token} "

  return output

HTML(plot(query))

Let's try some more queries!

In [96]:
output = ""
for query in ["feeling hot", "wear to protect from cold breeze", 'planning a trip to jungle']:
  output += plot(query) + "<br/><br/>"

HTML(output)

# Wrapping up

This notebook briefly introduced model explainability. There is a lot of work in this area, expect a number of different methods to become available. Model explainability helps users gain a level of trust in model predictions. It also helps debug why a model is making a decision, which can potentially drive how to fine-tune a model to make better predictions. 

Keep an eye on this important area over the coming months!
