In [1]:
%load_ext lab_black
%load_ext autoreload
%autoreload 2

# "Getting Started" with Text Style Transfer Library

This notebook is intented to serve as a demonstration of how to use the main functionality developed during the _FF24: Text Style Transfer_ research cycle. In particular we walk through the high-level usage and explose the low-level, inner workings of the following classes: `SubjectivityNeutralizer`, `StyleIntensityClassifier`, and `ContentPreservationScorer`.

## `SubjectivityNeutralizer` Walkthrough

<br>
<center><img src="./images/tst_bart.png" /></center>
<br>

The `SubjectivityNeutralizer` class consists of a sequence-to-sequence model wrapper around a HuggingFace pipeline that can be used to generate neutral-toned text provided subjective text as input. The class must be initialized with a HuggingFace model identifier for the weights of a fine-tuned BART model on the Wiki Neutrality Corpus (WNC). The `.transfer()` method takes a list of strings as input to return a corresponding list of strings. 

### Usage

In [2]:
from src.inference import SubjectivityNeutralizer

# instantiate class with TST model path
MODEL_PATH = "/home/cdsw/models/bart-tst-full"
sn = SubjectivityNeutralizer(model_identifier=MODEL_PATH)

In [3]:
# generate neutralized text conditioned on the subjective input text
examples = ["Sir Alex Ferguson is the greatest football manager of all time."]
sn.transfer(examples)

['Sir Alex Ferguson is one of the greatest football managers of all time.']

## `StyleIntensityClassifier` Walkthrough

<br>
<center><img src="./images/style_transfer_intensity.png"/></center>
<br>

Evaluating the quality of a text style transfer model is a difficult task as there is no "standard" set of evaluation practices or metric definitions. [Existing literature](https://arxiv.org/pdf/1904.02295.pdf) on the topic considers three main aspects of quality: style transfer intensity, content preservation, and fluency. The `StyleIntensityClassifer` that we have defined here draws inspiration from the mentioned paper and implements one way to measure the first of those aspects -- style transfer intensity (STI).

The STI metric can be explained best by referencing the figure above:
1. A fine-tuned text style transfer model (BART) is used to generate neutralized text ($X_{N}$) from a subjective input ($X_{S}$). This forms the pair of text that we will be calculating the style transfer intensity between.
2. Both texts are passed through a fine-tuned, Transformer-based classification model (BERT) to produce a resulting style distribution for each text ($d_{S}$, $d_{N}$). The BERT model here has been trained/fine-tuned on the style classification task for which the style transfer model was also trained on. In this case, that means classifying a given piece of text as subjective or neutral.
3. Earth movers distance (EMD) -- also known as Wasserstein distance -- is then calculated on the two distributions to produce a resulting style transfer intensity score. The EMD distance metric calculates the minimum "cost" to turn one distribution into the other. In this sense, we can interpret EMD on style class distributions to imply how intense the style transfer was between the two input texts.






### Usage

In [4]:
from src.inference import StyleIntensityClassifier

# instantiate class with classifier model path
MODEL_PATH = "../models/TRIAL-J-shuffle-lr_3en06-epoch_15-wd_.1-bs_32/checkpoint-67466"
sc = StyleIntensityClassifier(model_identifier=MODEL_PATH)

#### Step-by-step

In [5]:
# 1. transfer style using seq2seq model
x_s = [
    """there is an iconic roadhouse, named "spud's roadhouse", which sells fuel and general shop items , has great meals and has accommodation."""
]

x_n = sn.transfer(x_s)
x_n

['there is a roadhouse, named "spud\'s roadhouse", which sells fuel and general shop items and has accommodation.']

In [6]:
# 2. obtain style distributions using BERT classifier
d_s, d_n = sc.score(x_s + x_n)
d_s, d_n

({'label': 'LABEL_0',
  'score': 0.9891378283500671,
  'distribution': [0.9891378283500671, 0.010862216353416443]},
 {'label': 'LABEL_1',
  'score': 0.9893038272857666,
  'distribution': [0.010696199722588062, 0.9893038272857666]})

In [7]:
# 3. calculate EMD between d_s and d_n
sc.calculate_emd(
    input_dist=d_s["distribution"], output_dist=d_n["distribution"], target_class_idx=1
)

0.9784

#### High-level API

In [8]:
sc.calculate_transfer_intensity(input_text=x_s, output_text=x_n)

[0.9784]

## `ContentPreservationScorer` Walkthrough

<br>
<center><img src="./images/content_preservation_score.png"/></center>
<br>

Similar to our STI metric, the Content Preservation Score (CPS) metric also draws inspiration from the previously mentioned paper, and aims to quantify the similarity in content (i.e. style-independent semantic meaning) between the input and the output texts. The metric is depicted in the figure above:

1. A fine-tuned text style transfer model (BART) is used to generate neutralized text ($X_{N}$) from a subjective input ($X_{S}$). This forms the pair of text that we will be calculating the style transfer intensity between.
2. Style tokens are masked inline in both texts to produce versions of the text that contain only content-related tokens. Style tokens are determined by calculating word attributions for each text on a per-sentence basis using integrated gradients from the trained BERT classification model. Essentially, this method produces per-token feature importances, and tokens that have a high attribution score (i.e. are important in making a style classification) are deemed as style-related tokens.
3. The style-masked texts are then passed through a generic, pre-trained (but not fine-tuned) SentenceBERT model to produce sentence embeddings for each text ($e_{S}$, $e_{N}$).
4. We calculate cosine similarity between these content-only embedding reprentations. Since the style-related tokens have been removed from the text, high cosine similarity between these embeddings indicates a high level of content preservation.

### Usage

In [8]:
from src.inference import ContentPreservationScorer

# instantiate class with classifier model path
SBERT_MODEL_PATH = "sentence-transformers/all-MiniLM-L6-v2"
CLS_MODEL_PATH = (
    "../models/TRIAL-J-shuffle-lr_3en06-epoch_15-wd_.1-bs_32/checkpoint-67466"
)
cps = ContentPreservationScorer(
    sbert_model_identifier=SBERT_MODEL_PATH, cls_model_identifier=CLS_MODEL_PATH
)

#### Step-by-step

In [30]:
# 1. transfer style using seq2seq model
x_s = ["alexithymia is thought to affect 10% of the overall population."]

x_n = sn.transfer(x_s)
x_n

['alexithymia is claimed to affect 10% of the overall population.']

In [31]:
# 2. mask out style-related tokens from both texts
#
# NOTE: "threshold" indicates the cumulative percentage of style-attributed tokens
# that should be masked out. "mask_type" specifies if we should replace style tokens
# with a "[PAD]" token or remove it outright
x_s_masked = cps.mask_style_tokens(text=x_s[0], threshold=0.1, mask_type="pad")
x_n_masked = cps.mask_style_tokens(text=x_n[0], threshold=0.1, mask_type="pad")

x_s_masked, x_n_masked

('alexithymia is [PAD] to affect 10 % of the overall population.',
 'alexithymia is [PAD] to affect 10 % of the overall population.')

In [35]:
# NOTE: In x_n from the example above, we mask out the "claimed" token because
# as we see from below, it is attributed to 46.2% of style for this sentence
# and we set a threhold of 0.1, so just this one token is selected for masking
cps.calculate_feature_attribution_scores(text=x_n[0], as_norm=True)

Unnamed: 0,token,score,abs_norm,cumulative
6,claimed,-0.932072,0.462688,0.462688
8,affect,0.243048,0.120651,0.583339
9,10,-0.125258,0.062179,0.645517
13,overall,-0.116233,0.057699,0.703217
5,is,-0.103953,0.051603,0.75482
14,population,-0.088188,0.043777,0.798597
4,##ia,-0.088092,0.04373,0.842326
7,to,-0.0714,0.035444,0.87777
12,the,-0.069723,0.034611,0.912381
2,##ith,-0.047825,0.023741,0.936121


In [36]:
# 3. get sentence embeddings from SBERT for each masked text

e_s = cps.compute_sentence_embeddings(input_text=x_s_masked)
e_n = cps.compute_sentence_embeddings(input_text=x_n_masked)

e_s.shape, e_n.shape

(torch.Size([1, 384]), torch.Size([1, 384]))

In [37]:
# 4. calculate cosine similarity between style-removed embedding representations
cps.cosine_similarity(e_s, e_n)

[1.0]

#### High-level API

In [38]:
cps.calculate_content_preservation_score(
    input_text=x_s, output_text=x_n, threshold=0.1, mask_type="pad", return_all=True
)

{'scores': [1.0],
 'masked_input_text': ['alexithymia is [PAD] to affect 10 % of the overall population.'],
 'masked_output_text': ['alexithymia is [PAD] to affect 10 % of the overall population.']}