# A T5-based Model for Semantic Consistency Checking of IFTTT applets


**Authors:**
<br>[Bernardo Breve](https://orcid.org/0000-0002-3898-7512)<br>
[Gaetano Cimino](https://orcid.org/0000-0001-8061-7104)<br>
[Vincenzo Deufemia](https://orcid.org/0000-0002-6711-3590)<br>
[Annunziata Elefante](https://orcid.org/0009-0001-7141-6105)<br>
**Date created:** 2023/07/25<br>
**Description:** We propose a T5-based model for semantic consistency checking of IFTTT applets. Our model uses pre-trained language representations to learn the semantics of applet components and identifies inconsistencies within the user-defined descriptions associated with applets.

## Introduction

According to the IFTTT creation paradigm, when a user creates
a new applet, the creator must specify a natural language description that summarize how the applet works. By reading this field, a new user can more easily understand what an applet is for and decide whether or not to activate it on their device. However, on the part of IFTTT, there is no control over
the content of the description entered by the user, so the creator could write anything, falsely describing the applet’s behavior. To this end, we developed a model that can check whether there is some semantic consistency between the trigger-action components of an applet and its natural language description provided by its creator. We fine-tuned a T5-based classification model that takes as input a pattern derived from the applet components and the corresponding user-defined description and outputs a classification label ('cc', 'ce', 'ec' or 'ee').

### References

* ["An empirical characterization of IFTTT: ecosystem, usage, and performance"](https://doi.org/10.1145/3131365.3131369)

In [None]:
!pip install transformers
!pip install sentencepiece
!pip install optuna

In [None]:
# load packages
import torch
import torch.nn as nn
import torch.nn.functional as F
from torch.cuda.amp import autocast, GradScaler
from torch.utils.data import TensorDataset, random_split, DataLoader, RandomSampler
from transformers import T5Tokenizer, T5ForConditionalGeneration
from transformers import AdamW, get_linear_schedule_with_warmup
import time
import datetime
import random
import pandas as pd
import numpy as np
from sklearn.metrics import accuracy_score, f1_score, precision_score, recall_score, classification_report
import re
import matplotlib.pyplot as plt
import seaborn as sn
from optuna.pruners import SuccessiveHalvingPruner
from optuna.samplers import TPESampler
from sklearn.metrics import accuracy_score
from sklearn.metrics import precision_score
from sklearn.metrics import recall_score
from sklearn.metrics import f1_score
import sklearn.metrics as metrics

torch.cuda.amp.autocast(enabled=True)

In [None]:
SEED = 15
random.seed(SEED)
np.random.seed(SEED)
torch.manual_seed(SEED)

In [5]:
torch.backends.cudnn.deterministic = True

# tell pytorch to use cuda
device = torch.device("cuda")

In [None]:
train_path = 'trainSet.csv'

col_names = ['label','pattern','desc']
train_df = pd.read_csv(train_path,skiprows=1,sep=';',names=col_names,encoding = "ISO-8859-1")

train_df

In [7]:
texts = []

for i in range(len(train_df)):
  text = "pattern = " + str(train_df.iloc[i]['pattern']) + "; description = " + str(train_df.iloc[i]['desc'])
  # Save content.
  texts.append(text)

train_df['text'] = texts
train_df['label'] = train_df['label'].replace('cc','first')
train_df['label'] = train_df['label'].replace('ce','second')
train_df['label'] = train_df['label'].replace('ec','third')
train_df['label'] = train_df['label'].replace('ee','fourth')

In [None]:
val_path = 'devSet.csv'

valid_df = pd.read_csv(val_path,skiprows=1,sep=';',names=col_names,encoding = "ISO-8859-1")

valid_df

In [9]:
texts = []

for i in range(len(valid_df)):
  text = "pattern = " + str(valid_df.iloc[i]['pattern']) + "; description = " + str(valid_df.iloc[i]['desc'])
  # Save content.
  texts.append(text)

valid_df['text'] = texts
valid_df['label'] = valid_df['label'].replace('cc','first')
valid_df['label'] = valid_df['label'].replace('ce','second')
valid_df['label'] = valid_df['label'].replace('ec','third')
valid_df['label'] = valid_df['label'].replace('ee','fourth')

In [None]:
test_path = 'testSet.csv'

test_df = pd.read_csv(test_path,skiprows=1,sep=';',names=col_names,encoding = "ISO-8859-1")

test_df

In [11]:
texts = []

for i in range(len(test_df)):
  text = "pattern = " + str(test_df.iloc[i]['pattern']) + "; description = " + str(test_df.iloc[i]['desc'])
  # Save content.
  texts.append(text)

test_df['text'] = texts
test_df['label'] = test_df['label'].replace('cc','first')
test_df['label'] = test_df['label'].replace('ce','second')
test_df['label'] = test_df['label'].replace('ec','third')
test_df['label'] = test_df['label'].replace('ee','fourth')

In [12]:
# prepare data
def clean_df(df):
    # strip dash but keep a space
    df['text'] = df['text'].str.replace('-', ' ')
    # lower case the data
    df['text'] = df['text'].apply(lambda x: x.lower())
    # remove excess spaces near punctuation
    df['text'] = df['text'].apply(lambda x: re.sub(r'\s([?.!"](?:\s|$))', r'\1', x))
    # generate a word count for body
    df['word_count'] = df['text'].apply(lambda x: len(x.split()))
    # remove excess white spaces
    df['text'] = df['text'].apply(lambda x: " ".join(x.split()))
    # lower case to body
    df['text'] = df['text'].apply(lambda x: x.lower())
    # add " </s>" to end of body
    df['text'] = df['text'] + " </s>"
    # add " </s>" to end of target
    df['label'] = df['label'] + " </s>"
    return df

In [13]:
train_df = clean_df(train_df)
valid_df = clean_df(valid_df)
test_df = clean_df(test_df)

In [None]:
# instantiate T5 tokenizer
tokenizer = T5Tokenizer.from_pretrained('t5-small')

In [15]:
# tokenize the main text
def tokenize_corpus(df, tokenizer, max_len):
    # token ID storage
    input_ids = []
    # attension mask storage
    attention_masks = []
    # max len -- 512 is max
    max_len = max_len
    # for every document:
    for doc in df:
        # `encode_plus` will:
        #   (1) Tokenize the sentence.
        #   (2) Prepend the `[CLS]` token to the start.
        #   (3) Append the `[SEP]` token to the end.
        #   (4) Map tokens to their IDs.
        #   (5) Pad or truncate the sentence to `max_length`
        #   (6) Create attention masks for [PAD] tokens.
        encoded_dict = tokenizer.encode_plus(
                            doc,  # document to encode.
                            add_special_tokens=True,  # add tokens relative to model
                            max_length=max_len,  # set max length
                            truncation=True,  # truncate longer messages
                            pad_to_max_length=True,  # add padding
                            return_attention_mask=True,  # create attn. masks
                            return_tensors='pt'  # return pytorch tensors
                       )

        # add the tokenized sentence to the list
        input_ids.append(encoded_dict['input_ids'])

        # and its attention mask (differentiates padding from non-padding)
        attention_masks.append(encoded_dict['attention_mask'])

    return torch.cat(input_ids, dim=0), torch.cat(attention_masks, dim=0)

In [None]:
# create tokenized data
train_input_ids, train_attention_masks = tokenize_corpus(train_df['text'].values, tokenizer, 70)

In [None]:
# how long are tokenized targets
ls = []
for i in range(train_df.shape[0]):
    ls.append(len(tokenizer.tokenize(train_df.iloc[i]['label'])))

temp_df = pd.DataFrame({'len_tokens': ls})
temp_df['len_tokens'].mean()

In [None]:
temp_df['len_tokens'].median()

In [None]:
temp_df['len_tokens'].max()

In [None]:
train_target_input_ids, train_target_attention_masks = tokenize_corpus(train_df['label'].values, tokenizer, 2)

In [None]:
# create tokenized data
val_input_ids, val_attention_masks = tokenize_corpus(valid_df['text'].values, tokenizer, 70)

In [None]:
# how long are tokenized targets
ls = []
for i in range(valid_df.shape[0]):
    ls.append(len(tokenizer.tokenize(valid_df.iloc[i]['label'])))

temp_df = pd.DataFrame({'len_tokens': ls})
temp_df['len_tokens'].mean()

In [None]:
temp_df['len_tokens'].median()

In [None]:
temp_df['len_tokens'].max()

In [None]:
val_target_input_ids, val_target_attention_masks = tokenize_corpus(valid_df['label'].values, tokenizer, 2)

In [None]:
# create tokenized data
test_input_ids, test_attention_masks = tokenize_corpus(test_df['text'].values, tokenizer, 70)

In [None]:
# how long are tokenized targets
ls = []
for i in range(test_df.shape[0]):
    ls.append(len(tokenizer.tokenize(test_df.iloc[i]['label'])))

temp_df = pd.DataFrame({'len_tokens': ls})
temp_df['len_tokens'].mean()

In [None]:
temp_df['len_tokens'].median()

In [None]:
temp_df['len_tokens'].max()

In [None]:
test_target_input_ids, test_target_attention_masks = tokenize_corpus(test_df['label'].values, tokenizer, 2)

In [35]:
# prepare tensor data sets
def prepare_dataset(body_tokens, body_masks, target_token, target_masks):
    return TensorDataset(body_tokens, body_masks, target_token, target_masks)

In [36]:
train_dataset = prepare_dataset(train_input_ids, train_attention_masks, train_target_input_ids, train_target_attention_masks)
val_dataset = prepare_dataset(val_input_ids, val_attention_masks, val_target_input_ids, val_target_attention_masks)
test_dataset = prepare_dataset(test_input_ids, test_attention_masks, test_target_input_ids, test_target_attention_masks)

In [37]:
def train(model, dataloader, optimizer):

    # capture time
    total_t0 = time.time()

    # Perform one full pass over the training set.
    print("")
    print('======== Epoch {:} / {:} ========'.format(epoch + 1, epochs))
    print('Training...')

    # reset total loss for epoch
    train_total_loss = 0
    total_train_f1 = 0

    # put model into traning mode
    model.train()

    # for each batch of training data...
    for step, batch in enumerate(dataloader):

        # progress update every 40 batches.
        if step % 40 == 0 and not step == 0:

            # Report progress.
            print('  Batch {:>5,}  of  {:>5,}.'.format(step, len(dataloader)))

        # Unpack this training batch from our dataloader:
        #
        # As we unpack the batch, we'll also copy each tensor to the GPU using
        # the `to` method.
        #
        # `batch` contains three pytorch tensors:
        #   [0]: input tokens
        #   [1]: attention masks
        #   [2]: target tokens
        #   [3]: target attenion masks
        b_input_ids = batch[0].cuda()
        b_input_mask = batch[1].cuda()
        b_target_ids = batch[2].cuda()
        b_target_mask = batch[3].cuda()

        # clear previously calculated gradients
        optimizer.zero_grad()

        # runs the forward pass with autocasting.
        with autocast():
            # forward propagation (evaluate model on training batch)
            outputs = model(input_ids=b_input_ids,
                            attention_mask=b_input_mask,
                            labels=b_target_ids,
                            decoder_attention_mask=b_target_mask)

            loss, prediction_scores = outputs[:2]

            # sum the training loss over all batches for average loss at end
            # loss is a tensor containing a single value
            train_total_loss += loss.item()

        # Scales loss.  Calls backward() on scaled loss to create scaled gradients.
        # Backward passes under autocast are not recommended.
        # Backward ops run in the same dtype autocast chose for corresponding forward ops.
        scaler.scale(loss).backward()

        # scaler.step() first unscales the gradients of the optimizer's assigned params.
        # If these gradients do not contain infs or NaNs, optimizer.step() is then called,
        # otherwise, optimizer.step() is skipped.
        scaler.step(optimizer)

        # Updates the scale for next iteration.
        scaler.update()

        # update the learning rate
        scheduler.step()

    # calculate the average loss over all of the batches
    avg_train_loss = train_total_loss / len(dataloader)

    # Record all statistics from this epoch.
    training_stats.append(
        {
            'Train Loss': avg_train_loss
        }
    )

    # training time end
    training_time = format_time(time.time() - total_t0)

    # print result summaries
    print("")
    print("summary results")
    print("epoch | trn loss | trn time ")
    print(f"{epoch+1:5d} | {avg_train_loss:.5f} | {training_time:}")

    return training_stats


def validating(model, dataloader):

    # capture validation time
    total_t0 = time.time()

    # After the completion of each training epoch, measure our performance on
    # our validation set.
    print("")
    print("Running Validation...")

    # put the model in evaluation mode
    model.eval()

    # track variables
    total_valid_loss = 0

    # evaluate data for one epoch
    for batch in dataloader:

        # Unpack this training batch from our dataloader:
        # `batch` contains three pytorch tensors:
        #   [0]: input tokens
        #   [1]: attention masks
        #   [2]: target tokens
        #   [3]: target attenion masks
        b_input_ids = batch[0].cuda()
        b_input_mask = batch[1].cuda()
        b_target_ids = batch[2].cuda()
        b_target_mask = batch[3].cuda()

        # tell pytorch not to bother calculating gradients
        # as its only necessary for training
        with torch.no_grad():

            # forward propagation (evaluate model on training batch)
            outputs = model(input_ids=b_input_ids,
                            attention_mask=b_input_mask,
                            labels=b_target_ids,
                            decoder_attention_mask=b_target_mask)

            loss, prediction_scores = outputs[:2]

            # sum the training loss over all batches for average loss at end
            # loss is a tensor containing a single value
            total_valid_loss += loss.item()

    # calculate the average loss over all of the batches.
    global avg_val_loss
    avg_val_loss = total_valid_loss / len(dataloader)

    # Record all statistics from this epoch.
    valid_stats.append(
        {
            'Val Loss': avg_val_loss,
            'Val PPL.': np.exp(avg_val_loss)
        }
    )

    # capture end validation time
    training_time = format_time(time.time() - total_t0)

    # print result summaries
    print("")
    print("summary results")
    print("epoch | val loss | val ppl | val time")
    print(f"{epoch+1:5d} | {avg_val_loss:.5f} | {np.exp(avg_val_loss):.3f} | {training_time:}")

    return valid_stats


def testing(model, dataloader):

    print("")
    print("Running Testing...")

    # measure training time
    t0 = time.time()

    # put the model in evaluation mode
    model.eval()

    # track variables
    total_test_loss = 0
    total_test_acc = 0
    total_test_f1 = 0
    predictions = []
    actuals = []

    # evaluate data for one epoch
    for step, batch in enumerate(dataloader):
        # progress update every 40 batches.
        if step % 40 == 0 and not step == 0:
            # Calculate elapsed time in minutes.
            elapsed = format_time(time.time() - t0)
            # Report progress.
            print('  Batch {:>5,}  of  {:>5,}.    Elapsed: {:}.'.format(step, len(dataloader), elapsed))

        # Unpack this training batch from our dataloader:
        # `batch` contains three pytorch tensors:
        #   [0]: input tokens
        #   [1]: attention masks
        #   [2]: target tokens
        #   [3]: target attenion masks
        b_input_ids = batch[0].cuda()
        b_input_mask = batch[1].cuda()
        b_target_ids = batch[2].cuda()
        b_target_mask = batch[3].cuda()

        # tell pytorch not to bother calculating gradients
        # as its only necessary for training
        with torch.no_grad():

            # forward propagation (evaluate model on training batch)
            outputs = model(input_ids=b_input_ids,
                            attention_mask=b_input_mask,
                            labels=b_target_ids,
                            decoder_attention_mask=b_target_mask)

            loss, prediction_scores = outputs[:2]

            total_test_loss += loss.item()

            generated_ids = model.generate(
                    input_ids=b_input_ids,
                    attention_mask=b_input_mask,
                    max_length=3
                    )

            preds = [tokenizer.decode(g, skip_special_tokens=True, clean_up_tokenization_spaces=True) for g in generated_ids]
            target = [tokenizer.decode(t, skip_special_tokens=True, clean_up_tokenization_spaces=True) for t in b_target_ids]

            total_test_acc += accuracy_score(target, preds)
            total_test_f1 += f1_score(preds, target,
                                       average='weighted',
                                       labels=np.unique(preds))
            predictions.extend(preds)
            actuals.extend(target)

    # calculate the average loss over all of the batches.
    avg_test_loss = total_test_loss / len(dataloader)

    avg_test_acc = total_test_acc / len(test_dataloader)

    avg_test_f1 = total_test_f1 / len(test_dataloader)

    # Record all statistics from this epoch.
    test_stats.append(
        {
            'Test Loss': avg_test_loss,
            'Test PPL.': np.exp(avg_test_loss),
            'Test Acc.': avg_test_acc,
            'Test F1': avg_test_f1
        }
    )
    global df2
    temp_data = pd.DataFrame({'predicted': predictions, 'actual': actuals})
    df2 = df2.append(temp_data)

    return test_stats


# time function
def format_time(elapsed):
    '''
    Takes a time in seconds and returns a string hh:mm:ss
    '''
    # Round to the nearest second.
    elapsed_rounded = int(round((elapsed)))
    # Format as hh:mm:ss
    return str(datetime.timedelta(seconds=elapsed_rounded))

In [None]:
# instantiate model T5 transformer with a language modeling head on top
model = T5ForConditionalGeneration.from_pretrained('t5-small').cuda()  # to GPU

In [None]:
train_dataloader = DataLoader(train_dataset, batch_size=24, shuffle=False)

valid_dataloader = DataLoader(val_dataset, batch_size=24, shuffle=True)

test_dataloader = DataLoader(test_dataset, batch_size=24, shuffle=True)


# Adam w/ Weight Decay Fix
# set to optimizer_grouped_parameters or model.parameters()
optimizer = AdamW(model.parameters(), lr = 3e-5)

# epochs
epochs = 6

# lr scheduler
total_steps = len(train_dataloader) * epochs
scheduler = get_linear_schedule_with_warmup(optimizer, num_warmup_steps=0, num_training_steps=total_steps)

# create gradient scaler for mixed precision
scaler = GradScaler()

In [None]:
# create training result storage
training_stats = []
valid_stats = []
best_valid_loss = float('inf')
output_dir = 'path'

# for each epoch
for epoch in range(epochs):
    # train
    train(model, train_dataloader, optimizer)
    # validate
    validating(model, valid_dataloader)
    # check validation loss
    if valid_stats[epoch]['Val Loss'] < best_valid_loss:
        best_valid_loss = valid_stats[epoch]['Val Loss']
        # save best model for use later
        torch.save(model.state_dict(), 't5-classification.pt')  # torch save
        model_to_save = model.module if hasattr(model, 'module') else model
        model_to_save.save_pretrained(output_dir)  # transformers save
        tokenizer.save_pretrained(output_dir)  # transformers save

In [None]:
# organize results
df_train_stats = pd.DataFrame(data=training_stats)
df_valid_stats = pd.DataFrame(data=valid_stats)
df_stats = pd.concat([df_train_stats, df_valid_stats], axis=1)
df_stats.insert(0, 'Epoch', range(1, len(df_stats)+1))
df_stats = df_stats.set_index('Epoch')
df_stats

In [42]:
# test the model
df2 = pd.DataFrame({'predicted': [], 'actual': []})
test_stats = []

model = T5ForConditionalGeneration.from_pretrained(output_dir).cuda()
tokenizer = T5Tokenizer.from_pretrained(output_dir)

In [None]:
testing(model, test_dataloader)

In [None]:
df_test_stats = pd.DataFrame(data=test_stats)
print(df_test_stats)

In [None]:
confusion_matrix = pd.crosstab(df2['actual'], df2['predicted'], rownames=['Target Class'], colnames=['Output Class'])

sn.set(font_scale=1.1) # for label size
sn.heatmap(confusion_matrix, annot=True, fmt=".0f", annot_kws={"size": 13}, cmap='Blues')

plt.show()

In [None]:
test_error = pd.DataFrame({'true_label': df2['actual'], 'result': df2['predicted']})

test_error.to_csv('test_semantic_results_T5.csv')
!cp test_semantic_results_T5.csv "Results"

In [None]:
test_path = '/Results/test_semantic_results_T5.csv'

col_names = ['true_label','result']
test_error = pd.read_csv(test_path,skiprows=1,sep=',',names=col_names,encoding = "ISO-8859-1")

test_error

In [None]:
data = pd.DataFrame({'prediction':test_error['result'], 'true_label':test_error['true_label']})

# precision tp / (tp + fp)
precision = precision_score(data['true_label'], data['prediction'], average = 'macro')
print('Precision: %f' % precision)
# recall: tp / (tp + fn)
recall = recall_score(data['true_label'], data['prediction'], average = 'macro')
print('Recall: %f' % recall)
# f1: 2 tp / (2 tp + fp + fn)
f1 = f1_score(data['true_label'], data['prediction'], average = 'macro')
print('F1 score: %f' % f1)

In [None]:
print("Classification report for classifier:\n%s\n"
      % (metrics.classification_report(test_error['true_label'], test_error['result'])))