### Flint the text

In [3]:
%load_ext autoreload
%autoreload 2

import openai

import re
import glob
import os
import dotenv

dotenv.load_dotenv()
openai.api_key = os.getenv("OPENAI_API_KEY")

from textflint import Engine
from textflint.adapter import Config, auto_config
from textflint.input.dataset import Dataset

In [1]:
# Transformation outputs directory
outdir = './transformed_data'

## Universal Transformations List
Note: TextFlint mentions that these universal transformations are not always compatible with some tasks, so we must be careful in applying them "universally".

* **BackTrans**: BackTrans (Trans short for translation) replaces test data with paraphrases by leveraging back translation, which is able to figure out whether or not the target models merely capture the literal features instead of semantic meaning.
* **InsertAdv**: Transforms an input by add adverb word before verb
* **AppendIrr**: Extend sentences by irrelevant sentences
* **WordCase**: Transform an input to upper and lower case or capitalize case.
* **Contraction**: Contraction replaces phrases like `will not` and `he has` with contracted forms, namely, `won’t` and `he’s`
* **Keyboard**: Keyboard turn to the way how people type words and change tokens into mistaken ones with errors caused by the use of keyboard, like `word → worf` and `ambiguous → amviguius`.
* **SwapNum**: Transforms an input by replacing the numbers in it.
* **Ocr**: Transformation that simulate ocr error by random values.
* **Punctuation**: Transforms input by add punctuation at the end of sentence.
* **SpellingError**: Transformation that leverage pre-defined spelling mistake dictionary to simulate spelling mistake.
* **Tense**: Transforms all verb tenses in sentence.
* **TwitterType**: Transforms input by common abbreviations in TwitterType.
* **Typos**: Randomly inserts, deletes, swaps or replaces a single letter within one word (Ireland → Irland).
* **SwapSynWordEmbedding**: Transforms an input by replacing its words by Glove.
* **SwapSynWordNet**: Transforms an input by replacing its words with synonyms provided by WordNet.
* **Prejudice**: Transforms an input by Reverse gender or place names in sentences.
* **MLMSuggestion**: MLMSuggestion (MLM short for masked language model) generates new sentences where one syntactic category element of the original sentence is replaced by what is predicted by masked language models.
* **ReverseNeg**: Transforms an affirmative sentence into a negative sentence, or vice versa.
* **SwapAntWordNet**: Transforms an input by replacing its words with antonym provided by WordNet.
* **SwapNamedEnt**: Swap entities with other entities of the same category.

### Sentiment Analysis

In [6]:
data_path = './data/aclImdb.json'

config = auto_config(task = 'SA')
config.trans_methods = [
    'WordCase',
    'TwitterType',
    'Ocr'
]
config.trans_config = {
    
}
config.sub_methods = []
config.out_dir = f'{outdir}/SA'

engine = Engine()
engine.run(data_path, config)

[34;1mTextFlint[0m: ******Start load!******
100%|██████████| 100/100 [00:00<00:00, 786.04it/s]
[34;1mTextFlint[0m: 100 in total, 100 were loaded successful.
[34;1mTextFlint[0m: ******Finish load!******
[34;1mTextFlint[0m: Downloading http://textflint.oss-cn-beijing.aliyuncs.com/download/UT_DATA/twitter_contraction.json.
100%|██████████| 750/750 [00:00<00:00, 2.66MB/s]
[34;1mTextFlint[0m: Copying /home/marko/.cache/textflint/tmpnk25hcns to /home/marko/.cache/textflint/UT_DATA/twitter_contraction.json.
[34;1mTextFlint[0m: Successfully saved UT_DATA/twitter_contraction.json to cache.
[34;1mTextFlint[0m: ******Start WordCase_upper!******
100%|██████████| 100/100 [00:00<00:00, 257.62it/s]
[34;1mTextFlint[0m: WordCase_upper, original 100 samples, transform 100 samples!
[34;1mTextFlint[0m: Save samples to ./transformed_data/SA/ori_WordCase_upper_100.json!
[34;1mTextFlint[0m: Save samples to ./transformed_data/SA/trans_WordCase_upper_100.json!
[34;1mTextFlint[0m: ******Fi

### Natural Language Inference

In [58]:
data_path = './data/snli_mini.json' # 1000 samples

config = auto_config(task = 'NLI')
config.trans_methods = [
    'WordCase',
    'TwitterType',
    'Ocr',
    'AppendIrr'
]

config.trans_config = []
config.sub_methods = []
config.out_dir = f'{outdir}/NLI'

engine = Engine()
engine.run(data_path, config)

# data_path = './data/snli_large.json' # 9000 samples
# config.trans_methods = ['NumWord', 'SwapAnt']
# engine.run(data_path, config)

[34;1mTextFlint[0m: ******Start load!******
100%|██████████| 988/988 [00:00<00:00, 119937.84it/s]
[34;1mTextFlint[0m: 988 in total, 988 were loaded successful.
[34;1mTextFlint[0m: ******Finish load!******
[34;1mTextFlint[0m: ******Start AddSent!******
100%|██████████| 988/988 [00:00<00:00, 2454.91it/s]
[34;1mTextFlint[0m: AddSent, original 988 samples, transform 988 samples!
[34;1mTextFlint[0m: Save samples to ./transformed_data/NLI/ori_AddSent_988.json!
[34;1mTextFlint[0m: Save samples to ./transformed_data/NLI/trans_AddSent_988.json!
[34;1mTextFlint[0m: ******Finish AddSent!******
[34;1mTextFlint[0m: ******Start BackTrans!******
100%|██████████| 988/988 [09:06<00:00,  1.81it/s]
[34;1mTextFlint[0m: BackTrans, original 988 samples, transform 988 samples!
[34;1mTextFlint[0m: Save samples to ./transformed_data/NLI/ori_BackTrans_988.json!
[34;1mTextFlint[0m: Save samples to ./transformed_data/NLI/trans_BackTrans_988.json!
[34;1mTextFlint[0m: ******Finish BackTran

### Machine Reading Comprehension

In [45]:
data_path = './data/squad2.0.json'

config = auto_config(task = 'MRC')
config.trans_methods = [
    'ModifyPos',
    'PerturbAnswer'
]
config.trans_config = []
config.sub_methods = []
config.out_dir = f'{outdir}/MRC'

engine = Engine()
engine.run(data_path, config)

[34;1mTextFlint[0m: ******Start load!******
  0%|          | 0/1250 [00:00<?, ?it/s][34;1mTextFlint[0m: Data sample {'title': '1973_oil_crisis', 'context': 'Price controls exacerbated the crisis in the US. The system limited the price of "old oil" (that which had already been discovered) while allowing newly discovered oil to be sold at a higher price to encourage investment. Predictably, old oil was withdrawn from the market, creating greater scarcity. The rule also discouraged development of alternative energies. The rule had been intended to promote oil exploration. Scarcity was addressed by rationing (as in many countries). Motorists faced long lines at gas stations beginning in summer 1972 and increasing by summer 1973.', 'question': 'Why was old oil withdrawn from the market?', 'answers': [{'text': 'Price controls', 'answer_start': 0}, {'text': 'Price controls', 'answer_start': 0}, {'text': 'promote oil exploration', 'answer_start': 394}, {'text': 'discouraged development of 

Token ` cannot be found


100%|██████████| 1212/1212 [00:08<00:00, 143.74it/s]
[34;1mTextFlint[0m: PerturbAnswer, original 1212 samples, transform 539 samples!
[34;1mTextFlint[0m: Save samples to ./transformed_data/MRC/ori_PerturbAnswer_539.json!
[34;1mTextFlint[0m: Save samples to ./transformed_data/MRC/trans_PerturbAnswer_539.json!
[34;1mTextFlint[0m: ******Finish PerturbAnswer!******


Token ` cannot be found


In [None]:
data_path = './data/reclor.json'

config = auto_config(task = 'MRC')
config.trans_methods = [
    'PerturbAnswer'
]
config.trans_config = []
config.sub_methods = []
config.out_dir = f'{outdir}/MRC_plus'

engine = Engine()
engine.run(data_path, config)

In [31]:
sample = {
    "context": "It takes a particular talent to be a successful business manager. Business courses can help people to solve management problems, but such courses can do so only for those people with managerial talent. Such people should take business courses to acquire ideas that they can subsequently use to good advantage if management problems happen to arise.",
    "question": "If the statements above are true, which of the following must also be true on the basis of them?",
    "answers": [
        "People who are helped by business courses in solving management problems also have managerial talent.",
        "Those people who have never taken business courses are unable to solve management problems when such problems arise.",
        "People who lack managerial talent are more likely to take business courses than are people who have managerial talent.",
        "People who are already skilled at solving management problems are unlikely to benefit from business courses."
    ],
    "label": 0,
    "id_string": "val_1"
}

print("Answers: \n" + '\n'.join(a for a in sample['answers']))

Answers: 
People who are helped by business courses in solving management problems also have managerial talent.
Those people who have never taken business courses are unable to solve management problems when such problems arise.
People who lack managerial talent are more likely to take business courses than are people who have managerial talent.
People who are already skilled at solving management problems are unlikely to benefit from business courses.


In [59]:
prompt = {"hypothesis": "The church has cracks in the ceiling.", "premise": "This church choir sings to the masses as they sing joyous songs from the book at a church.", "y": "neutral", "sample_id": 0}

In [60]:
response = openai.ChatCompletion.create(
    model='gpt-3.5-turbo',
    messages=[
        {"role": "system", "content": "You are performing Machine Reading Comprehension. \
         When given a question, respond with the index of the correct answer based on the context."},
        {"role": "user", "content": f"Context: {prompt['context']}\n \
                                    Question: {prompt['question']}\n \
                                    Answers:\n" + prompt['answer_choices']
        }
    ],
    temperature=0
)

In [61]:
response

<OpenAIObject chat.completion id=chatcmpl-7GJd339j7yNja7fCqpXJ6DHJv0kza at 0x7ff213a5fbf0> JSON: {
  "choices": [
    {
      "finish_reason": "stop",
      "index": 0,
      "message": {
        "content": "neutral",
        "role": "assistant"
      }
    }
  ],
  "created": 1684123057,
  "id": "chatcmpl-7GJd339j7yNja7fCqpXJ6DHJv0kza",
  "model": "gpt-3.5-turbo-0301",
  "object": "chat.completion",
  "usage": {
    "completion_tokens": 1,
    "prompt_tokens": 82,
    "total_tokens": 83
  }
}

In [44]:
context = "Price controls exacerbated the crisis in the US. The system limited the price of \"old oil\" (that which had already been discovered) while allowing newly discovered oil to be sold at a higher price to encourage investment. Predictably, old oil was withdrawn from the market, creating greater scarcity. The rule also discouraged development of alternative energies. The rule had been intended to promote oil exploration. Scarcity was addressed by rationing (as in many countries). Motorists faced long lines at gas stations beginning in summer 1972 and increasing by summer 1973."

len(context)

577