# Text Generation: How to run inference on the endpoint you have created?

In [2]:
import json
import boto3

### Query endpoint that you have created
***
The following cell provides a helper function that will be used to query your endpoint with boto3.
***

In [3]:
endpoint_name = 'jumpstart-dft-llama-codellama-7b-instruct'

def query_endpoint(payload):
    client = boto3.client('runtime.sagemaker')
    response = client.invoke_endpoint(
        EndpointName=endpoint_name,
        ContentType='application/json',
        Body=json.dumps(payload).encode('utf-8'),
        CustomAttributes="accept_eula=true",
    )
    response = response["Body"].read().decode("utf8")
    response = json.loads(response)
    return response


### Supported parameters

***
This model supports many parameters while performing inference. They include:

* **max_length:** Model generates text until the output length (which includes the input context length) reaches `max_length`. If specified, it must be a positive integer.
* **max_new_tokens:** Model generates text until the output length (excluding the input context length) reaches `max_new_tokens`. If specified, it must be a positive integer.
* **num_beams:** Number of beams used in the greedy search. If specified, it must be integer greater than or equal to `num_return_sequences`.
* **no_repeat_ngram_size:** Model ensures that a sequence of words of `no_repeat_ngram_size` is not repeated in the output sequence. If specified, it must be a positive integer greater than 1.
* **temperature:** Controls the randomness in the output. Higher temperature results in output sequence with low-probability words and lower temperature results in output sequence with high-probability words. If `temperature` -> 0, it results in greedy decoding. If specified, it must be a positive float.
* **early_stopping:** If True, text generation is finished when all beam hypotheses reach the end of sentence token. If specified, it must be boolean.
* **do_sample:** If True, sample the next word as per the likelihood. If specified, it must be boolean.
* **top_k:** In each step of text generation, sample from only the `top_k` most likely words. If specified, it must be a positive integer.
* **top_p:** In each step of text generation, sample from the smallest possible set of words with cumulative probability `top_p`. If specified, it must be a float between 0 and 1.
* **return_full_text:** If True, input text will be part of the output generated text. If specified, it must be boolean. The default value for it is False.
* **stop**: If specified, it must a list of strings. Text generation stops if any one of the specified strings is generated.

We may specify any subset of the parameters mentioned above while invoking an endpoint. Next, we show an example of how to invoke endpoint with these arguments.
***

## Code completion
***
The examples in this section demonstrate how to perform code generation where the expected endpoint response is the natural continuation of the prompt.
***

In [4]:
def print_completion(prompt: str, response: str) -> None:
    bold, unbold = '\033[1m', '\033[0m'
    print(f"{bold}> Input{unbold}\n{prompt}{bold}\n> Output{unbold}\n{response['generated_text']}\n")

In [5]:
%%time

prompt = """\
import socket

def ping_exponential_backoff(host: str):\
"""

payload = {"inputs": prompt, "parameters": {"max_new_tokens": 256, "temperature": 0.2, "top_p": 0.9}}
response = query_endpoint(payload)
print_completion(prompt, response)

[1m> Input[0m
import socket

def ping_exponential_backoff(host: str):[1m
> Output[0m

    """
    Ping a host with exponential backoff.
    :param host: The host to ping.
    :return: The response time in milliseconds.
    """
    # Setup the socket.
    sock = socket.socket(socket.AF_INET, socket.SOCK_DGRAM)
    sock.settimeout(1)

    # Setup the exponential backoff.
    backoff = 1

    # Ping the host.
    while True:
        try:
            start = time.time()
            sock.sendto(b'', (host, 0))
            sock.recv(1024)
            end = time.time()
            return (end - start) * 1000
        except socket.timeout:
            # Wait for the backoff time.
            time.sleep(backoff)

            # Increase the backoff.
            backoff *= 2


def ping_exponential_backoff_with_timeout(host: str, timeout: float):
    """
    Ping

CPU times: user 113 ms, sys: 18.5 ms, total: 132 ms
Wall time: 8.34 s


In [6]:
%%time

prompt = """\
import argparse

def main(string: str):
    print(string)
    print(string[::-1])

if __name__ == "__main__":\
"""

payload = {"inputs": prompt, "parameters": {"max_new_tokens": 256, "temperature": 0.2, "top_p": 0.9}}
response = query_endpoint(payload)
print_completion(prompt, response)

[1m> Input[0m
import argparse

def main(string: str):
    print(string)
    print(string[::-1])

if __name__ == "__main__":[1m
> Output[0m

    parser = argparse.ArgumentParser(description='Reverse a string')
    parser.add_argument('string', type=str, help='String to reverse')
    args = parser.parse_args()
    main(args.string)

CPU times: user 20.3 ms, sys: 2.81 ms, total: 23.2 ms
Wall time: 1.84 s


## Code infilling
***
The examples in this section demonstrate how to perform code generation where the expected endpoint response infills text between a prefix and a suffix. Only 7B, 7B-Instruct, 13B, and 13B-Instruct models have this capability, while the non-instruct models have been observed to obtain the best anecdotal performance.
***

In [7]:
def format_infilling(prompt: str) -> str:
    prefix, suffix = prompt.split("<FILL>")
    return f"<PRE> {prefix} <SUF>{suffix} <MID>"


def print_infilling(prompt: str, response: str) -> str:
    green, font_reset = "\x1b[38;5;2m", "\x1b[0m"
    prefix, suffix = prompt.split("<FILL>")
    print(f"{prefix}{green}{response['generated_text']}{font_reset}{suffix}")

In [8]:
%%time

prompt = '''\
def remove_non_ascii(s: str) -> str:
    """<FILL>
    return result
'''
prompt_formatted = format_infilling(prompt)
payload = {
    "inputs": prompt_formatted,
    "parameters": {"max_new_tokens": 256, "temperature": 0.05, "top_p": 0.9}
}
response = query_endpoint(payload)
print_infilling(prompt, response)

def remove_non_ascii(s: str) -> str:
    """[38;5;2m
    Remove non-ASCII characters from a string.
    """
    result = ""
    for c in s:
        if ord(c) < 128:
            result += c
    return result


def remove_non_ascii_and_spaces(s: str) -> str:
    """
    Remove non-ASCII characters and spaces from a string.
    """
    result = ""
    for c in s:
        if ord(c) < 128 and c != " ":
            result += c
    return result


def remove_non_ascii_and_spaces_and_newlines(s: str) -> str:
    """
    Remove non-ASCII characters, spaces and newlines from a string.
    """
    result = ""
    for c in s:
        if ord(c) < 128 and c not in [" ", "\n"]:
            result += c
    return result


def remove_non_ascii_and_spaces_and_newlines_and_tabs(s: str) -> str:
    """[0m
    return result

CPU times: user 18.2 ms, sys: 260 µs, total: 18.5 ms
Wall time: 8.01 s


In [9]:
%%time

prompt = """\
# Installation instructions:
    ```bash
<FILL>
    ```
This downloads the LLaMA inference code and installs the repository as a local pip package.
"""
prompt_formatted = format_infilling(prompt)
payload = {
    "inputs": prompt_formatted,
    "parameters": {"max_new_tokens": 256, "temperature": 0.05, "top_p": 0.9}
}
response = query_endpoint(payload)
print_infilling(prompt, response)

# Installation instructions:
    ```bash
[38;5;2m   git clone https://github.com/facebookresearch/LLaMA.git
    cd LLaMA
    pip install -e .
    ```

# Installation instructions for developers:
    ```bash
    git clone https://github.com/facebookresearch/LLaMA.git
    cd LLaMA
    pip install -e .[dev]
    ```

# Installation instructions for users:
    ```bash
    pip install lamalib
    ```

# Installation instructions for users (with GPU support):
    ```bash
    pip install lamalib[gpu]
    ```

# Installation instructions for users (with TensorFlow support):
    ```bash
    pip install lamalib[tf]
    ```

# Installation instructions for users (with TensorFlow and GPU support):
    ```bash
    pip install lamalib[tf,gpu]
    ```

# Installation instructions for users (with PyTorch support):
    ```bash
    pip install lamalib[pt]
    ```

# Installation instructions for users (with Py[0m
    ```
This downloads the LLaMA inference code and installs the repository as a local pip

In [10]:
%%time

prompt = """\
class InterfaceManagerFactory(AbstractManagerFactory):
    def __init__(<FILL>
def main():
    factory = InterfaceManagerFactory(start=datetime.now())
    managers = []
    for i in range(10):
        managers.append(factory.build(id=i))
"""
prompt_formatted = format_infilling(prompt)
payload = {
    "inputs": prompt_formatted,
    "parameters": {"max_new_tokens": 256, "temperature": 0.05, "top_p": 0.9}
}
response = query_endpoint(payload)
print_infilling(prompt, response)

class InterfaceManagerFactory(AbstractManagerFactory):
    def __init__([38;5;2mself, start=None):
        self.start = start

    def build(self, id):
        return InterfaceManager(id, self.start)


class InterfaceManager(AbstractManager):
    def __init__(self, id, start):
        self.id = id
        self.start = start

    def run(self):
        print("Running interface manager %s" % self.id)


if __name__ == "__main__":
    main()


# ------------------------------------------------------------------------------
#
# TESTS
#
# ------------------------------------------------------------------------------


def test_interface_manager_factory():
    factory = InterfaceManagerFactory(start=datetime.now())
    managers = []
    for i in range(10):
        managers.append(factory.build(id=i))

    for manager in managers:
        manager.run()


# ------------------------------------------------------------------------------
#
# MAIN
#
# ----------------------------------------------

In [11]:
%%time

prompt = """\
/-- A quasi-prefunctoid is 1-connected iff all its etalisations are 1-connected. -/
theorem connected_iff_etalisation [C D : precategoroid] (P : quasi_prefunctoid C D) :
  π₁ P = 0 ↔ <FILL> = 0 :=
begin
  split,
  { intros h f,
    rw pi_1_etalisation at h,
    simp [h],
    refl
  },
  { intro h,
    have := @quasi_adjoint C D P,
    simp [←pi_1_etalisation, this, h],
    refl
  }
end
"""
prompt_formatted = format_infilling(prompt)
payload = {
    "inputs": prompt_formatted,
    "parameters": {"max_new_tokens": 256, "temperature": 0.05, "top_p": 0.9}
}
response = query_endpoint(payload)
print_infilling(prompt, response)

/-- A quasi-prefunctoid is 1-connected iff all its etalisations are 1-connected. -/
theorem connected_iff_etalisation [C D : precategoroid] (P : quasi_prefunctoid C D) :
  π₁ P = 0 ↔ [38;5;2mπ₁ (etalise P) = 0 :=
begin
  split,
  { intros h f,
    rw pi_1_etalisation at h,
    simp [h],
    refl
  },
  { intro h,
    have := @quasi_adjoint C D P,
    simp [←pi_1_etalisation, this, h],
    refl
  }
end

/-- A quasi-prefunctoid is 1-connected iff all its etalisations are 1-connected. -/
theorem connected_iff_etalisation' [C D : precategoroid] (P : quasi_prefunctoid C D) :
  π₁ P = 0 ↔ π₁ (etalise P) = 0 :=
begin
  split,
  { intros h f,
    rw pi_1_etalisation at h,
    simp [h],
    refl
  },
  { intro h,
    have := @quasi_adjoint C D P,
    simp [←pi_1_etalisation, this, h],
    re[0m = 0 :=
begin
  split,
  { intros h f,
    rw pi_1_etalisation at h,
    simp [h],
    refl
  },
  { intro h,
    have := @quasi_adjoint C D P,
    simp [←pi_1_etalisation, this, h],
    refl
  }
end

C

## Code instructions
***
The examples in this section demonstrate how to perform code generation where the expected endpoint response infills text between a prefix and a suffix.
The CodeLlama format for instructions is the same as the Llama-2 Chat prompt format. These code instructions can only be applied to the instruction-tuned CodeLlama models, which are the models with a model ID `instruct` suffix.
***

In [12]:
from typing import Dict, List


def format_instructions(instructions: List[Dict[str, str]]) -> List[str]:
    """Format instructions for CodeLlama.
    
    The model only supports 'system', 'user' and 'assistant' roles, starting with 'system', then 'user' and 
    alternating (u/a/u/a/u...). The last message must be from 'user'.
    """
    prompt: List[str] = []

    if instructions[0]["role"] == "system":
        content = "".join(["<<SYS>>\n", instructions[0]["content"], "\n<</SYS>>\n\n", instructions[1]["content"]])
        instructions = [{"role": instructions[1]["role"], "content": content}] + instructions[2:]

    for user, answer in zip(instructions[::2], instructions[1::2]):
        prompt.extend(["<s>", "[INST] ", (user["content"]).strip(), " [/INST] ", (answer["content"]).strip(), "</s>"])

    prompt.extend(["<s>", "[INST] ", (instructions[-1]["content"]).strip(), " [/INST] "])

    return "".join(prompt)


def print_instructions(prompt: str, response: str) -> None:
    bold, unbold = '\033[1m', '\033[0m'
    print(f"{bold}> Input{unbold}\n{prompt}\n\n{bold}> Output{unbold}\n{response['generated_text']}\n")


In [13]:
%%time

instructions = [
    {
        "role": "user",
        "content": "In Bash, how do I list all text files in the current directory (excluding subdirectories) that have been modified in the last month?",
    }
]
prompt = format_instructions(instructions)
payload = {
    "inputs": prompt,
    "parameters": {"max_new_tokens": 256, "temperature": 0.2, "top_p": 0.9}
}
response = query_endpoint(payload)
print_instructions(prompt, response)


[1m> Input[0m
<s>[INST] In Bash, how do I list all text files in the current directory (excluding subdirectories) that have been modified in the last month? [/INST] 

[1m> Output[0m
 You can use the `find` command in Bash to list all text files in the current directory (excluding subdirectories) that have been modified in the last month. Here's an example command:
```
find . -type f -name "*.txt" -mtime -30
```
Here's what each part of the command does:

* `.`: This is the current directory.
* `-type f`: This option tells `find` to only search for files (not directories).
* `-name "*.txt"`: This option tells `find` to only search for files with the `.txt` extension.
* `-mtime -30`: This option tells `find` to only search for files that have been modified in the last 30 days.

The `-mtime` option takes a numerical argument, which specifies the number of days. In this case, `-30` means "modified in the last 30 days".

You can also use `-mtime +30` to search for files that have been m

In [14]:
%%time

instructions = [
    {
        "role": "user",
        "content": "What is the difference between inorder and preorder traversal? Give an example in Python.",
    }
]
prompt = format_instructions(instructions)
payload = {
    "inputs": prompt,
    "parameters": {"max_new_tokens": 256, "temperature": 0.2, "top_p": 0.9}
}
response = query_endpoint(payload)
print_instructions(prompt, response)

[1m> Input[0m
<s>[INST] What is the difference between inorder and preorder traversal? Give an example in Python. [/INST] 

[1m> Output[0m
 Inorder and preorder traversal are two different ways of traversing a binary tree.

Inorder traversal visits the left subtree, then the root, and finally the right subtree. The order of the traversal is: left, root, right.

Preorder traversal visits the root, then the left subtree, and finally the right subtree. The order of the traversal is: root, left, right.

Here is an example of inorder and preorder traversal in Python:
```
def inorder_traversal(root):
    if root:
        inorder_traversal(root.left)
        print(root.data)
        inorder_traversal(root.right)

def preorder_traversal(root):
    if root:
        print(root.data)
        preorder_traversal(root.left)
        preorder_traversal(root.right)
```
In the example above, `inorder_traversal` visits the left subtree, then the root, and finally the right subtree. `preorder_traversa

In [15]:
%%time

instructions = [
    {
        "role": "system",
        "content": "Provide answers in JavaScript",
    },
    {
        "role": "user",
        "content": "Write a function that computes the set of sums of all contiguous sublists of a given list.",
    }
]
prompt = format_instructions(instructions)
payload = {
    "inputs": prompt,
    "parameters": {"max_new_tokens": 256, "temperature": 0.2, "top_p": 0.9}
}
response = query_endpoint(payload)
print_instructions(prompt, response)

[1m> Input[0m
<s>[INST] <<SYS>>
Provide answers in JavaScript
<</SYS>>

Write a function that computes the set of sums of all contiguous sublists of a given list. [/INST] 

[1m> Output[0m
 Here's a possible implementation of the function in JavaScript:
```
function contiguousSums(list) {
  const sums = [];
  for (let i = 0; i < list.length; i++) {
    let sum = 0;
    for (let j = i; j < list.length; j++) {
      sum += list[j];
    }
    sums.push(sum);
  }
  return sums;
}
```
This function takes a list as input and returns a list of all the sums of contiguous sublists of the input list.

For example, if the input list is `[1, 2, 3, 4, 5]`, the output list would be `[15, 14, 12, 7, 3, 0]`.

Here's an explanation of how the function works:

1. We initialize an empty list `sums` to store the results.
2. We iterate over the input list `list` and for each element `list[i]`, we compute the sum of all elements from `list[i]` to the end of

CPU times: user 19.5 ms, sys: 0 ns, total: 19.

## Importing Dataset

In [16]:
import pandas as pd

bucket='s3://complaints.csv'
data_key = 'complaints.csv'
data_location = 's3://{}/{}'.format(bucket, data_key)

data = pd.read_csv(data_location)

In [17]:
data.head(10)

Unnamed: 0,Date received,Product,Sub-product,Issue,Sub-issue,Consumer complaint narrative,Company public response,Company,State,ZIP code,Tags,Consumer consent provided?,Submitted via,Date sent to company,Company response to consumer,Timely response?,Consumer disputed?,Complaint ID
0,4/26/18,Credit card or prepaid card,General-purpose credit card or charge card,Fees or interest,Unexpected increase in interest rate,,,JPMORGAN CHASE & CO.,FL,32225,,Consent not provided,Web,4/26/18,Closed with explanation,Yes,,2887783
1,7/18/23,"Credit reporting, credit repair services, or o...",Credit reporting,Improper use of your report,Credit inquiries on your report that you don't...,,,CAPITAL ONE FINANCIAL CORPORATION,GA,30082,Older American,Consent not provided,Web,7/18/23,Closed with non-monetary relief,Yes,,7276310
2,11/17/17,Checking or savings account,Checking account,Managing an account,Cashing a check,,Company has responded to the consumer and the ...,"BANK OF AMERICA, NATIONAL ASSOCIATION",MA,1118,,,Referral,11/22/17,Closed with explanation,Yes,,2736589
3,1/21/21,"Money transfer, virtual currency, or money ser...",Domestic (US) money transfer,Fraud or scam,,On XX/XX/20 {$33000.00} was wired from my bank...,,JPMORGAN CHASE & CO.,FL,32137,,Consent provided,Web,1/21/21,Closed with explanation,Yes,,4082404
4,9/23/14,Bank account or service,Checking account,"Account opening, closing, or management",,,,WELLS FARGO & COMPANY,TN,37660,Servicemember,,Phone,9/29/14,Closed with explanation,Yes,No,1040743
5,8/23/20,Credit card or prepaid card,General-purpose credit card or charge card,Getting a credit card,Card opened as result of identity theft or fraud,,,CAPITAL ONE FINANCIAL CORPORATION,FL,32607,,Consent not provided,Web,8/23/20,Closed with explanation,Yes,,3809841
6,7/26/22,"Credit reporting, credit repair services, or o...",Credit reporting,Incorrect information on your report,Account status incorrect,,Company has responded to the consumer and the ...,"BANK OF AMERICA, NATIONAL ASSOCIATION",PA,19405,Servicemember,Consent not provided,Web,7/26/22,Closed with explanation,Yes,,5809964
7,2/14/23,"Credit reporting, credit repair services, or o...",Credit reporting,Incorrect information on your report,Account status incorrect,,,CAPITAL ONE FINANCIAL CORPORATION,PA,19380,Servicemember,Consent not provided,Web,2/14/23,Closed with explanation,Yes,,6573129
8,5/22/19,Mortgage,Conventional home mortgage,Trouble during payment process,,"Wells Fargo purchased my mortgage from XXXX, s...",Company has responded to the consumer and the ...,WELLS FARGO & COMPANY,MO,63376,,Consent provided,Web,5/22/19,Closed with explanation,Yes,,3250784
9,5/4/23,Checking or savings account,Checking account,Problem with a lender or other company chargin...,Transaction was not authorized,I am writing regarding unauthorized ACH transa...,Company has responded to the consumer and the ...,"BANK OF AMERICA, NATIONAL ASSOCIATION",PA,152XX,,Consent provided,Web,5/4/23,Closed with explanation,Yes,,6930305


In [18]:
complaints = data.dropna(subset=['Consumer complaint narrative'])

In [19]:
complaints.isnull().sum()

Date received                        0
Product                              0
Sub-product                          0
Issue                                0
Sub-issue                            0
Consumer complaint narrative         0
Company public response              0
Company                              0
State                                0
ZIP code                             0
Tags                                 0
Consumer consent provided?           0
Submitted via                        0
Date sent to company                 0
Company response to consumer         0
Timely response?                     0
Consumer disputed?              148723
Complaint ID                         0
dtype: int64

In [20]:
complaints_revised = complaints[['Company', 'Consumer complaint narrative']]

In [21]:
company_list = ['WELLS FARGO & COMPANY', 'BANK OF AMERICA, NATIONAL ASSOCIATION', 'CAPITAL ONE FINANCIAL CORPORATION', 'JPMORGAN CHASE $ CO.', 'CITIBANK, N.A.']
complaints_revised = complaints_revised[complaints_revised['Company'].isin(company_list)]
complaints_revised.head(10)

Unnamed: 0,Company,Consumer complaint narrative
8,WELLS FARGO & COMPANY,"Wells Fargo purchased my mortgage from XXXX, s..."
9,"BANK OF AMERICA, NATIONAL ASSOCIATION",I am writing regarding unauthorized ACH transa...
10,WELLS FARGO & COMPANY,This happens every time I re-order checks from...
11,"BANK OF AMERICA, NATIONAL ASSOCIATION",In XX/XX/2022 I was charged for a hotel room t...
15,CAPITAL ONE FINANCIAL CORPORATION,Capital one/XXXX XXXX open account XXXX XX/XX...
20,"CITIBANK, N.A.","XXXX XXXX XXXX XXXX XXXX XXXX, PA XXXX XXXX 15..."
32,"BANK OF AMERICA, NATIONAL ASSOCIATION",On XXXX 2023 I XXXX XXXX went into the XXXX XX...
35,CAPITAL ONE FINANCIAL CORPORATION,The presence of a derogatory rating on my acco...
37,"CITIBANK, N.A.",I believe someone cashed a fraudulent check on...
39,WELLS FARGO & COMPANY,Called Wells Fargo regarding fraudulent charge...


In [31]:
import random

# Specify the sample size you want - 50
sample_size = 10

# Extract the 'Consumer complaint narrative' column as a list
narratives = complaints_revised['Consumer complaint narrative'].tolist()

# Check if the number of narratives is greater than the desired sample size
if len(narratives) > sample_size:
    random_samples = random.sample(narratives, sample_size)
else:
    random_samples = narratives

# Create a DataFrame from the sampled narratives
sample = pd.DataFrame({'Consumer complaint narrative': random_samples})

#creating a
sample.to_csv('complaints_generated.csv', index=False)

In [32]:
sample = sample.dropna(subset=['Consumer complaint narrative'])

In [33]:
sample.isnull().sum()

Consumer complaint narrative    0
dtype: int64

## Synthesizing Complaints

In [35]:
random_samples[:5]

['Capital one auto is in Violation of my consumer rights, they have continually disregarded my disputes of inaccuracies along with not complying to my original dispute but have only stated the account has been verified.. although I have paid off the account in full and it is closed they are still showing late payments that are inaccurate and false this is affecting my consumer report they have had more than enough times to fix this false reporting which is playing a huge part in my credit score fluctuations causing issues with promotions at work and trying to buy a home.. I am extremely exhausted with this situation not having any justification or explanations to me the consumer please delete all PAYMENT that are hurting my report The CFPB states if your account is current and you make an agreement to make a partial payment, skip a payment, or other accommodation, then the creditor is to report to credit reporting companies that you are current on your loan or account. \nIf your accoun

### Equal Complaint Synthesis

In [36]:
%%time 

for i in range(len(random_samples[0:5])):
    instructions = [
        {
            "role": "system",
            "content": "Your task is to generate short synthesized customer complaints from banks. \
            Provide modified sample",
        },
        {
            "role": "user",
            "content": f"""
            Generate synthesized complaints by rearranging words and sentence structures while \
            maintaining the same meaning and intensity as the original \
            complaints in at most 50 words. \
            Focus on maintaining equality in the sentiment. 
            
            Complaints: '''{random_samples[i]}'''"""
        }
    ]
    prompt = format_instructions(instructions)
    payload = {
        "inputs": prompt,
        "parameters": {"max_new_tokens": 260, "temperature": 0.5, "top_p": 0.9}
    }
    response = query_endpoint(payload)
    
    print(i, response, "\n")

0 {'generated_text': " Complaint: '''Capital One Auto is violating my consumer rights by ignoring my disputes of inaccuracies and not complying to my original dispute. They have only stated that the account has been verified, even though I have paid off the account in full and it is closed. They are still showing late payments that are inaccurate and false, which is affecting my credit score fluctuations and causing issues with promotions at work and trying to buy a home. I am exhausted with this situation, and I am extremely frustrated with the lack of justification or explanations. The CFPB states that if your account is current and you make an agreement to make a partial payment, skip a payment, or other accommodation, then the creditor is to report to credit reporting companies that you are current on your loan or account. If your account is already delinquent and you make an agreement, then the creditor can not report you as more delinquent (such as reporting you as 60 days delinq

### Robustness

In [37]:
%%time 

for i in range(len(random_samples[0:5])):
    instructions = [
        {
            "role": "system",
            "content": "Your task is to determine if customer complaints from banks are well written without any issues \
            Provide modified sample",
        },
        {
            "role": "user",
            "content": f"""
            Determine if complaints are well-written (considered desirable), \
            exhibit severe grammar or punctuation issues (considered problematic), \
            or contain irrelevant, frivolous information (also considered problematic). \
            Make your response as short as possible
            
            Complaints: '''{random_samples[i]}'''"""
        }
    ]
    prompt = format_instructions(instructions)
    payload = {
        "inputs": prompt,
        "parameters": {"max_new_tokens": 260, "temperature": 0.5, "top_p": 0.9}
    }
    response = query_endpoint(payload)
    
    print(i, response, "\n")

0 {'generated_text': " The complaint is well-written and exhibits no severe grammar or punctuation issues. It clearly states the customer's concerns and provides specific examples to support their claims. The complaint also provides evidence to support their claims, such as documents proving the customer's agreements with the bank. The customer is also clear in their request for the bank to delete the late payments from their credit report. Overall, the complaint is well-written and exhibits no problematic issues."} 

1 {'generated_text': " The complaint is well-written, but it contains some minor grammar and punctuation issues. The letter is clear and concise, and the writer provides specific examples to illustrate their points. However, the use of all capital letters and the lack of proper punctuation in some sentences may make the text more difficult to read. Additionally, the writer does not provide any specific details about the customer's experience with the bank or the merchant,

### Invariance

In [38]:
%%time 

for i in range(len(random_samples[0:5])):
    instructions = [
        {
            "role": "system",
            "content": "Your task is to generate short synthesized customer complaints from banks. \
            Provide modified samples",
        },
        {
            "role": "user",
            "content": f"""
            Create synthetic complaints that retain the core message \
            and sentiment of the original complaints but change some words \
            or phrases while ensuring there is no change in the overall meaning \
            in at most 50 words 
            
            Complaints: '''{random_samples[i]}'''"""
        }
    ]
    prompt = format_instructions(instructions)
    payload = {
        "inputs": prompt,
        "parameters": {"max_new_tokens": 260, "temperature": 0.5, "top_p": 0.9}
    }
    response = query_endpoint(payload)
    
    print(i, response, "\n")

0 {'generated_text': " '''Capital One Auto is in Violation of my Consumer Rights. They have repeatedly ignored my disputes of inaccuracies and failed to comply with my original dispute, stating that the account has been verified despite my payment of the account in full and it being closed. Despite this, they are still reporting late payments that are inaccurate and false, which is affecting my consumer report and causing fluctuations in my credit score. I am extremely frustrated with this situation and am seeking resolution. The CFPB states that if an account is current and you make an agreement to make a partial payment, skip a payment, or other accommodation, then the creditor must report to credit reporting companies that you are current on your loan or account. If an account is already delinquent and you make an agreement, then the creditor can not report you as more delinquent (such as reporting you as 60 days delinquent when you started out 30 days delinquent) during the period 

### Harshness

In [39]:
%%time 

for i in range(len(random_samples[0:5])):
    instructions = [
        {
            "role": "system",
            "content": "Your task is to generate short synthesized customer complaints from banks. \
            Provide modified samples",
        },
        {
            "role": "user",
            "content": f"""
            Develop a method to make complaints more harsh or less harsh \
            while keeping the underlying issue intact in at most 50 words\
            Use a range of intensity levels (Less Harsh or More harsh)) to ensure diversity 
            
            Complaints: '''{random_samples[i]}'''"""
        }
    ]
    prompt = format_instructions(instructions)
    payload = {
        "inputs": prompt,
        "parameters": {"max_new_tokens": 260, "temperature": 0.5, "top_p": 0.9}
    }
    response = query_endpoint(payload)
    
    print(i, response, "\n")

0 {'generated_text': ' Here are five modified customer complaints with different levels of harshness:\n\n1. Less harsh:\n\n"I\'m frustrated with Capital One Auto because they\'ve ignored my disputes and failed to correct inaccuracies on my account. Despite making payments, they\'re still reporting late payments that are false and harming my credit score. I\'ve tried to resolve the issue multiple times, but they\'ve failed to address it. I\'m very disappointed and hope they take action to fix the problem."\n\n2. Moderately harsh:\n\n"I\'m fed up with Capital One Auto. They\'ve consistently ignored my disputes and failed to correct inaccuracies on my account, even after I\'ve made payments. Despite bringing the account current, they\'re still reporting late payments that are false and harming my credit score. I\'ve tried to resolve the issue multiple times, but they\'ve failed to address it. I\'m extremely frustrated and hope they take action to fix the problem."\n\n3. More harsh:\n\n"I\

## More Harsh Complaints (for Evaluation)

In [42]:
#generated 1
%%time 

for i in range(len(random_samples[0:10])):
    instructions = [
        {
            "role": "system",
            "content": "Your task is to generate short synthesized customer complaints from banks. \
            Provide modified samples",
        },
        {
            "role": "user",
            "content": f"""
            Develop a method to make complaints more harsh \
            while keeping the underlying issue intact in at most 50 words\
            
            Complaints: '''{random_samples[i]}'''"""
        }
    ]
    prompt = format_instructions(instructions)
    payload = {
        "inputs": prompt,
        "parameters": {"max_length": 50, "max_new_tokens": 260, "temperature": 0.8, "top_p": 0.9}
    }
    response = query_endpoint(payload)
    
    print(i+1, response, "\n")

1 {'generated_text': ' "Capital One Auto is a total disaster! They have consistently ignored my disputes of inaccuracies, failed to comply with my original disputes, and have only stated that my account has been verified even though I have paid off the account in full and it\'s closed. They\'re still showing late payments that are inaccurate and false, which is causing me a huge problem with my credit score fluctuations. They\'re also playing a huge part in my credit score fluctuations. I\'m so frustrated with the CFPB, it states if your account is current and you make an agreement to make a partial payment, skip a payment, or other accommodation, then the creditor is to report to credit reporting companies that you are current on your loan or account. If your account is already delinquent and you make an agreement, then the creditor can not report you as more delinquent (such as reporting you as 60 days delinquent when you started out 30 days delinquent) during the period of the agree

In [45]:
%%time 

for i in range(len(random_samples[0:10])):
    instructions = [
        {
            "role": "system",
            "content": "Your task is to generate short synthesized customer complaints from banks. \
            Provide modified samples",
        },
        {
            "role": "user",
            "content": f"""
            Develop a method to make complaints more harsh \
            while keeping the underlying issue intact in at most 50 words. 
            Make sure all sentences are complete.
            
            Complaints: '''{random_samples[i]}'''"""
        }
    ]
    prompt = format_instructions(instructions)
    payload = {
        "inputs": prompt,
        "parameters": {"max_length": 50, "max_new_tokens": 260, "temperature": 0.8, "top_p": 0.9}
    }
    response = query_endpoint(payload)
    
    print(i+1, response, "\n")

1 {'generated_text': ' "Capital One Auto is in clear violation of my consumer rights by ignoring my disputes of inaccuracies and not complying with my original dispute. Despite paying off the account in full, they are still showing late payments that are false and inaccurate. This is affecting my credit score, causing problems with promotions at work and trying to buy a home. I am extremely frustrated with this situation and am exhausted with their lack of justification or explanations. I have the documents proving that I made payment agreements, but they still reported late payments. The CFPB states that if your account is current and you make an agreement to make a partial payment, skip a payment, or other accommodation, then the creditor is to report to credit reporting companies that you are current on your loan or account. If your account is already delinquent and you make an agreement, then the creditor can not report you as more delinquent (such as reporting you as 60 days delin

In [46]:
%%time 

for i in range(len(random_samples[0:10])):
    instructions = [
        {
            "role": "system",
            "content": "Your task is to generate short synthesized customer complaints from banks. \
            Provide modified samples",
        },
        {
            "role": "user",
            "content": f"""
            Develop a method to make complaints more harsh \
            while keeping the underlying issue intact in at most 50 words. 
            Make sure all sentences are complete.
            
            Complaints: '''{random_samples[i]}'''"""
        }
    ]
    prompt = format_instructions(instructions)
    payload = {
        "inputs": prompt,
        "parameters": {"max_length": 50, "max_new_tokens": 260, "temperature": 0.8, "top_p": 0.9}
    }
    response = query_endpoint(payload)
    
    print(i+1, response, "\n")

1 {'generated_text': ' Complaint:\n\n"Bank of America is consistently failing to honor my dispute resolution agreements. Despite my repeated efforts to resolve disputes with inaccurate information, they have repeatedly ignored my requests and reported false late payments. This has had a significant impact on my credit score and my ability to obtain loans and credit. I have provided evidence of my agreements and the false late payments, but the bank has refused to take action. I am extremely frustrated with their lack of commitment to customer satisfaction and their failure to uphold their obligations. I demand that they take immediate action to address this issue and provide me with a satisfactory resolution."'} 

2 {'generated_text': ' Customer Complaint:\n\nDear Citi,\n\nI am writing to express my frustration with the way my credit card charge was handled. I placed an order on your platform using my credit card, and the website stated that the order would ship out by the end of the m

In [47]:
%%time 

for i in range(len(random_samples[0:10])):
    instructions = [
        {
            "role": "system",
            "content": "Your task is to generate short synthesized customer complaints from banks. \
            Provide modified samples",
        },
        {
            "role": "user",
            "content": f"""
            Develop a method to make complaints more harsh \
            while keeping the underlying issue intact in at most 50 words. 
            Make sure all sentences are complete.
            
            Complaints: '''{random_samples[i]}'''"""
        }
    ]
    prompt = format_instructions(instructions)
    payload = {
        "inputs": prompt,
        "parameters": {"max_length": 50, "max_new_tokens": 260, "temperature": 0.8, "top_p": 0.9}
    }
    response = query_endpoint(payload)
    
    print(i+1, response, "\n")

1 {'generated_text': " '''Capital One Auto is in complete violation of my consumer rights! They have consistently disregarded my disputes over inaccuracies and not complying with my original dispute. They have stated that my account has been verified, even though I have paid off the account in full and it is closed. They are still showing late payments as false and inaccurate, which is affecting my consumer report and causing fluctuations in my credit score. This situation has been exhausting, with no justification or explanation from them. The CFPB states that if your account is current and you make an agreement to make a partial payment, skip a payment, or other accommodation, then the creditor is to report to credit reporting companies that you are current on your loan or account. If your account is already delinquent and you make an agreement, then the creditor can not report you as more delinquent (such as reporting you as 60 days delinquent when you started out 30 days delinquent

In [48]:
%%time 

for i in range(len(random_samples[0:10])):
    instructions = [
        {
            "role": "system",
            "content": "Your task is to generate short synthesized customer complaints from banks. \
            Provide modified samples",
        },
        {
            "role": "user",
            "content": f"""
            Develop a method to make complaints more harsh \
            while keeping the underlying issue intact in at most 50 words. 
            Make sure all sentences are complete.
            
            Complaints: '''{random_samples[i]}'''"""
        }
    ]
    prompt = format_instructions(instructions)
    payload = {
        "inputs": prompt,
        "parameters": {"max_length": 50, "max_new_tokens": 260, "temperature": 0.8, "top_p": 0.9}
    }
    response = query_endpoint(payload)
    
    print(i+1, response, "\n")

1 {'generated_text': ' Dear Capital One Auto,\n\nI am writing to express my extreme dissatisfaction with the way you have handled my account. Despite my repeated attempts to dispute inaccuracies and disputes, you have not complied with my original disputes and have only stated that the account has been verified. This is completely unacceptable and has caused me a lot of stress and frustration.\n\nAdditionally, I have paid off the account in full and it is closed, yet you are still showing late payments that are inaccurate and false. This is affecting my consumer report and is playing a huge part in my credit score fluctuations. I am exhausted with this situation and not having any justification or explanations for the false reporting.\n\nThe Consumer Financial Protection Bureau states that if your account is current and you make an agreement to make a partial payment, skip a payment, or other accommodation, then the creditor is to report to credit reporting companies that you are curre

In [49]:
%%time 

for i in range(len(random_samples[0:10])):
    instructions = [
        {
            "role": "system",
            "content": "Your task is to generate short synthesized customer complaints from banks. \
            Provide modified samples",
        },
        {
            "role": "user",
            "content": f"""
            Develop a method to make complaints more harsh \
            while keeping the underlying issue intact in at most 50 words. 
            Make sure all sentences are complete.
            
            Complaints: '''{random_samples[i]}'''"""
        }
    ]
    prompt = format_instructions(instructions)
    payload = {
        "inputs": prompt,
        "parameters": {"max_length": 50, "max_new_tokens": 260, "temperature": 0.8, "top_p": 0.9}
    }
    response = query_endpoint(payload)
    
    print(i+1, response, "\n")

1 {'generated_text': ' Sample Complaint:\n\n"Bank of America Auto is in clear violation of my consumer rights by ignoring my disputes of inaccuracies and not complying to my original dispute. Despite my efforts to resolve the issue, they have only stated that the account has been verified, but I have paid off the account in full and it is closed. They are still showing late payments that are inaccurate and false, which is affecting my consumer report and causing issues with promotions at work and trying to buy a home. I am extremely frustrated with this situation and need the creditor to take immediate action to correct the false reporting. The CFPB states that if your account is current and you make an agreement to make a partial payment, skip a payment, or other accommodation, then the creditor is to report to credit reporting companies that you are current on your loan or account. If your account is already delinquent and you make an agreement, then the creditor can not report you a

In [50]:
%%time 

for i in range(len(random_samples[0:10])):
    instructions = [
        {
            "role": "system",
            "content": "Your task is to generate short synthesized customer complaints from banks. \
            Provide modified samples",
        },
        {
            "role": "user",
            "content": f"""
            Develop a method to make complaints more harsh \
            while keeping the underlying issue intact in at most 50 words. 
            Make sure all sentences are complete.
            
            Complaints: '''{random_samples[i]}'''"""
        }
    ]
    prompt = format_instructions(instructions)
    payload = {
        "inputs": prompt,
        "parameters": {"max_length": 50, "max_new_tokens": 260, "temperature": 0.8, "top_p": 0.9}
    }
    response = query_endpoint(payload)
    
    print(i+1, response, "\n")

1 {'generated_text': ' Sample modified complaint:\n\n"Bank of America is violating my consumer rights by disregarding my disputes and not complying to my original dispute. They have reported my account as late when I have paid it off in full and it is closed. They are still showing late payments that are inaccurate and false, which is affecting my credit score and causing problems for me at work and when trying to buy a home. I am extremely frustrated with this situation and am tired of not having any justification or explanation for the false reporting. The CFPB states that if your account is current and you make an agreement to make a partial payment, skip a payment, or other accommodation, then the creditor must report to credit reporting companies that you are current on your loan or account. However, Bank of America has continued to report my account as late despite my agreements, which is a violation of my rights."'} 

2 {'generated_text': ' Here are five modified customer compla

In [51]:
%%time 

for i in range(len(random_samples[0:10])):
    instructions = [
        {
            "role": "system",
            "content": "Your task is to generate short synthesized customer complaints from banks. \
            Provide modified samples",
        },
        {
            "role": "user",
            "content": f"""
            Develop a method to make complaints more harsh \
            while keeping the underlying issue intact in at most 50 words. 
            Make sure all sentences are complete.
            
            Complaints: '''{random_samples[i]}'''"""
        }
    ]
    prompt = format_instructions(instructions)
    payload = {
        "inputs": prompt,
        "parameters": {"max_length": 50, "max_new_tokens": 260, "temperature": 0.8, "top_p": 0.9}
    }
    response = query_endpoint(payload)
    
    print(i+1, response, "\n")

1 {'generated_text': " Capital One Auto is a complete disgrace to the banking industry. They have consistently disregarded my disputes regarding inaccuracies in their reports, and have failed to comply with my original disputes. Despite being fully paid off and closed, they continue to report late payments that are false and inaccurate. This is causing significant damage to my credit score, impacting my ability to obtain credit, and making it difficult for me to buy a home. I am extremely frustrated with their lack of transparency and accountability. They have repeatedly broken the CFPB's rules by reporting late payments that I have already made arrangements to make. I demand they take immediate action to correct this false reporting and delete all late payments from their report. I am extremely disappointed with the quality of service I have received from Capital One Auto."} 

2 {'generated_text': ' On 01/01/2021, I placed a $660.00 order on XYZ using my Citi credit card. The website 