<p> <center> <a href="../../LLM-Application.ipynb">Home Page</a> </center> </p>

 
<div>
    <span style="float: left; width: 33%; text-align: left;"><a href="triton-llama.ipynb">Previous Notebook</a></span>
    <span style="float: left; width: 33%; text-align: center;">
        <a href="llama-chat-finetune.ipynb">1</a>
        <a href="trt-llama-chat.ipynb">2</a>
        <a href="triton-llama.ipynb">3</a>
        <a>4</a>
    </span>
</div>

# Challenge

---

This challenge notebook is vital to test your understanding and assist you in perfecting what you have learned in the previous notebooks. You are to provide a solution to the problem statement below by following the finetuning process, optimizing to generate tensorrt engine, and deploying.

## Problem Statement

eCommerce websites receive vast customer feedback through reviews and comments and sometimes have long chats with eCommerce agents. Manually analyzing this feedback is time-consuming, tedious, and prone to errors. The solution is to develop a generative AI-based solution that can efficiently analyze and summarize feedback and chat qualitatively.

- **Use Case 1**: Customer feedback and Agent chat summarization
- **Domain**: E-commerce
- **Dataset**: Salesforce dialogstudio


## Instructions

You are to reproduce the End-to-End approach using the `Salesforce dialogstudio` dataset. To complete the challenge, you are to implement the following steps:

- Perform the finetuning process
    - import needed libraries
    - preprocess the dialogstudio TweetSumm (train, validation, test)
    - set the path of your base model (Llama-2-7b)
    - convert the model to Hugging Face Transformers Format
    - initialize paths to the base model, tokenizer, and output checkpoint (`new_model='../../model/Llama-2-7b-hf-finetune'`)
    - load tokenizer
    - set the training parameter (TrainingArguments object)
    - configure PEFT With LoRA (LoraConfig())
    - 4-bit Quantization Configuration (BitsAndBytesConfig())
    - load base model using AutoModelForCausalLM.from_pretrained()
    - set the Trainer Parameter using SFTTrainer()
    - apply trainer.train() and your new model to: `../../model/Llama-2-7b-hf-finetune`
    - inference your model with provided test data
    - merge the base model with the finetune checkpoint and save as `../../model/Llama-2-7b-hf-merged.` 
- Build TensorRT engine with single GPU and FP16
    - execute build.py script to build tensorrt engine
    - execute run.py script to run inference (use the flag `--input_text` or `--inpup_file` for the prompt text)
- Deploy model on Triton server
    - follow the instructions as given in the triton-llama.ipynb notebook

The data preprocessing part of the solution code is written for you. You are to complete the rest by creating empty cells below.



### Challenge Duration

The challenge is expected to take  `3hrs: 30mins`

## Solution


### Data Preprocessing

In [None]:
import json
import re
from pprint import pprint
import pandas as pd
import torch
from datasets import Dataset, load_dataset
from huggingface_hub import notebook_login
from peft import LoraConfig, PeftModel
from transformers import (
    AutoModelForCausalLM,
    AutoTokenizer,
    BitsAndBytesConfig,
    TrainingArguments,
    pipeline,
)
from trl import SFTTrainer


#### Load dataset

In [None]:
dataset = load_dataset("Salesforce/dialogstudio","TweetSumm")
dataset

#### Preprocess dataset

In [None]:
def format_text(text):
    text = re.sub(r"\s+", " ", text)
    text = re.sub(r"\^[^ ]+", "", text)
    text = re.sub(r"http\S+", "", text)
    text = re.sub(r"@[^\s]+", "", text)
    return text

def transform_conversation(data):
    transformed_text = ""
    for row in data["log"]:
        user = format_text(row["user utterance"])
        transformed_text += f"user: {user.strip()}\n"
        agent = format_text(row["system response"])
        transformed_text += f"agent: {agent.strip()}\n"
    return transformed_text


def format_training_prompt(conversation, summary):
    return f"""### Instruction: Write  a summary of the conversation below. ### Input: {conversation.strip()} ### Response: {summary} """.strip()


def generate_conversation(data):
    summaries = json.loads(data["original dialog info"])["summaries"]["abstractive_summaries"]
    summary = summaries[0]
    summary = " ".join(summary)
    
    transformed_text = transform_conversation(data)
    return {
        "conversation": transformed_text,
        "summary": summary,
        "text": format_training_prompt(transformed_text, summary),
    }



In [None]:
train_example = generate_conversation(dataset["train"][0])
print("summary:\n",train_example["summary"], "\n")
print("conversation:\n",train_example["conversation"], "\n")
print("text:\n",train_example["text"], "\n")

**Expected Output :**

```text
summary:
 Customer enquired about his Iphone and Apple watch which is not showing his any steps/activity and health activities. Agent is asking to move to DM and look into it. 

conversation:
 user: So neither my iPhone nor my Apple Watch are recording my steps/activity, and Health doesn’t recognise either source anymore for some reason. Any ideas?   please read the above.
agent: Let’s investigate this together. To start, can you tell us the software versions your iPhone and Apple Watch are running currently?
user: My iPhone is on 11.1.2, and my watch is on 4.1.
agent: Thank you. Have you tried restarting both devices since this started happening?
user: I’ve restarted both, also un-paired then re-paired the watch.
agent: Got it. When did you first notice that the two devices were not talking to each other. Do the two devices communicate through other apps such as Messages?
user: Yes, everything seems fine, it’s just Health and activity.
agent: Let’s move to DM and look into this a bit more. When reaching out in DM, let us know when this first started happening please. For example, did it start after an update or after installing a certain app?
 

text:
 ### Instruction: Write  a summary of the conversation below. ### Input: user: So neither my iPhone nor my Apple Watch are recording my steps/activity, and Health doesn’t recognise either source anymore for some reason. Any ideas?   please read the above.
agent: Let’s investigate this together. To start, can you tell us the software versions your iPhone and Apple Watch are running currently?
user: My iPhone is on 11.1.2, and my watch is on 4.1.
agent: Thank you. Have you tried restarting both devices since this started happening?
user: I’ve restarted both, also un-paired then re-paired the watch.
agent: Got it. When did you first notice that the two devices were not talking to each other. Do the two devices communicate through other apps such as Messages?
user: Yes, everything seems fine, it’s just Health and activity.
agent: Let’s move to DM and look into this a bit more. When reaching out in DM, let us know when this first started happening please. For example, did it start after an update or after installing a certain app? ### Response: Customer enquired about his Iphone and Apple watch which is not showing his any steps/activity and health activities. Agent is asking to move to DM and look into it. 

```

In [None]:
def preprocess_dataset(data: Dataset):
    return (
         data.shuffle(seed=42).map(generate_conversation).remove_columns(
         [
          "original dialog id",
             "new dialog id",
             "dialog index",
             "original dialog info",
             "log",
             "prompt",
         ]
         )
    
    )

dataset["train"] = process_dataset(dataset["train"])
dataset["validation"] = process_dataset(dataset["validation"])
dataset["test"] = process_dataset(dataset["test"])

In [None]:
dataset

#### Download Llama-2-7b

In [None]:
# download Llama-2-7b model
!python3 ../../source_code/Llama2/download-llama2.py

print("extracting files......")
!tar -xf ../../model/Llama-2-7b.tar  -C ../../model

print("files extraction done! removing tar file......")
!rm -rf ../../model/Llama-2-7b.tar
print("All done!!!")

### Complete the Rest Task in the Cell Below 

###  Run Inference

The functions below format the test data for running inference.

In [None]:
def format_training_prompt(conversation, summary):
    return f"""### Instruction: Write  a summary of the conversation below. ### Input: {conversation.strip()} ### Response: """.strip()


In [None]:
prompt_examples = []
for row in dataset["test"].select(range(3)):
    #print(data_point)
    prompt_examples.append(
        {
          "summary": row["summary"],
            "conversation": row["conversation"],
            "prompt": format_training_prompt(row["conversation"], row["summary"]),
        }    
    )
    
test_df = pd.DataFrame(prompt_examples)
test_df

#### Create Custom Function to Summarize Conversation

In [None]:
DEVICE = "cuda:0" 
def summarize(model, text):
    inputs = tokenizer(text, return_tensors="pt").to(DEVICE)
    inputs_length = len(inputs["input_ids"][0])
    with torch.inference_mode():
        outputs = model.generate(**inputs, max_new_tokens=256, temperature= 1)
    return tokenizer.decode(outputs[0][inputs_length:], skip_special_tokens=True) 

In [None]:
%%time

model = PeftModel.from_pretrained(model, new_model)

example = test_df.iloc[0]
print("Grundtruth : \n", example.summary)

print("Conversation : \n", example.conversation)


summary = summarize(model, example.prompt)
print("All Prompt: \n", summary,"\n")

print("Summary Generated : \n",summary.strip().split("\n")[0])

---
## Switch to TensorRT-LLM Container to Continue the Challenge

---

### Build TensorRT engine with single GPU and FP16

### Deploy Model on Triton Server

---
## Licensing

Copyright © 2022 OpenACC-Standard.org. This material is released by OpenACC-Standard.org, in collaboration with NVIDIA Corporation, under the Creative Commons Attribution 4.0 International (CC BY 4.0). These materials include references to hardware and software developed by other entities; all applicable licensing and copyrights apply.

<div>
    <span style="float: left; width: 33%; text-align: left;"><a href="triton-llama.ipynb">Previous Notebook</a></span>
    <span style="float: left; width: 33%; text-align: center;">
        <a href="llama-chat-finetune.ipynb">1</a>
        <a href="trt-llama-chat.ipynb">2</a>
        <a href="triton-llama.ipynb">3</a>
        <a>4</a>
    </span>
</div>

<p> <center> <a href="../../LLM-Application.ipynb">Home Page</a> </center> </p>