### Step 1: Install necessary libraries

In [14]:
pip install transformers datasets

Collecting datasets
  Downloading datasets-2.18.0-py3-none-any.whl.metadata (20 kB)
Collecting pyarrow>=12.0.0 (from datasets)
  Downloading pyarrow-15.0.2-cp310-cp310-macosx_10_15_x86_64.whl.metadata (3.0 kB)
Collecting pyarrow-hotfix (from datasets)
  Downloading pyarrow_hotfix-0.6-py3-none-any.whl.metadata (3.6 kB)
Collecting xxhash (from datasets)
  Downloading xxhash-3.4.1-cp310-cp310-macosx_10_9_x86_64.whl.metadata (12 kB)
Collecting multiprocess (from datasets)
  Downloading multiprocess-0.70.16-py310-none-any.whl.metadata (7.2 kB)
Collecting fsspec<=2024.2.0,>=2023.1.0 (from fsspec[http]<=2024.2.0,>=2023.1.0->datasets)
  Using cached fsspec-2024.2.0-py3-none-any.whl.metadata (6.8 kB)
Collecting aiohttp (from datasets)
  Downloading aiohttp-3.9.3-cp310-cp310-macosx_10_9_x86_64.whl.metadata (7.4 kB)
Collecting aiosignal>=1.1.2 (from aiohttp->datasets)
  Downloading aiosignal-1.3.1-py3-none-any.whl.metadata (4.0 kB)
Collecting frozenlist>=1.1.1 (from aiohttp->datasets)
  Downloadi

In [15]:
pip install torch

Note: you may need to restart the kernel to use updated packages.


In [16]:
pip install tensorflow

Note: you may need to restart the kernel to use updated packages.


### Step 2: Load Pretrained GPT Model

Use TFAutoModel and AutoTokenizer to load the pretrained model and it’s associated tokenizer (more on an TFAutoClass below):

#### Autoclass

Under the hood, the AutoModelForSequenceClassification and AutoTokenizer classes work together to power the pipeline() you used above. An AutoClass is a shortcut that automatically retrieves the architecture of a pretrained model from its name or path. You only need to select the appropriate AutoClass for your task and it’s associated preprocessing class.

#### AutoTokenizer

A tokenizer is responsible for preprocessing text into an array of numbers as inputs to a model. There are multiple rules that govern the tokenization process, including how to split a word and at what level words should be split (learn more about tokenization in the tokenizer summary). The most important thing to remember is you need to instantiate a tokenizer with the same model name to ensure you’re using the same tokenization rules a model was pretrained with.

In [11]:
# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

model_name = "gpt2"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name)

#### Pipeline

The pipeline() can accommodate any model from the Hub, making it easy to adapt the pipeline() for other use-cases. Specify the model and tokenizer in the pipeline():

In [12]:
from transformers import pipeline

classifier = pipeline("text-generation", model=model, tokenizer=tokenizer)

### Step 3: Prepare input:

In [3]:
import pandas as pd
df = pd.read_csv("dummy_data_new.csv")

In [4]:
from transformers import T5ForConditionalGeneration, T5Tokenizer

# Initialize the T5 model and tokenizer
model_name = "t5-small"
model = T5ForConditionalGeneration.from_pretrained(model_name)
tokenizer = T5Tokenizer.from_pretrained(model_name)

# Example input
product = "iPhone 15"
criteria_ratings = {
    "1": "very poor",
    "2": "poor",
    "3": "acceptable",
    "4": "good",
    "5": "excellent"
}

# Generate a custom template based on the ratings
template = f"The product '{product}' is rated for its {criteria_ratings['5']} quality. However, its performance is rated only {criteria_ratings['4']}, and the design is rather {criteria_ratings['2']}."

# Generate the review by incorporating the template and ratings into the input text
input_text = f"generate a review: {template}"

### Step 4: Generate Review:

In [13]:
# Generate a custom template based on the ratings
template = f"The product '{product}' is rated for its {criteria_ratings['5']} quality. However, its performance is rated only {criteria_ratings['4']}, and the design is rather {criteria_ratings['2']}."

# Generate the review by incorporating the template and ratings into the input text
input_text = f"generate a review: {template}"

# Tokenize and encode the input text
input_ids = tokenizer.encode(input_text, return_tensors="pt", max_length=512, truncation=True)

# Generate the review
output = model.generate(input_ids, max_length=150, num_beams=4, early_stopping=True)
review = tokenizer.decode(output[0], skip_special_tokens=True)

# Print the generated review
print(review)

'iPhone 15' is rated for its excellent quality, but its performance is only good, and the design is rather poor.
