## Baseline Model

 Defining a function to create a simple product description by combining the product title, product type ID, and bullet points from the input text.

In [None]:
def generate_baseline_description(row):
    try:
        # Extract title, product_type_id, and bullet_points from the input_text
        input_text = row['input_text']

        # parsing fields out from the combined input_text
        title = input_text.split("TITLE: ")[-1].split(" PRODUCT_TYPE_ID: ")[0].strip()
        product_type = input_text.split("PRODUCT_TYPE_ID: ")[-1].split(" BULLET_POINTS: ")[0].strip()
        bullet_points = input_text.split("BULLET_POINTS: ")[-1].strip()

        description = (
            f"This {product_type} titled '{title}' features the following: {bullet_points}"
        )
        return description
    except Exception as e:
        print(f"Error processing row: {e}")
        return "Description unavailable"


# Apply result to df
df_llm_input['generated_baseline'] = df_llm_input.apply(generate_baseline_description, axis=1)

Creating a function to calculate several metrics (BLEU, ROUGE-1, ROUGE-2, BERTScore-F1, and Flesch Reading Ease) to compare the generated descriptions against the target descriptions

In [None]:
## function for evaluation
!pip install -q textstat evaluate bert_score

from evaluate import load
import numpy as np
from textstat import flesch_reading_ease

# Load evaluation metrics
bleu = load("bleu")
rouge = load("rouge")
bertscore = load("bertscore")

def evaluate_model(df, gen_col, ref_col="description", lang="en"):
    refs = df[ref_col].fillna("").tolist()
    gens = df[gen_col].fillna("").tolist()

    # BLEU
    bleu_score = bleu.compute(predictions=gens, references=[[r] for r in refs])['bleu']

    # ROUGE
    rouge_scores = rouge.compute(predictions=gens, references=refs)
    rouge1 = rouge_scores['rouge1']
    rouge2 = rouge_scores['rouge2']

    # BERTScore
    bert = bertscore.compute(predictions=gens, references=refs, lang=lang)
    bert_f1 = np.mean(bert['f1'])

    # Flesch Reading Ease
    flesch_scores = [flesch_reading_ease(text) for text in gens]
    flesch_avg = np.mean(flesch_scores)

    # Display
    print(f"\nEvaluation for Model: {gen_col}")
    print(f"BLEU Score         : {bleu_score:.4f}")
    print(f"ROUGE-1 Score      : {rouge1:.4f}")
    print(f"ROUGE-2 Score      : {rouge2:.4f}")
    print(f"BERTScore-F1       : {bert_f1:.4f}")
    print(f"Flesch Reading Ease: {flesch_avg:.2f}")

    return {
        "BLEU": bleu_score,
        "ROUGE-1": rouge1,
        "ROUGE-2": rouge2,
        "BERTScore-F1": bert_f1,
        "Flesch": flesch_avg
    }

In [None]:
# baseline metrics
results_baseline = evaluate_model(df_llm_input, gen_col='generated_baseline', ref_col='target_description')

Some weights of RobertaModel were not initialized from the model checkpoint at roberta-large and are newly initialized: ['pooler.dense.bias', 'pooler.dense.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
  return forward_call(*args, **kwargs)



Evaluation for Model: generated_baseline
BLEU Score         : 0.1006
ROUGE-1 Score      : 0.2889
ROUGE-2 Score      : 0.1271
BERTScore-F1       : 0.8336
Flesch Reading Ease: 33.60


In [None]:
import re

# Extract original metadata from input_text column
def extract_fields(text):
    pattern = r"TITLE: (.*?) PRODUCT_TYPE_ID: (.*?) BULLET_POINTS: (.*)"
    match = re.match(pattern, text)
    if match:
        return match.groups()
    return ("", "", "")

extracted = df_llm_input['input_text'].apply(extract_fields)
df_llm_input[['title', 'product_type_id', 'bullet_points']] = pd.DataFrame(extracted.tolist(), index=df_llm_input.index)

# original vs generated descriptions
for i in range(2):
    print(f"Product Title:\n{df_llm_input.loc[i, 'title']}")
    print(f"\n* Original Description:\n{df_llm_input.loc[i, 'target_description']}")
    print(f"\n🤖 Generated Desc by Baseline model):\n{df_llm_input.loc[i, 'generated_baseline']}")
    print('---' * 20)

Product Title:
Plane Light System, Plastic + Metal Taxi Lights Airplane LED Light, for Model Plane

* Original Description:
Features:&nbsp;<br> Full set of bright LED lights for your model plane or glider. <br> Use the power supplied by the receiver and don't require its own battery. <br> The circuitry has been specially designed so that the landing lights and taxi lights can be switched on and off directly. <br> Either use a Y-harness and link it to another channel such as the gear or assign a separate channel to it. <br> It includes red beacon light, white taxi light, white strobe light, white landing light, green navigation light, red navigation light. <br> <br>Specification:&nbsp;<br>Material: Plastic + Metal<br>Color: Shown As Pictures<br>Weight: 111g<br>Type: RC Part &amp; Accessory<br>Control Board Size: Approx. 54 * 35 * 13mm / 2.1 * 1.4 * 0.5inch<br> Cable Size: Approx. OD 0.8mm*0.8m <br> LED Quantity: 14pcs <br>Lighting System Specification:<br>2 x Red Beacon Lights (#11, #12

Analysis:
* The BLEU (0.1006), ROUGE-1 (0.2889), ROUGE-2 (0.1271) scores are relatively low, which is expected for the baseline model that mainly concatenates input features rather than generating natural language. They indicate limited overlap in terms of n-grams (BLEU, ROUGE-2) and unigrams (ROUGE-1) between the generated and target descriptions.
* The BERTScore-F1 (0.8336) score is higher meaning the generated descriptions capture some of the semantic meaning of the target descriptions.
* The Flesch Reading Ease (33.60) score indicates that the generated descriptions are quite difficult to read. This is probably due to the concatenated nature of the baseline description.