**AI & Machine Learning (KAN-CINTO4003U) - Copenhagen Business School | Spring 2025**

***


# Part III: LLM

Please see the description of the assignment in the README file (section 3) <br>
**Guide notebook**: [guides/llm_guide.ipynb](guides/llm_guide.ipynb)


***

<br>

* Note that you should report results using a classification report. 

* Also, remember to include some reflections on your results: how do they compare with the results from Part I, BoW?, and part II, BERT? Are there any hyperparameters or prompting techniques that are particularly important?

* You should follow the steps given in the `llm_guide` notebook

<br>


***

In [1]:
# imports for the project

import pandas as pd
from decouple import config
from ibm_watsonx_ai import APIClient
from ibm_watsonx_ai import Credentials
from ibm_watsonx_ai.foundation_models import ModelInference


### 1. Load the data

We can load this data directly from [Hugging Face Datasets](https://huggingface.co/docs/datasets/) - The HuggingFace Hub- into a Pandas DataFrame. Pretty neat!

**Note**: This cell will download the dataset and keep it in memory. If you run this cell multiple times, it will download the dataset multiple times.

You are welcome to increase the `frac` parameter to load more data.

In [2]:

splits = {'train': 'data/train-00000-of-00001.parquet', 'test': 'data/test-00000-of-00001.parquet'}
# train = pd.read_parquet("hf://datasets/fancyzhx/ag_news/" + splits["train"])
test = pd.read_parquet("hf://datasets/fancyzhx/ag_news/" + splits["test"])

In [3]:
from decouple import Config, RepositoryEnv
import os
print(os.getcwd())

config = Config(RepositoryEnv(".env"))  # Adjust path if needed
WX_API_KEY = config('WX_API_KEY')

/Users/jkatz/git/aiml25/mas/ma2


In [4]:
label_map = {
    0: 'World',
    1: 'Sports',
    2: 'Business',
    3: 'Sci/Tech'
}

def preprocess(df: pd.DataFrame, frac = 1e-2, label_map = label_map, seed=42) -> pd.DataFrame:
    return  (
        df
        .assign(label=lambda x: x['label'].map(label_map))
        [lambda df: df['label'].isin(label_map.values())]
        .groupby('label')
        .apply(lambda x: x.sample(frac=frac, random_state=seed))
        .reset_index(drop=True)

    )

# train_df = preprocess(train, frac=0.01)
test_df = preprocess(test, frac=0.1)

# clear up some memory by deleting the original dataframes
# del train
del test

test_df.shape, # train_df.shape, 

  .apply(lambda x: x.sample(frac=frac, random_state=seed))


((760, 2),)

In [5]:
from ibm_watsonx_ai.foundation_models.schema import TextGenParameters

TextGenParameters.show()

+-----------------------+----------------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------+
| PARAMETER             | TYPE                                   | EXAMPLE VALUE                                                                                                                             |
| decoding_method       | str, TextGenDecodingMethod, NoneType   | sample                                                                                                                                    |
+-----------------------+----------------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------+
| length_penalty        | dict, TextGenLengthPenalty, NoneType   | {'decay_factor': 2.5, 'start_index': 5}                                                                  

In [None]:
credentials = Credentials(
    url = "https://us-south.ml.cloud.ibm.com",
    api_key = WX_API_KEY
)

client = APIClient(
    credentials=credentials, 
    project_id="15ea7f54-fa96-4600-a1c2-05b96934de0e"
)

In [7]:
import pandas as pd
from sklearn.metrics import classification_report 
from tqdm import tqdm

In [8]:
examples = test_df.groupby("label").sample(n=1, random_state=42).reset_index(drop=True)

for idx, row in examples.iterrows():
    print(f"Label: {row['label']}")
    print(f"Text: {row['text']}")
    print("-" * 80)

Label: Business
Text: Poor nations seek WTO textile aid With 40 years of textile quotas about to be abolished in a move to help developing nations, a group of the world #39;s poorest countries are asking for a different approach: special trade deals to protect them from a free-for-all.
--------------------------------------------------------------------------------
Label: Sci/Tech
Text: RIM takes new BlackBerry design overseas Revamped keyboard is key feature of the 7100v, which is headed for European and Asian shores.\
--------------------------------------------------------------------------------
Label: Sports
Text: Novak Captures First Indoor Title BASEL, Switzerland Oct 31, 2004 - Jiri Novak of the Czech Republic won the Swiss Indoors for his first indoor title, defeating David Nalbandian in five sets Sunday in a final in which the Argentine smashed two rackets.
--------------------------------------------------------------------------------
Label: World
Text: German investor conf

In [9]:
PARAMS = TextGenParameters(
    temperature=0,              # Higher temperature means more randomness - In this case we don't want randomness
    max_new_tokens=10,          # Maximum number of tokens to generate
    stop_sequences=[".", "\n"], # Stop generating text when these sequences are encountered
)

model = ModelInference(
    api_client=client,
    model_id="ibm/granite-13b-instruct-v2",
)


Failure during Get available foundation models. (GET https://us-south.ml.cloud.ibm.com/ml/v1/foundation_model_specs?version=2025-03-20&project_id=15ea7f54-fa96-4600-a1c2-05b96934de0e&filters=function_text_generation%2C%21lifecycle_withdrawn%3Aand&limit=200)
Status code: 403, body: {"errors":[{"code":"no_associated_service_instance_error","message":"project_id 15ea7f54-fa96-4600-a1c2-05b96934de0e is not associated with a WML instance","more_info":"https://cloud.ibm.com/apidocs/watsonx-ai"}],"trace":"80133198ac69465005b8f76e5b818ac7","status_code":403}
Unable to get model specifications from url: https://us-south.ml.cloud.ibm.com
Reason: Failure during Get available foundation models. (GET https://us-south.ml.cloud.ibm.com/ml/v1/foundation_model_specs?version=2025-03-20&project_id=15ea7f54-fa96-4600-a1c2-05b96934de0e&filters=function_text_generation%2C%21lifecycle_withdrawn%3Aand&limit=200)
Status code: 403, body: {"errors":[{"code":"no_associated_service_instance_error","message":"proje

WMLClientError: Unable to get model specifications from url: https://us-south.ml.cloud.ibm.com
Reason: Failure during Get available foundation models. (GET https://us-south.ml.cloud.ibm.com/ml/v1/foundation_model_specs?version=2025-03-20&project_id=15ea7f54-fa96-4600-a1c2-05b96934de0e&filters=function_text_generation%2C%21lifecycle_withdrawn%3Aand&limit=200)
Status code: 403, body: {"errors":[{"code":"no_associated_service_instance_error","message":"project_id 15ea7f54-fa96-4600-a1c2-05b96934de0e is not associated with a WML instance","more_info":"https://cloud.ibm.com/apidocs/watsonx-ai"}],"trace":"80133198ac69465005b8f76e5b818ac7","status_code":403}

In [None]:
SYSTEM_PROMPT = """You task is to classify news stories into one of four categories

CATEGORIES:
{categories}
Ensure that your answer is specifically the same as one the categories!


TEXT:
{text}

Please assign the correct category to the text. Answer with the correct category and nothing else.


Examples to help you determine the category:
Example 1:
Label: Business
Text: Poor nations seek WTO textile aid With 40 years of textile quotas about to be abolished in a move to help developing nations, a group of the world #39;s poorest countries are asking for a different approach: special trade deals to protect them from a free-for-all.

Example 2:
Label: Sci/Tech
Text: RIM takes new BlackBerry design overseas Revamped keyboard is key feature of the 7100v, which is headed for European and Asian shores.\

Category:
"""

In [None]:
CATEGORIES = "- " + "\n- ".join(test_df["label"].unique())  # Create a string with all categories

predictions = []

for text in tqdm(test_df["text"]):

    # format the prompt with the categories and the text
    prompt = SYSTEM_PROMPT.format(categories=CATEGORIES, text=text)
    
    # generate the response from the model
    response = model.generate(prompt)

    # extract the generated text from the response
    prediction = response["results"][0]["generated_text"].strip()

    # append the prediction to the list of predictions
    predictions.append(prediction)

In [None]:
print(classification_report(test_df.label, predictions))


The large language model (LLM) didn’t get as high of a score as BERT, which actually makes sense when you think about it. LLMs are designed to do a bunch of different things like answering questions, writing stories, and generating text, so they’re more general-purpose. On the other hand, BERT is specifically built for encoding tasks, so it’s kind of specialized and better at that one thing. I also tried giving the LLM a few example inputs in the system prompt to help guide it, and that did make the performance a little bit better, but not by a whole lot. It still didn’t catch up to BERT, but it showed some small improvement. Overall, it seems like BERT just has the advantage when it comes to tasks that need really strong encoding skills.