# Generating Structured Functional Representations for Video Game Opinions (Viggo Dataset)

Here, I generate structured functional representations of video game opinions the three LLMs: 
1. [mistralai/mixtral-8x7b-instruct-v0.1](https://replicate.com/mistralai/mixtral-8x7b-instruct-v0.1)
2. [meta/meta-llama-3-70b-instruct](https://replicate.com/meta/meta-llama-3-70b-instruct)
3. [meta/meta-llama-3-8b-instruct](https://replicate.com/meta/meta-llama-3-8b-instruct)

I use the [Viggo dataset](https://huggingface.co/datasets/GEM/vigg). I make use of [Replicate](https://replicate.com/) APIs to generate responses from these models.


In [None]:
import os
from datasets import load_dataset, Dataset
import json
import replicate

## Load and preprocess the Viggo dataset

In [None]:
# Load the dataset
print("Loading and preprocessing the dataset...")

# Load the Viggo dataset from the GEM benchmark
dataset = load_dataset("GEM/viggo")

# Get the validation dataset
val_dataset = dataset["validation"]

# Rename columns for consistency
val_dataset = val_dataset.rename_columns({
    "meaning_representation": "attributes", 
    "target": "text"
})

# Remove unnecessary columns
val_dataset = val_dataset.remove_columns(["gem_id", "references"])

# Delete the original full dataset to save memory
del dataset

print("Dataset loaded and preprocessed.")


In [None]:
# Print a few examples
print("\nExamples:")
for i in range(2):
    print(f"\nExample {i}")
    for key in ['text', 'attributes']: 
        print(f"{key:12s}: {val_dataset[key][i]}")

## Define the prompt template


In [None]:
PROMPT_TEMPLATE = """
Given a target sentence construct the underlying meaning representation of the input sentence as a single function with attributes and attribute values. 
This function should describe the target string accurately and the function must be one of the following ['inform', 'request', 'give_opinion', 'confirm', 'verify_attribute', 'suggest', 'request_explanation', 'recommend', 'request_attribute'].

The attributes must be one of the following: ['name', 'exp_release_date', 'release_year', 'developer', 'esrb', 'rating', 'genres', 'player_perspective', 'has_multiplayer', 'platforms', 'available_on_steam', 'has_linux_release', 'has_mac_release', 'specifier']. The order your list the attributes within the function must follow the order listed above. For example the 'name' attribute must always come before the 'exp_release_date' attribute, and so forth.

For each attribute, fill in the corresponding value of the attribute within brackets. A couple of examples are below.

Example 1)
Sentence: Dirt: Showdown from 2012 is a sport racing game for the PlayStation, Xbox, PC rated E 10+ (for Everyone 10 and Older). It's not available on Steam, Linux, or Mac.
Output: inform(name[Dirt: Showdown], release_year[2012], esrb[E 10+ (for Everyone 10 and Older)], genres[driving/racing, sport], platforms[PlayStation, Xbox, PC], available_on_steam[no], has_linux_release[no], has_mac_release[no])

Example 2) 
Sentence: Were there even any terrible games in 2014?
Output: request(release_year[2014], specifier[terrible])

Example 3)
Sentence: Adventure games that combine platforming and puzzles  can be frustrating to play, but the side view perspective is perfect for them. That's why I enjoyed playing Little Nightmares.
Output: give_opinion(name[Little Nightmares], rating[good], genres[adventure, platformer, puzzle], player_perspective[side view])

Example 4)
Sentence: Since we're on the subject of games developed by Telltale Games, I'm wondering, have you played The Wolf Among Us?
Output: recommend(name[The Wolf Among Us], developer[Telltale Games])

Example 5) 
Sentence: Layers of Fear, the indie first person point-and-click adventure game?
Output: confirm(name[Layers of Fear], genres[adventure, indie, point-and-click], player_perspective[first person])	

Example 6) 
Sentence: I bet you like it when you can play games on Steam, like Worms: Reloaded, right?	
Output: suggest(name[Worms: Reloaded], available_on_steam[yes])

Example 7)
Sentence: I recall you saying that you really enjoyed The Legend of Zelda: Ocarina of Time. Are you typically a big fan of games on Nintendo rated E (for Everyone)?	
Output: verify_attribute(name[The Legend of Zelda: Ocarina of Time], esrb[E (for Everyone)], rating[excellent], platforms[Nintendo])

Example 8)
Sentence: So what is it about the games that were released in 2005 that you find so excellent?	
Output: request_explanation(release_year[2005], rating[excellent])

Example 9)
Sentence: Do you think Mac is a better gaming platform than others?
Output: request_attribute(has_mac_release[])

Note: you are to output the string after "Output: ". Do not include "Output: " in your answer.

Give the output for the following sentence:
{input}
"""

## Set up the Replicate client
   


In [None]:
# Set up the Replicate client
print("Setting up the Replicate client...")
replicate_client = replicate.Client(api_token=os.environ["REPLICATE_API_TOKEN"])
print("Replicate client set up successfully.")

## Function to make predictions

In [None]:
# Function to make predictions using Replicate
def make_predictions(model_name: str, input_dict: dict, in_dataset: Dataset, prefix: str, out_dir: str, save_intermediate: bool = False):
    """
    Generate predictions using a specified model via Replicate API.
    
    Args:
    - model_name (str): Name of the model on Replicate
    - input_dict (dict): Dictionary containing model input parameters
    - in_dataset (Dataset): Input dataset
    - prefix (str): Prefix for output files
    - out_dir (str): Directory to save output files
    - save_intermediate (bool): Whether to save intermediate results
    
    Returns:
    - dict: Dictionary containing all responses
    """
    
    print(f"Starting predictions for {model_name}")
    print(f"Total examples: {len(in_dataset)}")
    responses_dict = {}
    for i in range(len(in_dataset)):
        response = ""
        print(i)
        if (i % 50) == 0:
            print(f"Processing example {i} of {len(in_dataset)}")
            if save_intermediate:
                with open(f"{out_dir}/{prefix}_responses_dict_{i}.json", "w") as f:
                    json.dump(responses_dict, f)
    
        # Get the text and ground truth for the example
        text = in_dataset[i]["text"]
        ground_truth = in_dataset[i]["attributes"]
        
        # Generate the prompt
        prompt = PROMPT_TEMPLATE.format(input=text)
        input_dict['prompt'] = prompt

        # Generate the response
        for event in replicate_client.stream(model_name,input=input_dict):
            response += str(event)
        
        # Store the response in the dictionary
        responses_dict[i] = {"text": text, "ground_truth": ground_truth, "output": response}
        
    return responses_dict


## Generate and save responses for the validation set


In [None]:
# Set up output directory
out_dir = "responses"
if not os.path.exists(out_dir): os.makedirs(out_dir, exist_ok=True)


In [None]:
# Model configuration
# Uncomment the desired model configuration

# Mixtral 8x7B configuration
input_dict = {
    "top_p": 0.95,
    "prompt": "",
    "max_tokens": 256,
    "temperature": 0.7,
    "prompt_template": "<s>[INST] {prompt} [/INST] ",
}

model_name = "mistralai/mixtral-8x7b-instruct-v0.1"
prefix = "mistral-8x7B"

# # Llama 3 configuration (same for both 8B and 70B models)
# input_dict = {
#     "top_p": 0.9,
#     "prompt": "",
#     "max_new_tokens": 256,
#     "temperature": 0.75,
#     "prompt_template": "<|begin_of_text|><|start_header_id|>system<|end_header_id|>\n\nYou are a helpful assistant<|eot_id|><|start_header_id|>user<|end_header_id|>\n\n{prompt}<|eot_id|><|start_header_id|>assistant<|end_header_id|>\n\n"
# }

# # # Llama 3 70B configuration
# # model_name = "meta/meta-llama-3-70b-instruct"
# # prefix = "llama3-70B"

# # Llama3 8B
# model_name = "meta/meta-llama-3-8b-instruct"
# prefix = "llama3-8B"


# Make predictions
print("Generating predictions...")
responses_dict = make_predictions(model_name, 
                                  input_dict, 
                                  val_dataset, 
                                  prefix, 
                                  out_dir,
                                  False
                                  )

# save the responses to a file for future reference
with open(f"{out_dir}/{prefix}_responses_viggo_val.json", "w") as f:
    json.dump(responses_dict, f)