## Generating Characters and Quests
 
For this in-class exercise, we will use the LIGHT data to try to generate interesting Non-Player Characters.  We'll do the following generation exercises:
1. Given a location, generate possible characters who might be found there.
2. For each character, we'll generate a name, a persona, and a quest.

To generate quests, we'll use a new set of data from the Facebook AI Group that developed the LIGHT data that we used in previous assignments.

You can find a description of their quest data in the paper [How to Motivate Your Dragon: Teaching Goal-Driven Agents to Speak and Act in Fantasy Worlds](https://arxiv.org/abs/2010.00685).  Here is the abstract for the paper.

> We seek to create agents that both act and communicate with other agents in pursuit of a goal. Towards this end, we extend LIGHT (Urbanek et al., 2019)–a large-scale crowd-sourced fantasy text-game—with a dataset of “quests”. These contain natural language motivations paired with in-game goals and human demonstrations; completing a quest might require dialogue or actions (or both). We introduce a reinforcement learning system that (1) incorporates large-scale language modeling-based and commonsense reasoning-based pre-training to imbue the agent with relevant priors; and (2) leverages a factorized action space of action commands and dialogue, balancing between the two. We conduct zero-shot evaluations using held-out human expert demonstrations, showing that our agents are able to act consistently and talk naturally with respect to their motivations.

The thing that I find exciting about this work as compared to the in-class exercise we did last week is that it will potentially allow us to add **goals** to characters to help guide their dialogue, rather than just have them perform chit-chat.


## Load the data

The LIGHT data was released as part of the Facebook's ParlAI system. I extracted the data into several JSON files:
* ```light_environment_train.json``` contains information about the locations, objects, and characters in the text-adventure games.  
* ```light_dialogue_data.json``` contains sample conversations between pairs of characters.  
* ```light_quest_data.jsonl``` contains question data (one quest per line in JSON format).



## Load the LIGHT Quest Data

In [None]:
!wget https://raw.githubusercontent.com/interactive-fiction-class/interactive-fiction-class-data/master/light_dialogue/light_quests.jsonl

In [27]:
import sys
import os
import json
from collections import defaultdict

jsonl_filename = 'light_quests.jsonl'

quests = []
with open(jsonl_filename) as f:
    for line in f:
        quest = json.loads(line)
        quests.append(quest)

# Print out an example quest
print(json.dumps(quests[10], indent=4))


{
    "character": "The King Visiting The Shipyard",
    "persona": "I am the King. I rule this land, and all power is mine to hold.  My kingship is a divine right passed down from my father to me, and it will be passed down to my son someday. I live in luxury, but I am also at risk from other rulers who may want to take over my kingdom. A king must be a man of war, always prepared to defend his land.",
    "description": "You are in the Royal Shipyard.\nA massive shipyard with different Five dry docks. each dry dock has several wooden cranes and rope works. The dry docks are made of stone and the water gate is all harden wood.\nThere's a dock, a water gate is all harden wood, a rope work, a water gate, two cranes, a Fishing ships, and a rope here.\nThe thief is here.\n\nYou are carrying nothing.",
    "goal": "get rope work",
    "short_motivation": "I plan to inspect the rope work",
    "mid_motivation": "I plan to instruct the Chief Naval Engineer to build me a new warship",
    "lo

# Load the LIGHT Environment Data

In [None]:
!wget https://raw.githubusercontent.com/interactive-fiction-class/interactive-fiction-class-data/master/light_dialogue/light_environment_train.json

In [17]:
import sys
import os
import json
from collections import defaultdict


json_filename = 'light_environment_train.json'

f = open(json_filename)
light_environment = json.load(f)

def get_categories(light_environment):
  return light_environment['categories'].values()
categories = get_categories(light_environment)

def get_room_name(room_id, rooms_by_id):
  return rooms_by_id[room_id]['setting']

def print_rooms_for_category(category, rooms_by_category, rooms_by_id):
  rooms = rooms_by_category[category]
  print(category.capitalize())
  for room_id in rooms:
    print('\t', room_id, '-', get_room_name(room_id))


def sort_objects_by_property(objects_by_id):
  objects_by_property = defaultdict(set)
  for object_id, obj in objects_by_id.items(): 
    name = obj['name']
    for label, value in obj.items():
      if label.startswith('is_') and value == 1:
        objects_by_property[label].add(object_id)
  return objects_by_property


rooms_by_id = light_environment['rooms']
rooms_by_category = defaultdict(set)
for room_id in rooms_by_id:
  category = light_environment['rooms'][room_id]['category']
  rooms_by_category[category].add(room_id)
objects_by_id = light_environment['objects']
objects_by_property = sort_objects_by_property(objects_by_id)




### Characters in LIGHT 


Characters have a description, a persona (a first person description of who they are and what their motivations might be), a character type (person, creature or object), a location (```in_room_id```) and an an inventory (```carrying_objects```)

The Gravedigger character is listed in the Unfinished Mausoleum's ``in_characters`` variable.  The ``in_characters`` are characters that are explictly mentioned in the location's ``description`` or ``background`` variables. 
```
light_environment['characters']['203']

{'base_form': ['gravedigger'],
 'carrying_objects': [890],
 'char_type': 'person',
 'character_id': 203,
 'corrected_name': 'gravedigger',
 'desc': 'You might want to talk to the gravedigger, specially if your looking for a friend, he might be odd but you will find a friend in him.',
 'ex_room_ids': [100, 349],
 'in_room_ids': [62],
 'is_plural': 0,
 'name': 'gravedigger',
 'orig_room_id': 349,
 'personas': ["I am low paid labor in this town. I do a job that many people shun because of my contact with death. I am very lonely and wish I had someone to talk to who isn't dead."],
 'wearing_objects': [],
 'wielding_objects': []}
 ```


In [18]:
light_environment['characters']['203']

{'base_form': ['gravedigger'],
 'carrying_objects': [890],
 'char_type': 'person',
 'character_id': 203,
 'corrected_name': 'gravedigger',
 'desc': 'You might want to talk to the gravedigger, specially if your looking for a friend, he might be odd but you will find a friend in him.',
 'ex_room_ids': [100, 349],
 'in_room_ids': [62],
 'is_plural': 0,
 'name': 'gravedigger',
 'orig_room_id': 349,
 'personas': ["I am low paid labor in this town. I do a job that many people shun because of my contact with death. I am very lonely and wish I had someone to talk to who isn't dead."],
 'wearing_objects': [],
 'wielding_objects': []}

In [20]:
room_id = light_environment['characters']['203']['in_room_ids'][0]
room = rooms_by_id[str(room_id)]
# Print out the room JSON
print(json.dumps(room, indent=4))

{
    "category": "Graveyard",
    "setting": "An Unfinished Mausoleum",
    "description": "Two-and-a-half walls of the finest, whitest stone stand here, weathered by the passing of countless seasons. There is no roof, nor sign that there ever was one. All indications are that the work was abruptly abandoned. There is no door, nor markings on the walls. Nor is there any indication that any coffin has ever lain here... yet.",
    "background": "Bright white stone was all the fad for funerary architecture, once upon a time. It's difficult to understand why someone would abandon such a large and expensive undertaking. If they didn't have the money to finish it, they could have sold the stone, surely - or the mausoleum itself. Maybe they just haven't needed it yet? A bit odd, though, given how old it is. Maybe the gravedigger remembers... if he's sober.",
    "neighbors": [
        108,
        109
    ],
    "in_characters": [
        203,
        203
    ],
    "ex_characters": [
      

Here are some examples of characters’ names and their personas.



In [None]:
for character_id in list(light_environment['characters'])[10:20]:
  character = light_environment['characters'][character_id]
  name = character['corrected_name']
  persona = character['personas'][0]
  
  print(name.title(), '-', persona)


Witches - I only mastered one spell in witch school. I can speak with inanimate objects. I use this spell in espionage. I work for the government.
Queen - I am second in command under the king. I have a great power of authority. I am worshiped and seen as a wise and beautiful leader.
King - I am a king of the whole empire. I give rules and pursuit them. I am brave and fearless.
Dragon - I am a dragon living in the mountains. I enjoy hoarding treasure. I terrorize the local populace for fun.
Knight - I am a knight. I come from a lower-ranking noble family. I serve under the king, as my father did before me. In times of war, I fight on horseback.
Faeries - I giggle as I toss about my hair.  Some of the male faeries take notice and give chase.  How I love to tease them!  For they will never catch me.
Talking Cat - I am a talking cat. I can speak to humans. I have scared many, many children.
A Rat - I stick to the edge, nose up and ready for any morsels that may drop my way. Or sometimes t

# Dialogue Data in LIGHT


Here is how to access the dialogues in the LIGHT dataset.

In [None]:
!wget https://raw.githubusercontent.com/interactive-fiction-class/interactive-fiction-class-data/master/light_dialogue/light_dialogue_data_train.json.gz
!gunzip light_dialogue_data_train.json.gz

--2022-03-24 16:39:07--  https://raw.githubusercontent.com/interactive-fiction-class/interactive-fiction-class-data/master/light_dialogue/light_dialogue_data_train.json.gz
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.109.133, 185.199.108.133, 185.199.110.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.109.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 15425057 (15M) [application/octet-stream]
Saving to: ‘light_dialogue_data_train.json.gz’


2022-03-24 16:39:07 (136 MB/s) - ‘light_dialogue_data_train.json.gz’ saved [15425057/15425057]



In [None]:
import json
light_dialogue_json_filename = 'light_dialogue_data_train.json'
f = open(light_dialogue_json_filename)
light_dialogues = json.load(f)

In [None]:
def get_dialogue_description(dialogue):
  """
  Constructs a string representation of the dialogue.
  """
  agents = dialogue["agents"] # A list of dictionaries with keys "name" and "persona"
  setting = dialogue["setting"] # A dictionary with keys "name", "category", "description", "background"
  context = dialogue["context"][0] # A second-person description of the set-up (maybe presented to Turkers?)
  object_descriptions = dialogue["all_descriptions"]

  # These lists comprise the turns of the conversation
  character_order = dialogue["character"]
  speech = dialogue["speech"]
  emotes = dialogue["emote"]
  actions = dialogue["action"]

  turns = []
  for i, _ in enumerate(character_order):
    turns.append((character_order[i], speech[i], emotes[i], actions[i]))

  # Setting description
  setting_str = "{setting} - {description}\n".format(setting=setting["name"], description=setting["description"])
  # Name and personas of the characters
  characters = []
  for agent in agents:
    name = agent["name"].title()
    persona = persona=agent["persona"]
    characters.append((name, persona))
  # Conversation 
  dialogue_str = ""
  for character, line, emote, action in turns:
    dialogue_str += '{character}: "{line}"\n'.format(character=character.capitalize(), line=line.capitalize().strip())
    if emote:
      dialogue_str += "{character}: Gestures - {emote}\n".format(character=character.capitalize(), emote=emote.capitalize().strip())
    if action:
      dialogue_str += "{character}: Stage Direction - {action}\n".format(character=character.capitalize(), action=action.capitalize().strip())
  return setting_str, characters, dialogue_str


# TODO: Character Generation



Let's start today by generating characters for a location.

Given a location, generate a character that could be at that location.  The character should have
1. A name
2. A persona written in the first person

In [None]:
%%capture
!pip install --upgrade openai
!pip install jsonlines

You can find your OpenAI API key [here](https://beta.openai.com/account/api-keys).


In [None]:
import os
import openai

print('Enter OpenAI API key:')
openai.api_key = input()

os.environ['OPENAI_API_KEY']=openai.api_key

# TODO: Format Data for Fine-Tuning 

Below, I show how to create data to fine-tune OpenAI.  The OpenAI API documentation has a [guide to fine-tuning models](https://beta.openai.com/docs/guides/fine-tuning) that you should read.   The basic format of fine-tuning data is a JSONL file (one JSON object per line) with two key-value pairs: `prompt:` and `completion:`.

```
{"prompt": "<prompt text>", "completion": "<ideal generated text>"}
{"prompt": "<prompt text>", "completion": "<ideal generated text>"}
{"prompt": "<prompt text>", "completion": "<ideal generated text>"}
...
```

In the code below, I'll extract a prompt that contains the `Category` and `Setting` variables from a LIGHT Environment room, and I'll have the completion be the room's `Description`.

In [None]:

# Print out the room JSON
print(json.dumps(room, indent=4))

def create_character_finetuning_data(filename, light_environment, max_characters=100):
  fine_tuning_data = []
  counter = 0
  for character_id in light_environment['characters']:
    counter += 1
    if counter > max_characters:
      break
      
    character = light_environment['characters'][character_id]
    name = character['corrected_name']
    persona = character['personas'][0]
    room_json = rooms_by_id[str(character['in_room_ids'][0])]

    data = {}
    data['prompt'] = TODO
    data['completion'] = TODO
    fine_tuning_data.append(data)

  with open(filename, 'w') as out:
    for data in fine_tuning_data:
        out.write(json.dumps(data))
        out.write('\n')

jsonl_filename='fine_tune_LIGHT_characters.jsonl'
create_character_finetuning_data(jsonl_filename, light_environment)

### Fine-tune GPT3 with the OpenAI API

Next, we'll perform fine-tuning with this data using OpenAI. 

In [None]:
!head '{jsonl_filename}'
!wc -lw '{jsonl_filename}'

Next, we'll make the fine tuning API call via the command line.  Here the -m argument gives the model.  There are 4 sizes of GPT3 models.  They go in alphabetical order from smallest to largest.
* Ada 
* Baddage
* Currie
* Davinci

The models as the model sizes increase, so does their quality and their cost.  Davinci is the highest quality and highest cost model.  I recommend starting by fine-tuning smaller models to debug your code first so that you don't rack up costs.

Fine-tuning curie on 1000 dialogues costs about $6.50.


In [None]:
!openai api fine_tunes.create -t '{jsonl_filename}' -m curie
#!openai api fine_tunes.create -t '{jsonl_filename}' -m davinci


In [None]:
#!openai api fine_tunes.cancel -i ft-NwXfffYxfrc3BIqYACBSSDFG

You should copy down the fine-tune numbers which look like this:

```
Created fine-tune: ft-VzQpTwfnWAzDXNKgPTFtiZg2

[2022-01-21 23:22:47] Uploaded model: curie:ft-ccb-lab-members-2022-01-21-23-22-46
```

If you forget to write it down, you can list your fine-tuned runs and models this way. These model names aren't mneumonic, so it is probably a good idea to make a note on what your model's inputs and outputs are. 

In [None]:
!openai api fine_tunes.list

You can run your fine tuned model in the OpenAI Playground.  After the model is finished finetuning you'll find it in the Engine dropdown menu.  


# TODO - Generate Motivations for Characters

Given a character (name and person) and their current location description from the Quest File, generate a list of 3 motivations for the character (short term, mid-term, and long-term motivation).

Example input:
```
    "character": "The King Visiting The Shipyard",
    "persona": "I am the King. I rule this land, and all power is mine to hold.  My kingship is a divine right passed down from my father to me, and it will be passed down to my son someday. I live in luxury, but I am also at risk from other rulers who may want to take over my kingdom. A king must be a man of war, always prepared to defend his land.",
    "description": "You are in the Royal Shipyard.\nA massive shipyard with different Five dry docks. each dry dock has several wooden cranes and rope works. The dry docks are made of stone and the water gate is all harden wood.\nThere's a dock, a water gate is all harden wood, a rope work, a water gate, two cranes, a Fishing ships, and a rope here.\nThe thief is here.\n\nYou are carrying nothing.",
```

Example output: 
```
    "short_motivation": "I plan to inspect the rope work",
    "mid_motivation": "I plan to instruct the Chief Naval Engineer to build me a new warship",
    "long_motivation": "I plan to attack an enemy kingdom with my new warship",
```

In [None]:
#TODO - fine tune a model here

# TODO - Conversations with Motivations

Use your code from last week on generating conversations.  Now instead of inputting the setting, the characters' names and their personas, incorporate a motivation for character too.  

There are several ways that you could do this.  For instance, you could append their motivation to their persona. 

Does the conversation change with the motivation?  How could you automatically detect whether the character reached their goal? 

# Optional TODO

There are lots of other cool things you could try with the quest data:
* Given a short term motivation, predict the immediate goal from a set of possible goals. Input: ` "short_motivation": "I plan to inspect the rope work",` output - pick `"goal": "get rope work",` out of a list of several possible goals. 
* Given an unordered set of motivations, put them in order short-term, near-term, long-term.
* Generate a story from the timeline data for each quest. 