PhantomWiki is at its core an on-demand random generator of fictional worlds. Similarly to the wiki hosting services popular in film, video games, and literature[1], we represent these fictional worlds through Wikipedia-like biographical entries about their characters. We then test the model’s retrieval skills and its understanding of the fictional world through an accompanying set of automatically generated question-answer pairs.

### 1. Generate a PhantomWiki Instance

The first step of the PhantomWiki pipeline is to generate a random universe of n characters as well as the document corpus describing it. 


Each character in a PhantomWiki universe is described through its social relationships and personal facts. For the social relationships,
we first generate family trees, following the family tree generator of Hohenecker & Lukasiewicz (2020)[2].


This family tree generation algorithm allows us to specify the parameters (for more please refer to our source code) for family tree generation: 
<br>
<br>
`num-family-trees`: the number of family trees in a PhantomWiki universe 
<br>
`max-family-tree-size`: the maximum number of people in one family tree 
<br>
`max-branching-factor`: the maximum depth that a family tree may have 

In [None]:
num_family_trees = 1
max_family_tree_size = 25
max_branching_factor = 5

Specify the folder and subfolder (for different splits) you want to store your PhantomWiki instance (including facts, QA pairs.)

In [None]:
import os

output_dir = "out"
split = "split0"
split_dir = os.path.join(output_dir, split)

Now we can run the following command to generate a PhantomWiki universe. 

In [None]:
!python -m phantom_wiki --num-family-trees $num_family_trees --max-family-tree-size $max_family_tree_size --max-branching-factor $max_branching_factor --output-dir $split_dir --article-format json --question-format json  --valid-only --debug

### 2. Visualization of the PhantomWiki universe

Now we have generated a universe with PhantomWiki stored in the `$output_dir` folder. 

#### 2.1 Visualization of family trees

We can first take a look at the family trees we generated. 

In [None]:
# By default we are showing the first family tree generated, although more may be generated at the last step.
family_tree_file = f"{output_dir}/family_tree_1.png"
from IPython.display import Image

Image(filename=family_tree_file)

Every person in PhantomWiki has a first name and a last name. Colors indicate the gender of people in the PhantomWiki universe. Arrows indicate parental relationship. 

#### 2.2 Generated Articles

The facts generated besides the family relationships include friend relationships, hobbies, occupations for the people in the universe. These facts are stored in `facts.pl` and used when converted into articles. 

Those facts are converted into articles for everyone using pre-defined templates. 
Articles generated are saved in `articles` folder. Each person has a `$name.txt` file associated listing the related facts of this person. (For family relationships only the parents and siblings information are relected in the articles.)

Here we can take a look at an example of generated articles:


In [None]:
# input the name of the person you want to read the article about
import json

name = "Aida Wang"
article_file = f"{output_dir}/articles.json"
with open(article_file, "r") as f:
    article_file = json.load(f)
    for entry in article_file:
        if entry["title"] == name:
            article = entry["article"]
            print(article)

#### 2.3 Generated QA pairs

Each PhantomWiki instance also contains Question-Answer pairs that are consistent with the generated facts. 

The difficulty of the generated questions is tunable via the `--depth` when running `python -m phantom_wiki` command above. by default, using `--depth 10` gives us `8` types of question templates. The number of questions generated from each type of template can be specified via `--num-questions-per-type` (default is 10). These questions are stored in `questions` folder arraged by type. 

Let's now look at some of the questions: 

In [None]:
# specify the type of question you want to look at
type = 0
question_file = f"{output_dir}/questions.json"
import json

with open(question_file, "r") as f:
    questions = json.load(f)

we can look at the result of a sampled question and its answer along with the original question template:

In [None]:
question = questions[0]
print("Question: ", question["question"])
print("Answer: ", question["answer"])
print("Prolog:", question["prolog"])

The `prolog` key shows the prolog query needed to get the answer of a certain question. For users who have more interest in Prolog, please refer to [3]. 

### 3. Evaluation on a PhantomWiki instance

#### 3.1 Run evaluation with specific model and method

Loading a dataset from HuggingFace or from a local folder: 

In [None]:
from phantom_eval.utils import load_data

# The dataset we just generated can be loaded by
# dataset = load_data(dataset=output_dir, split=split, from_local=True)

# If you want to load a HuggingFace dataset
HF_version = "kilian-group/phantom-wiki-v050"
# Note that the HuggingFace splits follow the following format
HF_split = "depth_20_size_50_seed_1"
dataset = load_data(dataset=HF_version, split=HF_split, from_local=False)

Now we can finally run evaluation on the generated PhantomWiki dataset. As an example, we test the `zeroshot` method using a gpt model.

In [None]:
method = "zeroshot"
model = "gpt-4o-mini-2024-07-18"
preds_dir = "preds"

In [None]:
# to run evaluation on the generated PhantomWiki instance
!python -m phantom_eval --method $method -od $preds_dir -m $model --from_local --dataset $output_dir --split_list $split

# to run evaluation on a HuggingFace Dataset
!python -m phantom_eval --method $method -od $preds_dir -m $model --dataset $HF_version

#### 3.2 Visualize the results

Now we can use functionalities for evaluation to plot the method tested on a PhamtomWiki instance. 

In [None]:
# plot the accuracy vs. reasoning steps for the predictions
!python eval/plot_difficulty_accuracy.py -od $preds_dir --method $method --depth

[1] For example, see stardewvalley.fandom.com or harrypotter.fandom.com. \
[2] Hohenecker, Patrick, and Thomas Lukasiewicz. "Ontology reasoning with deep neural networks." Journal of Artificial Intelligence Research 68 (2020): 503-540.
[3] Sterling, Leon, and Ehud Y. Shapiro. The art of Prolog: advanced programming techniques. MIT press, 1994. 
