Skip to content

Tutorial: Creating a simple visualization input file

jgroschwitz edited this page Dec 6, 2023 · 2 revisions

This tutorial creates a VULCAN visualization input file for the Little Prince AMR corpus, visualizing strings and graphs.

Setup

Download the Little Prince AMR corpus: https://amr.isi.edu/download/amr-bank-struct-v3.0.txt, we will need it below.

You also need the penman package. It is part of the default VULCAN setup, but if you don't have it installed yet, run

pip install penman

Code

Loading the corpus

We use the penman package to load the AMR corpus (let's say its location is stored in the variable little_prince_path):

from penman import load

amrs = load(little_prince_path)

This gives us a list of all the AMRs in the corpus, over which we can now iterate.

Building the VULCAN visualization input file

First, we make a PickleBuilder object (you can find that class here). We want to visualize graphs and sentences, and let us simply title them "Graph" and "Sentence". We need to specify what format they are in, choosing an option from this list. VULCAN supports the penman Graph object with the format name "graph". Our sentences will just be (untokenized) strings, so we can use the "string" format name that applies simple tokenization based on whitespace. We pass all this information to the PickleBuilder constructor, using a dictionary that maps our object names/titles to their format name:

pb = PickleBuilder({"Graph": "graph", "Sentence": "string"})

We now iterate over the corpus, adding instances to the PickleBuilder as we go:

for amr in amrs:
    sentence = amr.metadata["snt"]
    pb.add_instance_by_name("Graph", amr)
    pb.add_instance_by_name("Sentence", sentence)

Finally, we write the data to a pickle file (assuming the file path is in the variable output_path):

pb.write(output_path)

Putting it all together

Using the above code, we can write the following script converting the AMR corpus into a pickle file:

import sys

from penman import load

from vulcan.pickle_builder.pickle_builder import PickleBuilder


def main():
    little_prince_path = sys.argv[1]
    output_path = sys.argv[2]
    amrs = load(little_prince_path)

    pb = PickleBuilder({"Graph": "graph", "Sentence": "string"})
    for amr in amrs:
        sentence = amr.metadata["snt"]
        pb.add_instance_by_name("Graph", amr)
        pb.add_instance_by_name("Sentence", sentence)

    pb.write(output_path)


if __name__ == "__main__":
    main()

Try it yourself! You can also find the above code in the VULCAN repo, here.

A JSON version

You might want to store the data in a JSON file instead of a pickle file, for example because JSON files are (to some extent) human-readable. For JSON, we need to convert everything into strings first, which the following code does:

import sys

from penman import load, encode

from vulcan.pickle_builder.pickle_builder import PickleBuilder


def main():
    little_prince_path = sys.argv[1]
    output_path = sys.argv[2]
    amrs = load(little_prince_path)

    pb = PickleBuilder({"Graph": "graph_string", "Sentence": "string"})
    for amr in amrs:
        sentence = amr.metadata["snt"]
        amr_string = encode(amr)
        pb.add_instance_by_name("Graph", amr_string)
        pb.add_instance_by_name("Sentence", sentence)

    pb.write_as_json(output_path)


if __name__ == "__main__":
    main()