This resource contains data and code for the paper
- Ida Szubert, Adam Lopez, and Nathan Schneider (2018). A structured syntax-semantics interface for English-AMR alignment. Proceedings of NAACL-HLT. http://people.cs.georgetown.edu/nschneid/p/amr2dep.pdf
which describes a representation, dataset, and algorithms for aligning nodes and subgraphs of Abstract Meaning Representation (AMR) semantic structures with nodes and subgraphs of syntactic parses in the Universal Dependencies (UD) framework.
To recover our manual alignments you will need to have access to AMR Release 1.0. Unzip the release files and cd to corpus/release1/unsplit. From there run:
cat amr-release-1.0-bolt.txt amr-release-1.0-consensus.txt amr-release-1.0-dfa.txt amr-release-1.0-mt09sdl.txt amr-release-1.0-xinhua.txt amr-release-1.0-proxy.txt > ~/YOUR_PATH/amr_ud/data/amr-release-1.0-all.txt
to concatenate all release files into one.
Then cd to amr_ud and run:
python2 dp1.py patch ./alignments/aligned_amrs_reconstructed.txt -i ./alignments/amrs.patch -o ./alignments/aligned_amrs.txt python2 dp2.py patch ./alignments/ud_parses_ldc_reconstructed.txt -i ./alignments/ud_parses.patch -o ./alignments/ud_parses_patched.txt tr '\n%@' '\t\n\n' <./alignments/ud_parses_patched.txt >./alignments/ud_parses.txt patch ./alignments/amr_ud_alignments_ldc_reconstructed.txt -i ./alignments/alignments.patch -o ./alignments/amr_ud_alignments.txt rm ./alignments/aligned_amrs_reconstructed.txt ./alignments/ud_parses_patched.txt ./alignments/ud_parses_ldc_reconstructed.txt ./alignments/amr_ud_alignments_ldc_reconstructed.txt
The following files should be created in the amr_ud/alignments directory:
- aligned_amrs.txt: contains all AMRs for which hand alignments were created
- ud_parses.txt: contains hand-corrected UD parses for those AMRs
- amr_ud_alignments: contains manual alignments
The format of the alignments is as follows:
AMR subgraph # UD subgraph
A subgraph might be a single node, or it might contain nodes and edges:
|x/word1||node x, whose label is word1|
|x/word1 :rel y/word2||a subgraph consisting of x, its child y, and the edge connecting them labeled rel|
|x/word1 ( :rel1 y/word2 ) | :rel2 z/word3||a subgraph consisting of x, two of its children, and the connecting edges|
( ) groups subgraphs, | separates children of one parent node
The file 'amr_ud_alignment_nonldc.txt' contains alignments and UD parses for AMRs which were not included in the LDC AMR release, and which can be freely shared. All those AMRs, parses, and alignments are also included in the full alignment corpus reconstructed according to the instructions above.