AMR Annotations Supervised Project - NLP Master - Université de Lorraine
Authors
- Kelvin Han
- Siyana Pavlova
Supervisors
- Bruno Guillaume
- Maxime Amblard
We present a system which takes sentences in natural language, parses them in the Universal Dependencies (UD) syntactic framework and applies a set of rewrite rules, using the GREW rewriting system on the UD parses to produce semantic Abstract Meaning Representation (AMR) of the sentences.
Motivation, background, details on the design decisions and implementation, experiments and results can be found in the project report.
An outline of all of the above can be found in the project poster.
- Install Grew (make sure the the
grewpy
command is available, this is the executable called but thegrew
python lib) - Python 3.x
- Python packages
grew
smatch
ufal.udpipe
numpy
[for pattern_identification.py]sklearn
[for pattern_identification.py]
The script ud_to_amr.py
contains the code to do the full transformation on one UD graph.
The input file if given line 60 and the script outputs the list of structures produced by the transformation on stdout.
This script main.py
runs the main pipeline of the system and produces results to analyse.
Parameters for the collect_scores()
function
sentence_nums
- a list of sentence numbers (corresponding to the trailing digits in any of the files in dataset)n
- number of times to perform the testfolder
- path to the folder where the results should be saved
Data
- UD parses. In our experiments we used these parses. UD parses can be produced by calling the
parse_files_in_folder(raw_sentences_folder, ud_save_folder)
function of theparser.py
module, whereraw_sentences_folder
is the path to the folder where raw sentences are stored (one sentence per file) andud_save_folder
is the path to the folder where UD parses (in CoNNL-U format) should be stored. Example:parse_files_in_folder('./data/amr_bank_data/sentences/', './data/amr_bank_data/ud/')
- Gold AMR parses. We used parses for The Little Prince, as outlined in the report. Available here
Running the script
- Initialise Grew.
grew.init()
- Run
collect_scores(sentence_nums, n, folder)
. This will do the following:- Call
calculate_scores()
on the given sentences. This, in turn, will:- Call
run_pipeline()
on each sentence. This will:- Load the UD graph for the given sentence
- Build all possible AMR graphs from the UD graph following the specified rewrite rules and compute their scores.
- Select the best scoring AMR graph and return its score
- Compute precision, recall and f1 score for each sentence.
- Print the max, min and average values for each measure.
- Return the measures.
- Call
- Write the computed score to files.
- Call
Example
#initialise Grew
grew.init()
#generate the numbers of the sentences to be processed
sentence_nums = list(range(1,101))
#process the sentences and collect scores
collect_scores(sentence_nums, 1, './results/final_100')
Result_Visualisations.ipynb
is a Jupyter notebook that can be used to analyse and visualise the stored scores.
Here we outline how to run the system on a different dataset. Furthermore, the coverage of the rewriting rules and lexicon can be significantly extended to improve the AMR parsing performance. We outline how these can be extended.
- To run the code on your own dataset, starting from raw sentences:
- Add your raw sentences (one sentence per file) to a folder and number them accordingly (with 4 digits at the end of the filename).
- Call the
parse_files_in_folder(raw_sentences_folder, ud_save_folder)
function of theparser.py
module, as outlined above. - Add Gold AMR parses for your dataset to a folder.
- Run
main.py
, specifyingsentence_nums
,n
andfolder
.
- To use a different model for the UD parser:
- Save your model to
./models/
- Change the model path in
line 37
inparser.py
.
- Save your model to
- To extend the GRS rulebase, you can add rules to the main grs file. Please refer to Application of Graph Rewriting to Natural Language Processing and the GREW documentation for a guide on how to do so.
- You can add to the lexicons, by manually annotating the PropBank predicates on the proto-roles of these predicates. The lexicon we used can be found here. An annotation tool for adding these proto-roles can be found here.