Skip to content

Latest commit

 

History

History
100 lines (71 loc) · 5.85 KB

README.md

File metadata and controls

100 lines (71 loc) · 5.85 KB

AMR_annotations

AMR Annotations Supervised Project - NLP Master - Université de Lorraine

Authors

  • Kelvin Han
  • Siyana Pavlova

Supervisors

  • Bruno Guillaume
  • Maxime Amblard

We present a system which takes sentences in natural language, parses them in the Universal Dependencies (UD) syntactic framework and applies a set of rewrite rules, using the GREW rewriting system on the UD parses to produce semantic Abstract Meaning Representation (AMR) of the sentences.

alt text

Motivation, background, details on the design decisions and implementation, experiments and results can be found in the project report.

An outline of all of the above can be found in the project poster.

Requirements

  • Install Grew (make sure the the grewpy command is available, this is the executable called but the grew python lib)
  • Python 3.x
  • Python packages
    • grew
    • smatch
    • ufal.udpipe
    • numpy [for pattern_identification.py]
    • sklearn [for pattern_identification.py]

Running the System

Apply the transformation from UD to AMR

The script ud_to_amr.py contains the code to do the full transformation on one UD graph. The input file if given line 60 and the script outputs the list of structures produced by the transformation on stdout.

The full pipeline

This script main.py runs the main pipeline of the system and produces results to analyse.

Parameters for the collect_scores() function

  • sentence_nums - a list of sentence numbers (corresponding to the trailing digits in any of the files in dataset)
  • n - number of times to perform the test
  • folder - path to the folder where the results should be saved

Data

  • UD parses. In our experiments we used these parses. UD parses can be produced by calling the parse_files_in_folder(raw_sentences_folder, ud_save_folder) function of the parser.py module, where raw_sentences_folder is the path to the folder where raw sentences are stored (one sentence per file) and ud_save_folder is the path to the folder where UD parses (in CoNNL-U format) should be stored. Example: parse_files_in_folder('./data/amr_bank_data/sentences/', './data/amr_bank_data/ud/')
  • Gold AMR parses. We used parses for The Little Prince, as outlined in the report. Available here

Running the script

  1. Initialise Grew. grew.init()
  2. Run collect_scores(sentence_nums, n, folder). This will do the following:
    1. Call calculate_scores() on the given sentences. This, in turn, will:
      1. Call run_pipeline() on each sentence. This will:
        1. Load the UD graph for the given sentence
        2. Build all possible AMR graphs from the UD graph following the specified rewrite rules and compute their scores.
        3. Select the best scoring AMR graph and return its score
      2. Compute precision, recall and f1 score for each sentence.
      3. Print the max, min and average values for each measure.
      4. Return the measures.
    2. Write the computed score to files.

Example

#initialise Grew
grew.init()

#generate the numbers of the sentences to be processed
sentence_nums = list(range(1,101))

#process the sentences and collect scores
collect_scores(sentence_nums, 1, './results/final_100')

Visualisations

Result_Visualisations.ipynb is a Jupyter notebook that can be used to analyse and visualise the stored scores.

Possible alterations

Here we outline how to run the system on a different dataset. Furthermore, the coverage of the rewriting rules and lexicon can be significantly extended to improve the AMR parsing performance. We outline how these can be extended.

  1. To run the code on your own dataset, starting from raw sentences:
    1. Add your raw sentences (one sentence per file) to a folder and number them accordingly (with 4 digits at the end of the filename).
    2. Call the parse_files_in_folder(raw_sentences_folder, ud_save_folder) function of the parser.py module, as outlined above.
    3. Add Gold AMR parses for your dataset to a folder.
    4. Run main.py, specifying sentence_nums, n and folder.
  2. To use a different model for the UD parser:
    1. Save your model to ./models/
    2. Change the model path in line 37 in parser.py.
  3. To extend the GRS rulebase, you can add rules to the main grs file. Please refer to Application of Graph Rewriting to Natural Language Processing and the GREW documentation for a guide on how to do so.
  4. You can add to the lexicons, by manually annotating the PropBank predicates on the proto-roles of these predicates. The lexicon we used can be found here. An annotation tool for adding these proto-roles can be found here.