Evaluation script that unpacks lextag into remaining STREUSLE columns #41

nschneid · 2019-06-20T22:13:38Z

Re: #40, we need a script that takes lextags (full tags, one per token) output by a system and parses them to extract MWE groupings.

Lextags are the 19th and final column in the .conllulex format. Columns 1-10 are UD. Columns 11-18 can be filled in based on UD+lextags.

nschneid · 2019-06-20T22:26:02Z

Input: .conllulex format except columns 11-18 are blank (not underscores; completely blank)

I think the easiest way to implement this will be to adapt streuseval.py so that instead of VERIFYING that lextags are consistent with columns 11-18, it parses lextags and then populates columns 11-18 in JSON.

Specifically, it needs to:

parse each lextag into mwetag + lexcat + supersenses
parse mwetag sequences into links
form strong and weak groups (token sets) out of links
number the groups (first strong, then weak) and the tokens within the groups
look up lemmas for the tokens in each group

If we want the output as .conllulex, converting JSON to .conllulex could be a separate script.

…rse_mwe_links(), which can be imported separately for lextag unpacking (#41)

…ession annotations from sequence of lextags (#41)

nschneid · 2019-06-21T03:03:21Z

@danielhers I believe I have this working on the lextag-unpack branch. When reconstructing from the gold lextags I can't 100% match the original data file due to an arbitrary numbering issue (#42), but the streuseval score of the original vs. reconstructed is 100%, so there should not be any errors in the reconstruction. Hopefully this means the script is bug-free.

nschneid self-assigned this Jun 20, 2019

nschneid added a commit that referenced this issue Jun 21, 2019

streuseval: refactor part of the logic from eval_sent_links() into pa…

37f00a2

…rse_mwe_links(), which can be imported separately for lextag unpacking (#41)

nschneid added a commit that referenced this issue Jun 21, 2019

conllulex2UDlextag.py and UDlextag2json.py - reconstruct lexical expr…

09014b4

…ession annotations from sequence of lextags (#41)

nschneid mentioned this issue Jun 21, 2019

MWE numbering within sentence is inconsistent #42

Closed

This was referenced Jun 22, 2019

Add json2conllulex conversion script #44

Closed

Scripts to support evaluation of automatic lextag prediction (#40) #46

Merged

nschneid closed this as completed in #46 Jun 22, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Evaluation script that unpacks lextag into remaining STREUSLE columns #41

Evaluation script that unpacks lextag into remaining STREUSLE columns #41

nschneid commented Jun 20, 2019

nschneid commented Jun 20, 2019

nschneid commented Jun 21, 2019

Evaluation script that unpacks lextag into remaining STREUSLE columns #41

Evaluation script that unpacks lextag into remaining STREUSLE columns #41

Comments

nschneid commented Jun 20, 2019

nschneid commented Jun 20, 2019

nschneid commented Jun 21, 2019