Layout engine may introduce some diffs #25

goodmami · 2018-08-09T17:55:40Z

One goal of the project is to model the PENMAN structure as graphs but to retain enough information from their serialization so the tree structure doesn't change on reserialization. Here is an example from the Bio-AMR corpus where a diff is introduced:

(e / enhance-01~e.11 :li 2~e.0 
      :ARG1 (a3 / and~e.6 
            :op1 (n6 / nucleic-acid 
                  :name (n / name :op1 "mRNA"~e.5) 
                  :ARG0-of (e2 / encode-01 
                        :ARG1 p)) 
            :op2 (p / protein~e.7 
                  :name (n2 / name :op1 "serpinE2"~e.4))) 
      :manner~e.10 (m / marked~e.10) 
      :mod (a2 / also~e.9) 
      :location~e.12 (c / cell~e.15 
            :ARG0-of (e3 / exhibit-01~e.16 
                  :ARG1 (m2 / mutate-01~e.17 
                        :ARG1 (a4 / and~e.22 
                              :op1 (g / gene 
                                    :name (n4 / name :op1 "KRAS"~e.20)) 
                              :op2 (g2 / gene 
                                    :name (n5 / name :op1 "BRAF"~e.24))))) 
            :mod (h / human~e.13) 
            :mod (d / disease 
                  :name (n3 / name :op1 "CRC"~e.14))) 
      :manner~e.2 (i / interesting~e.2))

Here is what is produced (with whitespace differences normalized):

(e / enhance-01~e.11 :li 2~e.0
      :ARG1 (a3 / and~e.6
            :op1 (n6 / nucleic-acid
                  :name (n / name :op1 "mRNA"~e.5)
                  :ARG0-of (e2 / encode-01
                        :ARG1 (p / protein~e.7
                              :name (n2 / name :op1 "serpinE2"~e.4))))
            :op2 p)
      :manner~e.10 (m / marked~e.10)
      :mod (a2 / also~e.9)                                                                     
      :location~e.12 (c / cell~e.15                                                            
            :ARG0-of (e3 / exhibit-01~e.16                                                     
                  :ARG1 (m2 / mutate-01~e.17                                                   
                        :ARG1 (a4 / and~e.22                                                   
                              :op1 (g / gene                                                   
                                    :name (n4 / name :op1 "KRAS"~e.20))                        
                              :op2 (g2 / gene                                                  
                                    :name (n5 / name :op1 "BRAF"~e.24)))))                     
            :mod (h / human~e.13)
            :mod (d / disease       
                  :name (n3 / name :op1 "CRC"~e.14)))
      :manner~e.2 (i / interesting~e.2))

Note how the reentrancy of the p node is reversed. The layout engine prefers edges to appear in their original orientation, but in this case they do. I could possibly prefer reentrancies to start from deeper nestings, or maybe I could embed some info about reentrancy in the triple (as I do with inversion).

The text was updated successfully, but these errors were encountered:

danielhers · 2018-08-10T05:40:16Z

Note that in general, I think a rule of thumb is that in coreference or predicate conjunction or gapping, a variable is expanded where it is mentioned explicitly, and appears as a reentrancy where a pronoun is used or the argument is elided. So in the case of

In the panel of six CRC cell lines , all of them harboured a <i> KRAS </i> gene mutation that was located in codon 12 or 13 .

It makes sense to expand the variable c3 when referring to the cell lines in the panel, and as a reentrancy when referred to as them as an argument of harboured.

As another example from the guidelines, in

The boy arrived and left on Tuesday.

boy is expanded as an argument of arrived but used as reentrancy as an argument of left (where it is elided).

I don't know how easy it is to take these issues into account, though.

goodmami · 2018-08-10T17:06:54Z

Thanks for explaining. Those are good guidelines for hand-annotation, but I don't think it would help for serializing from triples since we don't know the surface form. It may be possible to use the alignments, if available, and the ::tok annotaiton, if available, but as they optional meta info and not part of the graph, it seems like a bad direction to go.

As an aside, for the harboured example, it sounds like you're arguing for the rearranged output of the Penman module than the original annotation, but if the module's output is better it's surely just by chance. I would, however, like Penman to allow deterministic restructuring for normalization, which could help with ML models learned from AMR by reducing unnecessary (?) variation.

danielhers · 2018-08-10T17:16:29Z

I'm "arguing" for the original annotation, actually. In a_pmid_2256_9000.150 the ARG0 of harbor-01 is a reentrancy (corresponding to them, the subject of harboured).

goodmami · 2018-08-10T17:28:45Z

Oh, my mistake. I was misreading the graph. Thanks for pointing that out.

goodmami mentioned this issue Aug 9, 2018

Support for alignment to text tokens #19

Closed

goodmami added this to the v0.7.0 milestone Nov 5, 2019

goodmami closed this as completed in aabc021 Nov 21, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Layout engine may introduce some diffs #25

Layout engine may introduce some diffs #25

goodmami commented Aug 9, 2018

danielhers commented Aug 10, 2018

goodmami commented Aug 10, 2018

danielhers commented Aug 10, 2018

goodmami commented Aug 10, 2018

Layout engine may introduce some diffs #25

Layout engine may introduce some diffs #25

Comments

goodmami commented Aug 9, 2018

danielhers commented Aug 10, 2018

goodmami commented Aug 10, 2018

danielhers commented Aug 10, 2018

goodmami commented Aug 10, 2018