Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Position of a node #74

Closed
rafaelanchieta opened this issue Apr 19, 2020 · 15 comments
Closed

Position of a node #74

rafaelanchieta opened this issue Apr 19, 2020 · 15 comments
Labels
enhancement New feature or request

Comments

@rafaelanchieta
Copy link

Hi Goodman.

Does the penman library have a method to return the position of each node in the graph? Ex.
(w / want-01 :ARG0 (b / boy) :ARG1 (g / go-01 : ARG0 b))
want-01 - 0
boy - 0.0
go-01 - 0.1

@goodmami
Copy link
Owner

There isn't currently a function to do this, but it wouldn't be hard to do, and it would probably make a good addition to the library so it could work with JAMR-style alignment metadata. But you would want to target the tree and not the graph (e.g., from PENMANCodec.parse(), also see the penman.tree module), as those are tree positions. The trouble is that if the tree gets restructured from the pure graph, those alignments no longer make sense.

Depending on what you want, it would probably be as simple as the following (not tested):

def positions(node):
    variable, branches = node
    pos = []
    for i, (role, target) in enumerate(branches):
        pos.append((i, target))
        if not tree.is_atomic(target):
            pos.extend((f'{i}.{j}', tgt) for j, tgt in positions(target))
    return pos

I would welcome a PR with a Tree method that does something similar to the above, or maybe I'll have a chance to do it for the next release.

@rafaelanchieta
Copy link
Author

Thank you for the quick reply. I'm trying to fix the code.

@goodmami
Copy link
Owner

Sounds good. There's some useful documentation of the numbering here, and #19 has further investigation and discussion of the numbering and alignment formats. I'm on the fence as to whether we should have the position start at 0 or 1 or make it a parameter. I kinda like 0-based and in depth-first order so they can be zipped with the output of Tree.nodes(). Users who want 1-based can always add 1 when creating alignment metadata, I suppose?

@rafaelanchieta
Copy link
Author

I believe that the position should start at 0 to keep the compatibility with the JARM-alignment and with parsers based on it. I'll see the documentation.

@goodmami
Copy link
Owner

Also see this document from the 3.0 release of the AMR data. It uses 1-based indices.

https://catalog.ldc.upenn.edu/docs/LDC2020T02/AMR-alignment-format.txt

@rafaelanchieta
Copy link
Author

Interesting. Are there more examples of this alignment format?

@goodmami
Copy link
Owner

I'm not sure. In this comment of #19 I investigated some sources and found 3 different in-situ formats (~N, ~eN, ~e.N) and two metadata formats (M-N|i+i.j and M-i.j). Penman already supports all the in-situ formats (shallowly, it just stores whatever comes between ~ and the first index as a prefix, and there's no interpretation of the indices yet, so 0-based and 1-based are both ok).

As mentioned, I'm happy if as a first pass the Tree.positions() method (or whatever it's called) only returns 0-based indices because we can add 1 when creating metadata. If we start interpreting the ::alignment metadata lines or in-situ alignments (e.g., for extracting token sequences from the ::tok line) we'll need to make up a story for interpreting the values.

@goodmami
Copy link
Owner

@rafaelanchieta I don't see a fork of Penman on your GitHub page so am I correct in assuming you're not working on this feature (at least not for submitting a PR)?

goodmami added a commit that referenced this issue Apr 27, 2020
@goodmami
Copy link
Owner

I went ahead and pushed a new branch with the Tree.positions() method. I went with the starting number of 1 to follow the AMR annotation guidelines, but it takes an argument to adjust this. Here's how it works:

>>> from penman.codec import PENMANCodec
>>> c = PENMANCodec()
>>> t = c.parse('(w / want-01 :polarity - :ARG0 (b / boy) :ARG1 (g / go-01 :ARG0 b))')
>>> print(c.format(t))
(w / want-01
   :polarity -
   :ARG0 (b / boy)
   :ARG1 (g / go-01
            :ARG0 b))
>>> t.positions()
['1', '1.2', '1.3']
>>> for node, pos in zip(t.nodes(), t.positions()):
...   print(pos, node[0])
... 
1 w
1.2 b
1.3 g
>>> t.positions(0)
['0', '0.1', '0.2']

Note that the positions are only returned for actual nodes, so the (w :polarity -) triple increments the counter but does not end up in the list of positions. This is so the result of Tree.positions() can be zipped with Tree.nodes().

@rafaelanchieta, does this suit your needs?

@rafaelanchieta
Copy link
Author

Hi @goodmami .

I downloaded the Penman project and was working on it.
For that example, the (w :polarity -) triple should be counted.
The method should returns ['0', '0.0, '0.1', '0.2']'

@goodmami
Copy link
Owner

Thanks for the feedback. If you want to submit a PR I can still merge it in (even if it gets some further adjustments) so your contribution is recognized by GitHub.

Regarding the (w :polarity -) triple, I chose to count but not return a position for those triples because if the list of positions could not zip with the nodes they would be rather useless. You would have to do something complicated to figure out which node each position went with, and that defeats the purpose of computing them in the first place.

But I just now realized that tree nodes and positions won't help with the main use case of computing the positions, which is to create or interpret alignment metadata, because when a tree is configured the alignment data is already embedded in the tree and thus not easily inspected:

>>> from penman.codec import PENMANCodec
>>> from penman import layout
>>> from penman.surface import alignments, role_alignments
>>> c = PENMANCodec()
>>> g = c.decode('''
... # ::tok I swam in the lake .
... (s / swim~e.1
...    :ARG0 (i / i~e.0)
...    :location~e.2 (l / lake~e.4))''')
>>> alignments(g)
{('s', ':instance', 'swim'): Alignment((1,), prefix='e.'), ('i', ':instance', 'i'): Alignment((0,), prefix='e.'), ('l', ':instance', 'lake'): Alignment((4,), prefix='e.')}
>>> role_alignments(g)
{('s', ':location', 'l'): RoleAlignment((2,), prefix='e.')}
>>> layout.configure(g)
Tree(('s', [('/', 'swim~e.1'), (':ARG0', ('i', [('/', 'i~e.0')])), (':location~e.2', ('l', [('/', 'lake~e.4')]))]))

So perhaps what is needed instead is a function that maps triples to tree positions. I'm not sure if this would be a Tree method or a function of penman.layout, since it is strongly related to penman.layout.configure().

@rafaelanchieta
Copy link
Author

I'm not sure but, checking if the attribute-is in the attribute list could solve this issue.

@goodmami
Copy link
Owner

Thanks, but I don't think that's the issue. It's that alignments are associated with graph triples, but the positions are based on the tree structure. The graph, tree, and PENMAN string are all separate representations of an AMR (see here if you find that confusing). Associating alignments with tree positions thus requires some function that straddles both the graph and tree.

goodmami added a commit that referenced this issue May 6, 2020
Now the function yields pairs of (path, branch) where path is a
sequence of branch indices to get to the paired branch.

Resolves #74, probably
@goodmami
Copy link
Owner

goodmami commented May 6, 2020

Ok, I've reconsidered this and changed Tree.positions() to Tree.walk() which yields paths to branches. That is, it should yield one pair for every branch in the tree structure. This is still not exactly the same as the alignment annotations, but this is more general and should be able to be applied to alignment annotation conversion. Here's an example:

>>> import penman
>>> t = penman.parse('''
... # ::tok I swam in the lake .
... (s / swim-01~e.1
...    :ARG0 (i / i~e.0)
...    :location~e.2 (l / lake~e.4))''')
>>> for path, branch in t.walk():
...     print(path, branch)
... 
(0,) ('/', 'swim-01~e.1')
(1,) (':ARG0', ('i', [('/', 'i~e.0')]))
(1, 0) ('/', 'i~e.0')
(2,) (':location~e.2', ('l', [('/', 'lake~e.4')]))
(2, 0) ('/', 'lake~e.4')

Note that the top node doesn't have an index, because these are indices of branches. They are 0-based but the concept is included as a branch, so it looks like 1-based indices. Here's an example of how you could get most of the way to the alignment metadata format:

>>> from penman.tree import is_atomic
>>> for path, branch in t.walk():
...     if is_atomic(branch[1]):
...         if path[-1] == 0:
...             path = path[:-1]
...         path = '.'.join(map(str, (1,) + path))
...         print(path, branch[1])
... 
1 swim-01~e.1
1.1 i~e.0
1.2 lake~e.4

To get the rest of the way would require parsing the alignment info from the roles or concepts. The AlignmentMarker.from_string() function would help here. Also one would need to consider role alignments on branches whose target is not atomic.

@rafaelanchieta does this look better?

@goodmami goodmami added the enhancement New feature or request label May 7, 2020
@rafaelanchieta
Copy link
Author

It's great. I'll close the issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants