-
Notifications
You must be signed in to change notification settings - Fork 26
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Lots of errors when looking at model.errors(graph) #88
Comments
Thanks, @bjascob! I don't currently have access to the LDC data so I appreciate your bug reports for things I can't easily test. As to your problems, there are a few things going on, detailed below. Also, there are several documents I refer to: The roles defined in penman.models.amr come from the role inventory.
Nope, you're using it correctly. The 2/3 are mostly the
You mean where it checks for errors as it decodes instead of as a separate step? No, that is not provided.
This means that Penman tried to deinvert the relation but it could not because it would make a constant (not a node) into the source of the triple, which is not allowed. It is a warning and not an error because you can still proceed, but the triple is then |
Thanks for all the info. FYI... for my use case, I'm looking for "bad" errors so I probably don't care whether it's conforming to a specific revision of the specs / LDC releases. As long as it's a reasonable role, I'm OK with keeping it. What I'm saying is, don't make a bunch of models specific to the LDC releases / specs for me. For now, I'll just create exceptions for the items listed above. To clarify on the last point... if the parser is producing attributes with |
Got it. But also I think this library could at least document better which version of AMR it supports.
Good idea. Note that you cannot just do: from penman.models.amr import model
model.roles.update({':wiki': {'type': 'wiki'}}) # don't do this ... because when the model is created it creates a regular-expression based index for checking if a role exists. Rather, rebuild the model with the expanded roles. Something like this: from penman.models import amr
from penman.model import Model
roles = dict(amr.roles)
roles.update({'wiki': {'type': 'wiki'}})
model = Model(
roles=roles,
normalizations=amr.normalizations,
reifications=amr.reifications,
)
I do, yes. The official documentation is lacking on this point, but this is my conclusion based on what I've seen. For instance, if I were to treat constants like
... and most people would find that surprising. So I now have the notion of "attribute" edges which are those where the pre-normalized triple (or tree branch) has a value that is not the source variable of some node. Note that, hypothetically, if we had a role whose primary form has Whether or not you include these in your model is your choice. If you are training an AMR to text generator and the input AMRs from JAMR might contain these bad relations, you might choose to not filter them to increase robustness. But if you only want "proper" data (more relevant if you're training a parser, perhaps?) then you might avoid them. |
I think your conclusion on the Again, thanks for all the great info. |
There were several points raised in this issue, so I gave the issue the "question" label and created #89 specifically for the missing roles. I'll close this now. Thanks, again! |
I'm trying to use the library to weed out any bad graphs that get produced by my AMR parser but I'm finding that even on standard AMR training data (LDC2015E86) this is giving me a lot of errors. Here's what I'm doing...
If I accumulate some stats on LDC2015E86 here's the counts / errors...
About 2/3 of the graphs have some type of error. Am I using the library wrong?
Also, since decoding uses the model, is there a simple way to get the errors after calling
decode
(or possiblyload
) instead of passing the graph back intomodel.errors(graph)
?One more thing, when decoding I routinely see warnings such as...
WARNING:penman.layout:cannot deinvert attribute: ('l', ':poss-of', '700')
. Is this an issue? I would think that this would be somewhat typical for an attribute. I notice this only comes up with the output of my parser. I don't see it on the LDC data so maybe thex-of
is not legal on an attrib?The text was updated successfully, but these errors were encountered: