Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Self-loops appearing in collapsed-ccprocessed-dependencies #81

Closed
stephenroller opened this issue Jul 16, 2015 · 2 comments
Closed

Self-loops appearing in collapsed-ccprocessed-dependencies #81

stephenroller opened this issue Jul 16, 2015 · 2 comments

Comments

@stephenroller
Copy link

Some (many) sentences seem to produce dependency graphs with self-loops. From my understanding of the documentation, this should never occur. I found this issue in the 3.5.2 release, and the web demo seems to demonstrate it as well. I have NOT tried compiling and testing on HEAD.

This example sentence from UkWaC reproduces the error. I found countless other examples, but this was one of the simplest. Most of the faulty sentences seemed to have the issue with conj:and and conj:or relations, but I also found it with case. Later in my pipeline, it seems to appear with nsubj and other relations, but I cannot find a sentence that produces this at the moment.

Example Input:
We have course participants from Britain , Europe and from the rest of the world - usually in roughly equal proportions .

Relevant example output:

    <dep type="conj:and">
    <governor idx="4">participants</governor>
    <dependent idx="4" copy="1">participants</dependent>
    </dep>

Full example output:

<sentence id="327" line="327">
<tokens>
    <token id="1">
    <word>We</word>
    <lemma>we</lemma>
    <CharacterOffsetBegin>48047</CharacterOffsetBegin>
    <CharacterOffsetEnd>48049</CharacterOffsetEnd>
    <POS>PRP</POS>
    <NER>O</NER>
    </token>
    <token id="2">
    <word>have</word>
    <lemma>have</lemma>
    <CharacterOffsetBegin>48050</CharacterOffsetBegin>
    <CharacterOffsetEnd>48054</CharacterOffsetEnd>
    <POS>VBP</POS>
    <NER>O</NER>
    </token>
    <token id="3">
    <word>course</word>
    <lemma>course</lemma>
    <CharacterOffsetBegin>48055</CharacterOffsetBegin>
    <CharacterOffsetEnd>48061</CharacterOffsetEnd>
    <POS>NN</POS>
    <NER>O</NER>
    </token>
    <token id="4">
    <word>participants</word>
    <lemma>participant</lemma>
    <CharacterOffsetBegin>48062</CharacterOffsetBegin>
    <CharacterOffsetEnd>48074</CharacterOffsetEnd>
    <POS>NNS</POS>
    <NER>O</NER>
    </token>
    <token id="5">
    <word>from</word>
    <lemma>from</lemma>
    <CharacterOffsetBegin>48075</CharacterOffsetBegin>
    <CharacterOffsetEnd>48079</CharacterOffsetEnd>
    <POS>IN</POS>
    <NER>O</NER>
    </token>
    <token id="6">
    <word>Britain</word>
    <lemma>Britain</lemma>
    <CharacterOffsetBegin>48080</CharacterOffsetBegin>
    <CharacterOffsetEnd>48087</CharacterOffsetEnd>
    <POS>NNP</POS>
    <NER>LOCATION</NER>
    </token>
    <token id="7">
    <word>,</word>
    <lemma>,</lemma>
    <CharacterOffsetBegin>48088</CharacterOffsetBegin>
    <CharacterOffsetEnd>48089</CharacterOffsetEnd>
    <POS>,</POS>
    <NER>O</NER>
    </token>
    <token id="8">
    <word>Europe</word>
    <lemma>Europe</lemma>
    <CharacterOffsetBegin>48090</CharacterOffsetBegin>
    <CharacterOffsetEnd>48096</CharacterOffsetEnd>
    <POS>NNP</POS>
    <NER>LOCATION</NER>
    </token>
    <token id="9">
    <word>and</word>
    <lemma>and</lemma>
    <CharacterOffsetBegin>48097</CharacterOffsetBegin>
    <CharacterOffsetEnd>48100</CharacterOffsetEnd>
    <POS>CC</POS>
    <NER>O</NER>
    </token>
    <token id="10">
    <word>from</word>
    <lemma>from</lemma>
    <CharacterOffsetBegin>48101</CharacterOffsetBegin>
    <CharacterOffsetEnd>48105</CharacterOffsetEnd>
    <POS>IN</POS>
    <NER>O</NER>
    </token>
    <token id="11">
    <word>the</word>
    <lemma>the</lemma>
    <CharacterOffsetBegin>48106</CharacterOffsetBegin>
    <CharacterOffsetEnd>48109</CharacterOffsetEnd>
    <POS>DT</POS>
    <NER>O</NER>
    </token>
    <token id="12">
    <word>rest</word>
    <lemma>rest</lemma>
    <CharacterOffsetBegin>48110</CharacterOffsetBegin>
    <CharacterOffsetEnd>48114</CharacterOffsetEnd>
    <POS>NN</POS>
    <NER>O</NER>
    </token>
    <token id="13">
    <word>of</word>
    <lemma>of</lemma>
    <CharacterOffsetBegin>48115</CharacterOffsetBegin>
    <CharacterOffsetEnd>48117</CharacterOffsetEnd>
    <POS>IN</POS>
    <NER>O</NER>
    </token>
    <token id="14">
    <word>the</word>
    <lemma>the</lemma>
    <CharacterOffsetBegin>48118</CharacterOffsetBegin>
    <CharacterOffsetEnd>48121</CharacterOffsetEnd>
    <POS>DT</POS>
    <NER>O</NER>
    </token>
    <token id="15">
    <word>world</word>
    <lemma>world</lemma>
    <CharacterOffsetBegin>48122</CharacterOffsetBegin>
    <CharacterOffsetEnd>48127</CharacterOffsetEnd>
    <POS>NN</POS>
    <NER>O</NER>
    </token>
    <token id="16">
    <word>-</word>
    <lemma>-</lemma>
    <CharacterOffsetBegin>48128</CharacterOffsetBegin>
    <CharacterOffsetEnd>48129</CharacterOffsetEnd>
    <POS>:</POS>
    <NER>O</NER>
    </token>
    <token id="17">
    <word>usually</word>
    <lemma>usually</lemma>
    <CharacterOffsetBegin>48130</CharacterOffsetBegin>
    <CharacterOffsetEnd>48137</CharacterOffsetEnd>
    <POS>RB</POS>
    <NER>O</NER>
    </token>
    <token id="18">
    <word>in</word>
    <lemma>in</lemma>
    <CharacterOffsetBegin>48138</CharacterOffsetBegin>
    <CharacterOffsetEnd>48140</CharacterOffsetEnd>
    <POS>IN</POS>
    <NER>O</NER>
    </token>
    <token id="19">
    <word>roughly</word>
    <lemma>roughly</lemma>
    <CharacterOffsetBegin>48141</CharacterOffsetBegin>
    <CharacterOffsetEnd>48148</CharacterOffsetEnd>
    <POS>RB</POS>
    <NER>O</NER>
    </token>
    <token id="20">
    <word>equal</word>
    <lemma>equal</lemma>
    <CharacterOffsetBegin>48149</CharacterOffsetBegin>
    <CharacterOffsetEnd>48154</CharacterOffsetEnd>
    <POS>JJ</POS>
    <NER>O</NER>
    </token>
    <token id="21">
    <word>proportions</word>
    <lemma>proportion</lemma>
    <CharacterOffsetBegin>48155</CharacterOffsetBegin>
    <CharacterOffsetEnd>48166</CharacterOffsetEnd>
    <POS>NNS</POS>
    <NER>O</NER>
    </token>
    <token id="22">
    <word>.</word>
    <lemma>.</lemma>
    <CharacterOffsetBegin>48167</CharacterOffsetBegin>
    <CharacterOffsetEnd>48168</CharacterOffsetEnd>
    <POS>.</POS>
    <NER>O</NER>
    </token>
</tokens>
<parse>(ROOT (S (NP (PRP We)) (VP (VBP have) (NP (NP (NN course) (NNS participants)) (PP (PP (PP (IN from) (NP (NNP Britain) (, ,) (NNP Europe))) (CC and) (PP (IN from) (NP (NP (DT the) (NN rest)) (PP (IN of) (NP (DT the) (NN world)))))) (: -) (RB usually) (PP (IN in) (NP (ADJP (RB roughly) (JJ equal)) (NNS proportions)))))) (. .))) </parse>
<dependencies type="basic-dependencies">
    <dep type="root">
    <governor idx="0">ROOT</governor>
    <dependent idx="2">have</dependent>
    </dep>
    <dep type="nsubj">
    <governor idx="2">have</governor>
    <dependent idx="1">We</dependent>
    </dep>
    <dep type="compound">
    <governor idx="4">participants</governor>
    <dependent idx="3">course</dependent>
    </dep>
    <dep type="dobj">
    <governor idx="2">have</governor>
    <dependent idx="4">participants</dependent>
    </dep>
    <dep type="case">
    <governor idx="8">Europe</governor>
    <dependent idx="5">from</dependent>
    </dep>
    <dep type="compound">
    <governor idx="8">Europe</governor>
    <dependent idx="6">Britain</dependent>
    </dep>
    <dep type="acl">
    <governor idx="4">participants</governor>
    <dependent idx="8">Europe</dependent>
    </dep>
    <dep type="cc">
    <governor idx="8">Europe</governor>
    <dependent idx="9">and</dependent>
    </dep>
    <dep type="case">
    <governor idx="12">rest</governor>
    <dependent idx="10">from</dependent>
    </dep>
    <dep type="det">
    <governor idx="12">rest</governor>
    <dependent idx="11">the</dependent>
    </dep>
    <dep type="conj">
    <governor idx="8">Europe</governor>
    <dependent idx="12">rest</dependent>
    </dep>
    <dep type="case">
    <governor idx="15">world</governor>
    <dependent idx="13">of</dependent>
    </dep>
    <dep type="det">
    <governor idx="15">world</governor>
    <dependent idx="14">the</dependent>
    </dep>
    <dep type="nmod">
    <governor idx="12">rest</governor>
    <dependent idx="15">world</dependent>
    </dep>
    <dep type="dep">
    <governor idx="8">Europe</governor>
    <dependent idx="17">usually</dependent>
    </dep>
    <dep type="case">
    <governor idx="21">proportions</governor>
    <dependent idx="18">in</dependent>
    </dep>
    <dep type="advmod">
    <governor idx="20">equal</governor>
    <dependent idx="19">roughly</dependent>
    </dep>
    <dep type="amod">
    <governor idx="21">proportions</governor>
    <dependent idx="20">equal</dependent>
    </dep>
    <dep type="nmod">
    <governor idx="8">Europe</governor>
    <dependent idx="21">proportions</dependent>
    </dep>
</dependencies>
<dependencies type="collapsed-dependencies">
    <dep type="root">
    <governor idx="0">ROOT</governor>
    <dependent idx="2">have</dependent>
    </dep>
    <dep type="nsubj">
    <governor idx="2">have</governor>
    <dependent idx="1">We</dependent>
    </dep>
    <dep type="compound">
    <governor idx="4">participants</governor>
    <dependent idx="3">course</dependent>
    </dep>
    <dep type="dobj">
    <governor idx="2">have</governor>
    <dependent idx="4">participants</dependent>
    </dep>
    <dep type="conj:and">
    <governor idx="4">participants</governor>
    <dependent idx="4" copy="1">participants</dependent>
    </dep>
    <dep type="case">
    <governor idx="8">Europe</governor>
    <dependent idx="5">from</dependent>
    </dep>
    <dep type="compound">
    <governor idx="8">Europe</governor>
    <dependent idx="6">Britain</dependent>
    </dep>
    <dep type="acl:from">
    <governor idx="4">participants</governor>
    <dependent idx="8">Europe</dependent>
    </dep>
    <dep type="cc">
    <governor idx="4">participants</governor>
    <dependent idx="9">and</dependent>
    </dep>
    <dep type="case">
    <governor idx="12">rest</governor>
    <dependent idx="10">from</dependent>
    </dep>
    <dep type="det">
    <governor idx="12">rest</governor>
    <dependent idx="11">the</dependent>
    </dep>
    <dep type="acl:from">
    <governor idx="4" copy="1">participants</governor>
    <dependent idx="12">rest</dependent>
    </dep>
    <dep type="case">
    <governor idx="15">world</governor>
    <dependent idx="13">of</dependent>
    </dep>
    <dep type="det">
    <governor idx="15">world</governor>
    <dependent idx="14">the</dependent>
    </dep>
    <dep type="nmod:of">
    <governor idx="12">rest</governor>
    <dependent idx="15">world</dependent>
    </dep>
    <dep type="dep">
    <governor idx="8">Europe</governor>
    <dependent idx="17">usually</dependent>
    </dep>
    <dep type="case">
    <governor idx="21">proportions</governor>
    <dependent idx="18">in</dependent>
    </dep>
    <dep type="advmod">
    <governor idx="20">equal</governor>
    <dependent idx="19">roughly</dependent>
    </dep>
    <dep type="amod">
    <governor idx="21">proportions</governor>
    <dependent idx="20">equal</dependent>
    </dep>
    <dep type="nmod:in">
    <governor idx="8">Europe</governor>
    <dependent idx="21">proportions</dependent>
    </dep>
</dependencies>
<dependencies type="collapsed-ccprocessed-dependencies">
    <dep type="root">
    <governor idx="0">ROOT</governor>
    <dependent idx="2">have</dependent>
    </dep>
    <dep type="nsubj">
    <governor idx="2">have</governor>
    <dependent idx="1">We</dependent>
    </dep>
    <dep type="compound">
    <governor idx="4">participants</governor>
    <dependent idx="3">course</dependent>
    </dep>
    <dep type="dobj">
    <governor idx="2">have</governor>
    <dependent idx="4">participants</dependent>
    </dep>
    <dep type="dobj" extra="true">
    <governor idx="2">have</governor>
    <dependent idx="4" copy="1">participants</dependent>
    </dep>
    <dep type="conj:and">
    <governor idx="4">participants</governor>
    <dependent idx="4" copy="1">participants</dependent>
    </dep>
    <dep type="case">
    <governor idx="8">Europe</governor>
    <dependent idx="5">from</dependent>
    </dep>
    <dep type="compound">
    <governor idx="8">Europe</governor>
    <dependent idx="6">Britain</dependent>
    </dep>
    <dep type="acl:from">
    <governor idx="4">participants</governor>
    <dependent idx="8">Europe</dependent>
    </dep>
    <dep type="cc">
    <governor idx="4">participants</governor>
    <dependent idx="9">and</dependent>
    </dep>
    <dep type="case">
    <governor idx="12">rest</governor>
    <dependent idx="10">from</dependent>
    </dep>
    <dep type="det">
    <governor idx="12">rest</governor>
    <dependent idx="11">the</dependent>
    </dep>
    <dep type="acl:from">
    <governor idx="4" copy="1">participants</governor>
    <dependent idx="12">rest</dependent>
    </dep>
    <dep type="case">
    <governor idx="15">world</governor>
    <dependent idx="13">of</dependent>
    </dep>
    <dep type="det">
    <governor idx="15">world</governor>
    <dependent idx="14">the</dependent>
    </dep>
    <dep type="nmod:of">
    <governor idx="12">rest</governor>
    <dependent idx="15">world</dependent>
    </dep>
    <dep type="dep">
    <governor idx="8">Europe</governor>
    <dependent idx="17">usually</dependent>
    </dep>
    <dep type="case">
    <governor idx="21">proportions</governor>
    <dependent idx="18">in</dependent>
    </dep>
    <dep type="advmod">
    <governor idx="20">equal</governor>
    <dependent idx="19">roughly</dependent>
    </dep>
    <dep type="amod">
    <governor idx="21">proportions</governor>
    <dependent idx="20">equal</dependent>
    </dep>
    <dep type="nmod:in">
    <governor idx="8">Europe</governor>
    <dependent idx="21">proportions</dependent>
    </dep>
</dependencies>
</sentence>
@sebschu
Copy link
Member

sebschu commented Jul 16, 2015

This is not a bug - it's a feature ;)

If you look carefully these are not self-loops as the dependent is a copy of the original node (It has the attribute copy="1").

In sentences such as

United flies to and from Serbia.

we create a copy of flies (flies') in the CCprocessed dependencies so that we end up with the following dependency graph

 nsubj(flies, United)
 nsubj(flies', United)
 root(ROOT, flies)
 conj:and(flies, flies')
 case(Serbia, to)
 cc(to, and)
 conj:and(to, from)
 nmod:to(flies, Serbia)
 nmod:from(flies', Serbia)

The reason for copying this node is that when we add the preposition to the relation name we would lose some information if we only added to to the relation name and we wanted to avoid complex relation names such as nmod:to_and_from. Also, the meaning of the sentence is roughly "United flies from Serbia and United flies to Serbia" which is encoded in this graph.

@stephenroller
Copy link
Author

I understand. Thanks for your explanation and time. I'll have to figure out how I want to handle this downstream, but this makes sense.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants