### NER Coref

goals:
1. implement NER Coref framework - In: Text. Out: FewRel queries.
2. Test on re3d

note:
flair installation:
```
    pip install --upgrade git+https://github.com/flairNLP/flair.git
```

In [1]:
from flair.data import Sentence
from flair.models import SequenceTagger
import spacy

I0318 13:03:44.651339 140669276460864 file_utils.py:41] PyTorch version 1.4.0+cu92 available.


In [2]:
parser = spacy.load("en_core_web_sm", disable=['ner'])

# load the NER tagger
tagger = SequenceTagger.load('ner')

2020-03-18 13:04:04,682 loading file /home/akvallapuram/.flair/models/en-ner-conll03-v0.4.pt


In [3]:
sent = Sentence('It is being conducted in full coordination and support of the intra-Syrian talks that will be held by the UN Special Envoy de Mistura in Geneva in February, and in view of the Brussels Conference on Syria and the region which will be hosted by the EU later in the spring. They were able to come to a resolution.')

In [4]:
# run NER over sentence
tagger.predict(sent)

[Sentence: "It is being conducted in full coordination and support of the intra-Syrian talks that will be held by the UN Special Envoy de Mistura in Geneva in February, and in view of the Brussels Conference on Syria and the region which will be hosted by the EU later in the spring. They were able to come to a resolution."   [− Tokens: 59  − Token-Labels: "It is being conducted in full coordination and support of the intra-Syrian <S-MISC> talks that will be held by the UN <S-ORG> Special Envoy <B-PER> de <I-PER> Mistura <E-PER> in Geneva <S-LOC> in February, and in view of the Brussels <B-MISC> Conference <I-MISC> on <I-MISC> Syria <E-MISC> and the region which will be hosted by the EU <S-ORG> later in the spring. They were able to come to a resolution."]]

In [107]:
for entity in sent.get_spans('ner'):
    print(entity.start_pos, entity.tag, entity.tag in "PER/ORG")

62 MISC False
106 ORG True
117 PER True
137 LOC False
176 MISC False
248 ORG True


In [6]:
from itertools import combinations

In [7]:
ent_pairs = combinations(sent.get_spans('ner'), 2)

In [11]:
list(ent_pairs)

[(<MISC-span (12): "intra-Syrian">, <ORG-span (20): "UN">),
 (<MISC-span (12): "intra-Syrian">, <PER-span (22,23,24): "Envoy de Mistura">),
 (<MISC-span (12): "intra-Syrian">, <LOC-span (26): "Geneva">),
 (<MISC-span (12): "intra-Syrian">,
  <MISC-span (34,35,36,37): "Brussels Conference on Syria">),
 (<MISC-span (12): "intra-Syrian">, <ORG-span (47): "EU">),
 (<ORG-span (20): "UN">, <PER-span (22,23,24): "Envoy de Mistura">),
 (<ORG-span (20): "UN">, <LOC-span (26): "Geneva">),
 (<ORG-span (20): "UN">,
  <MISC-span (34,35,36,37): "Brussels Conference on Syria">),
 (<ORG-span (20): "UN">, <ORG-span (47): "EU">),
 (<PER-span (22,23,24): "Envoy de Mistura">, <LOC-span (26): "Geneva">),
 (<PER-span (22,23,24): "Envoy de Mistura">,
  <MISC-span (34,35,36,37): "Brussels Conference on Syria">),
 (<PER-span (22,23,24): "Envoy de Mistura">, <ORG-span (47): "EU">),
 (<LOC-span (26): "Geneva">,
  <MISC-span (34,35,36,37): "Brussels Conference on Syria">),
 (<LOC-span (26): "Geneva">, <ORG-span (

In [103]:
# article = 'the US Republican presidential candidate Donald TRUMP'
# article = 'The Russian President Vladimir PUTIN'
article = 'there is nothing'
doc = pos_tagger(article)

In [104]:
for token in doc:
    print(token.text, token.dep_)

there expl
is ROOT
nothing attr


In [105]:
phrase = Sentence(article)
tagger.predict(phrase)
phrase.get_spans('ner')

[]

In [87]:
dep = next(doc.sents)
doc[dep.root.i]

PUTIN

In [90]:
phrase.get_spans('ner')[1].to_dict()

{'text': 'Vladimir PUTIN',
 'start_pos': 22,
 'end_pos': 36,
 'labels': [PER (0.9984)]}

In [101]:
help(type(dep.root))

Help on class Token in module spacy.tokens.token:

class Token(builtins.object)
 |  An individual token – i.e. a word, punctuation symbol, whitespace,
 |  etc.
 |  
 |  DOCS: https://spacy.io/api/token
 |  
 |  Methods defined here:
 |  
 |  __bytes__(...)
 |  
 |  __eq__(self, value, /)
 |      Return self==value.
 |  
 |  __ge__(self, value, /)
 |      Return self>=value.
 |  
 |  __gt__(self, value, /)
 |      Return self>value.
 |  
 |  __hash__(self, /)
 |      Return hash(self).
 |  
 |  __le__(self, value, /)
 |      Return self<=value.
 |  
 |  __len__(...)
 |      The number of unicode characters in the token, i.e. `token.text`.
 |      
 |      RETURNS (int): The number of unicode characters in the token.
 |      
 |      DOCS: https://spacy.io/api/token#len
 |  
 |  __lt__(self, value, /)
 |      Return self<value.
 |  
 |  __ne__(self, value, /)
 |      Return self!=value.
 |  
 |  __new__(*args, **kwargs) from builtins.type
 |      Create and return a new object.  See help

In [102]:
dep.root.idx

31

In [21]:
line = parser("Trump' s aim")
for tok in line:
    print(tok.text , tok.pos_, tok.dep_)

Trump PROPN amod
' PUNCT dep
s NOUN nsubj
aim NOUN ROOT


In [48]:
class LazyStringSet(set):
    """
        Set of Strings with a lazy
        accessor. 
        
        Attributes:
            set (set): stores only string 
            type elementselements.
    """
    def __init__(self):
        self.set = set()
    
    def get(self, x):
        """
            if the string x is a subset of
            any element present in the set, 
            that element will be returned.
        """
        assert type(x) == str
        for i in self.set:
            if x in i:
                return i
        return None
    
    def put(self, x):
        """
        Inserts new string element x to set.
        Returns True if inserted, otherwise False.
        """
        assert type(x) == str
        if self.get(x) is None:
            self.set.add(x)
            return True
        return False

In [49]:
lss = LazyStringSet()

In [50]:
print(lss.put("Donald Trump"))
lss.get("Trump")

True


'Donald Trump'

### Possessive improvements for First Mentions

In [63]:
parser("her")[0].dep_

'ROOT'