### Project Kojak

** The problem  **

We will attempt to identify amboguously defined words - words that are homographs (spelled the same, but with multiple meanings) and determine the exact meaning of the word from a context window.

Here we attempt to do this in a few stages
1. train a word embedding on some training corpus using skip-gram (Here we use 1000 sholarly research papers) 
2. identify common homographs and extract the various context windows
3. interpret the context windows as vectors in the embedding space and appy a clustering algorith (DBSCAN). Each cluster is interpreted as a distinct definition of the homograph. Each cluster then is representative vector.
4. apply to a test corpus - match context of given homograph to most similar group.


### This notebook

Loads a pre-trained word embedding model, uses DBSCAN clustering to identify several 'definitions' of a set of homographs, and saves those definitions for later use.

In [2]:
import gensim
import json
import os
import re
import time
from nltk.corpus import stopwords
from nltk import tokenize
from nltk import pos_tag
from pprint import pprint



In [3]:
# Declare stopwords, preprocess the data from source file

stop = stopwords.words('english')
stop += ['?','!',':',';','[',']','[]','“' ]
stop += ['.', ',', '(', ')', "'", '"',"''",'""',"``",'”', '“', '?', '!', '’', 'et', 'al', 'al.']
stop = set(stop)

class MyPapers(object):
    # a memory-friendly way to load a large corpora
     def __init__(self, dirname):
            self.dirname = dirname
 
     def __iter__(self):
        with open(self.dirname) as data_file:    
            data = json.load(data_file)
        # iterate through all file names in our directory
        for paper in data:
            sentences = tokenize.sent_tokenize(paper['full_text'])
            for sentence in sentences:
                try:
                    line = re.sub(r'[?\.,!:;\(\)“\[\]]',' ',sentence)
                    line = [word for word in line.lower().split() if word not in stop]
                    yield line
                except:
                    print("Empty line found")
                    continue
                

In [4]:
#Instantiate iterable on the data

#papers is an iterable of scholarly papers, tokenized for prcessing
papers = MyPapers('data/train_data.json') 


In [123]:
def find_sentence(json_file, word_list):
    words = []
    for w in word_list:
        for _ in w.split('_'):
            words.append(_)
    for paper in json_file:
        for sentence in tokenize.sent_tokenize(paper['full_text']):
            if all(word in sentence.lower() for word in words):
                return sentence

## Word embeddings

Import word2vec word embeddings trained on 2848 scholarly journal articles

In [5]:
import numpy as np
from sklearn.cluster import DBSCAN
from sklearn.cluster import AgglomerativeClustering
from functools import reduce

In [6]:
model = gensim.models.word2vec.Word2Vec.load("data/journal.txt")

In [7]:
model.corpus_count

182533

In [8]:
vectors = model.wv

In [62]:
vocab = vectors.vocab

In [9]:
len(vectors.vocab)

127777

** contexts to vectors **

In [80]:
from collections import Counter, defaultdict
# The function takes as arguments a list of tokenized documents and a window size
# and returns each word in the document along with its window context as a tuple

def generate_word_counts(documents):
    counts = Counter()
    
    for document in documents:        
        for word in set(document):                    
            counts[word] += 1
            
    return counts

# Takes list of word tokens as arguments
# Returns a list of vectors whose components are the arithmetic mean of the 
# corresponding component of all of the input vectors

def get_vectors(word_list):
    vecs = []
    for word in word_list:
        try:
            vecs.append(vectors[word])
        except:
            print("{} missing from vocabulary".format(word))
            #continue
    return vecs

# Takes list of words as arguments
# Returns a single vector whose components are the arithmetic mean of the 
# corresponding component of all of the input vectors

def vector_average(vector_list):
    #words = [x for x in word_list if x in vocab]
    #vector_list = get_vectors(words)
    A = np.array(vector_list)
    dim = A.shape[0]
    ones = np.ones(dim)
    return ones.dot(A)/len(vector_list)

# Takes list of tokenized documents, target word and window size as arguments
# Returns list of vectors where each vector represents the context window 
# of the target word in the word embedding space

def context2vectors(documents,target):

    context_vectors = []

    for document in documents:
        sentence = document
        if target in sentence:
            #str_sentence = streamlined_sentence(sentence)
            context_vectors.append(vector_average(get_vectors(sentence)))
                    
    return list(context_vectors)


# Takes list of vectors as arguments
# Returns a single vector whose components are the arithmetic mean of the 
# corresponding component of all of the input vectors weighted by Inverse Document Frecuency

def vector_average2(words): #, word_counts, vectors):
    
    total = sum(list(word_counts.values()))
    vocab = set(vectors.vocab.keys())
    words = [x for x in words if x in vocab]
    vector_list = list(map((lambda x: vectors[x]*np.log((1 + total)/(1 + word_counts[x]))),words))
    
    if len(vector_list) == 0:
        return 0
    elif len(vector_list) == 1:
        vector_sum = vector_list[0]
    else:
        vector_sum = reduce((lambda x,y: np.add(x,y)),vector_list)
        
    weighted_average = (1.0/np.linalg.norm(vector_sum))*vector_sum
    
    return weighted_average

# Takes list of tokenized documents, target word and window size as arguments
# Returns list of vectors where each vector represents the context window 
# of the target word in the word embedding space

def context2vectors2(documents,target):

    context_vectors = []

    for document in documents:
        sentence = document
        if target in sentence:
            sentence.remove(target)
            #str_sentence = streamlined_sentence(sentence)
            context_vectors.append(vector_average2(sentence))
                    
    return context_vectors

In [53]:
def MyPapers_plus(papers):
    
    phrases = gensim.models.phrases.Phrases(sentences = papers, min_count = 5, threshold = 150)
    bigram = gensim.models.phrases.Phraser(phrases)
    phrases2 = gensim.models.phrases.Phrases(sentences = bigram[papers], min_count = 5, threshold = 300)
    trigram = gensim.models.phrases.Phraser(phrases2)
    
    return trigram[bigram[papers]]

In [12]:
word_counts = generate_word_counts(MyPapers_plus(papers))

In [13]:
word_counts['new_york_city']

14

In [14]:
#dictionary = gensim.corpora.dictionary.Dictionary(MyPapers_plus(papers))
#text = [dictionary.doc2bow(c) for c in MyPapers_plus(papers)]

** Clustering with DBSAN **

Use DBSCAN to determine similar usages of the target homographs. Each of these similar usages will be combined to a representative vector in the embedding space and constitute a "definition" of that word.

In [54]:
def streamlined_sentence(sentence):
    POS = {'JJ','JJR','JJS','NN','NNS','NNP','NNPS','VB','VBD','VBG','VBN','VBD','VBZ'}
    st_sent = [word[0] for word in pos_tag(sentence) if word[1] in POS]
    return st_sent

In [57]:
#Arguments: The desired cluster number, a list of documents making up the corpus, the target homograph
#           a list of labels for the conext sentences indicating the homographs usage, and window size
# The function prints the representative context windows for the target word within the desired cluster   

def print_cluster_context(cluster_number, documents, target, labels):
    
    context_vectors = []

    for document in documents:
        sentence = document
        if target in sentence:
            str_sentence = streamlined_sentence(sentence)
            context_vectors.append(str_sentence)
            
    for i, label in enumerate(labels):
        if label == cluster_number:
            print(context_vectors[i])
            
#Arguments: The desired cluster number, a list of documents making up the corpus, the target homograph
#           a list of labels for the conext sentences indicating the homographs usage, and window size
# The function prints the representative context windows for the target word within the desired cluster   

def cluster_context(documents, target, labels):
    
    context_sentences = []

    for document in documents:
        sentence = document
        if target in sentence:
            str_sentence = streamlined_sentence(sentence)
            context_sentences.append(str_sentence)
                     
    clustered_sentences = defaultdict(list)                
    for i, label in enumerate(labels):
        clustered_sentences[label].append(context_sentences[i])
    
    return clustered_sentences

# Arguments: List of vectors, each representing a context sentence and a list of labels 
#            corresponding to the context vectors
# Returns:   A dictionary where keys are the identified labels from clustering and the value is a single 
#            representing the cluster

def identify_definition(context_vectors, labels):
    
    cluster_numbers = set(labels)
    definitions = dict()
    cluster_vectors = defaultdict(list)
    
    if len(set(labels)) == 1:
        print("No consistent definition found")
        definitions[0] = np.zeros(len(context_vectors[0]))
        return definitions
    
    for i, label in enumerate(labels):
        cluster_vectors[label].append(context_vectors[i])
    
    for key in cluster_vectors.keys():
        if key < 0:
            continue
        else:
            v = vector_average(cluster_vectors[key])
            definitions[key] = v/np.linalg.norm(v)
                    
    return definitions


In [17]:
test = MyPapers('data/testing_data.json')

In [91]:
target = u'train'
context_vectors = context2vectors2(MyPapers_plus(papers), target)

In [92]:
epsilon = .1

In [93]:
dbscan = DBSCAN(eps = epsilon, metric = 'cosine', algorithm = 'brute', min_samples = 3)
dbscan.fit(context_vectors)

DBSCAN(algorithm='brute', eps=0.1, leaf_size=30, metric='cosine',
    min_samples=3, n_jobs=1, p=None)

In [94]:
labels = dbscan.labels_
n_clusters = len(set(labels)) # - (1 if -1 in labels else 0)
print(n_clusters)

3


In [95]:
labels

array([-1, -1, -1,  1, -1,  0, -1,  1, -1, -1, -1, -1, -1, -1, -1,  1, -1,
        1, -1, -1, -1, -1, -1,  1, -1,  1, -1, -1,  0, -1, -1, -1, -1,  0,
       -1, -1, -1, -1, -1,  0,  0,  0, -1,  0, -1,  0, -1, -1, -1, -1, -1,
       -1, -1, -1,  1, -1,  1, -1,  1,  1, -1,  0, -1, -1, -1, -1, -1, -1,
        0, -1, -1, -1, -1,  0,  0,  0,  0, -1, -1, -1, -1,  0, -1, -1, -1,
        0, -1,  0,  0,  0,  0,  0, -1, -1, -1, -1, -1, -1,  1, -1, -1, -1,
       -1, -1,  0, -1, -1, -1, -1, -1])

In [96]:
target_definitions = identify_definition(context_vectors, labels)

In [97]:
def read_glossary(glossary):
    
    vector_glossary = dict()
    
    for k, v in glossary.items():
        vector_glossary[k] = {key:vector_average2(tokenize.word_tokenize(value)) for (key,value) in v.items()}
    
    return vector_glossary

def get_target_sentences(documents, target):
    
    context_sentences = []

    for document in documents:
        #print(document[:15])
        sentence = document
        if target in sentence:
            #str_sentence = streamlined_sentence(sentence)
            #print[str_sentence]
            sentence.remove(target)
            context_sentences.append(sentence)
            
    return context_sentences


In [98]:
target_sentences = get_target_sentences(MyPapers_plus(test), target)

Empty line found
Empty line found


Exception ignored in: <generator object MyPapers.__iter__ at 0x7f001b136fc0>
RuntimeError: generator ignored GeneratorExit
Exception ignored in: <generator object MyPapers.__iter__ at 0x7f0041f0be08>
RuntimeError: generator ignored GeneratorExit


In [100]:
for s in target_sentences:
    d = define_target_from_sentence(s, target_definitions)
    print("{}\nDefined as {} \n".format(s, d))

[(0.11450983228614364, 1), (0.1952935527438393, 0)]
['military', 'fire', 'fighters', 'labourers', 'rely', 'upon', 'physical', 'mental', 'proficiencies', 'compete', 'elite', 'levels', 'remain', 'productive', 'workforce']
Defined as 1 

[(0.063228821455403539, 1), (0.12762305452685241, 0)]
['help', 'clarification', 'process', 'face-to-face-communication', 'would', 'always', 'preferable', 'e-mails', 'webinars', 'mere', 'technical', 'means', 'one', 'main', 'rules', 'double', 'bind-free', 'organisation', 'talk', 'instead', 'staff_members', 'level', 'therefore', 'maintain', 'neutral', 'attitude', 'towards', 'colleagues', 'supervisors', 'co-workers']
Defined as 1 

[(0.036080638950056865, 0), (0.075199066907859713, 1)]
['proposed', 'extracting', 'method', 'improves', '1', 'extraction', 'quality', 'system', 'increasing', 'precision', 'entity_extraction', 'result', 'initial', 'extracted', 'entities', 'method', 'highly', 'precised', 'entities', 'used', 'seed', 'training', 'machine_learning', 'ml

# Recording Definitions

Use the DBSCAN clusters to determine the various definitions of a word, then create a dictionary for the word

In [103]:
def define_target_from_sentence(sentence, dictionary):
    cosine_dists = []
    sv = vector_average2(sentence)
    sentence_vector = sv/np.linalg.norm(sv)
    for k,v in dictionary.items():
        cosine_dists.append((1 - np.dot(sentence_vector, v),k))
    cosine_dists.sort()
    print(cosine_dists)
    return cosine_dists[0][1]
    
def extract_dictionary(papers, homographs):
    dictionary = dict()
    for word in homographs:
        print("Calculating context vectors for \"{}\"".format(word))
        context_vectors = context2vectors2(papers, word)
        print("Clustering...")
        if len(context_vectors)<1:
            print("no example for \"{}\" found.".format(word))
            continue
        dbscan = DBSCAN(eps = epsilon, metric = 'cosine', algorithm = 'brute', min_samples = 3)
        dbscan.fit(context_vectors)
        labels = dbscan.labels_
        print("found {} distinct definitions".format(len(set(labels))))
        print("Building definitions for \"{}\"".format(word))
        dictionary[word] = identify_definition(context_vectors, labels)
        
    print("Dictionary complete")    
    return dictionary    

In [104]:
homographs = ['attribute', 'bank', 'charge', 'train']

dictionary = extract_dictionary(papers, homographs)

Calculating context vectors for "attribute"
Clustering...
found 2 distinct definitions
Building definitions for "attribute"
Calculating context vectors for "bank"
Clustering...
found 5 distinct definitions
Building definitions for "bank"
Calculating context vectors for "charge"
Clustering...
found 6 distinct definitions
Building definitions for "charge"
Calculating context vectors for "train"
Clustering...
found 3 distinct definitions
Building definitions for "train"
Dictionary complete


In [138]:
dictionary['charge'].keys()

dict_keys([0, 1, 2, 3, 4])

## Testing

In [None]:
# TEST WINDOW #

target = u'charge'
with open('data/testing_data.json') as f:
    file = json.load(f)
    
context_vectors = context2vectors2(MyPapers_plus(papers), target)
dbscan = DBSCAN(eps = epsilon, metric = 'cosine', algorithm = 'brute', min_samples = 4)
dbscan.fit(context_vectors)
labels = dbscan.labels_
#target_definitions = identify_definition(context_vectors, labels)
target_definitions = dictionary[target]


contexts = cluster_context(MyPapers_plus(papers), target, labels)
correct = 0
wrong = 0
for c in contexts:
    for window in contexts[c]:
        d = define_target_from_sentence(window, target_definitions)
        if c == d:
            correct += 1
        else:
            print("Cluster {}, defined as {} \n".format(c, d), find_sentence(file,window), '\n')
            if c != -1:
                wrong += 1
                
print("Number correct: {}\nNumber wrong: {}".format(correct, wrong))

In [137]:
target = u'charge'
with open('data/testing_data.json') as f:
    file = json.load(f)
context_sentences = context(MyPapers_plus(test), target)
#print(context_sentences)

for c in context_sentences:
    d = define_target_from_sentence(c, dictionary[target])
    #print(c)
    print("{}\nDefined as {} \n".format(find_sentence(file,c),d))



[(0.055669052328530477, 0), (0.16997656549947227, 3), (0.18869912453325421, 1), (0.20350545291678956, 4), (0.22427200788766244, 2)]
But the most significant implication is that the way that H is incorporated in olivine may vary with the fugacity of HO, depending on whether the H atoms needed to achieve charge balance are completely associated with the point defect by being bonded to the oxygen atoms surrounding it in specific locations, or are disordered over the lattice by being bonded to oxygen atoms without regard to location.
Defined as 0 

[(0.070765776459314367, 0), (0.18354397038542425, 1), (0.20159771967971685, 3), (0.26746119951088776, 2), (0.28529326483576622, 4)]
For example, if the four H atoms of the [Si] mechanism are bonded to oxygen atoms surrounding the Si site vacancy to produce local charge balance (that is short-range order), as recently shown by Xue et al.
Defined as 0 

[(0.11366215858995266, 0), (0.16091510984935953, 1), (0.19528214471626981, 3), (0.2607369904631

After this, a training set of 1249 samples (423 positive samples and 826 negative samples) and a test set of 312 samples (105 positive samples and 207 negative samples) were obtained.In the feature calculation step, 203 descriptor were calculated including 30 constitution descriptors, 44 connectivity indices, 7 kappa indices, 32 Moran auto-correction descriptors, 5 molecular properties, 25 charge descriptors and 60 MOE-Type descriptors.
Defined as 0 

[(0.10224518372763458, 0), (0.16258850563945493, 4), (0.16728005781188271, 3), (0.18051267370454949, 2), (0.30600007017572417, 1)]
This has resulted in significant new functionality in core application programming interfaces (APIs) while maintaining the quality of code depending on those core APIs.Examples of new features supported by the improved development model include InChI functionality [], greatly improved ring detection algorithms [], improvements to the core atom type perception module that now covers a much more comprehensive se

In [134]:
charge_def = {1:"(criminal law) a pleading describing some wrong or offense",
              2:"(explosive) a quantity of explosive to be set off at one time",
              3:"(physics) the quantity of unbalanced electricity in a body (either positive or negative) and construed as an excess or deficiency of electrons",
              4:"(finance) request for payment or fee in exchange for a good or service",
              5:"the responsibility of taking care or control of someone or something."}

state_def = {1:"(physics) the condition of matter with respect to structure, form, constitution, phase, or the like",
            2:"a nation or territory considered as an organized political community under one government",
            3:"express something definitely or clearly in speech or writing"}

train_def = {1:"teach (a person or animal) a particular skill or type of behavior through practice and instruction over a period of time.",
            2: "a series of railroad cars moved as a unit by a locomotive or by integral motors."}

attribute_def = {1:"a quality or feature that is as a characteristic of a person or thing",
                2:"regard something as being caused by someone or something"}

glossary = { 'charge':charge_def, 'state':state_def, 'train':train_def, 'attribute':attribute_def}

In [135]:
g = read_glossary(glossary)

In [124]:
target = u'train'
with open('data/testing_data.json') as f:
    file = json.load(f)
context_sentences = context(MyPapers_plus(test), target)
#print(context_sentences)

for c in context_sentences:
    d = define_target_from_sentence(c, g[target])
    #print(c)
    print("{}\nDefined as {} \n".format(find_sentence(file,c),d))



[(0.20578956604003906, 1), (0.29383689165115356, 2)]
military, fire fighters and labourers) [], who rely upon physical and mental proficiencies to compete or train at elite levels and remain productive in the workforce.
Defined as 1 

[(0.17394262552261353, 1), (0.24395477771759033, 2)]
To help the clarification process, face-to-face-communication would always be preferable over e-mails, webinars or other mere technical means and one of the main rules for a double bind-free organisation should be: Talk  each other instead of  each other!All staff members on each level should therefore train and maintain a neutral attitude towards all their colleagues, supervisors and co-workers.
Defined as 1 

[(0.18678128719329834, 1), (0.22531867027282715, 2)]
The proposed extracting method improves (1) the extraction quality of the system by increasing the precision of entity extraction as a result of the initial extracted entities from our method; our highly precised entities will be used as a seed

In [125]:
target = u'charge'
with open('data/testing_data.json') as f:
    file = json.load(f)
context_sentences = context(MyPapers_plus(test), target)
#print(context_sentences)

for c in context_sentences:
    d = define_target_from_sentence(c, g[target])
    #print(c)
    print("{}\nDefined as {} \n".format(find_sentence(file,c),d))



[(0.14442133903503418, 3), (0.22563308477401733, 2), (0.40358829498291016, 1), (0.41298401355743408, 5), (0.43839478492736816, 4)]
But the most significant implication is that the way that H is incorporated in olivine may vary with the fugacity of HO, depending on whether the H atoms needed to achieve charge balance are completely associated with the point defect by being bonded to the oxygen atoms surrounding it in specific locations, or are disordered over the lattice by being bonded to oxygen atoms without regard to location.
Defined as 3 

[(0.17687559127807617, 3), (0.25559866428375244, 2), (0.45852863788604736, 4), (0.47920215129852295, 1), (0.49328607320785522, 5)]
For example, if the four H atoms of the [Si] mechanism are bonded to oxygen atoms surrounding the Si site vacancy to produce local charge balance (that is short-range order), as recently shown by Xue et al.
Defined as 3 

[(0.19387054443359375, 3), (0.26459795236587524, 2), (0.43715059757232666, 1), (0.448050141334533

After this, a training set of 1249 samples (423 positive samples and 826 negative samples) and a test set of 312 samples (105 positive samples and 207 negative samples) were obtained.In the feature calculation step, 203 descriptor were calculated including 30 constitution descriptors, 44 connectivity indices, 7 kappa indices, 32 Moran auto-correction descriptors, 5 molecular properties, 25 charge descriptors and 60 MOE-Type descriptors.
Defined as 3 

[(0.20881187915802002, 3), (0.24343907833099365, 2), (0.34048378467559814, 1), (0.36043977737426758, 4), (0.36821633577346802, 5)]
This has resulted in significant new functionality in core application programming interfaces (APIs) while maintaining the quality of code depending on those core APIs.Examples of new features supported by the improved development model include InChI functionality [], greatly improved ring detection algorithms [], improvements to the core atom type perception module that now covers a much more comprehensive se

In [127]:
target = u'state'
with open('data/testing_data.json') as f:
    file = json.load(f)
context_sentences = context(MyPapers_plus(test), target)
#print(context_sentences)

for c in context_sentences:
    d = define_target_from_sentence(c, g[target])
    #print(c)
    print("{}\nDefined as {} \n".format(find_sentence(file,c),d))



[(0.15812051296234131, 2), (0.27128320932388306, 3), (0.28722202777862549, 1)]
Although it was evident that communist East Germany was an unjust state, many people consider the prosecution and condemnation of leading individuals in the GDR to be a form of victor’s justice.
Defined as 2 

[(0.20961058139801025, 3), (0.25187826156616211, 2), (0.2820887565612793, 1)]
The researchers believe that it will take time for CHC teachers to change their habits of traditional teaching when they interact with a newly designed social constructivism-based science curriculum.The challenges that the Vietnamese CHC teachers need deep understanding of scientific content knowledge and that the Vietnamese teachers find teaching and learning scientific argumentation difficult align with concerns of researchers and educators about the state of science teaching in many primary classrooms recently.
Defined as 3 

[(0.18396443128585815, 2), (0.22507059574127197, 3), (0.2410435676574707, 1)]
Specifically, this s

The state of major agent follows a linear  stochastic differential equation (BSDE) and the states of minor agents are governed by linear  stochastic differential equations (SDEs).
Defined as 1 

[(0.32707482576370239, 1), (0.42387086153030396, 2), (0.46314132213592529, 3)]
The major agent is dominating as its state enters those of minor agents.
Defined as 1 

[(0.26997500658035278, 1), (0.36079239845275879, 3), (0.41409283876419067, 2)]
() Solve the above auxiliary stochastic control problem to obtain the decentralized optimal state  (which should depend on the undetermined process , hence denoted by ).
Defined as 1 

[(0.26025474071502686, 1), (0.35258305072784424, 3), (0.38438940048217773, 2)]
() Determine  by the fixed-point argument: .As to the MFG with major-minor agent ,  can be further divided into:() First, solve the decentralized control problem for  by replacing 
                         using  The related decentralized optimal state is denoted by  and optimal control by 
   

Instead of producing healthy, safe and productive welfare state citizens, which is the intention of BMI regulation, the regulation may also turn fishers who have few other work opportunities into unproductive individuals dependent upon welfare.
Defined as 2 

[(0.29269766807556152, 1), (0.30543988943099976, 3), (0.35310035943984985, 2)]
).To facilitate analyses of the competence data and the repeated measures, plausible values, which are the state of the art in analysis techniques for large-scale assessment data, will be provided, adopting the procedures from PIAAC as far as possible.
Defined as 1 

[(0.29873096942901611, 2), (0.31645524501800537, 3), (0.32303857803344727, 1)]
The same is true for model .The results then suggest that investment is more responsive to cash flow and asset sales in the financially constrained state as was signaled in .
Defined as 2 

[(0.27984082698822021, 1), (0.30380988121032715, 2), (0.3147960901260376, 3)]
Since the standard deviation of investment is 

Our results may provide an alternative explanation to the observations and help in resolving the controversy between geodetic and gravity observations as a volcano moves from rest to unrest state.However, there are some limitations of the model presented in this paper that must be considered.
Defined as 1 

[(0.23438555002212524, 1), (0.33717530965805054, 3), (0.3836175799369812, 2)]
These results confirm our hypothesis that the peak pressure inside the PFJ would change more dramatically at higher degrees of knee flexion because the dynamic brace is designed to impart a larger anteriorly directed force on the tibia in that state.
Defined as 1 

[(0.30991506576538086, 1), (0.46815931797027588, 3), (0.47737765312194824, 2)]
reported increased PFJ contact forces in PCL- and PCL/PLC-deficient knees when compared to the intact state under simulated muscle loads at all knee flexion angles.
Defined as 1 

[(0.27759325504302979, 1), (0.37633210420608521, 3), (0.4049830436706543, 2)]
), Jacobi 

As they are  by nature, they exist in the state of  or divine grace (2251).
Defined as 3 

[(0.29536008834838867, 3), (0.32637161016464233, 1), (0.32657700777053833, 2)]
By divine grace they leave their attachment with the innate impurities, reach the state of  and remain as  or  (v2233).The ,  and  are .
Defined as 3 

[(0.22720551490783691, 1), (0.26884377002716064, 2), (0.27538156509399414, 3)]
When it realizes the truth about its nature, the  and  states are burnt and it attains the  state (v2409).
Defined as 1 

[(0.22755861282348633, 3), (0.25051480531692505, 2), (0.26482319831848145, 1)]
The five states of consciousness that the soul experiences are created by the number of  or principles functioning in them.The specific state of consciousness experienced by a soul is in accordance with the fruits of its previous action.
Defined as 3 

[(0.26227593421936035, 3), (0.26645433902740479, 1), (0.29488635063171387, 2)]
A  or a supreme soul in the state of silence takes a body of  whic

  after removing the cwd from sys.path.


ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()

In [136]:
target = u'attribute'
with open('data/testing_data.json') as f:
    file = json.load(f)
context_sentences = context(MyPapers_plus(test), target)
#print(context_sentences)

for c in context_sentences:
    d = define_target_from_sentence(c, g[target])
    #print(c)
    print("{}\nDefined as {} \n".format(find_sentence(file,c),d))



[(0.33354330062866211, 1), (0.37769865989685059, 2)]
Here, we focus on models explaining linkage between dyads beyond structure by incorporating node attribute information.
Defined as 1 

[(0.38523983955383301, 1), (0.52475601434707642, 2)]
The  () models network structure and node attributes by learning the attribute correlations in the observed network.
Defined as 1 

[(0.41379630565643311, 1), (0.5419037938117981, 2)]
Furthermore, the  () takes into account attribute information from nodes to model network structure.
Defined as 1 

[(0.31510353088378906, 1), (0.39183962345123291, 2)]
This model defines the probability of an edge as the product of individual attribute link formation affinities.
Defined as 1 

[(0.35475432872772217, 1), (0.53952652215957642, 2)]
Each feature vector 
                        =(
                        [1],...,
                        []) maps a node 
                         to  (numeric or categorical) attribute values.
Defined as 1 

[(0.3240596055984