# S-MatCNGenPy

Este é um passo-a-passo da implementação em python do método **S-MatCNGenPy**, desenvolvido no trabalho \[1\]. O seu principal objetivo é garantir o suporte a referência ao esquema de dados em busca por palavras-chave em banco de dados. Observe que algumas consultas, como visto abaixo, não estão relacionadas apenas a valores do banco de dados, mas a própria estrutura do esquema.

```
    filmes do Will Smith
```
- **`filmes`** : relação Movie
- **`Will`, `Smith`** : instâncias da tabela Person(Name) 


#### Leituras Importantes

> [\[1\]](https://drive.google.com/file/d/1ZnljlKss9a8M_RDqseTYfZbQCjDhcJkk/view) MARTINS, Paulo Rodrigo O.; DA SILVA, Altigran Soares. *Uma Abordagem para Suporte a Referências ao Esquema em Consultas por Palavras-Chave em Bancos de Dados Relacionais*. Trabalho de Conclusão de Curso (Ciência da Computação), Universidade Federal do Amazonas, 2017. 

> [\[2\]]() DE OLIVEIRA, Pericles; DA SILVA, Altigran; DE MOURA, Edleno. *Match-Based Candidate Network Generation for Keyword Queries over Relational Databases*. In: Data Engineering (ICDE), 2018 IEEE 34st International Conference on. IEEE, 2016. Aceito pra Pubicação

> [\[3\]](https://dl.acm.org/citation.cfm?id=1989383) BERGAMASCHI, Sonia et al. *Keyword search over relational databases: a metadata approach*. In: Proceedings of the 2011 ACM SIGMOD International Conference on Management of data. ACM, 2011. p. 565-576.

In [1]:
import psycopg2
from psycopg2 import sql
from pprint import pprint as pp
from collections import defaultdict
import string
import itertools
import copy
from math import log1p
from queue import deque
import ast
import gc
from queue import deque

import nltk 
#nltk.download('wordnet')
#nltk.download('omw')
#nltk.download('stopwords')

from nltk.corpus import stopwords
from nltk.corpus import wordnet as wn
from nltk.stem import WordNetLemmatizer
wnl = WordNetLemmatizer()

import gensim.models.keyedvectors as word2vec
from gensim.models import KeyedVectors


stw_set = set(stopwords.words('english')) - {'will'}

# Connect to an existing database
conn = psycopg2.connect("dbname=imdb user=imdb password=imdb")

# Open a cursor to perform database operations
cur = conn.cursor()



## Pré-processamento

Antes mesmo de receber os querysets, o sistema passa por um pré-processamento, que é responsavél pela criação de dois índices invertidos:

* **wordHash**: tabela que associa cada termo do banco de dados com o seu **IAF (Inverse Attribute Frequency)** e também referencia todas Tabelas, Colunas e CTIDs em que a palavra ocorre. Nota: o CTID é o endereço físico de uma linha em uma tabela, utilizado para encontrar rapidamente uma tupla.
```python
wordHash['term'] = ( IAF , { 'table': { 'column' : [ctid] } } )
```
* **attributeHash**: tabela que para cada atributo (documento), armazena a sua norma e o número de palavras distintas.
```python
attributeHash['table']['column'] = ( norm , num_distinct_words )
```

### Criação dos Índices Invertidos

O processo de criação é realizado em três etapas. Primeiramente, o procedimento ```createInvertedIndex()``` faz uma varredura no banco de dados e preenche parcialmente o ```wordHash```, faltando apenas calcular os IAFs para cada termo. Além disso, este procedimento também ele também armazena no ```attributeHash``` o total de palavras distintas para cada atributo.

Em seguida, os IAFs de cada termo são processados através do método ```processIAF(wordHash,attributeHash)```. Por último, as normas dos atributos (documentos) são calculadas no método ```processNormsOfAttributes(wordHash,attributeHash)```.

In [2]:
#Word2Vec
def loadWordEmbeddingsModel(filename = "word_embeddings/word2vec/GoogleNews-vectors-negative300.bin"):
    model = KeyedVectors.load_word2vec_format(filename,
                                                       binary=True, limit=500000)
    return model


#GloVe
#def loadWordEmbeddingsModel(filename = "word_embeddings/word2vec/GoogleNews-vectors-negative300.bin"):
#    model = KeyedVectors.load_word2vec_format(filename, limit=500000)
#    return model

In [3]:
# wordEmbeddingsModel=loadWordEmbeddingsModel(emb_model)

In [4]:
#Apesar de ID está no word embedding model, sabemos que esse campo não deve ser indexado
#'id' in embeddingModel

In [5]:
def createInvertedIndex(embeddingModel):
    #Output: wordHash (Term Index) with this structure below
    #map['word'] = [ 'table': ( {column} , ['ctid'] ) ]

    '''
    The Term Index is built in a preprocessing step that scans only
    once all the relations over which the queries will be issued.
    '''
    
    wordHash = {}
    attributeHash = {}
    
    
    # Get list of tablenames
    cur.execute("SELECT DISTINCT tablename FROM pg_tables WHERE schemaname!='pg_catalog' AND schemaname !='information_schema';")
    for table in cur.fetchall():
        table_name = table[0]
        
        if table_name not in embeddingModel:
            print('TABLE ',table_name, 'SKIPPED')
            continue
        
        print('INDEXING TABLE ',table_name)
        
        attributeHash[table_name] = {}
        
        #Get all tuples for this tablename
        cur.execute(
            sql.SQL("SELECT ctid, * FROM {};").format(sql.Identifier(table_name))
            #NOTE: sql.SQL is needed to specify this parameter as table name (can't be passed as execute second parameter)
        )
        printSkippedColumns = True
        for row in cur.fetchall(): 
            for column in range(1,len(row)):
                column_name = cur.description[column][0] 
                
                if column_name not in embeddingModel or column_name=='id':
                    if printSkippedColumns:
                        print('\tCOLUMN ',column_name,' SKIPPED')
                    continue
                
                ctid = row[0]

                for word in [word.strip(string.punctuation) for word in str(row[column]).lower().split()]:
                    
                    #Ignoring STOPWORDS
                    if word in stw_set:
                        continue

                    #If word entry doesn't exists, it will be inicialized (setdefault method),
                    #Append the location for this word
                    wordHash.setdefault(word, {})                    
                    wordHash[word].setdefault( table_name , {} )
                    wordHash[word][table_name].setdefault( column_name , [] ).append(ctid)
                    
                    attributeHash[table_name].setdefault(column_name,(0,set()))
                    attributeHash[table_name][column_name][1].add(word)
            printSkippedColumns=False
        
        #Count words
        
        for (column_name,(norm,wordSet)) in attributeHash[table_name].items():
            num_distinct_words = len(wordSet)
            wordSet.clear()
            attributeHash[table_name][column_name] = (norm,num_distinct_words)
        

    print ('INVERTED INDEX CREATED')
    return (wordHash,attributeHash)

In [6]:
# (wordHash,attributeHash) = createInvertedIndex(wordEmbeddingsModel)

In [7]:
#pp(wordHash['denzel'])

In [8]:
#pp(attributeHash)

In [9]:
def processIAF(wordHash,attributeHash):
    
    total_attributes = sum([len(attribute) for attribute in attributeHash.values()])
    
    for (term, values) in wordHash.items():
        
        attributes_with_this_term = sum([len(attribute) for attribute in wordHash[term].values()])
        
        IAF = log1p(total_attributes/attributes_with_this_term)
                
        wordHash[term] = (IAF,values)
    print('IAF PROCESSED')

In [10]:
# processIAF(wordHash,attributeHash)

In [11]:
#pp(wordHash['denzel'])

In [12]:
def processNormsOfAttributes(wordHash,attributeHash,embeddingModel):
  
    # Get list of tablenames
    cur.execute("SELECT DISTINCT tablename FROM pg_tables WHERE schemaname!='pg_catalog' AND schemaname !='information_schema';")
    for table in cur.fetchall():
        table_name = table[0]
        
        if table_name not in embeddingModel:
            print('TABLE ',table_name, 'SKIPPED')
            continue
        
        print('PROCESSING TABLE ',table_name)
        
        #Get all tuples for this tablename
        cur.execute(
            sql.SQL("SELECT ctid, * FROM {};").format(sql.Identifier(table_name))
            #NOTE: sql.SQL is needed to specify this parameter as table name (can't be passed as execute second parameter)
        )
        
        printSkippedColumns = False
        for row in cur.fetchall():
            for column in range(1,len(row)):
                column_name = cur.description[column][0]  
                
                if column_name not in embeddingModel or column_name=='id':
                    if printSkippedColumns:
                        print('\tCOLUMN ',column_name,' SKIPPED')
                    continue
                
                ctid = row[0]

                for word in [word.strip(string.punctuation) for word in str(row[column]).lower().split()]:
                    
                    #Ignoring STOPWORDS
                    if word in stw_set:
                        continue
                    
                    (prevNorm,num_distinct_words)=attributeHash[table_name][column_name]
                    
                    IAF = wordHash[word][0]
                    
                    Norm = prevNorm + IAF
                    
                    attributeHash[table_name][column_name]=(Norm,num_distinct_words)
            printSkippedColumns = False

    print ('NORMS OF ATTRIBUTES PROCESSED')

In [13]:
# processNormsOfAttributes(wordHash,attributeHash,wordEmbeddingsModel)

In [14]:
#pp(wordHash['denzel'])

In [15]:
#pp(attributeHash)

## Main

O processamento das consultas é realizado em 

In [16]:
def getQuerySets(filename='querysets/queryset_imdb_martins.txt'):
    QuerySet = []
    with open(filename,encoding='utf-8-sig') as f:
        for line in f.readlines():
            
            #The line bellow Remove words not in OLIVEIRA experiments
            #Q = [word.strip(string.punctuation) for word in line.split() if word not in ['title','dr.',"here's",'char','name'] and word not in stw_set]  
            
            Q = tuple([word.strip(string.punctuation) for word in line.lower().split() if word not in stw_set])
            
            QuerySet.append(Q)
    return QuerySet

In [17]:
# QuerySet = getQuerySets()
# QuerySet

### Recuperação de Tuple-sets
Esta etapa consiste em recuperar conjuntos de tuplas que contém cada palavra-chave, chamados de tuple-sets. O algoritmo `TSFind`, que realiza esse processo, pode ser é divido em três partes: 
* **Recuperação de tuplas:** Essa parte consiste em encontrar os conjuntos de tuplas que contém cada uma das palavras do Queryset. Essas informações já foram pré-processadas no índice invertido `wordHash`.
* **Interseção de tuplas:** Esta parte acontece no algoritmo `TSInter` e é responsável por encontrar tuplas que contém mais de uma das palavras-chave. Além disso, esta etapa irá garantir que os tuple-sets `TABLE{word}` contenham apenas a palavra `word` e nenhuma outra palavra do queryset. Esta propriedade é necessária para encontrar a cobertura mínima (etapa de criação de query matches). 
* **Criação de tuple-sets:** Esta parte irá condensar os resultados. Em vez de listar todas as tuplas que contenham as palavras-chave, precisamos apenas saber quais colunas possuem cada uma das palavras. Por isso, os tuple-sets terão a estrutura (o primeiro atributo refere-se a *value* ou *schema*):
```python
TupleSet = ('table','column', frozenset({schemaWords}), frozenset({valueWords}))
```

In [18]:
def TSFind(Q):
    #Input:  A keyword query Q=[k1, k2, . . . , km]
    #Output: Set of non-free and non-empty tuple-sets Rq

    '''
    The tuple-set Rki contains the tuples of Ri that contain all
    terms of K and no other keywords from Q
    '''
    
    #Part 1: Find sets of tuples containing each keyword
    global P
    P = {}
    for keyword in Q:
        tupleset = set()
        
        if keyword not in wordHash:
            continue
        
        for (table,attributes) in wordHash.get(keyword)[1].items():
            for (attribute,ctids) in attributes.items():
                for ctid in ctids:
                    tupleset.add( (table,attribute,ctid) )
        P[frozenset([keyword])] = tupleset
    
    #Part 2: Find sets of tuples containing larger termsets
    P = TSInterMartins(P)
    
    #Part 3:Build tuple-sets
    Rq = set()
    
    schemaWords = frozenset()
    for valueWords , tuples in P.items():
        for (table,attribute,ctid) in tuples:
            Rq.add( (table,attribute,schemaWords,valueWords) )
    #print ('TUPLE SETS CREATED')
    return Rq


def TSInter(P):
    #Input: A Set of non-empty tuple-sets for each keyword alone P 
    #Output: The Set P, but now including larger termsets (process Intersections)

    '''
    Termset is any non-empty subset K of the terms of a query Q        
    '''
    
    Pprev = {}
    Pprev=copy.deepcopy(P)
    Pcurr = {}

    combinations = [x for x in itertools.combinations(Pprev.keys(),2)]
    for ( Ki , Kj ) in combinations:
        Tki = Pprev[Ki]
        Tkj = Pprev[Kj]
        
        X = Ki | Kj
        Tx = Tki & Tkj        
        
        if len(Tx) > 0:            
            Pcurr[X]  = Tx            
            Pprev[Ki] = Tki - Tx         
            Pprev[Kj] = Tkj - Tx
            
    if Pcurr != {}:
        Pcurr = copy.deepcopy(TSInter(Pcurr))
        
    #Pprev = Pprev U Pcurr
    Pprev.update(Pcurr)     
    return Pprev   


def TSInterMartins(P):
    #Input: A Set of non-empty tuple-sets for each keyword alone P 
    #Output: The Set P, but now including larger termsets (process Intersections)

    '''
    Termset is any non-empty subset K of the terms of a query Q        
    '''
    somethingChanged = False
    
    combinations = [x for x in itertools.combinations(P.keys(),2)]
    for ( Ki , Kj ) in combinations:
        Tki = P[Ki]
        Tkj = P[Kj]
        
        X = Ki | Kj
        Tx = Tki & Tkj        
        
        if len(Tx) > 0:            
            P[X]  = Tx            
            P[Ki] = Tki - Tx         
            P[Kj] = Tkj - Tx
            somethingChanged = True
            
    if somethingChanged:
        TSInterMartins(P)   
    return P

In [19]:
# Q = ['actor', 'james', 'bond']
# Rq = TSFind(Q)
# pp(Rq)

### Criação Schema-sets

Esta etapa consiste na criação dos Schema-sets, que é uma estrutura análoga aos tuple-sets vistos na etapa anterior. Aqui, o processo também é divido em três partes: 
* **Mapeamento de Elementos do Esquema (*Schema Matching*):** Essa parte consiste em analisar a similaridade entre as palavras do querysets e elementos do esquema (nomes de relações e atributos).
* **Análise de Termos Adjacentes:** Esta parte verifica as relações entre as palavras chave, muitas vezes uma palavras-chave relacioada a elemento do esquema delimita o domínio das palavras-chave adjacentes. Ex: Actor James Bond delimita a palavra James para nome de Pessoa, em vez de nome de Filme.
* **Criação de Schema-sets:** Esta parte irá formatar os resultados para ficarem semelhantes à estrutura de tuple-sets, seguindo a estrutura a seguir (o primeiro atributo refere-se a *value* ou *schema*):
```python
SchemaSet = ('s','table','column', frozenset({words}))
```

#### Similaridades para o Schema-Matching

Para o mapeamento de palavras para elementos do esquema, foram utilizadas métricas de similaridade de escrita e semântica.
O Coeficiente de Jaccard é uma métrica que avalia a interseção entre duas palavras, sendo ideal para similaridades de escrita, como abreviações ou erros de digitação. 

Por outro lado, as métricas semânticas utilizam o dicionário léxico WordNet para encontrar similaridades de sentido. O pacote de ferramentas NLTK disponibiliza uma série de métricas semânticas [aqui](http://www.nltk.org/howto/wordnet.html "WordNet Interface"). Entre elas, as principais são a Path Similarity e a Wu-Palmer Similarity. A primeira métrica procura encontrar a menor distância entre duas palavras, no grafo de relações do WordNet, enquanto a segunda analisa o ancestral comum mais próximo entre duas palavras.

In [20]:
def wordnet_similarity(wordA,wordB):
    
    A = set(wn.synsets(wordA))
    B = set(wn.synsets(wordB))
    
    wupSimilarities = [0]
    pathSimilarities = [0]
    for (sense1,sense2) in itertools.product(A,B):        
        wupSimilarities.append(wn.wup_similarity(sense1,sense2) or 0)
        pathSimilarities.append(wn.path_similarity(sense1,sense2) or 0)
    return max(max(wupSimilarities),max(pathSimilarities))

def jaccard_similarity(wordA,wordB):
    
    A = set(wordA)
    B = set(wordB)
    
    return len(A & B ) / len(A | B)

In [21]:
def getSchemaGraph():
    #Output: A Schema Graph G  with the structure below:
    # G['node'] = edges
    # G['table'] = { 'foreign_table' : (direction, column, foreign_column) }
    
    
    G = {} 
    cur.execute("SELECT tablename FROM pg_tables WHERE schemaname!='pg_catalog' AND schemaname !='information_schema';")
    for table in cur.fetchall():
        G.setdefault(table[0],{})
    
    sql = "SELECT DISTINCT                 tc.table_name, kcu.column_name,                 ccu.table_name AS foreign_table_name, ccu.column_name AS foreign_column_name             FROM information_schema.table_constraints AS tc              JOIN information_schema.key_column_usage AS kcu                 ON tc.constraint_name = kcu.constraint_name             JOIN information_schema.constraint_column_usage AS ccu                 ON ccu.constraint_name = tc.constraint_name             WHERE constraint_type = 'FOREIGN KEY'"
    cur.execute(sql)
    relations = cur.fetchall()
    
    for (table,column,foreign_table,foreign_column) in relations:
        G[table][foreign_table] = (1,column, foreign_column)
        G[foreign_table][table] = (-1,foreign_column,column)
    print ('SCHEMA CREATED')
    return G

In [22]:
# G = getSchemaGraph()
# G

In [23]:
def createEmbeddingsHash(model,attributeHash,weight=0.5):
    
    wordEmbeddingsHashA = {}
    
    for table in attributeHash:
        
        if table not in model:
            continue
        
        wordEmbeddingsHashA[table]={word.lower() for word,sim in model.most_similar(table)}
        #wordEmbeddingsHashA[table]={wnl.lemmatize(word).lower() for word,sim in model.most_similar(table)}
            
        for column in attributeHash[table]:
            if column not in model or column=='id':
                continue
            wordEmbeddingsHashA[column]={wnl.lemmatize(word).lower() for word,sim in model.most_similar(column)}
            #wordEmbeddingsHashA[column]={wnl.lemmatize(word).lower() for word,sim in model.most_similar(column)}
    
    wordEmbeddingsHashB = copy.deepcopy(wordEmbeddingsHashA)
    
    for table in attributeHash:
        
        if table not in model:
            continue
        
        for column in attributeHash[table]:
            
            if column not in model or column=='id':
                continue
            
            similarSet = { wnl.lemmatize(word).lower() for word,sim in model.most_similar(positive=(table,column))}
            wordEmbeddingsHashB[column].update(similarSet)
            
    G = getSchemaGraph()
    for tableA in G:
        
        if tableA not in model:
            continue
        
        for tableB in G[tableA]:
            
            if tableB not in model:
                continue
            
            similarSet = { wnl.lemmatize(word).lower() for word,sim in model.most_similar(positive=(tableA,tableB))}
            wordEmbeddingsHashB[tableA].update(similarSet)
            wordEmbeddingsHashB[tableB].update(similarSet)
            
            
            
    wordEmbeddingsHashC = copy.deepcopy(wordEmbeddingsHashA)
    
    for table in attributeHash:
        
        if table not in model:
            continue
        
        for column in attributeHash[table]:
            
            if column not in model or column=='id':
                continue
            
            avg_vec = (model[table]*weight + model[column]*(1-weight))   
            similarSet = { wnl.lemmatize(word).lower() 
                          for word,sim in model.similar_by_vector(avg_vec)}
            wordEmbeddingsHashC[column].update(similarSet)
            
    G = getSchemaGraph()
    for tableA in G:
        
        if tableA not in model:
            continue
        
        for tableB in G[tableA]:
            
            if tableB not in model:
                continue
            
            avg_vec = (model[table]*weight + model[column]*(1-weight))
            similarSet = { wnl.lemmatize(word).lower() 
                          for word,sim in model.similar_by_vector(avg_vec)}
            wordEmbeddingsHashC[tableA].update(similarSet)            
    
    return wordEmbeddingsHashA,wordEmbeddingsHashB,wordEmbeddingsHashC

In [24]:
def embedding10_similarity(schema,word,wordEmbeddingsHash):
    if schema not in wordEmbeddingsHash:
        return 0
    
    #lemmatize is used to remove plural form   wnl.lemmatize('wolves')='wolf'
    if wnl.lemmatize(word) in wordEmbeddingsHash[schema]:
        return 1
    else:
        return 0        

In [25]:
def embedding_similarity(wordA,wordB,model):
    if wordA not in model or wordB not in model:
        return 0
    return model.similarity(wordA,wordB)

#### Algoritmo para Criação dos Schema-Sets

In [26]:
def word_similarity(schema_term,word,
                    wn_sim=True, jaccard_sim=True,
                    emb_sim=False,  emb_model=None,
                    emb10_sim=False, emb10_hash=None):
    
    
    sim_list=[0]
    
    if wn_sim:
        sim_list.append( wordnet_similarity(schema_term,word) )

    if jaccard_sim:
        sim_list.append( jaccard_similarity(schema_term,word) )

    if emb_sim and emb_model is not None:
        sim_list.append( embedding_similarity(schema_term,word,emb_model) )

    sim = max(sim_list) 

    if emb10_sim and emb10_hash is not None:
        if embedding10_similarity(schema_term,word,emb10_hash) == 0:
            sim=0
        else:
            if len(sim_list)==1:
                sim=1

    return sim

In [27]:
def SchSFind(Q,threshold=0.8, 
             sim_args={}):    
    S = []
    for keyword in Q:
        for (table,values) in attributeHash.items():
            
            sim = word_similarity(table,keyword,**sim_args)
            
            if sim >= threshold:
                S.append( (table,'*',{keyword},sim) )
            
            for attribute in values.keys():
                
                if(attribute=='id'):
                    continue
                
                sim = word_similarity(attribute,keyword,**sim_args)
                
                if sim >= threshold:
                    S.append( (table,attribute,{keyword},sim) )
    #S = SchSInter(S)

    #print ('SCHEMA SETS CREATED')
    valueWords = frozenset()
    Sq = {(table,attribute,frozenset(schemaWords),valueWords) for (table,attribute,schemaWords,sim) in S}
        
    return Sq

In [28]:
# wordEmbeddingsModel=loadWordEmbeddingsModel()
# (wordEmbeddingsHashA,wordEmbeddingsHashB,wordEmbeddingsHashC) = createEmbeddingsHash(wordEmbeddingsModel,attributeHash,weight=0.5)

In [29]:
# Q = ['actor', 'james', 'bond']
# SimilarityCoeficient = 0.799999999999
# Sq = SchSFind(Q,SimilarityCoeficient,{'emb10_sim':True,'emb10_hash':wordEmbeddingsHashB})
# Sq

### Criação de Query Matches

As etapas anteriores, de criação de schema-sets e tuple-sets, foram responsáveis por identificar quais relações possuem alguma informação sobre as palavras-chave. Nesta etapa de criação de full matches, o objetivo é combinar esses tuple-sets e schema-sets para se obter uma resposta completa, mínima e relevante para o usuário. 

O algoritmo `QMGen` é responsável por encontrar combinações de tuple-sets/schema-sets que compõem uma cobertura mínima (`MinimalCover`) sobre o queryset.
- **Total**: Cada palavra-chave deve estar presente em ao menos uma das tuplas da query-match.
- **Mínima**: Não é possível remover nenhum tuple-set/schema-set da query-match e manter a cobertura total sobre o queryset.

In [30]:
def MinimalCover(MC, Q):
    #Input:  A subset MC (Match Candidate) to be checked as total and minimal cover
    #Output: If the match candidate is a TOTAL and MINIMAL cover

    Subset = [schemaWords|valueWords for table,attribute,schemaWords,valueWords in MC]
    u = set().union(*Subset)    
    
    isTotal = (u == set(Q))
    for element in Subset:
        
        new_u = list(Subset)
        new_u.remove(element)
        
        new_u = set().union(*new_u)
        
        if new_u == set(Q):
            return False
    
    return isTotal

In [31]:
def QMGen(Q,Rq):
    #Input:  A keyword query Q, The set of non-empty non-free tuple-sets Rq
    #Output: The set Mq of query matches for Q
    
    '''
    Query match is a set of tuple-sets that, if properly joined,
    can produce networks of tuples that fulfill the query. They
    can be thought as the leaves of a Candidate Network.
    
    '''
    
    Mq = []
    for i in range(1,len(Q)+1):
        for subset in itertools.combinations(Rq,i):            
            if(MinimalCover(subset,Q)):
                #print('----------------------------------------------\nM')
                #pp(set(subset))
                #print('\n')
                M = MInter(set(subset))
                #pp(M)
                Mq.append(M)
                
                
    return Mq

def MInter(M):
    #print('M',M)
    Mprev = copy.deepcopy(M)
    Mcurr = set()

    combinations = [x for x in itertools.combinations(Mprev,2)]

    
    for ( (tableA,attributeA,schemaWordsA,valueWordsA) , (tableB,attributeB,schemaWordsB,valueWordsB) ) in combinations:
          
        #se  forem tabelas diferentes ou não tiverem value words mapeadas em ambos os tuplesets
        if (tableA!=tableB) or (len(valueWordsA)>0 and len(valueWordsB)>0):
            continue             
        
        tableC=tableA
        
        if len(valueWordsA)>0:
            attributeC=attributeA
        else:
            attributeC=attributeB
        
        schemaWordsC = schemaWordsA|schemaWordsB
        valueWordsC  = valueWordsA | valueWordsB #levando em consideração que um deles é vazio
        
        Mcurr.add( (tableC,attributeC,frozenset(schemaWordsC),frozenset(valueWordsC)) )
        
        Mprev = Mprev - {(tableA,attributeA,schemaWordsA,valueWordsA)}
        Mprev = Mprev - {(tableB,attributeB,schemaWordsB,valueWordsB)}
            
    if len(Mcurr)>0:
        Mcurr = copy.deepcopy(MInter(Mcurr))
        
    Mprev.update(Mcurr)     
    return Mprev   

In [32]:
# Q = ['actor', 'james', 'bond']

# Rq = TSFind(Q)

# SimilarityCoeficient = 0.799999999999
# Sq = SchSFind(Q,SimilarityCoeficient,{'emb10_sim':True,'emb10_hash':wordEmbeddingsHashB})

# Mq= QMGen(Q,Rq|Sq)

# for element in Mq:
#     pp(element)
#     print()

In [33]:
def QMRank(Mq,mi,smi,sim_args={}):
    Ranking = []
    for M in Mq:
        cosprod = schemaprod = 1
        thereIsValueTerms = thereIsSchemaTerms = False
        
        for (table,attribute,schemaWords,valueWords) in M:           
            
            if (len(valueWords)>0):
                
                thereIsValueTerms=True
                
                (norm_attribute,distinct_terms) = attributeHash[table][attribute]

                wsum = 0

                for term in valueWords:

                    IAF = wordHash[term][0]

                    ctids = wordHash[term][1][table][attribute]
                    fkj = len(ctids)

                    if fkj>0:

                        TF = log1p(fkj) / log1p(distinct_terms)

                        wsum = wsum + TF*IAF
                
                cos = wsum/norm_attribute
                cosprod *= cos
                
            if (len(schemaWords)>0):
                
                thereIsSchemaTerms=True
                
                if(attribute == '*'):
                    schemaElement = table
                else:
                    schemaElement = attribute
                
                schemasum = 0
                
                for term in schemaWords:
                    schemasum+=word_similarity(schemaElement,term,sim_args)
                
                schemaprod *= schemasum
                
        valuescore = schemascore = 0
        
        # O tamanho da query match não está sendo considerado no ranking, mas será analisado no ranking de Cns.
        #score = 1/len(M)
        score = 1.0
        
        if thereIsValueTerms:
            valuescore = mi * cosprod 
            score*=valuescore
        
        if thereIsSchemaTerms:
           
            schemascore = smi * schemaprod
            score*=schemascore
            
        Ranking.append( (M,score,schemascore,valuescore) )
    return sorted(Ranking,key=lambda x: x[1],reverse=True)

In [34]:
# mi = 46457610.86662768
# smi = 1

# RankedMq = QMRank(Mq,mi,smi)


# for (j, (M,score,schemascore,valuescore) ) in enumerate(RankedMq):
#     if j>10:
#         break
#     print(j+1,'ª QM')
#     print('Schema Score:',"%.8f" % schemascore,
#           '\nValue Score: ',"%.8f" % valuescore,
#           '\n|M|: ',"%02d (Não considerado para calcular o total score)" % len(M),
#           '\nTotal Score: ',"%.8f" % score)
#     pp(M)
#     print('----------------------------------------------------------------------\n')

### Criação e Ranking de Candidate Networks

Na etapa anterior, obteve-se as full matches, que compreendem todas as informações necessárias para o usuário. O próximo passo é encontrar maneiras de conectar estas informações para formar uma resposta para o usuário. Estas conexões, chamadas de candidate networks, são derivadas das restrições de integridade referencial do banco de dados, também conhecidas como chaves estrangeiras.

A criação de candidate networks utiliza dois grafos:
- **Schema Graph**: vértice que representa o banco de dados e é utilizado como base para o match graph. Ele contém como vértices os free tuple-sets associados a cada relação do banco de dados e como arestas as restrições de integridade referencial.

    O Schema Graph foi implementado como um dicionário, no qual cada vértice aponta para um outro vértice. Além disso, também é armazenada informações sobre as arestas, como direção e quais atributos entre as tabelas tem a relação de restrição referencial. A estrutura do Schema Graph pode ser observada a seguir:
   
```python
    G['table'] = { 'foreign_table' : (direction, column, foreign_column) }
```

Como existem diferentes maneiras de se conectar as informações associadas as palavras-chave, várias candidate networks serão geradas. Entretanto, na maioria das vezes, apenas uma delas contém uma resposta relevante para o usuário. Por este motivo, esta esta etapa irá ranquear as candidate networks por relevância.

In [35]:
def getSchemaGraph():
    #Output: A Schema Graph G  with the structure below:
    # G['node'] = edges
    # G['table'] = { 'foreign_table' : (direction, column, foreign_column) }
    
    
    G = {} 
    cur.execute("SELECT tablename FROM pg_tables WHERE schemaname!='pg_catalog' AND schemaname !='information_schema';")
    for table in cur.fetchall():
        G.setdefault(table[0],{})
    
    sql = '''
        SELECT DISTINCT
            tc.table_name, kcu.column_name,
            ccu.table_name AS foreign_table_name, ccu.column_name AS foreign_column_name             
        FROM
            information_schema.table_constraints AS tc
            JOIN information_schema.key_column_usage AS kcu 
                ON tc.constraint_name = kcu.constraint_name
            JOIN information_schema.constraint_column_usage AS ccu 
                ON ccu.constraint_name = tc.constraint_name
        WHERE constraint_type = 'FOREIGN KEY'
    '''
    cur.execute(sql)
    relations = cur.fetchall()
    
    for (table,column,foreign_table,foreign_column) in relations:
        G[table][foreign_table] = (1,column, foreign_column)
        G[foreign_table][table] = (-1,foreign_column,column)
    return G

In [36]:
# G = getSchemaGraph()
# G


- **Match Graph**: grafo gerado a partir de uma query match e o schema graph. No entanto, no match graph tuple-sets/schema-sets também são modelados como vértices. Para criá-lo, adiciona-se ao schema graph os tuple-sets/schema-sets presentes na query match. Um tuple-set de uma tabela x terá os mesmos relacionamentos (arestas) que o vértice x.

```python
    Gts['table'] = { 'foreign_table' : (direction, column, foreign_column) }

    Gts[('s','table','column', frozenset({words}))] = { 'foreign_table' : (direction, column, foreign_column) }
```

In [37]:
def MatchGraph(Rq, G, M):
    #Input:  The set of non-empty non-free tuple-sets Rq,
    #        The Schema Graph G,
    #        A Query Match M
    #Output: A Schema Graph Gts  with the structure below:
    # G['node'] = edges
    # G['table'] = { 'foreign_table' : (direction, column, foreign_column) }

    '''
    A Match Subgraph Gts[M] is a subgraph of G that contains:
        The set of free tuple-sets of G
        The query match M
    '''
    
    Gts = copy.deepcopy(G)
    
    #Insert non-free nodes
    for (table ,attribute, schemaWords, valueWords) in M:
        Gts[(table ,attribute, schemaWords, valueWords)]=copy.deepcopy(Gts[table])
        for foreign_table , (direction,column,foreign_column) in Gts[(table ,attribute, schemaWords, valueWords)].items():
            Gts[foreign_table][(table ,attribute, schemaWords, valueWords)] = (direction*(-1),foreign_column,column)
    return Gts 

In [38]:
# M = RankedMq[0][0]
# Gts = MatchGraph(Rq|Sq, G, M)

# print('QM:')
# pp(M)
# print('\nGts:')
# pp(Gts)

#### Algoritmo para Criação e Ranking de Candidate Networks

Para criar uma candidate network, o algoritmo `SingleCN` procura um caminho mínimo no match graph que visite todas os non-free tuple-sets/schema-sets da query match. 

Este caminho deve ser:
- **Mínimo:** garantido através do algoritmo de caminho mínima baseado em busca por largura (BFS).
- **Total:** a função `containsMatch` garante que todos os tuple-sets/schema-sets da query match sejam visitados.
- **Seguro (*Sound*):** uma joining networks of tuple-sets é considerado sound se ela não contém uma subárvore na forma $R^K - S^L - R^M $, na qual $R$ e $S$ são relações e o schema graph tem uma aresta $R \rightarrow S$.

O ranking das Candidate Networks agora é feito parcialmente na etapa de ranking de Query Matches. Restando apenas penalizar Candidate Networks grandes, dividindo o score pelo seu tamanho.

In [39]:
def containsMatch(Ji,M):
    for relation in M:
        if relation not in Ji:
            return False
    return True

def isJNTSound(Gts,Ji):
    if len(Ji)<3:
        return True
    
    for i in range(len(Ji)-2):
        
        if type(Ji[i]) is str:
            tableA = Ji[i]
        else:
            (tableA,attributeA,schemaWordsA,valueWordsA) = Ji[i]
            
        if type(Ji[i+2]) is str:
            tableB = Ji[i+2]
        else:
            (tableB,attributeB,schemaWordsB,valueWordsB) = Ji[i+2]         
            
        if tableA==tableB:
            edge_info = Gts[Ji[i]][Ji[i+1]]
            if(edge_info[0] == -1):
                return False
    return True

In [40]:
def SingleCN(FM,Gts,TMax=5,showLog=False):  
  
    if showLog:
        print('================================================================================\nSINGLE CN')
        print('Tmax ',Tmax)
        print('FM')
        pp(FM)

        print('\n\nGts')
        pp(Gts)
        print('\n\n')
    
    F = deque()

    first_element = list(FM)[0]
    J = [first_element]
    
    if len(FM)==1:
        return J
    
    F.append(J)
    
    while F:
        J = F.popleft()           
        u = J[-1]
        
        sortedAdjacents = sorted(Gts[u].items(),key=lambda x : type(x[0]) is str)
        
        if showLog:
            print('--------------------------------------------\nParctial CN')
            print('J ',J,'\n')

            print('\nAdjacents:')
            pp(Gts[u].items())
            
            print('\nSorted Adjacents:')
            pp(sortedAdjacents)
            
            print('F:')
            pp(F)
    
        for (adjacent,edge_info) in sortedAdjacents:
            if showLog:
                pp(adjacent)
                print('is str',(type(adjacent) is str),'notinJ',(adjacent not in J))
            if (type(adjacent) is str) or (adjacent not in J):
                Ji = J + [adjacent]
                
                
                if (Ji not in F) and (len(Ji)<TMax) and (isJNTSound(Gts,Ji)):
                    
                    if showLog:
                        print('isSound:')
                    
                    if(containsMatch(Ji,FM)):
                        
                        if showLog:
                            print('--------------------------------------------\nGenerated CN')
                            print('J ',Ji,'\n')
                        
                        return Ji
                    else:
                        F.append(Ji)

In [41]:
# SingleCN(M,Gts,10)

In [42]:
def MatchCN(G,Sq,Rq,RankedMq,TMax=5):    
    Cns = []                        
    for  (M,score,schemascore,valuescore) in RankedMq:
        Gts = MatchGraph(Rq|Sq, G, M)
        Cn = SingleCN(M,Gts,TMax=TMax)
        if(Cn is not None):
            
            
            #Dividindo score pelo tamanho da cn (SEGUNDA PARTE DO RANKING)
            
            CnScore = score/len(Cn)
            
            Cns.append( (Cn,Gts,M,CnScore,schemascore,valuescore) )
    
    #Ordena CNs pelo CnScore
    RankedCns=sorted(Cns,key=lambda x: x[3],reverse=True)
    
    return RankedCns

In [43]:
# RankedCns = MatchCN(G,Sq,Rq,RankedMq)
# for (j, (Cn,Gts,M,score,schemascore,valuescore) ) in enumerate(RankedCns):
#     if j>10:
#         break
#     print(j+1,'ª CN')
#     print('Schema Score:',"%.8f" % schemascore,
#           '\nValue Score: ',"%.8f" % valuescore,
#           '\n|Cn|: ',"%02d (Considerado para o Total Score)" % len(Cn),
#           '\nTotal Score: ',"%.8f" % score)
#     pp(Cn)
#     print('----------------------------------------------------------------------\n')

In [44]:
def getSQLfromCN(Gts,Cn,contract=True):
    selected_attributes = [] 
    hashTables = {}
    conditions=[]
    relationships = set()
    
    tables_id=[]
    tables=[]
    joincondiditions=[]
    
    for i in range(len(Cn)):
        
        if(type(Cn[i]) is str):
            tableA = Cn[i]
            attrA=''
            valueWords=[]
        else:
            (tableA,attrA, _ ,valueWords) = Cn[i]             
                
        A = 't' + str(i)
        
        if contract and type(Cn[i]) is str:
            A = hashTables.setdefault(tableA, [A])[0]
        else:
            hashTables.setdefault(tableA, []).append(A)            
        
        if(attrA != ''):
            selected_attributes.append(A +'.'+ attrA)
                    
        #tratamento de keywords
        for term in valueWords:
            condition = 'CAST('+A +'.'+ attrA + ' AS VARCHAR) ILIKE \'%' + term + '%\''
            conditions.append(condition)
        
        
        #tratamento de join paths
        if (i>0):
            # B se refere ao tupleset anterior
            if(type(Cn[i-1]) is str):
                tableB = Cn[i-1]
            else:
                (tableB,attrB, _ , _ )=Cn[i-1]
            
            # B vai receber o último valor de tx adicionado em hashTables[tableB]
            B = hashTables[tableB][-1]
            
            edge_info = Gts[Cn[i]][Cn[i-1]]
            (direction,joining_attrA,joining_attrB) = edge_info
            
            joincondiditions.append(A + '.' + joining_attrA + ' = ' + B + '.' + joining_attrB)
            
            relationships.add( frozenset([B,A]) ) 
    
    for tableX in hashTables.keys():
        for tx in hashTables[tableX]:
            tables_id.append(tx+'.__search_id')
            tables.append(tableX+' '+tx)
            
        
    relationshipsText = ['('+a+'.__search_id'+','+b+'.__search_id'+')' for (a,b) in relationships]
    
    sqlText = 'SELECT \n'
    sqlText +=' ('+', '.join(tables_id)+') AS Tuples,\n '
    if len(relationships)>0:
        sqlText +='('+', '.join(relationshipsText)+') AS Relationships,\n '
        
    sqlText += ' ,\n '.join(selected_attributes)
    
    sqlText +='\nFROM\n ' + ',\n '.join(tables)
    
    sqlText +='\nWHERE\n '
    
    # Considerando que todas as pequisas tem ao menos um value term
    if  len(conditions)==0:
        sqlText+= ' 1=2'
        return sqlText
    
    sqlText +='\n AND '.join(joincondiditions)
    sqlText +='\n'
    if len(joincondiditions)>0:
        sqlText +='\n AND '
    sqlText +='\n AND '.join(conditions)
    
    
    #Considerando que nenhuma consulta tem mais de 1000 linhas no resultado
    sqlText += '\n LIMIT 1000'
    '''
    print('SELECT:\n',selected_attributes)
    print('TABLES:\n',hashTables)
    print('CONDITIONS:')
    pp(conditions)
    print('RELATIONSHIPS:')
    pp(relationships)
    '''    
    #print('SQL:\n',sql)
    return sqlText

In [45]:
# cn=[('person', '*', frozenset({'actor'}), frozenset()), 'casting', ('character', 'name', frozenset(), frozenset({'draco'})), 'casting', ('movie', 'title', frozenset(), frozenset({'potter', 'harry'}))]
# gts={'movie_info': {'movie': (1, 'movie_id', 'id'), ('movie', 'title', frozenset(), frozenset({'potter', 'harry'})): (1, 'movie_id', 'id')}, 'person': {'casting': (-1, 'id', 'person_id')}, 'movie': {'casting': (-1, 'id', 'movie_id'), 'movie_info': (-1, 'id', 'movie_id')}, 'role': {'casting': (-1, 'id', 'role_id')}, 'character': {'casting': (-1, 'id', 'person_role_id')}, 'casting': {'movie': (1, 'movie_id', 'id'), 'person': (1, 'person_id', 'id'), 'character': (1, 'person_role_id', 'id'), 'role': (1, 'role_id', 'id'), ('person', '*', frozenset({'actor'}), frozenset()): (1, 'person_id', 'id'), ('character', 'name', frozenset(), frozenset({'draco'})): (1, 'person_role_id', 'id'), ('movie', 'title', frozenset(), frozenset({'potter', 'harry'})): (1, 'movie_id', 'id')}, ('person', '*', frozenset({'actor'}), frozenset()): {'casting': (-1, 'id', 'person_id')}, ('character', 'name', frozenset(), frozenset({'draco'})): {'casting': (-1, 'id', 'person_role_id')}, ('movie', 'title', frozenset(), frozenset({'potter', 'harry'})): {'casting': (-1, 'id', 'movie_id'), 'movie_info': (-1, 'id', 'movie_id')}} 

# print(getSQLfromCN(gts,cn,contract=True))

In [46]:
# for (j, (Cn,Gts,M,score,schemascore,valuescore) ) in enumerate(RankedCns):
#     pp(Cn)
#     print('\n',getSQLfromCN(Gts,Cn))
#     print('\n--------------------------------------------')

In [47]:
[i for i in range(1,5)]

[1, 2, 3, 4]

In [48]:
def getGoldenStandards(goldenStandardsFileName='golden_standards/imdb_martins',numQueries=11):
    goldenStandards = {}
    for i in range(1,numQueries+1):
        filename = goldenStandardsFileName+'/'+str(i).zfill(3) +'.txt'
        with open(filename) as f:

            listOfTuples = []
            Q = ()
            for j, line in enumerate(f.readlines()):
                
                splitedLine = line.split('#')
                
                line_without_comment=splitedLine[0]
                
                if len(splitedLine)>1:
                    comment_of_line=splitedLine[1]
                
                    if(j==2):
                        query = comment_of_line
                        Q = tuple([word.strip(string.punctuation) for word in query.lower().split() if word not in stw_set])
                    
                if line_without_comment:                    
                    
                    relevantResult = eval(line_without_comment)
                    listOfTuples.append( relevantResult )
            
            goldenStandards[Q]=listOfTuples
            
    return goldenStandards


In [49]:
goldenStandards = getGoldenStandards()
goldenStandards.keys()

dict_keys([('actor', 'draco', 'harry', 'potter'), ('johnny', 'depp', 'movies'), ('movie', 'steven', 'spielberg'), ('james', 'bond', 'actor', 'name'), ('protagonist', 'sound', 'music'), ('character', 'forrest', 'gump'), ('movie', 'social', 'network'), ('king', 'kong', 'actor', 'jack', 'black'), ('actor', 'fellowship', 'ring', 'return', 'king'), ('cast', 'star', 'wars'), ('actors', 'x-men')])

In [50]:
def getGoldenMappings(goldenMappingsFileName='golden_mappings/golden_mappings_imdb_martins.txt'):
    
    goldenMappings = []
    with open(goldenMappingsFileName) as f:
        for j, line in enumerate(f.readlines()):

            splitedLine = line.split('#')

            line_without_comment=splitedLine[0]

            if len(splitedLine)>1:
                comment_of_line=splitedLine[1]

            if line_without_comment:                    
                tupleset = eval(line_without_comment)
                goldenMappings.append(tupleset)

    return goldenMappings

    
    
goldenMappingsFileName='golden_mappings/golden_mappings_imdb_martins.txt'
getGoldenMappings()

[('person', '*', frozenset({'actor'}), frozenset()),
 ('movie', '*', frozenset({'movies'}), frozenset()),
 ('movie', '*', frozenset({'movie'}), frozenset()),
 ('character', 'name', frozenset({'name'}), frozenset()),
 ('person', '*', frozenset({'actor'}), frozenset()),
 ('person', 'name', frozenset({'name'}), frozenset()),
 ('character', '*', frozenset({'protagonist'}), frozenset()),
 ('character', '*', frozenset({'character'}), frozenset()),
 ('movie', '*', frozenset({'movie'}), frozenset()),
 ('person', '*', frozenset({'actor'}), frozenset()),
 ('person', '*', frozenset({'actor'}), frozenset()),
 ('casting', '*', frozenset({'cast'}), frozenset()),
 ('person', '*', frozenset({'actors'}), frozenset())]

In [51]:
def evaluateCN(CnResult,goldenStandard):
    '''
    print('Verificar se são iguais:\n')
    print('Result: \n',CnResult)
    print('Golden Result: \n',goldenStandard)
    '''
    
    tuplesOfCNResult =  set(CnResult[0])
    
    tuplesOfStandard =  set(goldenStandard[0])
        
    #Check if the CN result have all tuples in golden standard
    if tuplesOfCNResult.issuperset(tuplesOfStandard) == False:
        return False
    
    
    relationshipsOfCNResult = CnResult[1]
    
    relationshipsOfStandard = goldenStandard[1]
    
    if len(relationshipsOfCNResult)!=len(relationshipsOfStandard):
        #print('TAM OF JOIN PATHS DIFFERENT')
        
        #print('relationshipsOfCNResult')
        #pp(relationshipsOfCNResult)
        
        #print('\relationshipsOfStandard')
        #pp(relationshipsOfStandard)
        
        return False
    
    for goldenRelationship in relationshipsOfStandard:
        
        (A,B) = goldenRelationship
        
        if (A,B) not in relationshipsOfCNResult and (B,A) not in relationshipsOfCNResult:
            return False
        
    return True


def evaluanteResult(Result,Query):
    
    goldenStandard = goldenStandards[tuple(Query)]
    
    #print('RESULT')
    #pp(Result)
    
    #print('STANDARD')
    #pp(goldenStandard)
    
    for goldenRow in goldenStandard:

        found = False

        for row in Result:
            if evaluateCN(row,goldenRow):
                found = True

        if not found:
            return False
        
    return True
            

def normalizeResult(ResultFromDatabase,Description):
    normalizedResult = []
    
    if Description[1].name=='relationships':
        hasRelationships = True
    else:
        hasRelationships = False
    
    for row in ResultFromDatabase:       
        if type(row[0]) == int:
            tuples = [row[0]]
        else:
            tuples = eval(str(row[0]))
        
        if hasRelationships:
            relationships = eval(row[1])
            #print('RELATIONSHIPS')
            #pp(relationships)
            if type(relationships[0]) != int:
                relationships = [eval(element) for element in relationships]
            else:
                relationships = [relationships]
        else:
            relationships=[]
        
        normalizedResult.append( (tuples,relationships) )
    return normalizedResult

In [52]:
def getRelevantPosition(RankedCns,Q):
    
    position=0
    nonEmptyPosition=0
    
    print(Q,'\n')
    
    for (Cn,Gts,M,score,schemascore,valuescore) in RankedCns:
        
        print('*',end='')
        
        #print('CN:\n')
        #pp(Cn)
        SQL1 = getSQLfromCN(Gts,Cn,contract=True)
        SQL2 = getSQLfromCN(Gts,Cn,contract=False)
        #print('\nSQL1\n')
        #print(SQL1)
        #print('\nSQL2\n')
        #print(SQL2)
        
        def getRelevanceFromSQL(SQL):
            #print('RELAVANCE OF SQL:\n')
            #print(SQL)
            
            cur.execute(SQL)
            
            Results = cur.fetchall()
            Description = cur.description
            
            isEmpty = (len(Results)==0)

            NResults = normalizeResult(Results, Description)

            Relevance = evaluanteResult(NResults,Q)

            return (Relevance, isEmpty)
        
        (Relevance, isEmpty)=getRelevanceFromSQL(SQL1)
        if Relevance==False:
            (Relevance, isEmpty)=getRelevanceFromSQL(SQL2)
    
        position+=1
        if not isEmpty:
            nonEmptyPosition+=1
        
        if Relevance:
            print()
            return (position,nonEmptyPosition)
    print()
    return (-1,-1)

### Mais abaixo tem a execução para outras CNS (querysets)

# Execução

In [53]:
def preProcessing(emb_model="word_embeddings/word2vec/GoogleNews-vectors-negative300.bin"):
    global wordHash
    global attributeHash
    global wordEmbeddingsModel
    global wordEmbeddingsHashA
    global wordEmbeddingsHashB
    global wordEmbeddingsHashC
    
    wordEmbeddingsModel=loadWordEmbeddingsModel(emb_model)
    
    (wordHash,attributeHash) = createInvertedIndex(wordEmbeddingsModel)
    processIAF(wordHash,attributeHash)
    processNormsOfAttributes(wordHash,attributeHash,wordEmbeddingsModel)
    
    (wordEmbeddingsHashA,wordEmbeddingsHashB,wordEmbeddingsHashC) = createEmbeddingsHash(wordEmbeddingsModel,attributeHash,weight=0.5)
    
    print('PRE-PROCESSING STAGE FINISHED')

In [54]:
def main(mi,smi,sim_args={},
         showLog=False,
         SimilarityThreshold=0.9,
         querySetFileName='querysets/queryset_imdb_martins.txt',
         goldenStandardsFileName='golden_standards/imdb_martins', numQueries=11,
         goldenMappingsFileName='golden_mappings/golden_mappings_imdb_martins.txt',
         topK=10,
         TMax=10):
    QuerySets = getQuerySets(querySetFileName)
    
    global goldenStandards
    goldenStandards = getGoldenStandards(goldenStandardsFileName=goldenStandardsFileName,numQueries=numQueries)
    
    global goldenMappings
    goldenMappings = getGoldenMappings(goldenMappingsFileName=goldenMappingsFileName)
    
    TP=[]
    FP=[]
    FN=[]
    
    listSkippedCN=[]
    
    relevantPositions = []
    nonEmptyRelevantPositions = []
    
    for (i,Q) in enumerate(QuerySets):
       
        print('QUERY-SET ',Q,'\n')
        
        print('FINDING TUPLE-SETS')
        Rq = TSFind(Q)
        print(len(Rq),'TUPLE-SETS CREATED\n')
        
        print('FINDING SCHEMA-SETS')
        Sq = SchSFind(Q,SimilarityThreshold,sim_args)

        print(len(Sq),' SCHEMA-SETS CREATED\n')

        for schema_mapping in Sq:

            if schema_mapping in goldenMappings:
                TP.append(schema_mapping)
                goldenMappings.remove(schema_mapping)
            else:
                FP.append(schema_mapping)       
        
        print('GENERATING QUERY MATCHES')
        Mq = QMGen(Q,Sq|Rq)
        print (len(Mq),'QUERY MATCHES CREATED\n')
        
        print('RANKING QUERY MATCHES')
        RankedMq = QMRank(Mq,mi,smi)   
        
        if showLog:
            for (j, (M,score,schemascore,valuescore) ) in enumerate(RankedMq[:topK]):
                print(j+1,'ª QM')
                print('Schema Score:',"%.8f" % schemascore,
                      '\nValue Score: ',"%.8f" % valuescore,
                      '\n|M|: ',"%02d (Não considerado para calcular o total score)" % len(M),
                      '\nTotal Score: ',"%.8f" % score)
                pp(M)
                print('----------------------------------------------------------------------\n')
        
        
        if topK<=0:
            topKMq=RankedMq
        else:
            topKMq=RankedMq[:topK]
        
        numSkippedCNs = len(RankedMq)-topK
        
        
        RankedMq=[]
        gc.collect()
        
        if numSkippedCNs>0:
            print(numSkippedCNs,' QUERY MATCHES SKIPPED (due to low score)')
        else:
            numSkippedCNs=0
            
        
        print('GENERATING CANDIDATE NETWORKS')
        G = getSchemaGraph()        
        
        RankedCns = MatchCN(G,Sq,Rq,topKMq,TMax=TMax)
        
        listSkippedCN.append(numSkippedCNs)
        
        print (len(RankedCns),'CANDIDATE NETWORKS CREATED AND RANKED\n')
        
        if showLog:
            for (j, (Cn,Gts,M,score,schemascore,valuescore) ) in enumerate(RankedCns):
                print(j+1,'ª CN')
                print('Schema Score:',"%.8f" % schemascore,
                      '\nValue Score: ',"%.8f" % valuescore,
                      '\n|Cn|: ',"%02d (Considerado para o Total Score)" % len(Cn),
                      '\nTotal Score: ',"%.8f" % score)
                pp(Cn)
                print()
                print(getSQLfromCN(Gts,Cn))
                print('\nsem arvore')
                print(getSQLfromCN(Gts,Cn,contract=False))
                print('----------------------------------------------------------------------\n')
        
        print('CHECKING RELEVANCE')
        
        (pos,nonEmptyPos)=getRelevantPosition(RankedCns,Q)
        
        if pos<0:
            print('NO RELEVANT CN FOUND')
        else:
            (Cn,_,_,_,_,_) = RankedCns[pos-1]
            print('RELEVANT CN IN %d POSITION'%(pos))
            pp(Cn)
                        
        relevantPositions.append(pos)
        nonEmptyRelevantPositions.append(nonEmptyPos)
        
        print('==========================================================================\
==========================================================================\
==========================================================================\
==========================================================================\
==========================================================================\
==========================================================================')
    FN=goldenMappings
    return (relevantPositions,nonEmptyRelevantPositions,listSkippedCN,TP,FP,FN)

In [55]:
#pp(wordHash['denzel'])

In [56]:
#pp(attributeHash)

In [57]:
#pp(wordEmbeddingsHashC)

In [58]:
#pp(wordEmbeddingsHashB)

In [59]:
mi = 0.90/1.9372498568291752e-06
mi

464576.1086662768

In [60]:
preProcessing()

INDEXING TABLE  casting
	COLUMN  id  SKIPPED
	COLUMN  person_id  SKIPPED
	COLUMN  movie_id  SKIPPED
	COLUMN  person_role_id  SKIPPED
	COLUMN  nr_order  SKIPPED
	COLUMN  role_id  SKIPPED
	COLUMN  __search_id  SKIPPED
INDEXING TABLE  role
	COLUMN  id  SKIPPED
	COLUMN  __search_id  SKIPPED
INDEXING TABLE  person
	COLUMN  id  SKIPPED
	COLUMN  imdb_index  SKIPPED
	COLUMN  imdb_id  SKIPPED
	COLUMN  name_pcode_cf  SKIPPED
	COLUMN  name_pcode_nf  SKIPPED
	COLUMN  surname_pcode  SKIPPED
	COLUMN  __search_id  SKIPPED
INDEXING TABLE  info
	COLUMN  id  SKIPPED
	COLUMN  movie_id  SKIPPED
	COLUMN  info_type_id  SKIPPED
	COLUMN  __search_id  SKIPPED
INDEXING TABLE  movie
	COLUMN  id  SKIPPED
	COLUMN  imdb_index  SKIPPED
	COLUMN  kind_id  SKIPPED
	COLUMN  production_year  SKIPPED
	COLUMN  imdb_id  SKIPPED
	COLUMN  phonetic_code  SKIPPED
	COLUMN  episode_of_id  SKIPPED
	COLUMN  season_nr  SKIPPED
	COLUMN  episode_nr  SKIPPED
	COLUMN  series_years  SKIPPED
	COLUMN  __search_id  SKIPPED
INDEXING TABLE  c

  if np.issubdtype(vec.dtype, np.int):


PRE-PROCESSING STAGE FINISHED


In [61]:
# mi = 0.90/1.9372498568291752e-06
# smi = 1

# sim_args = {
#     'wn_sim':True,
#     'jaccard_sim':True,
#     'emb_sim':False,
#     'emb_model':None,
#     'emb10_sim':True,
#     'emb10_hash':wordEmbeddingsHashB}

# (relevantPositions,nonEmptyRelevantPositions,listSkippedCN,TP,FP,FN) = main(mi,smi,showLog=True,
#      topK=10,sim_args=sim_args,
#      querySetFileName='querysets/queryset_imdb_joint.txt',
#      goldenStandardsFileName='golden_standards/imdb_joint', numQueries=61,
#      goldenMappingsFileName='golden_mappings/golden_mappings_imdb_joint.txt',
#      SimilarityThreshold=0.8,)

# print(relevantPositions)

In [62]:
# print('relevantPosition')
# print(relevantPosition)

# print('\nTP')
# pp(TP)

# print('\nFP')
# pp(FP)

# print('\nFN')
# pp(FN)
# mi = 0.90/1.9372498568291752e-06
# smi = 1

# sim_args = {
#     'wn_sim':True,
#     'jaccard_sim':True,
#     'emb_sim':False,
#     'emb_model':None,
#     'emb10_sim':True,
#     'emb10_hash':wordEmbeddingsHashB}

# (relevantPosition,TP,FP,FN) = main(mi,smi,showLog=False,
#      topK=10,sim_args=sim_args,
#      querySetFileName='querysets/queryset_imdb_joint.txt',
#      goldenStandardsFileName='golden_standards/imdb_joint', numQueries=61,
#      goldenMappingsFileName='golden_mappings/golden_mappings_imdb_joint.txt',)


## Experimentos

## Encontrar threshold ideal para similaridade do WordNet

In [79]:
mi = 0.90/1.9372498568291752e-06
smi = 1

#sim_args={}
#sim_args={'wn_sim':False,'jaccard_sim':False,'emb10_sim':True,'emb10_hash':wordEmbeddingsHashC}
#sim_args={'emb10_sim':True,'emb10_hash':wordEmbeddingsHashC}
sim_args={'wn_sim':False,'jaccard_sim':False,'emb_sim':True,'emb_model':wordEmbeddingsModel}

with  open('plots/EmbSim.csv', 'w') as f:
    print('Threshold,Precision,Recall,F1,TP,FP,FN,MaxSkippedCN,AvgSkippedCN,MRR,Precision1,Precision2,Precision3,MRR-nonempty,Precision1-nonempty,Precision2-nonempty,Precision3-nonempty',file=f)
    for threshold in [x/100 for x in range(65,101)][::-5]:

        print('THRESHOLD: ',threshold)

        (relevantPositions,nonEmptyRelevantPositions,listSkippedCN,TP,FP,FN) = main(mi,smi,showLog=False,
                                           topK=10,sim_args=sim_args,
                                           querySetFileName='querysets/queryset_imdb_joint.txt',
                                           goldenStandardsFileName='golden_standards/imdb_joint', numQueries=61,
                                           goldenMappingsFileName='golden_mappings/golden_mappings_imdb_joint.txt',
                                           SimilarityThreshold=threshold)



        #Mapping Experiment
        tp=len(TP)
        fp=len(FP)
        fn=len(FN)


        def div(a,b,errorResult=0):
            try:
                return a/b
            except ZeroDivisionError:
                return errorResult

        precision = div(tp, tp+fp)
        recall = div(tp, tp+fn )
        f1 = 2 * div( precision*recall , precision+recall )

        print("%.2f,%.2f,%.2f,%.2f,%d,%d,%d" %  (threshold,precision,recall,f1,tp,fp,fn),file=f,end=',')

        #CN Experiment

        def positionMetrics(positionList):

            invertedValuesList = [1/element if element>0 else 0 for element in positionList]
            mrr = div( sum(invertedValuesList) , len(invertedValuesList) )

            def precisionAt(pos, positionList):
                precision = div (sum([1 if element <= pos and element>0 else 0 for element in positionList]), len(positionList))

                return precision

            precision1 = precisionAt(1,positionList)
            precision2 = precisionAt(2,positionList)
            precision3 = precisionAt(3,positionList)

            return (mrr,precision1,precision2,precision3)


        print("%.2f,%.2f" %  (max(listSkippedCN), sum(listSkippedCN)/len(listSkippedCN) ),file=f, end=',')

        print("%.2f,%.2f,%.2f,%.2f" %  positionMetrics(relevantPositions),file=f, end=',')

        print("%.2f,%.2f,%.2f,%.2f" %  positionMetrics(nonEmptyRelevantPositions),file=f)

    #     print('positions')
    #     pp(relevantPositions)

    #     print('---------------\nnon empty:')
    #     pp(nonEmptyRelevantPositions)


THRESHOLD:  1.0
QUERY-SET  ('denzel', 'washington') 

FINDING TUPLE-SETS
12 TUPLE-SETS CREATED

FINDING SCHEMA-SETS
0  SCHEMA-SETS CREATED

GENERATING QUERY MATCHES
23 QUERY MATCHES CREATED

RANKING QUERY MATCHES
13  QUERY MATCHES SKIPPED (due to low score)
GENERATING CANDIDATE NETWORKS
10 CANDIDATE NETWORKS CREATED AND RANKED

CHECKING RELEVANCE
('denzel', 'washington') 

*
RELEVANT CN IN 1 POSITION
[('person', 'name', frozenset(), frozenset({'washington', 'denzel'}))]
QUERY-SET  ('clint', 'eastwood') 

FINDING TUPLE-SETS
11 TUPLE-SETS CREATED

FINDING SCHEMA-SETS
0  SCHEMA-SETS CREATED

GENERATING QUERY MATCHES
19 QUERY MATCHES CREATED

RANKING QUERY MATCHES
9  QUERY MATCHES SKIPPED (due to low score)
GENERATING CANDIDATE NETWORKS
10 CANDIDATE NETWORKS CREATED AND RANKED

CHECKING RELEVANCE
('clint', 'eastwood') 

*
RELEVANT CN IN 1 POSITION
[('person', 'name', frozenset(), frozenset({'clint', 'eastwood'}))]
QUERY-SET  ('john', 'wayne') 

FINDING TUPLE-SETS
15 TUPLE-SETS CREATED

FIN

19  QUERY MATCHES SKIPPED (due to low score)
GENERATING CANDIDATE NETWORKS
10 CANDIDATE NETWORKS CREATED AND RANKED

CHECKING RELEVANCE
('morgan', 'freeman') 

*
RELEVANT CN IN 1 POSITION
[('person', 'name', frozenset(), frozenset({'morgan', 'freeman'}))]
QUERY-SET  ('gone', 'wind') 

FINDING TUPLE-SETS
12 TUPLE-SETS CREATED

FINDING SCHEMA-SETS
0  SCHEMA-SETS CREATED

GENERATING QUERY MATCHES
27 QUERY MATCHES CREATED

RANKING QUERY MATCHES
17  QUERY MATCHES SKIPPED (due to low score)
GENERATING CANDIDATE NETWORKS
10 CANDIDATE NETWORKS CREATED AND RANKED

CHECKING RELEVANCE
('gone', 'wind') 

*
RELEVANT CN IN 1 POSITION
[('movie', 'title', frozenset(), frozenset({'gone', 'wind'}))]
QUERY-SET  ('star', 'wars') 

FINDING TUPLE-SETS
10 TUPLE-SETS CREATED

FINDING SCHEMA-SETS
0  SCHEMA-SETS CREATED

GENERATING QUERY MATCHES
17 QUERY MATCHES CREATED

RANKING QUERY MATCHES
7  QUERY MATCHES SKIPPED (due to low score)
GENERATING CANDIDATE NETWORKS
10 CANDIDATE NETWORKS CREATED AND RANKED

CHEC

5 CANDIDATE NETWORKS CREATED AND RANKED

CHECKING RELEVANCE
('godfather',) 

*
RELEVANT CN IN 1 POSITION
[('movie', 'title', frozenset(), frozenset({'godfather'}))]
QUERY-SET  ('title', 'atticus', 'finch') 

FINDING TUPLE-SETS
15 TUPLE-SETS CREATED

FINDING SCHEMA-SETS
1  SCHEMA-SETS CREATED

GENERATING QUERY MATCHES
102 QUERY MATCHES CREATED

RANKING QUERY MATCHES
92  QUERY MATCHES SKIPPED (due to low score)
GENERATING CANDIDATE NETWORKS
10 CANDIDATE NETWORKS CREATED AND RANKED

CHECKING RELEVANCE
('title', 'atticus', 'finch') 

*
RELEVANT CN IN 1 POSITION
[('movie', 'title', frozenset({'title'}), frozenset()),
 'casting',
 ('character', 'name', frozenset(), frozenset({'finch', 'atticus'}))]
QUERY-SET  ('title', 'indiana', 'jones') 

FINDING TUPLE-SETS
21 TUPLE-SETS CREATED

FINDING SCHEMA-SETS
1  SCHEMA-SETS CREATED

GENERATING QUERY MATCHES
194 QUERY MATCHES CREATED

RANKING QUERY MATCHES
184  QUERY MATCHES SKIPPED (due to low score)
GENERATING CANDIDATE NETWORKS
10 CANDIDATE NETWOR

28 TUPLE-SETS CREATED

FINDING SCHEMA-SETS
1  SCHEMA-SETS CREATED

GENERATING QUERY MATCHES
804 QUERY MATCHES CREATED

RANKING QUERY MATCHES
794  QUERY MATCHES SKIPPED (due to low score)
GENERATING CANDIDATE NETWORKS
10 CANDIDATE NETWORKS CREATED AND RANKED

CHECKING RELEVANCE
('title', 'wicked', 'witch', 'west') 

*
RELEVANT CN IN 1 POSITION
[('movie', 'title', frozenset({'title'}), frozenset()),
 'casting',
 ('character', 'name', frozenset(), frozenset({'wicked', 'west', 'witch'}))]
QUERY-SET  ('title', 'nurse', 'ratched') 

FINDING TUPLE-SETS
13 TUPLE-SETS CREATED

FINDING SCHEMA-SETS
1  SCHEMA-SETS CREATED

GENERATING QUERY MATCHES
42 QUERY MATCHES CREATED

RANKING QUERY MATCHES
32  QUERY MATCHES SKIPPED (due to low score)
GENERATING CANDIDATE NETWORKS
10 CANDIDATE NETWORKS CREATED AND RANKED

CHECKING RELEVANCE
('title', 'nurse', 'ratched') 

*
RELEVANT CN IN 1 POSITION
[('movie', 'title', frozenset({'title'}), frozenset()),
 'casting',
 ('character', 'name', frozenset(), frozense

5336 QUERY MATCHES CREATED

RANKING QUERY MATCHES
5326  QUERY MATCHES SKIPPED (due to low score)
GENERATING CANDIDATE NETWORKS
9 CANDIDATE NETWORKS CREATED AND RANKED

CHECKING RELEVANCE
('henry', 'fonda', 'mine', 'character', 'name') 

*********
NO RELEVANT CN FOUND
QUERY-SET  ('russell', 'crowe', 'gladiator', 'character', 'name') 

FINDING TUPLE-SETS
29 TUPLE-SETS CREATED

FINDING SCHEMA-SETS
1  SCHEMA-SETS CREATED

GENERATING QUERY MATCHES
3290 QUERY MATCHES CREATED

RANKING QUERY MATCHES
3280  QUERY MATCHES SKIPPED (due to low score)
GENERATING CANDIDATE NETWORKS
9 CANDIDATE NETWORKS CREATED AND RANKED

CHECKING RELEVANCE
('russell', 'crowe', 'gladiator', 'character', 'name') 

*********
NO RELEVANT CN FOUND
QUERY-SET  ('brent', 'spiner', 'star', 'trek') 

FINDING TUPLE-SETS
21 TUPLE-SETS CREATED

FINDING SCHEMA-SETS
0  SCHEMA-SETS CREATED

GENERATING QUERY MATCHES
174 QUERY MATCHES CREATED

RANKING QUERY MATCHES
164  QUERY MATCHES SKIPPED (due to low score)
GENERATING CANDIDATE NE

10 CANDIDATE NETWORKS CREATED AND RANKED

CHECKING RELEVANCE
('sean', 'connery', 'fleming') 

*******
RELEVANT CN IN 7 POSITION
[('person', 'name', frozenset(), frozenset({'fleming'})),
 'casting',
 'movie',
 'casting',
 ('person', 'name', frozenset(), frozenset({'connery', 'sean'}))]
QUERY-SET  ('reeves', 'wachowski') 

FINDING TUPLE-SETS
8 TUPLE-SETS CREATED

FINDING SCHEMA-SETS
0  SCHEMA-SETS CREATED

GENERATING QUERY MATCHES
15 QUERY MATCHES CREATED

RANKING QUERY MATCHES
5  QUERY MATCHES SKIPPED (due to low score)
GENERATING CANDIDATE NETWORKS
10 CANDIDATE NETWORKS CREATED AND RANKED

CHECKING RELEVANCE
('reeves', 'wachowski') 

********
RELEVANT CN IN 8 POSITION
[('person', 'name', frozenset(), frozenset({'wachowski'})),
 'casting',
 'movie',
 'casting',
 ('person', 'name', frozenset(), frozenset({'reeves'}))]
QUERY-SET  ('dean', 'jones', 'herbie') 

FINDING TUPLE-SETS
17 TUPLE-SETS CREATED

FINDING SCHEMA-SETS
0  SCHEMA-SETS CREATED

GENERATING QUERY MATCHES
135 QUERY MATCHES CR

10 CANDIDATE NETWORKS CREATED AND RANKED

CHECKING RELEVANCE
('character', 'forrest', 'gump') 

*
RELEVANT CN IN 1 POSITION
[('character',
  'name',
  frozenset({'character'}),
  frozenset({'gump', 'forrest'}))]
QUERY-SET  ('movie', 'social', 'network') 

FINDING TUPLE-SETS
14 TUPLE-SETS CREATED

FINDING SCHEMA-SETS
0  SCHEMA-SETS CREATED

GENERATING QUERY MATCHES
73 QUERY MATCHES CREATED

RANKING QUERY MATCHES
63  QUERY MATCHES SKIPPED (due to low score)
GENERATING CANDIDATE NETWORKS
10 CANDIDATE NETWORKS CREATED AND RANKED

CHECKING RELEVANCE
('movie', 'social', 'network') 

**********
NO RELEVANT CN FOUND
QUERY-SET  ('king', 'kong', 'actor', 'jack', 'black') 

FINDING TUPLE-SETS
52 TUPLE-SETS CREATED

FINDING SCHEMA-SETS
0  SCHEMA-SETS CREATED

GENERATING QUERY MATCHES
14272 QUERY MATCHES CREATED

RANKING QUERY MATCHES
14262  QUERY MATCHES SKIPPED (due to low score)
GENERATING CANDIDATE NETWORKS
10 CANDIDATE NETWORKS CREATED AND RANKED

CHECKING RELEVANCE
('king', 'kong', 'actor', '

10 CANDIDATE NETWORKS CREATED AND RANKED

CHECKING RELEVANCE
('harrison', 'ford') 

*
RELEVANT CN IN 1 POSITION
[('person', 'name', frozenset(), frozenset({'harrison', 'ford'}))]
QUERY-SET  ('julia', 'roberts') 

FINDING TUPLE-SETS
12 TUPLE-SETS CREATED

FINDING SCHEMA-SETS
0  SCHEMA-SETS CREATED

GENERATING QUERY MATCHES
27 QUERY MATCHES CREATED

RANKING QUERY MATCHES
17  QUERY MATCHES SKIPPED (due to low score)
GENERATING CANDIDATE NETWORKS
10 CANDIDATE NETWORKS CREATED AND RANKED

CHECKING RELEVANCE
('julia', 'roberts') 

*
RELEVANT CN IN 1 POSITION
[('person', 'name', frozenset(), frozenset({'roberts', 'julia'}))]
QUERY-SET  ('tom', 'hanks') 

FINDING TUPLE-SETS
14 TUPLE-SETS CREATED

FINDING SCHEMA-SETS
0  SCHEMA-SETS CREATED

GENERATING QUERY MATCHES
29 QUERY MATCHES CREATED

RANKING QUERY MATCHES
19  QUERY MATCHES SKIPPED (due to low score)
GENERATING CANDIDATE NETWORKS
10 CANDIDATE NETWORKS CREATED AND RANKED

CHECKING RELEVANCE
('tom', 'hanks') 

**
RELEVANT CN IN 2 POSITION
[

13 TUPLE-SETS CREATED

FINDING SCHEMA-SETS
0  SCHEMA-SETS CREATED

GENERATING QUERY MATCHES
28 QUERY MATCHES CREATED

RANKING QUERY MATCHES
18  QUERY MATCHES SKIPPED (due to low score)
GENERATING CANDIDATE NETWORKS
10 CANDIDATE NETWORKS CREATED AND RANKED

CHECKING RELEVANCE
('sound', 'music') 

*
RELEVANT CN IN 1 POSITION
[('movie', 'title', frozenset(), frozenset({'sound', 'music'}))]
QUERY-SET  ('wizard', 'oz') 

FINDING TUPLE-SETS
14 TUPLE-SETS CREATED

FINDING SCHEMA-SETS
0  SCHEMA-SETS CREATED

GENERATING QUERY MATCHES
29 QUERY MATCHES CREATED

RANKING QUERY MATCHES
19  QUERY MATCHES SKIPPED (due to low score)
GENERATING CANDIDATE NETWORKS
10 CANDIDATE NETWORKS CREATED AND RANKED

CHECKING RELEVANCE
('wizard', 'oz') 

*
RELEVANT CN IN 1 POSITION
[('movie', 'title', frozenset(), frozenset({'wizard', 'oz'}))]
QUERY-SET  ('notebook',) 

FINDING TUPLE-SETS
3 TUPLE-SETS CREATED

FINDING SCHEMA-SETS
0  SCHEMA-SETS CREATED

GENERATING QUERY MATCHES
3 QUERY MATCHES CREATED

RANKING QUERY

**
RELEVANT CN IN 2 POSITION
[('movie', 'title', frozenset({'title'}), frozenset()),
 'casting',
 ('character', 'name', frozenset(), frozenset({'rick', 'blaine'}))]
QUERY-SET  ('title', 'will', 'kane') 

FINDING TUPLE-SETS
20 TUPLE-SETS CREATED

FINDING SCHEMA-SETS
1  SCHEMA-SETS CREATED

GENERATING QUERY MATCHES
184 QUERY MATCHES CREATED

RANKING QUERY MATCHES
174  QUERY MATCHES SKIPPED (due to low score)
GENERATING CANDIDATE NETWORKS
10 CANDIDATE NETWORKS CREATED AND RANKED

CHECKING RELEVANCE
('title', 'will', 'kane') 

*
RELEVANT CN IN 1 POSITION
[('movie', 'title', frozenset({'title'}), frozenset()),
 'casting',
 ('character', 'name', frozenset(), frozenset({'kane', 'will'}))]
QUERY-SET  ('title', 'dr', 'hannibal', 'lecter') 

FINDING TUPLE-SETS
24 TUPLE-SETS CREATED

FINDING SCHEMA-SETS
1  SCHEMA-SETS CREATED

GENERATING QUERY MATCHES
466 QUERY MATCHES CREATED

RANKING QUERY MATCHES
456  QUERY MATCHES SKIPPED (due to low score)
GENERATING CANDIDATE NETWORKS
10 CANDIDATE NETWORKS 

4338 QUERY MATCHES CREATED

RANKING QUERY MATCHES
4328  QUERY MATCHES SKIPPED (due to low score)
GENERATING CANDIDATE NETWORKS
8 CANDIDATE NETWORKS CREATED AND RANKED

CHECKING RELEVANCE
('title', 'coulda', 'class', 'contender', 'somebody', 'bum') 

********
NO RELEVANT CN FOUND
QUERY-SET  ('title', 'toto', 'feeling', 'kansas') 

FINDING TUPLE-SETS
23 TUPLE-SETS CREATED

FINDING SCHEMA-SETS
1  SCHEMA-SETS CREATED

GENERATING QUERY MATCHES
578 QUERY MATCHES CREATED

RANKING QUERY MATCHES
568  QUERY MATCHES SKIPPED (due to low score)
GENERATING CANDIDATE NETWORKS
10 CANDIDATE NETWORKS CREATED AND RANKED

CHECKING RELEVANCE
('title', 'toto', 'feeling', 'kansas') 

*
RELEVANT CN IN 1 POSITION
[('movie', 'title', frozenset({'title'}), frozenset()),
 ('info', 'info', frozenset(), frozenset({'feeling', 'toto', 'kansas'}))]
QUERY-SET  ('title', 'looking', 'kid') 

FINDING TUPLE-SETS
18 TUPLE-SETS CREATED

FINDING SCHEMA-SETS
1  SCHEMA-SETS CREATED

GENERATING QUERY MATCHES
169 QUERY MATCHES CR

86  QUERY MATCHES SKIPPED (due to low score)
GENERATING CANDIDATE NETWORKS
10 CANDIDATE NETWORKS CREATED AND RANKED

CHECKING RELEVANCE
('name', 'jacques', 'clouseau') 

**
RELEVANT CN IN 2 POSITION
[('person', 'name', frozenset({'name'}), frozenset()),
 'casting',
 ('character', 'name', frozenset(), frozenset({'jacques', 'clouseau'}))]
QUERY-SET  ('name', 'jack', 'ryan') 

FINDING TUPLE-SETS
25 TUPLE-SETS CREATED

FINDING SCHEMA-SETS
2  SCHEMA-SETS CREATED

GENERATING QUERY MATCHES
255 QUERY MATCHES CREATED

RANKING QUERY MATCHES
245  QUERY MATCHES SKIPPED (due to low score)
GENERATING CANDIDATE NETWORKS
10 CANDIDATE NETWORKS CREATED AND RANKED

CHECKING RELEVANCE
('name', 'jack', 'ryan') 

***
RELEVANT CN IN 3 POSITION
[('person', 'name', frozenset({'name'}), frozenset()),
 'casting',
 ('character', 'name', frozenset(), frozenset({'ryan', 'jack'}))]
QUERY-SET  ('rocky', 'stallone') 

FINDING TUPLE-SETS
11 TUPLE-SETS CREATED

FINDING SCHEMA-SETS
0  SCHEMA-SETS CREATED

GENERATING QUER

10 CANDIDATE NETWORKS CREATED AND RANKED

CHECKING RELEVANCE
('actor', 'draco', 'harry', 'potter') 

**********
NO RELEVANT CN FOUND
QUERY-SET  ('johnny', 'depp', 'movies') 

FINDING TUPLE-SETS
16 TUPLE-SETS CREATED

FINDING SCHEMA-SETS
0  SCHEMA-SETS CREATED

GENERATING QUERY MATCHES
94 QUERY MATCHES CREATED

RANKING QUERY MATCHES
84  QUERY MATCHES SKIPPED (due to low score)
GENERATING CANDIDATE NETWORKS
10 CANDIDATE NETWORKS CREATED AND RANKED

CHECKING RELEVANCE
('johnny', 'depp', 'movies') 

**********
NO RELEVANT CN FOUND
QUERY-SET  ('movie', 'steven', 'spielberg') 

FINDING TUPLE-SETS
20 TUPLE-SETS CREATED

FINDING SCHEMA-SETS
1  SCHEMA-SETS CREATED

GENERATING QUERY MATCHES
137 QUERY MATCHES CREATED

RANKING QUERY MATCHES
127  QUERY MATCHES SKIPPED (due to low score)
GENERATING CANDIDATE NETWORKS
10 CANDIDATE NETWORKS CREATED AND RANKED

CHECKING RELEVANCE
('movie', 'steven', 'spielberg') 

**
RELEVANT CN IN 2 POSITION
[('person', 'name', frozenset(), frozenset({'spielberg', 'st

8 CANDIDATE NETWORKS CREATED AND RANKED

CHECKING RELEVANCE
('actors', 'x-men') 

********
NO RELEVANT CN FOUND
THRESHOLD:  0.9
QUERY-SET  ('denzel', 'washington') 

FINDING TUPLE-SETS
12 TUPLE-SETS CREATED

FINDING SCHEMA-SETS
0  SCHEMA-SETS CREATED

GENERATING QUERY MATCHES
23 QUERY MATCHES CREATED

RANKING QUERY MATCHES
13  QUERY MATCHES SKIPPED (due to low score)
GENERATING CANDIDATE NETWORKS
10 CANDIDATE NETWORKS CREATED AND RANKED

CHECKING RELEVANCE
('denzel', 'washington') 

*
RELEVANT CN IN 1 POSITION
[('person', 'name', frozenset(), frozenset({'washington', 'denzel'}))]
QUERY-SET  ('clint', 'eastwood') 

FINDING TUPLE-SETS
11 TUPLE-SETS CREATED

FINDING SCHEMA-SETS
0  SCHEMA-SETS CREATED

GENERATING QUERY MATCHES
19 QUERY MATCHES CREATED

RANKING QUERY MATCHES
9  QUERY MATCHES SKIPPED (due to low score)
GENERATING CANDIDATE NETWORKS
10 CANDIDATE NETWORKS CREATED AND RANKED

CHECKING RELEVANCE
('clint', 'eastwood') 

*
RELEVANT CN IN 1 POSITION
[('person', 'name', frozenset(),

19  QUERY MATCHES SKIPPED (due to low score)
GENERATING CANDIDATE NETWORKS
10 CANDIDATE NETWORKS CREATED AND RANKED

CHECKING RELEVANCE
('morgan', 'freeman') 

*
RELEVANT CN IN 1 POSITION
[('person', 'name', frozenset(), frozenset({'morgan', 'freeman'}))]
QUERY-SET  ('gone', 'wind') 

FINDING TUPLE-SETS
12 TUPLE-SETS CREATED

FINDING SCHEMA-SETS
0  SCHEMA-SETS CREATED

GENERATING QUERY MATCHES
27 QUERY MATCHES CREATED

RANKING QUERY MATCHES
17  QUERY MATCHES SKIPPED (due to low score)
GENERATING CANDIDATE NETWORKS
10 CANDIDATE NETWORKS CREATED AND RANKED

CHECKING RELEVANCE
('gone', 'wind') 

*
RELEVANT CN IN 1 POSITION
[('movie', 'title', frozenset(), frozenset({'gone', 'wind'}))]
QUERY-SET  ('star', 'wars') 

FINDING TUPLE-SETS
10 TUPLE-SETS CREATED

FINDING SCHEMA-SETS
0  SCHEMA-SETS CREATED

GENERATING QUERY MATCHES
17 QUERY MATCHES CREATED

RANKING QUERY MATCHES
7  QUERY MATCHES SKIPPED (due to low score)
GENERATING CANDIDATE NETWORKS
10 CANDIDATE NETWORKS CREATED AND RANKED

CHEC

5 CANDIDATE NETWORKS CREATED AND RANKED

CHECKING RELEVANCE
('godfather',) 

*
RELEVANT CN IN 1 POSITION
[('movie', 'title', frozenset(), frozenset({'godfather'}))]
QUERY-SET  ('title', 'atticus', 'finch') 

FINDING TUPLE-SETS
15 TUPLE-SETS CREATED

FINDING SCHEMA-SETS
1  SCHEMA-SETS CREATED

GENERATING QUERY MATCHES
102 QUERY MATCHES CREATED

RANKING QUERY MATCHES
92  QUERY MATCHES SKIPPED (due to low score)
GENERATING CANDIDATE NETWORKS
10 CANDIDATE NETWORKS CREATED AND RANKED

CHECKING RELEVANCE
('title', 'atticus', 'finch') 

*
RELEVANT CN IN 1 POSITION
[('movie', 'title', frozenset({'title'}), frozenset()),
 'casting',
 ('character', 'name', frozenset(), frozenset({'finch', 'atticus'}))]
QUERY-SET  ('title', 'indiana', 'jones') 

FINDING TUPLE-SETS
21 TUPLE-SETS CREATED

FINDING SCHEMA-SETS
1  SCHEMA-SETS CREATED

GENERATING QUERY MATCHES
194 QUERY MATCHES CREATED

RANKING QUERY MATCHES
184  QUERY MATCHES SKIPPED (due to low score)
GENERATING CANDIDATE NETWORKS
10 CANDIDATE NETWOR

28 TUPLE-SETS CREATED

FINDING SCHEMA-SETS
1  SCHEMA-SETS CREATED

GENERATING QUERY MATCHES
804 QUERY MATCHES CREATED

RANKING QUERY MATCHES
794  QUERY MATCHES SKIPPED (due to low score)
GENERATING CANDIDATE NETWORKS
10 CANDIDATE NETWORKS CREATED AND RANKED

CHECKING RELEVANCE
('title', 'wicked', 'witch', 'west') 

*
RELEVANT CN IN 1 POSITION
[('movie', 'title', frozenset({'title'}), frozenset()),
 'casting',
 ('character', 'name', frozenset(), frozenset({'wicked', 'west', 'witch'}))]
QUERY-SET  ('title', 'nurse', 'ratched') 

FINDING TUPLE-SETS
13 TUPLE-SETS CREATED

FINDING SCHEMA-SETS
1  SCHEMA-SETS CREATED

GENERATING QUERY MATCHES
42 QUERY MATCHES CREATED

RANKING QUERY MATCHES
32  QUERY MATCHES SKIPPED (due to low score)
GENERATING CANDIDATE NETWORKS
10 CANDIDATE NETWORKS CREATED AND RANKED

CHECKING RELEVANCE
('title', 'nurse', 'ratched') 

*
RELEVANT CN IN 1 POSITION
[('movie', 'title', frozenset({'title'}), frozenset()),
 'casting',
 ('character', 'name', frozenset(), frozense

7094 QUERY MATCHES CREATED

RANKING QUERY MATCHES
7084  QUERY MATCHES SKIPPED (due to low score)
GENERATING CANDIDATE NETWORKS
10 CANDIDATE NETWORKS CREATED AND RANKED

CHECKING RELEVANCE
('henry', 'fonda', 'mine', 'character', 'name') 

**
RELEVANT CN IN 2 POSITION
[('person', 'name', frozenset(), frozenset({'fonda', 'henry'})),
 'casting',
 ('movie', 'title', frozenset(), frozenset({'mine'})),
 'casting',
 ('character', 'name', frozenset({'name', 'character'}), frozenset())]
QUERY-SET  ('russell', 'crowe', 'gladiator', 'character', 'name') 

FINDING TUPLE-SETS
29 TUPLE-SETS CREATED

FINDING SCHEMA-SETS
3  SCHEMA-SETS CREATED

GENERATING QUERY MATCHES
4410 QUERY MATCHES CREATED

RANKING QUERY MATCHES
4400  QUERY MATCHES SKIPPED (due to low score)
GENERATING CANDIDATE NETWORKS
10 CANDIDATE NETWORKS CREATED AND RANKED

CHECKING RELEVANCE
('russell', 'crowe', 'gladiator', 'character', 'name') 

***
RELEVANT CN IN 3 POSITION
[('character', 'name', frozenset({'name', 'character'}), frozens

154  QUERY MATCHES SKIPPED (due to low score)
GENERATING CANDIDATE NETWORKS
10 CANDIDATE NETWORKS CREATED AND RANKED

CHECKING RELEVANCE
('sean', 'connery', 'fleming') 

*******
RELEVANT CN IN 7 POSITION
[('person', 'name', frozenset(), frozenset({'fleming'})),
 'casting',
 'movie',
 'casting',
 ('person', 'name', frozenset(), frozenset({'connery', 'sean'}))]
QUERY-SET  ('reeves', 'wachowski') 

FINDING TUPLE-SETS
8 TUPLE-SETS CREATED

FINDING SCHEMA-SETS
0  SCHEMA-SETS CREATED

GENERATING QUERY MATCHES
15 QUERY MATCHES CREATED

RANKING QUERY MATCHES
5  QUERY MATCHES SKIPPED (due to low score)
GENERATING CANDIDATE NETWORKS
10 CANDIDATE NETWORKS CREATED AND RANKED

CHECKING RELEVANCE
('reeves', 'wachowski') 

********
RELEVANT CN IN 8 POSITION
[('person', 'name', frozenset(), frozenset({'wachowski'})),
 'casting',
 'movie',
 'casting',
 ('person', 'name', frozenset(), frozenset({'reeves'}))]
QUERY-SET  ('dean', 'jones', 'herbie') 

FINDING TUPLE-SETS
17 TUPLE-SETS CREATED

FINDING SCHEM

10 CANDIDATE NETWORKS CREATED AND RANKED

CHECKING RELEVANCE
('character', 'forrest', 'gump') 

*
RELEVANT CN IN 1 POSITION
[('character',
  'name',
  frozenset({'character'}),
  frozenset({'gump', 'forrest'}))]
QUERY-SET  ('movie', 'social', 'network') 

FINDING TUPLE-SETS
14 TUPLE-SETS CREATED

FINDING SCHEMA-SETS
1  SCHEMA-SETS CREATED

GENERATING QUERY MATCHES
90 QUERY MATCHES CREATED

RANKING QUERY MATCHES
80  QUERY MATCHES SKIPPED (due to low score)
GENERATING CANDIDATE NETWORKS
10 CANDIDATE NETWORKS CREATED AND RANKED

CHECKING RELEVANCE
('movie', 'social', 'network') 

*
RELEVANT CN IN 1 POSITION
[('movie', 'title', frozenset({'movie'}), frozenset({'network', 'social'}))]
QUERY-SET  ('king', 'kong', 'actor', 'jack', 'black') 

FINDING TUPLE-SETS
52 TUPLE-SETS CREATED

FINDING SCHEMA-SETS
0  SCHEMA-SETS CREATED

GENERATING QUERY MATCHES
14272 QUERY MATCHES CREATED

RANKING QUERY MATCHES
14262  QUERY MATCHES SKIPPED (due to low score)
GENERATING CANDIDATE NETWORKS
10 CANDIDATE NE

19  QUERY MATCHES SKIPPED (due to low score)
GENERATING CANDIDATE NETWORKS
10 CANDIDATE NETWORKS CREATED AND RANKED

CHECKING RELEVANCE
('harrison', 'ford') 

*
RELEVANT CN IN 1 POSITION
[('person', 'name', frozenset(), frozenset({'harrison', 'ford'}))]
QUERY-SET  ('julia', 'roberts') 

FINDING TUPLE-SETS
12 TUPLE-SETS CREATED

FINDING SCHEMA-SETS
0  SCHEMA-SETS CREATED

GENERATING QUERY MATCHES
27 QUERY MATCHES CREATED

RANKING QUERY MATCHES
17  QUERY MATCHES SKIPPED (due to low score)
GENERATING CANDIDATE NETWORKS
10 CANDIDATE NETWORKS CREATED AND RANKED

CHECKING RELEVANCE
('julia', 'roberts') 

*
RELEVANT CN IN 1 POSITION
[('person', 'name', frozenset(), frozenset({'roberts', 'julia'}))]
QUERY-SET  ('tom', 'hanks') 

FINDING TUPLE-SETS
14 TUPLE-SETS CREATED

FINDING SCHEMA-SETS
0  SCHEMA-SETS CREATED

GENERATING QUERY MATCHES
29 QUERY MATCHES CREATED

RANKING QUERY MATCHES
19  QUERY MATCHES SKIPPED (due to low score)
GENERATING CANDIDATE NETWORKS
10 CANDIDATE NETWORKS CREATED AND R

13 TUPLE-SETS CREATED

FINDING SCHEMA-SETS
0  SCHEMA-SETS CREATED

GENERATING QUERY MATCHES
28 QUERY MATCHES CREATED

RANKING QUERY MATCHES
18  QUERY MATCHES SKIPPED (due to low score)
GENERATING CANDIDATE NETWORKS
10 CANDIDATE NETWORKS CREATED AND RANKED

CHECKING RELEVANCE
('sound', 'music') 

*
RELEVANT CN IN 1 POSITION
[('movie', 'title', frozenset(), frozenset({'sound', 'music'}))]
QUERY-SET  ('wizard', 'oz') 

FINDING TUPLE-SETS
14 TUPLE-SETS CREATED

FINDING SCHEMA-SETS
0  SCHEMA-SETS CREATED

GENERATING QUERY MATCHES
29 QUERY MATCHES CREATED

RANKING QUERY MATCHES
19  QUERY MATCHES SKIPPED (due to low score)
GENERATING CANDIDATE NETWORKS
10 CANDIDATE NETWORKS CREATED AND RANKED

CHECKING RELEVANCE
('wizard', 'oz') 

*
RELEVANT CN IN 1 POSITION
[('movie', 'title', frozenset(), frozenset({'wizard', 'oz'}))]
QUERY-SET  ('notebook',) 

FINDING TUPLE-SETS
3 TUPLE-SETS CREATED

FINDING SCHEMA-SETS
0  SCHEMA-SETS CREATED

GENERATING QUERY MATCHES
3 QUERY MATCHES CREATED

RANKING QUERY

173  QUERY MATCHES SKIPPED (due to low score)
GENERATING CANDIDATE NETWORKS
10 CANDIDATE NETWORKS CREATED AND RANKED

CHECKING RELEVANCE
('title', 'rick', 'blaine') 

**
RELEVANT CN IN 2 POSITION
[('movie', 'title', frozenset({'title'}), frozenset()),
 'casting',
 ('character', 'name', frozenset(), frozenset({'rick', 'blaine'}))]
QUERY-SET  ('title', 'will', 'kane') 

FINDING TUPLE-SETS
20 TUPLE-SETS CREATED

FINDING SCHEMA-SETS
1  SCHEMA-SETS CREATED

GENERATING QUERY MATCHES
184 QUERY MATCHES CREATED

RANKING QUERY MATCHES
174  QUERY MATCHES SKIPPED (due to low score)
GENERATING CANDIDATE NETWORKS
10 CANDIDATE NETWORKS CREATED AND RANKED

CHECKING RELEVANCE
('title', 'will', 'kane') 

*
RELEVANT CN IN 1 POSITION
[('movie', 'title', frozenset({'title'}), frozenset()),
 'casting',
 ('character', 'name', frozenset(), frozenset({'kane', 'will'}))]
QUERY-SET  ('title', 'dr', 'hannibal', 'lecter') 

FINDING TUPLE-SETS
24 TUPLE-SETS CREATED

FINDING SCHEMA-SETS
1  SCHEMA-SETS CREATED

GENER

4338 QUERY MATCHES CREATED

RANKING QUERY MATCHES
4328  QUERY MATCHES SKIPPED (due to low score)
GENERATING CANDIDATE NETWORKS
8 CANDIDATE NETWORKS CREATED AND RANKED

CHECKING RELEVANCE
('title', 'coulda', 'class', 'contender', 'somebody', 'bum') 

********
NO RELEVANT CN FOUND
QUERY-SET  ('title', 'toto', 'feeling', 'kansas') 

FINDING TUPLE-SETS
23 TUPLE-SETS CREATED

FINDING SCHEMA-SETS
1  SCHEMA-SETS CREATED

GENERATING QUERY MATCHES
578 QUERY MATCHES CREATED

RANKING QUERY MATCHES
568  QUERY MATCHES SKIPPED (due to low score)
GENERATING CANDIDATE NETWORKS
10 CANDIDATE NETWORKS CREATED AND RANKED

CHECKING RELEVANCE
('title', 'toto', 'feeling', 'kansas') 

*
RELEVANT CN IN 1 POSITION
[('movie', 'title', frozenset({'title'}), frozenset()),
 ('info', 'info', frozenset(), frozenset({'feeling', 'toto', 'kansas'}))]
QUERY-SET  ('title', 'looking', 'kid') 

FINDING TUPLE-SETS
18 TUPLE-SETS CREATED

FINDING SCHEMA-SETS
1  SCHEMA-SETS CREATED

GENERATING QUERY MATCHES
169 QUERY MATCHES CR

86  QUERY MATCHES SKIPPED (due to low score)
GENERATING CANDIDATE NETWORKS
10 CANDIDATE NETWORKS CREATED AND RANKED

CHECKING RELEVANCE
('name', 'jacques', 'clouseau') 

**
RELEVANT CN IN 2 POSITION
[('person', 'name', frozenset({'name'}), frozenset()),
 'casting',
 ('character', 'name', frozenset(), frozenset({'jacques', 'clouseau'}))]
QUERY-SET  ('name', 'jack', 'ryan') 

FINDING TUPLE-SETS
25 TUPLE-SETS CREATED

FINDING SCHEMA-SETS
2  SCHEMA-SETS CREATED

GENERATING QUERY MATCHES
255 QUERY MATCHES CREATED

RANKING QUERY MATCHES
245  QUERY MATCHES SKIPPED (due to low score)
GENERATING CANDIDATE NETWORKS
10 CANDIDATE NETWORKS CREATED AND RANKED

CHECKING RELEVANCE
('name', 'jack', 'ryan') 

***
RELEVANT CN IN 3 POSITION
[('person', 'name', frozenset({'name'}), frozenset()),
 'casting',
 ('character', 'name', frozenset(), frozenset({'ryan', 'jack'}))]
QUERY-SET  ('rocky', 'stallone') 

FINDING TUPLE-SETS
11 TUPLE-SETS CREATED

FINDING SCHEMA-SETS
0  SCHEMA-SETS CREATED

GENERATING QUER

10 CANDIDATE NETWORKS CREATED AND RANKED

CHECKING RELEVANCE
('actor', 'draco', 'harry', 'potter') 

**********
NO RELEVANT CN FOUND
QUERY-SET  ('johnny', 'depp', 'movies') 

FINDING TUPLE-SETS
16 TUPLE-SETS CREATED

FINDING SCHEMA-SETS
0  SCHEMA-SETS CREATED

GENERATING QUERY MATCHES
94 QUERY MATCHES CREATED

RANKING QUERY MATCHES
84  QUERY MATCHES SKIPPED (due to low score)
GENERATING CANDIDATE NETWORKS
10 CANDIDATE NETWORKS CREATED AND RANKED

CHECKING RELEVANCE
('johnny', 'depp', 'movies') 

**********
NO RELEVANT CN FOUND
QUERY-SET  ('movie', 'steven', 'spielberg') 

FINDING TUPLE-SETS
20 TUPLE-SETS CREATED

FINDING SCHEMA-SETS
1  SCHEMA-SETS CREATED

GENERATING QUERY MATCHES
137 QUERY MATCHES CREATED

RANKING QUERY MATCHES
127  QUERY MATCHES SKIPPED (due to low score)
GENERATING CANDIDATE NETWORKS
10 CANDIDATE NETWORKS CREATED AND RANKED

CHECKING RELEVANCE
('movie', 'steven', 'spielberg') 

**
RELEVANT CN IN 2 POSITION
[('person', 'name', frozenset(), frozenset({'spielberg', 'st

8 CANDIDATE NETWORKS CREATED AND RANKED

CHECKING RELEVANCE
('actors', 'x-men') 

********
NO RELEVANT CN FOUND
THRESHOLD:  0.8
QUERY-SET  ('denzel', 'washington') 

FINDING TUPLE-SETS
12 TUPLE-SETS CREATED

FINDING SCHEMA-SETS
0  SCHEMA-SETS CREATED

GENERATING QUERY MATCHES
23 QUERY MATCHES CREATED

RANKING QUERY MATCHES
13  QUERY MATCHES SKIPPED (due to low score)
GENERATING CANDIDATE NETWORKS
10 CANDIDATE NETWORKS CREATED AND RANKED

CHECKING RELEVANCE
('denzel', 'washington') 

*
RELEVANT CN IN 1 POSITION
[('person', 'name', frozenset(), frozenset({'washington', 'denzel'}))]
QUERY-SET  ('clint', 'eastwood') 

FINDING TUPLE-SETS
11 TUPLE-SETS CREATED

FINDING SCHEMA-SETS
0  SCHEMA-SETS CREATED

GENERATING QUERY MATCHES
19 QUERY MATCHES CREATED

RANKING QUERY MATCHES
9  QUERY MATCHES SKIPPED (due to low score)
GENERATING CANDIDATE NETWORKS
10 CANDIDATE NETWORKS CREATED AND RANKED

CHECKING RELEVANCE
('clint', 'eastwood') 

*
RELEVANT CN IN 1 POSITION
[('person', 'name', frozenset(),

19  QUERY MATCHES SKIPPED (due to low score)
GENERATING CANDIDATE NETWORKS
10 CANDIDATE NETWORKS CREATED AND RANKED

CHECKING RELEVANCE
('morgan', 'freeman') 

*
RELEVANT CN IN 1 POSITION
[('person', 'name', frozenset(), frozenset({'morgan', 'freeman'}))]
QUERY-SET  ('gone', 'wind') 

FINDING TUPLE-SETS
12 TUPLE-SETS CREATED

FINDING SCHEMA-SETS
0  SCHEMA-SETS CREATED

GENERATING QUERY MATCHES
27 QUERY MATCHES CREATED

RANKING QUERY MATCHES
17  QUERY MATCHES SKIPPED (due to low score)
GENERATING CANDIDATE NETWORKS
10 CANDIDATE NETWORKS CREATED AND RANKED

CHECKING RELEVANCE
('gone', 'wind') 

*
RELEVANT CN IN 1 POSITION
[('movie', 'title', frozenset(), frozenset({'gone', 'wind'}))]
QUERY-SET  ('star', 'wars') 

FINDING TUPLE-SETS
10 TUPLE-SETS CREATED

FINDING SCHEMA-SETS
0  SCHEMA-SETS CREATED

GENERATING QUERY MATCHES
17 QUERY MATCHES CREATED

RANKING QUERY MATCHES
7  QUERY MATCHES SKIPPED (due to low score)
GENERATING CANDIDATE NETWORKS
10 CANDIDATE NETWORKS CREATED AND RANKED

CHEC

0  SCHEMA-SETS CREATED

GENERATING QUERY MATCHES
5 QUERY MATCHES CREATED

RANKING QUERY MATCHES
GENERATING CANDIDATE NETWORKS
5 CANDIDATE NETWORKS CREATED AND RANKED

CHECKING RELEVANCE
('godfather',) 

*
RELEVANT CN IN 1 POSITION
[('movie', 'title', frozenset(), frozenset({'godfather'}))]
QUERY-SET  ('title', 'atticus', 'finch') 

FINDING TUPLE-SETS
15 TUPLE-SETS CREATED

FINDING SCHEMA-SETS
1  SCHEMA-SETS CREATED

GENERATING QUERY MATCHES
102 QUERY MATCHES CREATED

RANKING QUERY MATCHES
92  QUERY MATCHES SKIPPED (due to low score)
GENERATING CANDIDATE NETWORKS
10 CANDIDATE NETWORKS CREATED AND RANKED

CHECKING RELEVANCE
('title', 'atticus', 'finch') 

*
RELEVANT CN IN 1 POSITION
[('movie', 'title', frozenset({'title'}), frozenset()),
 'casting',
 ('character', 'name', frozenset(), frozenset({'finch', 'atticus'}))]
QUERY-SET  ('title', 'indiana', 'jones') 

FINDING TUPLE-SETS
21 TUPLE-SETS CREATED

FINDING SCHEMA-SETS
1  SCHEMA-SETS CREATED

GENERATING QUERY MATCHES
194 QUERY MATCHES 

28 TUPLE-SETS CREATED

FINDING SCHEMA-SETS
1  SCHEMA-SETS CREATED

GENERATING QUERY MATCHES
804 QUERY MATCHES CREATED

RANKING QUERY MATCHES
794  QUERY MATCHES SKIPPED (due to low score)
GENERATING CANDIDATE NETWORKS
10 CANDIDATE NETWORKS CREATED AND RANKED

CHECKING RELEVANCE
('title', 'wicked', 'witch', 'west') 

*
RELEVANT CN IN 1 POSITION
[('movie', 'title', frozenset({'title'}), frozenset()),
 'casting',
 ('character', 'name', frozenset(), frozenset({'wicked', 'west', 'witch'}))]
QUERY-SET  ('title', 'nurse', 'ratched') 

FINDING TUPLE-SETS
13 TUPLE-SETS CREATED

FINDING SCHEMA-SETS
1  SCHEMA-SETS CREATED

GENERATING QUERY MATCHES
42 QUERY MATCHES CREATED

RANKING QUERY MATCHES
32  QUERY MATCHES SKIPPED (due to low score)
GENERATING CANDIDATE NETWORKS
10 CANDIDATE NETWORKS CREATED AND RANKED

CHECKING RELEVANCE
('title', 'nurse', 'ratched') 

*
RELEVANT CN IN 1 POSITION
[('movie', 'title', frozenset({'title'}), frozenset()),
 'casting',
 ('character', 'name', frozenset(), frozense

7094 QUERY MATCHES CREATED

RANKING QUERY MATCHES
7084  QUERY MATCHES SKIPPED (due to low score)
GENERATING CANDIDATE NETWORKS
10 CANDIDATE NETWORKS CREATED AND RANKED

CHECKING RELEVANCE
('henry', 'fonda', 'mine', 'character', 'name') 

**
RELEVANT CN IN 2 POSITION
[('person', 'name', frozenset(), frozenset({'fonda', 'henry'})),
 'casting',
 ('movie', 'title', frozenset(), frozenset({'mine'})),
 'casting',
 ('character', 'name', frozenset({'name', 'character'}), frozenset())]
QUERY-SET  ('russell', 'crowe', 'gladiator', 'character', 'name') 

FINDING TUPLE-SETS
29 TUPLE-SETS CREATED

FINDING SCHEMA-SETS
3  SCHEMA-SETS CREATED

GENERATING QUERY MATCHES
4410 QUERY MATCHES CREATED

RANKING QUERY MATCHES
4400  QUERY MATCHES SKIPPED (due to low score)
GENERATING CANDIDATE NETWORKS
10 CANDIDATE NETWORKS CREATED AND RANKED

CHECKING RELEVANCE
('russell', 'crowe', 'gladiator', 'character', 'name') 

***
RELEVANT CN IN 3 POSITION
[('character', 'name', frozenset({'name', 'character'}), frozens

154  QUERY MATCHES SKIPPED (due to low score)
GENERATING CANDIDATE NETWORKS
10 CANDIDATE NETWORKS CREATED AND RANKED

CHECKING RELEVANCE
('sean', 'connery', 'fleming') 

*******
RELEVANT CN IN 7 POSITION
[('person', 'name', frozenset(), frozenset({'fleming'})),
 'casting',
 'movie',
 'casting',
 ('person', 'name', frozenset(), frozenset({'connery', 'sean'}))]
QUERY-SET  ('reeves', 'wachowski') 

FINDING TUPLE-SETS
8 TUPLE-SETS CREATED

FINDING SCHEMA-SETS
0  SCHEMA-SETS CREATED

GENERATING QUERY MATCHES
15 QUERY MATCHES CREATED

RANKING QUERY MATCHES
5  QUERY MATCHES SKIPPED (due to low score)
GENERATING CANDIDATE NETWORKS
10 CANDIDATE NETWORKS CREATED AND RANKED

CHECKING RELEVANCE
('reeves', 'wachowski') 

********
RELEVANT CN IN 8 POSITION
[('person', 'name', frozenset(), frozenset({'wachowski'})),
 'casting',
 'movie',
 'casting',
 ('person', 'name', frozenset(), frozenset({'reeves'}))]
QUERY-SET  ('dean', 'jones', 'herbie') 

FINDING TUPLE-SETS
17 TUPLE-SETS CREATED

FINDING SCHEM

10 CANDIDATE NETWORKS CREATED AND RANKED

CHECKING RELEVANCE
('character', 'forrest', 'gump') 

*
RELEVANT CN IN 1 POSITION
[('character',
  'name',
  frozenset({'character'}),
  frozenset({'gump', 'forrest'}))]
QUERY-SET  ('movie', 'social', 'network') 

FINDING TUPLE-SETS
14 TUPLE-SETS CREATED

FINDING SCHEMA-SETS
1  SCHEMA-SETS CREATED

GENERATING QUERY MATCHES
90 QUERY MATCHES CREATED

RANKING QUERY MATCHES
80  QUERY MATCHES SKIPPED (due to low score)
GENERATING CANDIDATE NETWORKS
10 CANDIDATE NETWORKS CREATED AND RANKED

CHECKING RELEVANCE
('movie', 'social', 'network') 

*
RELEVANT CN IN 1 POSITION
[('movie', 'title', frozenset({'movie'}), frozenset({'network', 'social'}))]
QUERY-SET  ('king', 'kong', 'actor', 'jack', 'black') 

FINDING TUPLE-SETS
52 TUPLE-SETS CREATED

FINDING SCHEMA-SETS
0  SCHEMA-SETS CREATED

GENERATING QUERY MATCHES
14272 QUERY MATCHES CREATED

RANKING QUERY MATCHES
14262  QUERY MATCHES SKIPPED (due to low score)
GENERATING CANDIDATE NETWORKS
10 CANDIDATE NE

19  QUERY MATCHES SKIPPED (due to low score)
GENERATING CANDIDATE NETWORKS
10 CANDIDATE NETWORKS CREATED AND RANKED

CHECKING RELEVANCE
('harrison', 'ford') 

*
RELEVANT CN IN 1 POSITION
[('person', 'name', frozenset(), frozenset({'harrison', 'ford'}))]
QUERY-SET  ('julia', 'roberts') 

FINDING TUPLE-SETS
12 TUPLE-SETS CREATED

FINDING SCHEMA-SETS
0  SCHEMA-SETS CREATED

GENERATING QUERY MATCHES
27 QUERY MATCHES CREATED

RANKING QUERY MATCHES
17  QUERY MATCHES SKIPPED (due to low score)
GENERATING CANDIDATE NETWORKS
10 CANDIDATE NETWORKS CREATED AND RANKED

CHECKING RELEVANCE
('julia', 'roberts') 

*
RELEVANT CN IN 1 POSITION
[('person', 'name', frozenset(), frozenset({'roberts', 'julia'}))]
QUERY-SET  ('tom', 'hanks') 

FINDING TUPLE-SETS
14 TUPLE-SETS CREATED

FINDING SCHEMA-SETS
0  SCHEMA-SETS CREATED

GENERATING QUERY MATCHES
29 QUERY MATCHES CREATED

RANKING QUERY MATCHES
19  QUERY MATCHES SKIPPED (due to low score)
GENERATING CANDIDATE NETWORKS
10 CANDIDATE NETWORKS CREATED AND R

18  QUERY MATCHES SKIPPED (due to low score)
GENERATING CANDIDATE NETWORKS
10 CANDIDATE NETWORKS CREATED AND RANKED

CHECKING RELEVANCE
('sound', 'music') 

*
RELEVANT CN IN 1 POSITION
[('movie', 'title', frozenset(), frozenset({'sound', 'music'}))]
QUERY-SET  ('wizard', 'oz') 

FINDING TUPLE-SETS
14 TUPLE-SETS CREATED

FINDING SCHEMA-SETS
0  SCHEMA-SETS CREATED

GENERATING QUERY MATCHES
29 QUERY MATCHES CREATED

RANKING QUERY MATCHES
19  QUERY MATCHES SKIPPED (due to low score)
GENERATING CANDIDATE NETWORKS
10 CANDIDATE NETWORKS CREATED AND RANKED

CHECKING RELEVANCE
('wizard', 'oz') 

*
RELEVANT CN IN 1 POSITION
[('movie', 'title', frozenset(), frozenset({'wizard', 'oz'}))]
QUERY-SET  ('notebook',) 

FINDING TUPLE-SETS
3 TUPLE-SETS CREATED

FINDING SCHEMA-SETS
0  SCHEMA-SETS CREATED

GENERATING QUERY MATCHES
3 QUERY MATCHES CREATED

RANKING QUERY MATCHES
GENERATING CANDIDATE NETWORKS
3 CANDIDATE NETWORKS CREATED AND RANKED

CHECKING RELEVANCE
('notebook',) 

*
RELEVANT CN IN 1 POSITI

**
RELEVANT CN IN 2 POSITION
[('movie', 'title', frozenset({'title'}), frozenset()),
 'casting',
 ('character', 'name', frozenset(), frozenset({'rick', 'blaine'}))]
QUERY-SET  ('title', 'will', 'kane') 

FINDING TUPLE-SETS
20 TUPLE-SETS CREATED

FINDING SCHEMA-SETS
1  SCHEMA-SETS CREATED

GENERATING QUERY MATCHES
184 QUERY MATCHES CREATED

RANKING QUERY MATCHES
174  QUERY MATCHES SKIPPED (due to low score)
GENERATING CANDIDATE NETWORKS
10 CANDIDATE NETWORKS CREATED AND RANKED

CHECKING RELEVANCE
('title', 'will', 'kane') 

*
RELEVANT CN IN 1 POSITION
[('movie', 'title', frozenset({'title'}), frozenset()),
 'casting',
 ('character', 'name', frozenset(), frozenset({'kane', 'will'}))]
QUERY-SET  ('title', 'dr', 'hannibal', 'lecter') 

FINDING TUPLE-SETS
24 TUPLE-SETS CREATED

FINDING SCHEMA-SETS
1  SCHEMA-SETS CREATED

GENERATING QUERY MATCHES
466 QUERY MATCHES CREATED

RANKING QUERY MATCHES
456  QUERY MATCHES SKIPPED (due to low score)
GENERATING CANDIDATE NETWORKS
10 CANDIDATE NETWORKS 

4338 QUERY MATCHES CREATED

RANKING QUERY MATCHES
4328  QUERY MATCHES SKIPPED (due to low score)
GENERATING CANDIDATE NETWORKS
8 CANDIDATE NETWORKS CREATED AND RANKED

CHECKING RELEVANCE
('title', 'coulda', 'class', 'contender', 'somebody', 'bum') 

********
NO RELEVANT CN FOUND
QUERY-SET  ('title', 'toto', 'feeling', 'kansas') 

FINDING TUPLE-SETS
23 TUPLE-SETS CREATED

FINDING SCHEMA-SETS
1  SCHEMA-SETS CREATED

GENERATING QUERY MATCHES
578 QUERY MATCHES CREATED

RANKING QUERY MATCHES
568  QUERY MATCHES SKIPPED (due to low score)
GENERATING CANDIDATE NETWORKS
10 CANDIDATE NETWORKS CREATED AND RANKED

CHECKING RELEVANCE
('title', 'toto', 'feeling', 'kansas') 

*
RELEVANT CN IN 1 POSITION
[('movie', 'title', frozenset({'title'}), frozenset()),
 ('info', 'info', frozenset(), frozenset({'feeling', 'toto', 'kansas'}))]
QUERY-SET  ('title', 'looking', 'kid') 

FINDING TUPLE-SETS
18 TUPLE-SETS CREATED

FINDING SCHEMA-SETS
1  SCHEMA-SETS CREATED

GENERATING QUERY MATCHES
169 QUERY MATCHES CR

86  QUERY MATCHES SKIPPED (due to low score)
GENERATING CANDIDATE NETWORKS
10 CANDIDATE NETWORKS CREATED AND RANKED

CHECKING RELEVANCE
('name', 'jacques', 'clouseau') 

**
RELEVANT CN IN 2 POSITION
[('person', 'name', frozenset({'name'}), frozenset()),
 'casting',
 ('character', 'name', frozenset(), frozenset({'jacques', 'clouseau'}))]
QUERY-SET  ('name', 'jack', 'ryan') 

FINDING TUPLE-SETS
25 TUPLE-SETS CREATED

FINDING SCHEMA-SETS
2  SCHEMA-SETS CREATED

GENERATING QUERY MATCHES
255 QUERY MATCHES CREATED

RANKING QUERY MATCHES
245  QUERY MATCHES SKIPPED (due to low score)
GENERATING CANDIDATE NETWORKS
10 CANDIDATE NETWORKS CREATED AND RANKED

CHECKING RELEVANCE
('name', 'jack', 'ryan') 

***
RELEVANT CN IN 3 POSITION
[('person', 'name', frozenset({'name'}), frozenset()),
 'casting',
 ('character', 'name', frozenset(), frozenset({'ryan', 'jack'}))]
QUERY-SET  ('rocky', 'stallone') 

FINDING TUPLE-SETS
11 TUPLE-SETS CREATED

FINDING SCHEMA-SETS
0  SCHEMA-SETS CREATED

GENERATING QUER

10 CANDIDATE NETWORKS CREATED AND RANKED

CHECKING RELEVANCE
('actor', 'draco', 'harry', 'potter') 

**********
NO RELEVANT CN FOUND
QUERY-SET  ('johnny', 'depp', 'movies') 

FINDING TUPLE-SETS
16 TUPLE-SETS CREATED

FINDING SCHEMA-SETS
1  SCHEMA-SETS CREATED

GENERATING QUERY MATCHES
116 QUERY MATCHES CREATED

RANKING QUERY MATCHES
106  QUERY MATCHES SKIPPED (due to low score)
GENERATING CANDIDATE NETWORKS
10 CANDIDATE NETWORKS CREATED AND RANKED

CHECKING RELEVANCE
('johnny', 'depp', 'movies') 

*
RELEVANT CN IN 1 POSITION
[('person', 'name', frozenset(), frozenset({'johnny', 'depp'})),
 'casting',
 ('movie', '*', frozenset({'movies'}), frozenset())]
QUERY-SET  ('movie', 'steven', 'spielberg') 

FINDING TUPLE-SETS
20 TUPLE-SETS CREATED

FINDING SCHEMA-SETS
1  SCHEMA-SETS CREATED

GENERATING QUERY MATCHES
137 QUERY MATCHES CREATED

RANKING QUERY MATCHES
127  QUERY MATCHES SKIPPED (due to low score)
GENERATING CANDIDATE NETWORKS
10 CANDIDATE NETWORKS CREATED AND RANKED

CHECKING RELEVA

**********
NO RELEVANT CN FOUND
QUERY-SET  ('actors', 'x-men') 

FINDING TUPLE-SETS
6 TUPLE-SETS CREATED

FINDING SCHEMA-SETS
0  SCHEMA-SETS CREATED

GENERATING QUERY MATCHES
8 QUERY MATCHES CREATED

RANKING QUERY MATCHES
GENERATING CANDIDATE NETWORKS
8 CANDIDATE NETWORKS CREATED AND RANKED

CHECKING RELEVANCE
('actors', 'x-men') 

********
NO RELEVANT CN FOUND
THRESHOLD:  0.7
QUERY-SET  ('denzel', 'washington') 

FINDING TUPLE-SETS
12 TUPLE-SETS CREATED

FINDING SCHEMA-SETS
0  SCHEMA-SETS CREATED

GENERATING QUERY MATCHES
23 QUERY MATCHES CREATED

RANKING QUERY MATCHES
13  QUERY MATCHES SKIPPED (due to low score)
GENERATING CANDIDATE NETWORKS
10 CANDIDATE NETWORKS CREATED AND RANKED

CHECKING RELEVANCE
('denzel', 'washington') 

*
RELEVANT CN IN 1 POSITION
[('person', 'name', frozenset(), frozenset({'washington', 'denzel'}))]
QUERY-SET  ('clint', 'eastwood') 

FINDING TUPLE-SETS
11 TUPLE-SETS CREATED

FINDING SCHEMA-SETS
0  SCHEMA-SETS CREATED

GENERATING QUERY MATCHES
19 QUERY MATCHE

17  QUERY MATCHES SKIPPED (due to low score)
GENERATING CANDIDATE NETWORKS
10 CANDIDATE NETWORKS CREATED AND RANKED

CHECKING RELEVANCE
('angelina', 'jolie') 

*
RELEVANT CN IN 1 POSITION
[('person', 'name', frozenset(), frozenset({'jolie', 'angelina'}))]
QUERY-SET  ('morgan', 'freeman') 

FINDING TUPLE-SETS
14 TUPLE-SETS CREATED

FINDING SCHEMA-SETS
0  SCHEMA-SETS CREATED

GENERATING QUERY MATCHES
29 QUERY MATCHES CREATED

RANKING QUERY MATCHES
19  QUERY MATCHES SKIPPED (due to low score)
GENERATING CANDIDATE NETWORKS
10 CANDIDATE NETWORKS CREATED AND RANKED

CHECKING RELEVANCE
('morgan', 'freeman') 

*
RELEVANT CN IN 1 POSITION
[('person', 'name', frozenset(), frozenset({'morgan', 'freeman'}))]
QUERY-SET  ('gone', 'wind') 

FINDING TUPLE-SETS
12 TUPLE-SETS CREATED

FINDING SCHEMA-SETS
0  SCHEMA-SETS CREATED

GENERATING QUERY MATCHES
27 QUERY MATCHES CREATED

RANKING QUERY MATCHES
17  QUERY MATCHES SKIPPED (due to low score)
GENERATING CANDIDATE NETWORKS
10 CANDIDATE NETWORKS CREATED 

17  QUERY MATCHES SKIPPED (due to low score)
GENERATING CANDIDATE NETWORKS
10 CANDIDATE NETWORKS CREATED AND RANKED

CHECKING RELEVANCE
('princess', 'bride') 

*
RELEVANT CN IN 1 POSITION
[('movie', 'title', frozenset(), frozenset({'bride', 'princess'}))]
QUERY-SET  ('godfather',) 

FINDING TUPLE-SETS
5 TUPLE-SETS CREATED

FINDING SCHEMA-SETS
0  SCHEMA-SETS CREATED

GENERATING QUERY MATCHES
5 QUERY MATCHES CREATED

RANKING QUERY MATCHES
GENERATING CANDIDATE NETWORKS
5 CANDIDATE NETWORKS CREATED AND RANKED

CHECKING RELEVANCE
('godfather',) 

*
RELEVANT CN IN 1 POSITION
[('movie', 'title', frozenset(), frozenset({'godfather'}))]
QUERY-SET  ('title', 'atticus', 'finch') 

FINDING TUPLE-SETS
15 TUPLE-SETS CREATED

FINDING SCHEMA-SETS
1  SCHEMA-SETS CREATED

GENERATING QUERY MATCHES
102 QUERY MATCHES CREATED

RANKING QUERY MATCHES
92  QUERY MATCHES SKIPPED (due to low score)
GENERATING CANDIDATE NETWORKS
10 CANDIDATE NETWORKS CREATED AND RANKED

CHECKING RELEVANCE
('title', 'atticus', 'fin

15 TUPLE-SETS CREATED

FINDING SCHEMA-SETS
1  SCHEMA-SETS CREATED

GENERATING QUERY MATCHES
85 QUERY MATCHES CREATED

RANKING QUERY MATCHES
75  QUERY MATCHES SKIPPED (due to low score)
GENERATING CANDIDATE NETWORKS
10 CANDIDATE NETWORKS CREATED AND RANKED

CHECKING RELEVANCE
('title', 'darth', 'vader') 

*
RELEVANT CN IN 1 POSITION
[('character', 'name', frozenset(), frozenset({'vader', 'darth'})),
 'casting',
 ('movie', 'title', frozenset({'title'}), frozenset())]
QUERY-SET  ('title', 'wicked', 'witch', 'west') 

FINDING TUPLE-SETS
28 TUPLE-SETS CREATED

FINDING SCHEMA-SETS
1  SCHEMA-SETS CREATED

GENERATING QUERY MATCHES
804 QUERY MATCHES CREATED

RANKING QUERY MATCHES
794  QUERY MATCHES SKIPPED (due to low score)
GENERATING CANDIDATE NETWORKS
10 CANDIDATE NETWORKS CREATED AND RANKED

CHECKING RELEVANCE
('title', 'wicked', 'witch', 'west') 

*
RELEVANT CN IN 1 POSITION
[('movie', 'title', frozenset({'title'}), frozenset()),
 'casting',
 ('character', 'name', frozenset(), frozenset({'

20 QUERY MATCHES CREATED

RANKING QUERY MATCHES
10  QUERY MATCHES SKIPPED (due to low score)
GENERATING CANDIDATE NETWORKS
10 CANDIDATE NETWORKS CREATED AND RANKED

CHECKING RELEVANCE
('hanks', '2004') 

**********
NO RELEVANT CN FOUND
QUERY-SET  ('henry', 'fonda', 'mine', 'character', 'name') 

FINDING TUPLE-SETS
35 TUPLE-SETS CREATED

FINDING SCHEMA-SETS
3  SCHEMA-SETS CREATED

GENERATING QUERY MATCHES
7094 QUERY MATCHES CREATED

RANKING QUERY MATCHES
7084  QUERY MATCHES SKIPPED (due to low score)
GENERATING CANDIDATE NETWORKS
10 CANDIDATE NETWORKS CREATED AND RANKED

CHECKING RELEVANCE
('henry', 'fonda', 'mine', 'character', 'name') 

**
RELEVANT CN IN 2 POSITION
[('person', 'name', frozenset(), frozenset({'fonda', 'henry'})),
 'casting',
 ('movie', 'title', frozenset(), frozenset({'mine'})),
 'casting',
 ('character', 'name', frozenset({'name', 'character'}), frozenset())]
QUERY-SET  ('russell', 'crowe', 'gladiator', 'character', 'name') 

FINDING TUPLE-SETS
29 TUPLE-SETS CREATED



37 TUPLE-SETS CREATED

FINDING SCHEMA-SETS
0  SCHEMA-SETS CREATED

GENERATING QUERY MATCHES
1553 QUERY MATCHES CREATED

RANKING QUERY MATCHES
1543  QUERY MATCHES SKIPPED (due to low score)
GENERATING CANDIDATE NETWORKS
10 CANDIDATE NETWORKS CREATED AND RANKED

CHECKING RELEVANCE
('harrison', 'ford', 'george', 'lucas') 

*********
RELEVANT CN IN 9 POSITION
[('person', 'name', frozenset(), frozenset({'harrison', 'ford'})),
 'casting',
 'movie',
 'casting',
 ('person', 'name', frozenset(), frozenset({'lucas', 'george'}))]
QUERY-SET  ('sean', 'connery', 'fleming') 

FINDING TUPLE-SETS
21 TUPLE-SETS CREATED

FINDING SCHEMA-SETS
0  SCHEMA-SETS CREATED

GENERATING QUERY MATCHES
164 QUERY MATCHES CREATED

RANKING QUERY MATCHES
154  QUERY MATCHES SKIPPED (due to low score)
GENERATING CANDIDATE NETWORKS
10 CANDIDATE NETWORKS CREATED AND RANKED

CHECKING RELEVANCE
('sean', 'connery', 'fleming') 

*******
RELEVANT CN IN 7 POSITION
[('person', 'name', frozenset(), frozenset({'fleming'})),
 'casting

**********
NO RELEVANT CN FOUND
QUERY-SET  ('protagonist', 'sound', 'music') 

FINDING TUPLE-SETS
16 TUPLE-SETS CREATED

FINDING SCHEMA-SETS
0  SCHEMA-SETS CREATED

GENERATING QUERY MATCHES
84 QUERY MATCHES CREATED

RANKING QUERY MATCHES
74  QUERY MATCHES SKIPPED (due to low score)
GENERATING CANDIDATE NETWORKS
10 CANDIDATE NETWORKS CREATED AND RANKED

CHECKING RELEVANCE
('protagonist', 'sound', 'music') 

**********
NO RELEVANT CN FOUND
QUERY-SET  ('character', 'forrest', 'gump') 

FINDING TUPLE-SETS
17 TUPLE-SETS CREATED

FINDING SCHEMA-SETS
1  SCHEMA-SETS CREATED

GENERATING QUERY MATCHES
140 QUERY MATCHES CREATED

RANKING QUERY MATCHES
130  QUERY MATCHES SKIPPED (due to low score)
GENERATING CANDIDATE NETWORKS
10 CANDIDATE NETWORKS CREATED AND RANKED

CHECKING RELEVANCE
('character', 'forrest', 'gump') 

*
RELEVANT CN IN 1 POSITION
[('character',
  'name',
  frozenset({'character'}),
  frozenset({'gump', 'forrest'}))]
QUERY-SET  ('movie', 'social', 'network') 

FINDING TUPLE-SETS
1

20  QUERY MATCHES SKIPPED (due to low score)
GENERATING CANDIDATE NETWORKS
10 CANDIDATE NETWORKS CREATED AND RANKED

CHECKING RELEVANCE
('john', 'wayne') 

*
RELEVANT CN IN 1 POSITION
[('person', 'name', frozenset(), frozenset({'john', 'wayne'}))]
QUERY-SET  ('will', 'smith') 

FINDING TUPLE-SETS
15 TUPLE-SETS CREATED

FINDING SCHEMA-SETS
0  SCHEMA-SETS CREATED

GENERATING QUERY MATCHES
30 QUERY MATCHES CREATED

RANKING QUERY MATCHES
20  QUERY MATCHES SKIPPED (due to low score)
GENERATING CANDIDATE NETWORKS
10 CANDIDATE NETWORKS CREATED AND RANKED

CHECKING RELEVANCE
('will', 'smith') 

**
RELEVANT CN IN 2 POSITION
[('person', 'name', frozenset(), frozenset({'will', 'smith'}))]
QUERY-SET  ('harrison', 'ford') 

FINDING TUPLE-SETS
14 TUPLE-SETS CREATED

FINDING SCHEMA-SETS
0  SCHEMA-SETS CREATED

GENERATING QUERY MATCHES
29 QUERY MATCHES CREATED

RANKING QUERY MATCHES
19  QUERY MATCHES SKIPPED (due to low score)
GENERATING CANDIDATE NETWORKS
10 CANDIDATE NETWORKS CREATED AND RANKED

CHE

GENERATING CANDIDATE NETWORKS
4 CANDIDATE NETWORKS CREATED AND RANKED

CHECKING RELEVANCE
('casablanca',) 

*
RELEVANT CN IN 1 POSITION
[('movie', 'title', frozenset(), frozenset({'casablanca'}))]
QUERY-SET  ('lord', 'rings') 

FINDING TUPLE-SETS
12 TUPLE-SETS CREATED

FINDING SCHEMA-SETS
0  SCHEMA-SETS CREATED

GENERATING QUERY MATCHES
27 QUERY MATCHES CREATED

RANKING QUERY MATCHES
17  QUERY MATCHES SKIPPED (due to low score)
GENERATING CANDIDATE NETWORKS
10 CANDIDATE NETWORKS CREATED AND RANKED

CHECKING RELEVANCE
('lord', 'rings') 

*
RELEVANT CN IN 1 POSITION
[('movie', 'title', frozenset(), frozenset({'rings', 'lord'}))]
QUERY-SET  ('sound', 'music') 

FINDING TUPLE-SETS
13 TUPLE-SETS CREATED

FINDING SCHEMA-SETS
0  SCHEMA-SETS CREATED

GENERATING QUERY MATCHES
28 QUERY MATCHES CREATED

RANKING QUERY MATCHES
18  QUERY MATCHES SKIPPED (due to low score)
GENERATING CANDIDATE NETWORKS
10 CANDIDATE NETWORKS CREATED AND RANKED

CHECKING RELEVANCE
('sound', 'music') 

*
RELEVANT CN IN 

22 TUPLE-SETS CREATED

FINDING SCHEMA-SETS
1  SCHEMA-SETS CREATED

GENERATING QUERY MATCHES
201 QUERY MATCHES CREATED

RANKING QUERY MATCHES
191  QUERY MATCHES SKIPPED (due to low score)
GENERATING CANDIDATE NETWORKS
10 CANDIDATE NETWORKS CREATED AND RANKED

CHECKING RELEVANCE
('title', 'james', 'bond') 

***
RELEVANT CN IN 3 POSITION
[('movie', 'title', frozenset({'title'}), frozenset()),
 'casting',
 ('character', 'name', frozenset(), frozenset({'bond', 'james'}))]
QUERY-SET  ('title', 'rick', 'blaine') 

FINDING TUPLE-SETS
20 TUPLE-SETS CREATED

FINDING SCHEMA-SETS
1  SCHEMA-SETS CREATED

GENERATING QUERY MATCHES
183 QUERY MATCHES CREATED

RANKING QUERY MATCHES
173  QUERY MATCHES SKIPPED (due to low score)
GENERATING CANDIDATE NETWORKS
10 CANDIDATE NETWORKS CREATED AND RANKED

CHECKING RELEVANCE
('title', 'rick', 'blaine') 

**
RELEVANT CN IN 2 POSITION
[('movie', 'title', frozenset({'title'}), frozenset()),
 'casting',
 ('character', 'name', frozenset(), frozenset({'rick', 'blaine'

*
RELEVANT CN IN 1 POSITION
[('movie', 'title', frozenset({'title'}), frozenset()),
 ('info', 'info', frozenset(), frozenset({'damn', 'frankly', 'give'}))]
QUERY-SET  ('title', 'make', 'offer', 'refuse') 

FINDING TUPLE-SETS
27 TUPLE-SETS CREATED

FINDING SCHEMA-SETS
1  SCHEMA-SETS CREATED

GENERATING QUERY MATCHES
634 QUERY MATCHES CREATED

RANKING QUERY MATCHES
624  QUERY MATCHES SKIPPED (due to low score)
GENERATING CANDIDATE NETWORKS
10 CANDIDATE NETWORKS CREATED AND RANKED

CHECKING RELEVANCE
('title', 'make', 'offer', 'refuse') 

*
RELEVANT CN IN 1 POSITION
[('movie', 'title', frozenset({'title'}), frozenset()),
 ('info', 'info', frozenset(), frozenset({'offer', 'refuse', 'make'}))]
QUERY-SET  ('title', 'coulda', 'class', 'contender', 'somebody', 'bum') 

FINDING TUPLE-SETS
31 TUPLE-SETS CREATED

FINDING SCHEMA-SETS
1  SCHEMA-SETS CREATED

GENERATING QUERY MATCHES
4338 QUERY MATCHES CREATED

RANKING QUERY MATCHES
4328  QUERY MATCHES SKIPPED (due to low score)
GENERATING CANDIDATE

21 TUPLE-SETS CREATED

FINDING SCHEMA-SETS
0  SCHEMA-SETS CREATED

GENERATING QUERY MATCHES
174 QUERY MATCHES CREATED

RANKING QUERY MATCHES
164  QUERY MATCHES SKIPPED (due to low score)
GENERATING CANDIDATE NETWORKS
10 CANDIDATE NETWORKS CREATED AND RANKED

CHECKING RELEVANCE
('brent', 'spiner', 'star', 'trek') 

*
RELEVANT CN IN 1 POSITION
[('person', 'name', frozenset(), frozenset({'brent', 'spiner'})),
 'casting',
 ('movie', 'title', frozenset(), frozenset({'trek', 'star'}))]
QUERY-SET  ('audrey', 'hepburn', '1951') 

FINDING TUPLE-SETS
15 TUPLE-SETS CREATED

FINDING SCHEMA-SETS
0  SCHEMA-SETS CREATED

GENERATING QUERY MATCHES
69 QUERY MATCHES CREATED

RANKING QUERY MATCHES
59  QUERY MATCHES SKIPPED (due to low score)
GENERATING CANDIDATE NETWORKS
10 CANDIDATE NETWORKS CREATED AND RANKED

CHECKING RELEVANCE
('audrey', 'hepburn', '1951') 

**********
NO RELEVANT CN FOUND
QUERY-SET  ('name', 'jacques', 'clouseau') 

FINDING TUPLE-SETS
16 TUPLE-SETS CREATED

FINDING SCHEMA-SETS
2  SCH

125  QUERY MATCHES SKIPPED (due to low score)
GENERATING CANDIDATE NETWORKS
10 CANDIDATE NETWORKS CREATED AND RANKED

CHECKING RELEVANCE
('dean', 'jones', 'herbie') 

******
RELEVANT CN IN 6 POSITION
[('person', 'name', frozenset(), frozenset({'herbie'})),
 'casting',
 'movie',
 'casting',
 ('person', 'name', frozenset(), frozenset({'dean', 'jones'}))]
QUERY-SET  ('indiana', 'jones', 'last', 'crusade', 'lost', 'ark') 

FINDING TUPLE-SETS
48 TUPLE-SETS CREATED

FINDING SCHEMA-SETS
0  SCHEMA-SETS CREATED

GENERATING QUERY MATCHES
22308 QUERY MATCHES CREATED

RANKING QUERY MATCHES
22298  QUERY MATCHES SKIPPED (due to low score)
GENERATING CANDIDATE NETWORKS
10 CANDIDATE NETWORKS CREATED AND RANKED

CHECKING RELEVANCE
('indiana', 'jones', 'last', 'crusade', 'lost', 'ark') 

*
RELEVANT CN IN 1 POSITION
[('movie', 'title', frozenset(), frozenset({'ark', 'lost'})),
 'casting',
 'person',
 'casting',
 ('movie',
  'title',
  frozenset(),
  frozenset({'crusade', 'last', 'indiana', 'jones'}))]
QU

52 TUPLE-SETS CREATED

FINDING SCHEMA-SETS
0  SCHEMA-SETS CREATED

GENERATING QUERY MATCHES
14272 QUERY MATCHES CREATED

RANKING QUERY MATCHES
14262  QUERY MATCHES SKIPPED (due to low score)
GENERATING CANDIDATE NETWORKS
10 CANDIDATE NETWORKS CREATED AND RANKED

CHECKING RELEVANCE
('king', 'kong', 'actor', 'jack', 'black') 

**********
NO RELEVANT CN FOUND
QUERY-SET  ('actor', 'fellowship', 'ring', 'return', 'king') 

FINDING TUPLE-SETS
34 TUPLE-SETS CREATED

FINDING SCHEMA-SETS
0  SCHEMA-SETS CREATED

GENERATING QUERY MATCHES
3387 QUERY MATCHES CREATED

RANKING QUERY MATCHES
3377  QUERY MATCHES SKIPPED (due to low score)
GENERATING CANDIDATE NETWORKS
10 CANDIDATE NETWORKS CREATED AND RANKED

CHECKING RELEVANCE
('actor', 'fellowship', 'ring', 'return', 'king') 

**********
NO RELEVANT CN FOUND
QUERY-SET  ('cast', 'star', 'wars') 

FINDING TUPLE-SETS
16 TUPLE-SETS CREATED

FINDING SCHEMA-SETS
1  SCHEMA-SETS CREATED

GENERATING QUERY MATCHES
107 QUERY MATCHES CREATED

RANKING QUERY MATCH