## Script de Geração de Perguntas para miniKGraph.rdf

Gerar perguntas a partir do Grafo de Conhecimento (PetroKGraph) e criar o contexto relevante para cada pergunta.

To begin, we need to import the necessary modules and set up the environment variables.

### Requirements

```bash
!pip install networkx 
!pip install node2vec 
!pip install rdflib
```

In [1]:
import os
import json
import rdflib
import networkx as nx
from node2vec import Node2Vec

  from .autonotebook import tqdm as notebook_tqdm


### **✅ Carregar o PetroKGraph**

In [2]:
from rdflib import Graph, URIRef, Namespace, Literal

g = rdflib.Graph()
g.parse("C:\ICA\Projetos Git\knowledge-graph-question-answering\Dataset MiniKGraph\miniOntoGeoLogicaInstanciasRelacoes.rdf", format="xml") 

G = nx.Graph()

# Añadir nodos y aristas del grafo RDF al grafo de NetworkX
for subj, pred, obj in g:
   
    G.add_edge(str(subj), str(obj), label=str(pred))

  g.parse("C:\ICA\Projetos Git\knowledge-graph-question-answering\Dataset MiniKGraph\miniOntoGeoLogicaInstanciasRelacoes.rdf", format="xml")


### **✅ Verificar Grafo con networkx**

In [3]:
print(f"* * * * * * * * *")
# Numero de nodos, ejes del grafo y grado medio
nodos = G.number_of_nodes()
ejes = G.number_of_edges()
k = ejes*2/nodos
densidad = nx.density(G)
print(f"Grafo con {nodos} nodos, {ejes} ejes, densidad {densidad} y grado medio {k}")

# Grado de los nodos: el número de conexiones de cada nodo
nx.degree(G)

* * * * * * * * *
Grafo con 4472 nodos, 11915 ejes, densidad 0.0011918389589999395 y grado medio 5.32871198568873


DegreeView({'http://www.semanticweb.org/bg40/ontologies/2022/5/untitled-ontology-2#Eoburacica_Subage_quality': 7, 'Subidade Eoburacica': 2, 'http://www.semanticweb.org/bg40/ontologies/2022/5/untitled-ontology-2#earth_fluid': 8, 'http://www.w3.org/2002/07/owl#Class': 606, 'http://www.semanticweb.org/bg40/ontologies/2022/5/untitled-ontology-2#POCO_CD_POCO_023059': 15, 'http://www.semanticweb.org/bg40/ontologies/2022/5/untitled-ontology-2#formacao_141': 91, 'http://www.semanticweb.org/bg40/ontologies/2022/5/untitled-ontology-2#POCO_CD_POCO_022557': 17, 'http://www.semanticweb.org/bg40/ontologies/2022/5/untitled-ontology-2#well': 348, 'http://www.semanticweb.org/bg40/ontologies/2022/5/untitled-ontology-2#diabase_grupo_002': 6, 'http://www.w3.org/2002/07/owl#NamedIndividual': 1395, 'http://www.semanticweb.org/bg40/ontologies/2022/5/untitled-ontology-2#hypabyssal_rock': 6, 'hipabissal': 1, 'http://www.semanticweb.org/bg40/ontologies/2022/5/untitled-ontology-2#POCO_CD_POCO_023023': 5, 'http:/

In [4]:
# Imprimir nodos
print("Nodos del grafo:")
print(G.nodes())

Nodos del grafo:
['http://www.semanticweb.org/bg40/ontologies/2022/5/untitled-ontology-2#Eoburacica_Subage_quality', 'Subidade Eoburacica', 'http://www.semanticweb.org/bg40/ontologies/2022/5/untitled-ontology-2#earth_fluid', 'http://www.w3.org/2002/07/owl#Class', 'http://www.semanticweb.org/bg40/ontologies/2022/5/untitled-ontology-2#POCO_CD_POCO_023059', 'http://www.semanticweb.org/bg40/ontologies/2022/5/untitled-ontology-2#formacao_141', 'http://www.semanticweb.org/bg40/ontologies/2022/5/untitled-ontology-2#POCO_CD_POCO_022557', 'http://www.semanticweb.org/bg40/ontologies/2022/5/untitled-ontology-2#well', 'http://www.semanticweb.org/bg40/ontologies/2022/5/untitled-ontology-2#diabase_grupo_002', 'http://www.w3.org/2002/07/owl#NamedIndividual', 'http://www.semanticweb.org/bg40/ontologies/2022/5/untitled-ontology-2#hypabyssal_rock', 'hipabissal', 'http://www.semanticweb.org/bg40/ontologies/2022/5/untitled-ontology-2#POCO_CD_POCO_023023', 'http://www.semanticweb.org/bg40/ontologies/2022/5

In [5]:
print("\nAristas del grafo con etiquetas:")
for edge in G.edges(data=True):
    print(edge)


Aristas del grafo con etiquetas:
('http://www.semanticweb.org/bg40/ontologies/2022/5/untitled-ontology-2#Eoburacica_Subage_quality', 'Subidade Eoburacica', {'label': 'http://www.w3.org/2000/01/rdf-schema#label'})
('http://www.semanticweb.org/bg40/ontologies/2022/5/untitled-ontology-2#Eoburacica_Subage_quality', 'Subandar Buracica Inferior', {'label': 'http://www.w3.org/2000/01/rdf-schema#label'})
('http://www.semanticweb.org/bg40/ontologies/2022/5/untitled-ontology-2#Eoburacica_Subage_quality', 'Buracica Inferior', {'label': 'http://www.w3.org/2000/01/rdf-schema#label'})
('http://www.semanticweb.org/bg40/ontologies/2022/5/untitled-ontology-2#Eoburacica_Subage_quality', 'Eoburacica', {'label': 'http://www.w3.org/2000/01/rdf-schema#label'})
('http://www.semanticweb.org/bg40/ontologies/2022/5/untitled-ontology-2#Eoburacica_Subage_quality', 'Subidade Buracica Inferior', {'label': 'http://www.w3.org/2000/01/rdf-schema#label'})
('http://www.semanticweb.org/bg40/ontologies/2022/5/untitled-on

### ✅ Verificar etiquetas del Grafo

In [6]:
def verificar_etiquetas(grafo):
  """
  Recorre el grafo y verifica si cada nodo tiene una etiqueta asociada.
  """
  nodos_sin_etiqueta = []
  for nodo in grafo.nodes:
    if not grafo.has_node(nodo):
      continue

    etiqueta = grafo.nodes[nodo].get("label", None)
    if not etiqueta:
      nodos_sin_etiqueta.append(nodo)

  return nodos_sin_etiqueta

grafo = G

nodos_sin_etiqueta = verificar_etiquetas(grafo)
if nodos_sin_etiqueta:
  print(f"Se encontraron {len(nodos_sin_etiqueta)} nodos sin etiqueta:")
  print(nodos_sin_etiqueta)
else:
  print("Todos los nodos tienen etiqueta.")

Se encontraron 4472 nodos sin etiqueta:
['http://www.semanticweb.org/bg40/ontologies/2022/5/untitled-ontology-2#Eoburacica_Subage_quality', 'Subidade Eoburacica', 'http://www.semanticweb.org/bg40/ontologies/2022/5/untitled-ontology-2#earth_fluid', 'http://www.w3.org/2002/07/owl#Class', 'http://www.semanticweb.org/bg40/ontologies/2022/5/untitled-ontology-2#POCO_CD_POCO_023059', 'http://www.semanticweb.org/bg40/ontologies/2022/5/untitled-ontology-2#formacao_141', 'http://www.semanticweb.org/bg40/ontologies/2022/5/untitled-ontology-2#POCO_CD_POCO_022557', 'http://www.semanticweb.org/bg40/ontologies/2022/5/untitled-ontology-2#well', 'http://www.semanticweb.org/bg40/ontologies/2022/5/untitled-ontology-2#diabase_grupo_002', 'http://www.w3.org/2002/07/owl#NamedIndividual', 'http://www.semanticweb.org/bg40/ontologies/2022/5/untitled-ontology-2#hypabyssal_rock', 'hipabissal', 'http://www.semanticweb.org/bg40/ontologies/2022/5/untitled-ontology-2#POCO_CD_POCO_023023', 'http://www.semanticweb.org

In [7]:
def verificar_etiquetas_con_etiqueta(grafo):
  """
  Recorre el grafo y verifica si cada nodo tiene una etiqueta asociada.
  Si un nodo tiene etiqueta, imprime el nodo y su etiqueta.
  """
  nodos_sin_etiqueta = verificar_etiquetas(grafo)
  if nodos_sin_etiqueta:
    print(f"Se encontraron {len(nodos_sin_etiqueta)} nodos sin etiqueta:")
    for nodo in nodos_sin_etiqueta:
      etiqueta = grafo.nodes[nodo].get("label", None)
      if etiqueta:
        print(f"- {nodo}: {etiqueta}")
  else:
    print("Todos los nodos tienen etiqueta.")

# Ejemplo de uso
grafo = G
# Agregue nodos al grafo...

verificar_etiquetas_con_etiqueta(grafo)


Se encontraron 4472 nodos sin etiqueta:


### ✅ **Inspección de las Triplas**


Primero, inspeccionar las triplas para ver cuántos elementos contienen. al parecer las "triples" encontrados tienen una cantidad significativamente mayor de elementos de lo esperado, esto va en concordancia a lo estudiado del grafo, lo que indica que están representando algo más complejo, siendo que son descripciones más detalladas de entidades.

In [8]:
# Inspeccionar las triplas
for triple in G:
    print(triple)
    print(f"Number of elements in triple: {len(triple)}")
    if len(triple) <= 3:
        print("Found a non-triple element")


http://www.semanticweb.org/bg40/ontologies/2022/5/untitled-ontology-2#Eoburacica_Subage_quality
Number of elements in triple: 95
Subidade Eoburacica
Number of elements in triple: 19
http://www.semanticweb.org/bg40/ontologies/2022/5/untitled-ontology-2#earth_fluid
Number of elements in triple: 81
http://www.w3.org/2002/07/owl#Class
Number of elements in triple: 35
http://www.semanticweb.org/bg40/ontologies/2022/5/untitled-ontology-2#POCO_CD_POCO_023059
Number of elements in triple: 89
http://www.semanticweb.org/bg40/ontologies/2022/5/untitled-ontology-2#formacao_141
Number of elements in triple: 82
http://www.semanticweb.org/bg40/ontologies/2022/5/untitled-ontology-2#POCO_CD_POCO_022557
Number of elements in triple: 89
http://www.semanticweb.org/bg40/ontologies/2022/5/untitled-ontology-2#well
Number of elements in triple: 74
http://www.semanticweb.org/bg40/ontologies/2022/5/untitled-ontology-2#diabase_grupo_002
Number of elements in triple: 87
http://www.w3.org/2002/07/owl#NamedIndividu

Las "triplas" encontradas contienen descripciones más detalladas de entidades, descartar estos elementos resultaría en la pérdida de datos, entonces se adaptará el código para manejar este caso adecuadamente. Una forma de abordar esto es manejar triplas con un número variable de elementos. En lugar de asumir que cada triple tiene exactamente tres elementos.

### ✅ **Consultar o PetroKGraph atualizado para obter informação relevante**

In [139]:
from rdflib import Graph, URIRef, Namespace 

fields = {}
basins = {}
wells  = {}
formations = {}
geological_structures = {}

for s, p, o in g:
    if isinstance(s, URIRef) and "CAMP_CD_CAMPO" in s:
        campo_id = s.split("#")[1]
        if campo_id not in fields:
            fields[campo_id] = {"types": [], "located_in": [], "labels": [], "related": []}
        if "type" in p:
            fields[campo_id]["types"].append(o)
        elif "located_in" in p:
            fields[campo_id]["located_in"].append(o)
        elif "label" in p:
            fields[campo_id]["labels"].append(str(o))
        else:
            fields[campo_id]["related"].append((p, o))

    if isinstance(s, URIRef) and "BASE_CD_BACIA" in s:
        bacia_id = s.split("#")[1]
        if bacia_id not in basins:
            basins[bacia_id] = {"types": [], "labels": []}
        if "type" in p:
            basins[bacia_id]["types"].append(o)
        elif "label" in p:
            basins[bacia_id]["labels"].append(str(o))
            
    if isinstance(s, URIRef) and "POCO_CD_POCO" in s:
        poco_id = s.split("#")[1]
        if poco_id not in wells:
            wells[poco_id] = {"types": [], "located_in": [], "labels": [], "crosses": []}
        if "type" in p:
            wells[poco_id]["types"].append(o)
        elif "located_in" in p:
            wells[poco_id]["located_in"].append(o)
        elif "crosses" in p:
            wells[poco_id]["crosses"].append(str(o))
        elif "label" in p:
            wells[poco_id]["labels"].append(str(o))

    if isinstance(s, URIRef) and "formacao" in s:
        unidade_lito_id = s.split("#")[1]
        if unidade_lito_id not in formations:
            formations[unidade_lito_id] = {"types": [], "located_in": [], "has_age": [], "part_of": [],"carrier_of": [], "constituted_by": [], "crosses": [],  "labels": []}
        if "type" in p:
            formations[unidade_lito_id]["types"].append(o)
        elif "located_in" in p:
            formations[unidade_lito_id]["located_in"].append(str(o))  
        elif "constituted_by" in p:
            formations[unidade_lito_id]["constituted_by"].append(o) 
        elif "has_age" in p:
            formations[unidade_lito_id]["has_age"].append(str(o))  
        elif "part_of" in p:
            formations[unidade_lito_id]["part_of"].append(o)   
        elif "carrier_of" in p:
            formations[unidade_lito_id]["carrier_of"].append(o) 
        elif "crosses" in p:
            formations[unidade_lito_id]["crosses"].append(str(o))
        elif "label" in p:
            formations[unidade_lito_id]["labels"].append(str(o))
          

    if isinstance(s, URIRef) and "grupo" in s:
        unidade_lito_id = s.split("#")[1]
        if unidade_lito_id not in formations:
            formations[unidade_lito_id] = {"types": [], "located_in": [], "has_age": [], "part_of": [],"carrier_of": [], "constituted_by": [], "crosses": [],  "labels": []}
        if "type" in p:
            formations[unidade_lito_id]["types"].append(o)
        elif "located_in" in p:
            formations[unidade_lito_id]["located_in"].append(str(o))  
        elif "constituted_by" in p:
            formations[unidade_lito_id]["constituted_by"].append(o) 
        elif "has_age" in p:
            formations[unidade_lito_id]["has_age"].append(str(o))  
        elif "part_of" in p:
            formations[unidade_lito_id]["part_of"].append(o)  
        elif "carrier_of" in p:
            formations[unidade_lito_id]["carrier_of"].append(o)  
        elif "crosses" in p:
            formations[unidade_lito_id]["crosses"].append(str(o))
        elif "label" in p:
            formations[unidade_lito_id]["labels"].append(str(o))
          

    if isinstance(s, URIRef) and "membro" in s:
        unidade_lito_id = s.split("#")[1]
        if unidade_lito_id not in formations:
            formations[unidade_lito_id] = {"types": [], "located_in": [], "has_age": [], "part_of": [],"carrier_of": [], "constituted_by": [], "crosses": [],  "labels": [] }
        if "type" in p:
            formations[unidade_lito_id]["types"].append(o)
        elif "located_in" in p:
            formations[unidade_lito_id]["located_in"].append(str(o))  
        elif "constituted_by" in p:
            formations[unidade_lito_id]["constituted_by"].append(o) 
        elif "has_age" in p:
            formations[unidade_lito_id]["has_age"].append(str(o))  
        elif "part_of" in p:
            formations[unidade_lito_id]["part_of"].append(o)   
        elif "carrier_of" in p:
            formations[unidade_lito_id]["carrier_of"].append(o) 
        elif "crosses" in p:
            formations[unidade_lito_id]["crosses"].append(str(o))
        elif "label" in p:
            formations[unidade_lito_id]["labels"].append(str(o))
            
    if isinstance(s, URIRef) and "TEFR_CD_TIPO_EST_FISICA" in s:
        estrutura_id = s.split("#")[1]
        partes = estrutura_id.split('_')
        estrutura_fisica_id = '_'.join(partes[:7])
        
        if estrutura_fisica_id not in geological_structures:
            geological_structures[estrutura_fisica_id] = {"types": [], "labels": [] }
        if "type" in p:
            geological_structures[estrutura_fisica_id]["types"].append(o)
        elif "label" in p:
            geological_structures[estrutura_fisica_id]["labels"].append(str(o))
          

In [140]:
print("Basins:", basins.keys())
print("Count Basins:", len(basins.keys()) )
print("* * * " )
print("Fields:", fields.keys())
print("Count Fields:", len(fields.keys()))
print("* * * " )
print("Wells:", wells.keys())
print("Count Wells:", len(wells.keys()))
print("* * * " )
print("Geological Structures:", geological_structures.keys())
print("Count Geological Structures:", len(geological_structures.keys()))
print("* * * " )
print("Formations:", formations.keys())
print("Count Formations:", len(formations.keys()))

Basins: dict_keys(['BASE_CD_BACIA_266', 'BASE_CD_BACIA_030'])
Count Basins: 2
* * * 
Fields: dict_keys(['CAMP_CD_CAMPO_0109', 'CAMP_CD_CAMPO_0633', 'CAMP_CD_CAMPO_0890', 'CAMP_CD_CAMPO_0888', 'CAMP_CD_CAMPO_0108', 'CAMP_CD_CAMPO_0131', 'CAMP_CD_CAMPO_0572', 'CAMP_CD_CAMPO_0721'])
Count Fields: 8
* * * 
Wells: dict_keys(['POCO_CD_POCO_023059', 'POCO_CD_POCO_022557', 'POCO_CD_POCO_023023', 'POCO_CD_POCO_022685', 'POCO_CD_POCO_023017', 'POCO_CD_POCO_022651', 'POCO_CD_POCO_022683', 'POCO_CD_POCO_022825', 'POCO_CD_POCO_005659', 'POCO_CD_POCO_023011', 'POCO_CD_POCO_023026', 'POCO_CD_POCO_022807', 'POCO_CD_POCO_022810', 'POCO_CD_POCO_022733', 'POCO_CD_POCO_023058', 'POCO_CD_POCO_005653', 'POCO_CD_POCO_022984', 'POCO_CD_POCO_022903', 'POCO_CD_POCO_022988', 'POCO_CD_POCO_005656', 'POCO_CD_POCO_022709', 'POCO_CD_POCO_022485', 'POCO_CD_POCO_022767', 'POCO_CD_POCO_022707', 'POCO_CD_POCO_005745', 'POCO_CD_POCO_028879', 'POCO_CD_POCO_022770', 'POCO_CD_POCO_005654', 'POCO_CD_POCO_005753', 'POCO_CD_PO

###  ✅ **Corroborar informação extraida**

In [77]:
# Formações extraídos
for forma, info in formations.items():
        print(f"** Unidade Litoestretigrafica: {forma}")
        print(f"  Parte de: {info['part_of']}")
        print(f"  Constituted  by: {info['constituted_by']}")
        print(f"  Crosses  by: {info['crosses']}")
        print(f"  Label: {info['labels']}")
        print(f"  Tem idade: {info['has_age']}")
        print(f"  Portadora de: {info['carrier_of']}")
        
for estr, info in geological_structures.items():
        print(f"** Estrutura Fisica: {estr}")
        print(f"  Label: {info['labels']}")

** Unidade Litoestretigrafica: diabase_grupo_002
  Parte de: []
  Constituted  by: []
  Crosses  by: []
  Label: ['dolerito', 'diabásio', 'microgabro']
  Tem idade: []
  Portadora de: []
** Unidade Litoestretigrafica: mudstone_grupo_011
  Parte de: []
  Constituted  by: []
  Crosses  by: []
  Label: ['calcilutito', 'mudstone']
  Tem idade: []
  Portadora de: []
** Unidade Litoestretigrafica: sandstone_membro_073
  Parte de: []
  Constituted  by: []
  Crosses  by: []
  Label: ['arenito', 'psamito']
  Tem idade: []
  Portadora de: []
** Unidade Litoestretigrafica: formacao_006
  Parte de: [rdflib.term.URIRef('http://www.semanticweb.org/bg40/ontologies/2022/5/untitled-ontology-2#grupo_073')]
  Constituted  by: [rdflib.term.URIRef('http://www.semanticweb.org/bg40/ontologies/2022/5/untitled-ontology-2#sandstone_formacao_006'), rdflib.term.URIRef('http://www.semanticweb.org/bg40/ontologies/2022/5/untitled-ontology-2#shale_formacao_006'), rdflib.term.URIRef('http://www.semanticweb.org/bg40/on

In [78]:
# Función para obtener el nombre de un bacia
def obtener_nombre_bacia(uri):
    bacia_id = uri.split("#")[1]
    if bacia_id in basins:
        return basins[bacia_id]["labels"]
    return [str(uri)]

# Campos extraídos
for campo, info in fields.items():
        print(f"** ** ** **")
        print(f"  Campo: {campo}")
        print(f"  Nombres: {info['labels']}")
        localizacion = info['located_in']
        nombres_bacia = [obtener_nombre_bacia(uri) for uri in localizacion]
        print(f"  Localizado na bacia: {nombres_bacia}")

** ** ** **
  Campo: CAMP_CD_CAMPO_0109
  Nombres: ['SARDINHA', 'ÁREA DO B097']
  Localizado na bacia: [['BACIA DE CAMAMU-ALMADA', 'CAMAMU MAR', 'CAMAMU TERRA', 'ALMADA TERRA', 'CAMAMU-ALMADA', 'ALMADA MAR']]
** ** ** **
  Campo: CAMP_CD_CAMPO_0633
  Nombres: ['IGARAPÉ MARIPÁ', 'JAPIIM']
  Localizado na bacia: [['BACIA DE AMAZONAS', 'AMAZONAS']]
** ** ** **
  Campo: CAMP_CD_CAMPO_0890
  Nombres: ['CAMARÃO NORTE', 'ÁREA BAS-131', 'CAMARAO NORTE']
  Localizado na bacia: [['BACIA DE CAMAMU-ALMADA', 'CAMAMU MAR', 'CAMAMU TERRA', 'ALMADA TERRA', 'CAMAMU-ALMADA', 'ALMADA MAR']]
** ** ** **
  Campo: CAMP_CD_CAMPO_0888
  Nombres: ['ÁREA DO B128', 'MANATI']
  Localizado na bacia: [['BACIA DE CAMAMU-ALMADA', 'CAMAMU MAR', 'CAMAMU TERRA', 'ALMADA TERRA', 'CAMAMU-ALMADA', 'ALMADA MAR']]
** ** ** **
  Campo: CAMP_CD_CAMPO_0108
  Nombres: ['ÁREA BAS-064', 'AREA DO BAS-64']
  Localizado na bacia: [['BACIA DE CAMAMU-ALMADA', 'CAMAMU MAR', 'CAMAMU TERRA', 'ALMADA TERRA', 'CAMAMU-ALMADA', 'ALMADA MAR']]

In [80]:
# Bacias extraídas
for basin, info in basins.items():
        print(f"** ** ** **")
        print(f"  Bacia: {basin}")
        print(f"  Nombres: {info['labels']}")

** ** ** **
  Bacia: BASE_CD_BACIA_266
  Nombres: ['BACIA DE CAMAMU-ALMADA', 'CAMAMU MAR', 'CAMAMU TERRA', 'ALMADA TERRA', 'CAMAMU-ALMADA', 'ALMADA MAR']
** ** ** **
  Bacia: BASE_CD_BACIA_030
  Nombres: ['BACIA DE AMAZONAS', 'AMAZONAS']


In [82]:
# Poços extraídos
counter = 0
for well, info in wells.items():
    counter += 1
    print(f"Poço: {well}")
    #print(f"  Tipos: {info['types']}")
    print(f"  Nombre del poço: {info['labels']}")
    localizacion = info['located_in']
    nombres_bacia = [obtener_nombre_bacia(uri) for uri in localizacion]
    print(f"  Localizado na bacia: {nombres_bacia}")

# Poços extraídos
print(f"Total de poços: {counter}")

Poço: POCO_CD_POCO_023059
  Nombre del poço: ['2-NAST-1-PA']
  Localizado na bacia: [['BACIA DE AMAZONAS', 'AMAZONAS']]
Poço: POCO_CD_POCO_022557
  Nombre del poço: ['1-BRSA-9-AM', '1-RPV-1-AM']
  Localizado na bacia: [['http://www.semanticweb.org/bg40/ontologies/2022/5/untitled-ontology-2#CAMP_CD_CAMPO_0131'], ['BACIA DE AMAZONAS', 'AMAZONAS']]
Poço: POCO_CD_POCO_023023
  Nombre del poço: ['2-IBST-1-PA']
  Localizado na bacia: [['BACIA DE AMAZONAS', 'AMAZONAS']]
Poço: POCO_CD_POCO_022685
  Nombre del poço: ['1-BL-1-PA']
  Localizado na bacia: [['BACIA DE AMAZONAS', 'AMAZONAS']]
Poço: POCO_CD_POCO_023017
  Nombre del poço: ['2-RPST-1-PA']
  Localizado na bacia: [['BACIA DE AMAZONAS', 'AMAZONAS']]
Poço: POCO_CD_POCO_022651
  Nombre del poço: ['2-CAST-2-AM']
  Localizado na bacia: [['BACIA DE AMAZONAS', 'AMAZONAS']]
Poço: POCO_CD_POCO_022683
  Nombre del poço: ['1-MD-1-AM']
  Localizado na bacia: [['BACIA DE AMAZONAS', 'AMAZONAS']]
Poço: POCO_CD_POCO_022825
  Nombre del poço: ['2-BUST-1-

####  ✅  **Visualização do grafo**

Usando [WebVOWL](https://service.tib.eu/webvowl/) pode ser possivel visualizar cada nó no grafo.

![image.png](attachment:image.png)

### 🔎 **Gerar as perguntas baseado no miniKGraph** 

In [215]:
namespace_base = rdflib.Namespace("http://www.semanticweb.org/bg40/ontologies/2022/5/untitled-ontology-2#")
rdfs = Namespace("http://www.w3.org/2000/01/rdf-schema#")
rdf= Namespace("http://www.w3.org/1999/02/22-rdf-syntax-ns#")
import uuid

def gerar_perguntas(g):
    questions = []
    id_counter = 1  # Contador para los IDs de las preguntas
                
# ******** PERGUNTAS SOBRES OS NOMES RELACIONADOS NOS LABELS ******** 

    for bacia_URI, info in basins.items():
        
        if info["labels"]:
            basins_names = []
            
            for bacia_name in g.objects(URIRef(f"http://www.semanticweb.org/bg40/ontologies/2022/5/untitled-ontology-2#{bacia_URI}"), rdfs.label):
                bas = str(bacia_name).title()
                basins_names.append(str(bas))
                
            if basins_names:
                all_names_basin = ", ".join(basins_names)
                id_counter += 1
                questions.append({
                    "id": id_counter,
                    "question": f"Qual é o nome da bacia {bacia_URI}?",
                    "answer": basins_names,
                    "context": f"O nome da bacia {bacia_URI} é {all_names_basin}."
                })  

    for campo, info in fields.items():
        if info["labels"]:
            fields_names = []

            for campo_name in g.objects(URIRef(f"http://www.semanticweb.org/bg40/ontologies/2022/5/untitled-ontology-2#{campo}"), rdfs.label):
                camp = str(campo_name).title()
                fields_names.append(str(camp))

            if fields_names:
                all_fields_names = ", ".join(fields_names)
                id_counter += 1

                questions.append({
                    "id": id_counter,
                    "question": f"Qual é o nome do campo {campo}?",
                    "answer": fields_names,
                    "context": f"O nome do campo {campo} é {all_fields_names}."
                })

    for formacoes, info in formations.items():
        if info["labels"]:
            all_names = []

            for formacao_name in g.objects(URIRef(f"http://www.semanticweb.org/bg40/ontologies/2022/5/untitled-ontology-2#{formacoes}"), rdfs.label):
                all_names.append(str(formacao_name))

            if formacoes.startswith('formacao'):
                all_names_ = ", ".join(all_names)
                id_counter += 1
                questions.append({
                    "id": id_counter,
                    "question": f"Qual é o nome da {formacoes}?", # nome da formação 
                    "answer": all_names,
                    "context": f"O nome da formação {formacoes} é {all_names_}."
                })  

                
# ******** ******** ******** ******** ******** ******** ******** 
  
    for campo, info in fields.items():
        if info["located_in"]:
            location = info["located_in"][0].split("#")[1]
            campoID= campo
            
            campo_name = info["labels"][0] if info["labels"] else campo
            location_name = g.value(URIRef(f"http://www.semanticweb.org/bg40/ontologies/2022/5/untitled-ontology-2#{location}"), rdfs.label).title()

            all_campo_names = []
            for campo_name in g.objects(URIRef(f"http://www.semanticweb.org/bg40/ontologies/2022/5/untitled-ontology-2#{campo}"), rdfs.label):
                all_campo_names.append(str(campo_name))
            

            if all_campo_names:
                all_names = ", ".join(all_campo_names)
                id_counter += 1
                questions.append({
                    "id": id_counter,
                    "question": f"Em que BACIA está localizado o CAMPO {campo_name}?",
                    "answer": str(location_name),
                    "context": f"O campo {all_names} está localizado na bacia {location_name}."
                })
                id_counter += 1
                questions.append({
                    "id": id_counter,
                    "question": f"Em que BACIA está localizado o CAMPO {campoID}?",
                    "answer": str(location_name),
                    "context": f"O campo {campoID} está localizado na bacia {location_name}."
                })
                
            
 # ******** ******** ******** ******** ******** ******** ********                
    total_pozos = 0
    for well, info in wells.items():
        total_pozos += 1
        if info["located_in"]:
            location = info["located_in"][0].split("#")[1]
            well_ID = well
            
            well_name = info["labels"][0] if info["labels"] else well
            location_name = g.value(URIRef(f"http://www.semanticweb.org/bg40/ontologies/2022/5/untitled-ontology-2#{location}"), rdfs.label).title()
           
            if location.startswith('BASE'):
                id_counter += 1
                questions.append({
                    "id": id_counter,
                    "question": f"Em que BACIA está localizado o POÇO {well_name}?",
                    "answer": str(location_name),
                    "context": f"O poço {well_name} está localizado na bacia {location_name}, identificada com a URI {location} ."
                })
                id_counter += 1
                questions.append({
                        "id": id_counter,
                        "question": f"Em que BACIA está localizado o POÇO {well_ID}?",
                        "answer": str(location_name),
                        "context": f"O poço {well_ID} está localizado na bacia {location_name}, identificada com a URI {location} ."
                })
    #print(f"Total de poços: {total_pozos}")
       
# ******** ******** ******** ******** ******** ******** ******** 

    for formacoes, info in formations.items():
        if info["part_of"]:
            part_of = info["part_of"][0].split("#")[1]
            formation_name = info["labels"][0] if info["labels"] else formacoes
            part_of_name = g.value(URIRef(f"http://www.semanticweb.org/bg40/ontologies/2022/5/untitled-ontology-2#{part_of}"), rdfs.label)
            id_counter += 1
            questions.append({
                "id": id_counter,
                "question": f"Qual é a entidade que faz parte de {formation_name}?",
                "answer": str(part_of_name),
                "context": f"A formação/grupo/membro {formation_name} faz parte de {part_of_name}."
            })

    for formacoes, info in formations.items():
            
            if info["located_in"]:
                location = info["located_in"][0].split("#")[1]
                if location != "basin" : 
                    formation_name = info["labels"][0] if info["labels"] else formacoes
                    location_name = g.value(URIRef(f"http://www.semanticweb.org/bg40/ontologies/2022/5/untitled-ontology-2#{location}"), rdfs.label)
                    id_counter += 1

                    questions.append({
                        "id": id_counter,
                        "question": f"Qual é a localização de {formation_name}?",
                        "answer": str(location_name),
                        "context": f"A {formation_name} está localizada na bacia {location_name}."
                    })
                
# ******** ******** ******** ******** ******** ******** ******** 

    for formacoes, info in formations.items():
        if info["constituted_by"]:
            constituted_by_names = []
            formation_name = info["labels"][0] if info["labels"] else formacoes

            for material in info["constituted_by"]:
                material_id = material.split("#")[1]
                #print(material_id)
                material_name = g.value(URIRef(f"http://www.semanticweb.org/bg40/ontologies/2022/5/untitled-ontology-2#{material_id}"), rdfs.label)
                #print(material_name)
                if material_name:
                    constituted_by_names.append(str(material_name))
        
            if constituted_by_names:
                constituted_by_materials = ", ".join(constituted_by_names)
                id_counter += 1

                questions.append({
                    "id": id_counter,
                    "question": f"Qual é o material da terra que está constituida a {formation_name}?",
                    "answer": constituted_by_names,
                    "context": f"A {formation_name} está constituida por {constituted_by_materials}."
                })
                
# ******** ******** ******** ******** ******** ******** ******** 

    for formacoes, info in formations.items():
        if info["has_age"]:
            age_list_names = []
            formation_name = info["labels"][0] if info["labels"] else formacoes

            for has_age in info["has_age"]:
                has_age_URI = has_age.split("#")[1]
                has_age_URI_ = has_age.split('_')
                URI='_'.join(has_age_URI_[-2:])
                #print(has_age_URI)
                has_age_name = g.value(URIRef(f"http://www.semanticweb.org/bg40/ontologies/2022/5/untitled-ontology-2#{has_age_URI}"), rdfs.label)
                #print(has_age_name)
                for has_age_name in g.objects(URIRef(f"http://www.semanticweb.org/bg40/ontologies/2022/5/untitled-ontology-2#{has_age_URI}"), rdfs.label):
                        age_list_names.append(str(has_age_name.title()))


            if age_list_names:
                all_names_age = ", ".join(age_list_names)
                id_counter += 1
                questions.append({
                    "id": id_counter,   
                    "question": f"Qual é a idade geológica de {formation_name}?",
                    "answer": age_list_names,
                    "context": f"A idade geológica de {formation_name} é {all_names_age.title()}."
                })
                id_counter += 1
                questions.append({
                    "id": id_counter,
                    "question": f"Qual é a idade geológica de {URI.title()}?",
                    "answer": age_list_names,
                    "context": f"A idade geológica de {URI} é {all_names_age.title()}."
                })
                
# ******** ******** ******** ******** ******** ******** ******** 
           
    for formacoes, info in formations.items():
        if info["carrier_of"]:
            carrier_of_names = []
            
            formation_name = info["labels"][0] if info["labels"] else formacoes

            for carrier_of in info["carrier_of"]:
                carrier_of_URI = carrier_of.split("#")[1]
                carrier_of_name = g.value(URIRef(f"http://www.semanticweb.org/bg40/ontologies/2022/5/untitled-ontology-2#{carrier_of_URI}"), rdfs.label).lower()

                for carrier_of_name in g.objects(URIRef(f"http://www.semanticweb.org/bg40/ontologies/2022/5/untitled-ontology-2#{carrier_of_URI}"), rdfs.label):
                        carrier_of_names.append(str(carrier_of_name.lower()))

            if carrier_of_names:
                all_names_carrier_of = ", ".join(carrier_of_names)
                id_counter += 1
                questions.append({
                        "id": id_counter,
                        "question": f"Que ESTRUTURA GEOLÓGICAS são apresentadas pela {formation_name}?",
                        "answer": carrier_of_names,
                        "context": f"A {formation_name} apresenta as seguintes ESTRUTURA GEOLÓGICAS: {all_names_carrier_of.lower()}."
                    })
                
# ******** ******** ******** ******** ******** ******** ********     
    query = """
    PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
    PREFIX ont: <http://www.semanticweb.org/bg40/ontologies/2022/5/untitled-ontology-2#>

    SELECT ?field ?well (COUNT(?well) as ?pozos) (GROUP_CONCAT((?wellName); separator=", ") as ?wellNames) (GROUP_CONCAT(?well; separator=", ") as ?wellURIs)
    WHERE {
        ?well rdf:type ont:well .
        ?field rdf:type ont:field .
        ?well ont:located_in ?field .
        {
            SELECT ?well (SAMPLE(?name) AS ?wellName)
            WHERE {
            ?well rdfs:label ?name .
            }
            GROUP BY ?well
        }
    }
    GROUP BY ?field
    """

    results = g.query(query)
    fields_data = []   
    for row in results:
        campo_uri = row.field
        pozos = row.pozos
        wellNames = row.wellNames
        
        
        fieldURI = campo_uri.split("#")[1]
        
        count = row.pozos.toPython()
        campo_name = g.value(campo_uri, rdfs.label)
        
        wellNames = str(row.wellNames).split(", ")
        field_data = {
            "field": fieldURI,
            "pozos": pozos,
            "wellNames": wellNames
        }
        fields_data.append(field_data)

        # print(f"Field: {field_data['field']}, Pozos: {field_data['pozos']}, Well Names: {field_data['wellNames']}")
        id_counter += 1         
        questions.append({
            "id": id_counter,
            "question": f"Quantos POCOS estão localizados no CAMPO {campo_name}?",
            "answer": str(count),
            "context": f"No campo {campo_name} estão localizados no total {count} poços."
        })
        id_counter += 1         
        questions.append({
            "id": id_counter,
            "question": f"Quantos POCOS estão localizados no CAMPO {fieldURI}?",
            "answer": str(count),
            "context": f"No campo {fieldURI} estão localizados no total {count} poços."
        })
        id_counter += 1         
        questions.append({
            "id": id_counter,
            "question": f"Quais POCOS estão localizados no CAMPO {campo_name}?",
            "answer": wellNames,
            "context": f"No campo {campo_name} estão localizados os poços: {wellNames}."
        })


# ******** ******** ******** ******** ******** ******** ********   
    query2 = """
    PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
    PREFIX ont: <http://www.semanticweb.org/bg40/ontologies/2022/5/untitled-ontology-2#>

    SELECT ?basin
    WHERE {
        ?basin rdf:type ont:basin .
        ?lithostratigraphic_unit rdf:type ont:lithostratigraphic_unit .
        ?lithostratigraphic_unit ont:_id "formacao_320" .
        ?well ont:crosses ?lithostratigraphic_unit .
        ?well ont:located_in ?basin .
     
        }
    """
    results2 = g.query(query2)

    for row in results2:
        basin_uri = row.basin
        basin_name = g.value(basin_uri, rdfs.label)
        print(basin_name)
        id_counter += 1         
        questions.append({
            "id": id_counter,
            "question": f"Em que BACIA o poço atravessa a formação 320?",
            "answer": str(basin_name),
            "context": f"O poço atravessa a formação 320 na bacia {basin_name}."
        })

# ************ ********** ******* *********** ******* **********
    count_campos = {}
    campos_por_bacia = {}
    campos_por_bacia_URI = {}
    for campo, info in fields.items():
        if info["located_in"]:
            
            location = info["located_in"][0].split("#")[1]
            bacia_URI = location
            location_name = g.value(URIRef(f"http://www.semanticweb.org/bg40/ontologies/2022/5/untitled-ontology-2#{location}"), rdfs.label).title()
            campo_nome = info["labels"][0].title() if info["labels"] else campo
            if location_name not in count_campos:
                count_campos[location_name] = 0
            count_campos[location_name] += 1

            if location_name not in campos_por_bacia:
                campos_por_bacia[location_name] = []
            campos_por_bacia[location_name].append(campo_nome)

            if location not in campos_por_bacia_URI:
                campos_por_bacia_URI[location] = []
            campos_por_bacia_URI[location].append(campo)

                  
    for bacia, count in count_campos.items():
        id_counter += 1         
        questions.append({
            "id": id_counter,
            "question": f"Quantos campos estão localizados na bacia {bacia}?",
            "answer": str(count),
            "context": f"**Existem {count} campos localizados na bacia {bacia}."
        })
        
    for bacia, campos in campos_por_bacia.items():
        campos_list_ = ", ".join(campos)
        campos_list = [str(campo) for campo in campos]
        id_counter += 1         
        questions.append({
            "id": id_counter,
            "question": f"Quais CAMPOS estão localizados na bacia {bacia}?",
            "answer": campos_list,
            "context": f"Os seguintes campos estão localizados na bacia {bacia}: {campos_list_}."
    })
    
    for bacia_URI, campos in campos_por_bacia_URI.items():
        
        camposURI_list_ = ", ".join(campos)
        camposURI_list = [str(campo) for campo in campos]
        id_counter += 1         
        questions.append({
            "id": id_counter,
            "question": f"Quais CAMPOS estão localizados na bacia {bacia_URI}?",
            "answer": camposURI_list,
            "context": f"Os seguintes campos estão localizados na bacia {bacia_URI}: {camposURI_list_}."
    })


 # ******** **************** ************ ********** **********   
    count_wells = {}
    for well, info in wells.items():
        if info["located_in"] :
            location = info["located_in"][0].split("#")[1]
            if location.startswith('BASE_CD_BACIA'): 
                location_name = g.value(URIRef(f"http://www.semanticweb.org/bg40/ontologies/2022/5/untitled-ontology-2#{location}"), rdfs.label).title()
                #print(f"Location well: {location}")
                if location_name not in count_wells:
                    count_wells[location_name] = 0

                count_wells[location_name] += 1
            
    if count_wells:
        for bacia, count in count_wells.items():
                    id_counter += 1         
                    questions.append({
                        "id": id_counter,
                        "question": f"Quantos poços estão localizados na bacia {bacia}?",
                        "answer": str(count),
                        "context": f"**Existem {count} poços localizados na bacia {bacia}."
                    })

 # ******** **************** ************ ********** **********  
    
    for well_URI, info in wells.items():
        litog_names = []
        litog_URIs = []
        if info["crosses"]:
            well_name = info["labels"][0] if info["labels"] else well_URI  # Obtiene el nombre del pozo

            for cross in info["crosses"]:
                litog_URI = cross.split("#")[1]
                litog_name = g.value(URIRef(f"http://www.semanticweb.org/bg40/ontologies/2022/5/untitled-ontology-2#{litog_URI}"), rdfs.label)

                if litog_name: litog_names.append(str(litog_name))

                if litog_URI:  litog_URIs.append(str(litog_URI))

            if litog_names:
                all_litog_names = ", ".join(litog_names)
                id_counter += 1
        
                questions.append({
                    "id": id_counter,
                    "question": f"Que UNIDADES LITOESTRATIGRAFICAS o poco {well_name} atravessa?",
                    "answer": litog_names,
                    "context": f"O poço {well_URI} atravessa as seguintes unidades litoestratigráficas: {all_litog_names}."  })
            
            if litog_URIs:
                all_litog_URIs = ", ".join(litog_URIs)
                id_counter += 1
        
                questions.append({
                    "id": id_counter,
                    "question": f"Que UNIDADES LITOESTRATIGRAFICAS o poco {well_URI} atravessa?",
                    "answer": litog_URIs,
                    "context": f"O poço {well_URI} atravessa as seguintes unidades litoestratigráficas: {all_litog_URIs}."   })

# ******** ******** ******** ******** ******** ******** ******* 
# ********* materials_to_formations[material_name] ************
# ******** ******** ******** ******** ******** ******** ******* 

    materials_to_formations = {}
    for formacoes, info in formations.items():
        
        if info["constituted_by"]:
            
            formation_name = info["labels"][0] if info["labels"] else formacoes

            for material in info["constituted_by"]:
                material_id = material.split("#")[1]
                material_name = g.value(URIRef(f"http://www.semanticweb.org/bg40/ontologies/2022/5/untitled-ontology-2#{material_id}"), rdfs.label)
                material_name = str(material_name)
                
                if material_name not in materials_to_formations:
                        materials_to_formations[material_name] = []

                materials_to_formations[material_name].append(formation_name)
            
                
    for material_name, formation_name in materials_to_formations.items():

        all_forma_names = ", ".join(formation_name)
        formations_list = [str(formacao) for formacao in formation_name]
        id_counter += 1
        
        questions.append({
            "id": id_counter,
            "question": f"Que UNIDADES LITOESTRATIGRÁFICAS são constituídas por {material_name}?",
            "answer": formations_list,
            "context": f"As unidades litoestratigráficas constituídas por {material_name} são: {all_forma_names}."
         }) 
# ******** ******** ******** *********** ******* ******** ******** ********             
# ******** ******** ******** MULTI - HOP ******* ******** ******** ******** 
# PERGUNTA # 1 - MULTI-HOP
# P: Que UNIDADES LITOESTRATIGRÁFICAS o POCO atravessa que são constituídas por ROCHAS do MATERIAL DA TERRA?

    query3 = """
    PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
    PREFIX ont: <http://www.semanticweb.org/bg40/ontologies/2022/5/untitled-ontology-2#>
                 

    SELECT ?well ?dolomite ?lithostratigraphic_unit (GROUP_CONCAT(DISTINCT ?lithostratigraphic_unit; separator=", ") AS ?lithostratigraphic_units) 
    
    WHERE {
            ?well rdf:type ont:well .
            ?dolomite rdf:type ont:dolomite .
            ?lithostratigraphic_unit rdf:type ont:lithostratigraphic_unit .
            
            ?well ont:crosses ?lithostratigraphic_unit .
            ?lithostratigraphic_unit ont:constituted_by ?dolomite .
        
            }
    
    GROUP BY ?well

    
    """ 
       

    results3 = g.query(query3)
    
    
    for row in results3:
        lithostratigraphic_units = row.lithostratigraphic_units
        
        well = row.well
        well_URI = well.split("#")[1]
        well_name = g.value(well, rdfs.label)       
        
        lithostratigraphic_units_names = []
        for lithostratigraphic_unit in lithostratigraphic_units.split(", "):
            lithostratigraphic_unit_name = g.value(URIRef(f"http://www.semanticweb.org/bg40/ontologies/2022/5/untitled-ontology-2#{lithostratigraphic_unit}"), rdfs.label)
            
            if lithostratigraphic_unit_name is not None:
                lithostratigraphic_units_names.append(lithostratigraphic_unit_name)
            else:
                lithostratigraphic_units_names.append(lithostratigraphic_unit.split("#")[1])  # Usamos la parte después del #
        
        if lithostratigraphic_units_names:
            lithostratigraphic_units_str = ", ".join(lithostratigraphic_units_names)

            #print(f"Lithostratigraphic Unit: {lithostratigraphic_units_str}, Well: {well_URI}")
            id_counter += 1
            
            questions.append({
                "id": id_counter,
                "question": f"Que UNIDADES LITOESTRATIGRÁFICAS o poco {well_name} / {well_URI} atravessa que são constituídas por ROCHAS do tipo dolomito?",
                "answer": lithostratigraphic_units_names,
                "context": f"O poço {well_URI} atravessa as seguintes unidades litoestratigráficas que são constituídas por rochas do material da terra: {lithostratigraphic_units_str}."
            })

# ******** ******** MULTI - HOP ******** ******** ******** ******** ********
# PERGUNTA # 2 - MULTI-HOP
# Que UNIDADES LITOESTRATIGRÁFICAS o POCO atravessa que são constituídas por FLUIDO do tipo gás seco

    materials_to_formations = {}
    
    for formacoes, info in formations.items():
        
        if info["constituted_by"]:
            formation_name = info["labels"][0] if info["labels"] else formacoes
            for material in info["constituted_by"]:
                material_id = material.split("#")[1]
                material_name = g.value(URIRef(f"http://www.semanticweb.org/bg40/ontologies/2022/5/untitled-ontology-2#{material_id}"), rdfs.label)
                material_name = str(material_name)
                if material_name not in materials_to_formations:
                        materials_to_formations[material_name] = []
                materials_to_formations[material_name].append(formation_name)

                #print(f"Material: {material_name}, Formação: {formation_name}")
    #2da parte de la pergunta    
    
    for well_URI, well_info in wells.items():
        if well_info["crosses"]:
            well_name = well_info["labels"][0] if well_info["labels"] else well_URI  # Obtiene el nombre del pozo

            for material_name, formation_name in materials_to_formations.items():
                crossed_formations = []
                for cross in well_info["crosses"]:
                    litog_URI = cross.split("#")[1]
                    litog_name = g.value(URIRef(f"http://www.semanticweb.org/bg40/ontologies/2022/5/untitled-ontology-2#{litog_URI}"), rdfs.label)
                    litog_name = str(litog_name) if litog_name else None
                    #print(f"Litog: {litog_name}, Material: {material_name}, Formação: {formation_name}, poço: {well_name}")
                    if litog_name in formation_name:
                        crossed_formations.append(litog_name)
                    
                if crossed_formations:
                    all_crossed_formations = ", ".join(crossed_formations)
                    id_counter += 1
                                            
                    questions.append({
                        "id": id_counter,
                        "question": f"Que UNIDADES LITOESTRATIGRÁFICAS o poço {well_name} / {well_URI} atravessa que são constituídas por MATERIAL do tipo {material_name}?",
                        "answer": crossed_formations, #materials_to_formations[material_name],
                        "context": f"O poço {well_name} atravessa as seguintes unidades litoestratigráficas constituídas pelo MATERIAL do tipo {material_name}: {all_crossed_formations}."
                    })
 
# ******** ******** MULTI - HOP ******** ******** ******** ******** ********
# PERGUNTA # 3 - MULTI-HOP
# Que IDADE GEOLÓGICA das UNIDADES LITOESTRATIGRÁFICAS constituídas por FLUIDO do tipo gás seco / dry_gas?
    # en la primera parte materials_to_formations
    # en la segunda parte formations "has_age" 

    for material_name, formations_list in materials_to_formations.items():
        combined_formations = []
        combined_ages = set()
        for formation in formations_list:
            
            for key in formations.keys():
                if formation in formations[key]["labels"]:
                    formation_info = formations[key]
                    if formation_info["has_age"]:
                        

                        for has_age in formation_info["has_age"]:
                            has_age_URI = has_age.split("#")[1]
                            has_age_name = g.value(URIRef(has_age), rdfs.label)

                            if has_age_name not in combined_ages:
                                combined_formations.append(str(formation))
                                combined_ages.add(str(has_age_name))
                            
        # print(f"Material: {material_name}, Formação: {all_formations}, Idades: {all_ages}")
        if combined_formations and combined_ages:
            all_formations = ", ".join(combined_formations)
            all_ages = ", ".join(sorted(combined_ages))
            combined_ages_list = list(sorted(combined_ages))
            id_counter += 1
            questions.append({
                "id": id_counter,
                "question": f"Que IDADE GEOLOGICA das UNIDADES LITOESTRATIGRAFICAS constituídas por MATERIAL do tipo {material_name}?",
                "answer": combined_ages_list,
                "context": f"A idade geológica das formações {all_formations} constituídas por material do tipo {material_name} é {all_ages}."
            })
   
# *********************************************************************************** 
# ******** ******** ******** MULTI - HOP ******** ******** ******** ******** ********
# PERGUNTA # 4 - MULTI-HOP   
    count_pozos_per_material = {material: 0 for material in materials_to_formations}

    
    for material, formations_list in materials_to_formations.items():
        for well_URI, well_info in wells.items():
            if well_info["crosses"]:
                for cross in well_info["crosses"]:
                    lithostratigraphic_unit = cross.split("#")[1]
                    lithog_name = g.value(URIRef(f"http://www.semanticweb.org/bg40/ontologies/2022/5/untitled-ontology-2#{lithostratigraphic_unit}"), rdfs.label)
                    lithog_name = str(lithog_name) if lithog_name else None

                    if lithog_name and lithog_name in formations_list:
                        count_pozos_per_material[material] += 1
                        
        #print(f" Material: {material}, Num. Pozos: {count_pozos_per_material[material]}")
    
    for material, count in count_pozos_per_material.items():
        
        id_counter += 1
        if count > 0:
            questions.append({
            "id": id_counter,
            "question": f"Quantos POÇOS atravessam UNIDADES LITOESTRATIGRÁFICAS constituídas por MATERIAL do tipo {material}?",
            "answer": str(count),
            "context": f"Total de poços que atravessam unidades litoestratigráficas constituídas por material do tipo {material}: {count}"
        })

# ******** ******** MULTI - HOP ******** ******** ******** ******** ********
# PERGUNTA # 5 - MULTI-HOP
#  ***************************************************************************************
# Em quais BACIAS estão as UNIDADES LITOESTRATIGRÁFICAS constituídas por FLUIDO do tipo gás seco / dry_gas?
    questions_by_material = {}
    for bacia_URI, info in basins.items():
        bacia_name_ = info["labels"][0] if info["labels"] else bacia_URI
        bacia_name = str(bacia_name_.title())
        for material_name, formations_list in materials_to_formations.items():
            if material_name not in questions_by_material:
                questions_by_material[material_name] = []

            for formation in formations_list:
                for key in formations.keys():
                    if formation in formations[key]["labels"]:
                        formation_info = formations[key]
                        if formation_info["located_in"]:
                            location = formation_info["located_in"][0].split("#")[1]
                            if location == bacia_URI and bacia_name not in questions_by_material[material_name]:
                                                        
                                questions_by_material[material_name].append(bacia_name)

    for material_name, bacias in questions_by_material.items():
        id_counter += 1
        bacias_list = ", ".join(bacias)
        #print(f"Material: {material_name}, Bacias: {bacias_list}")                              
        questions.append({
            "id": id_counter,
            "question": f"Em quais BACIAS estão as UNIDADES LITOESTRATIGRÁFICAS constituídas por MATERIAL do tipo {material_name}?",
            "answer": bacias,
            "context": f"As unidades litoestratigráficas constituídas por material do tipo {material_name} estão localizadas na {bacias_list}."
    })
        
# ******** ******** MULTI - HOP ******** ******** ******** ******** ******** ************ 
# PERGUNTA # 6 - MULTI-HOP
#  ***************************************************************************************
# Em quais CAMPOS estão as UNIDADES LITOESTRATIGRÁFICAS constituídas por FLUIDO do tipo gás seco / dry_gas?
    material_to_fields = {}
    # Asociando las formaciones a los campos
    for well_URI, well_info in wells.items():
        if well_info["crosses"]:
            for material_name, formation_name_list in materials_to_formations.items():
                for cross in well_info["crosses"]:
                    litog_URI = cross.split("#")[1]
                    litog_name = g.value(URIRef(f"http://www.semanticweb.org/bg40/ontologies/2022/5/untitled-ontology-2#{litog_URI}"), rdfs.label)
                    litog_name = str(litog_name) if litog_name else None
                    
                    if litog_name in formation_name_list:
                        if "located_in" in well_info:
                            for field_URI in well_info["located_in"]:
                                field_id = field_URI.split("#")[1]
                                
                                if field_id.startswith("CAMP_CD_CAMPO"):
                                    field_info = fields.get(field_id, {})
                                    field_name = field_info["labels"][0] if "labels" in field_info  else field_id
                                    
                                    if material_name not in material_to_fields:
                                        material_to_fields[material_name] = set()
                                    
                                    # Asegurando que solo se agregue el campo correcto
                                    if field_name not in material_to_fields[material_name]:
                                        material_to_fields[material_name].add(field_name.title())
    
    # Generando las preguntas con base en la información recopilada
    for material_name, field_names in material_to_fields.items():
        id_counter += 1
        all_fields = ", ".join(field_names)
        questions.append({
            "id": id_counter,
            "question": f"Em quais CAMPOS estão as UNIDADES LITOESTRATIGRÁFICAS constituídas por tipo material {material_name}?",
            "answer": list(field_names),
            "context": f"As UNIDADES LITOESTRATIGRÁFICAS constituídas por {material_name} estão nos seguintes CAMPOS: {all_fields}."
        })
# ******** ******** ******** ******** ******** ******** ********    
# ******** ******** ******** ******** ******** ******** ********    
# Nombres de las estructuras geológicas

    for struc, info in geological_structures.items():
        if info["labels"]:
            all_names = []

            for struc_name in g.objects(URIRef(f"http://www.semanticweb.org/bg40/ontologies/2022/5/untitled-ontology-2#{struc}"), rdfs.label):
                all_names.append(str(struc_name))

            if struc.startswith('TEFR_CD_TIPO_EST_FISICA'):
                               
                all_names_ = ", ".join(all_names)
                id_counter += 1
                questions.append({
                    "id": id_counter,
                    "question": f"Qual é o nome da estrutura geologica {struc}?",
                    "answer": all_names,
                    "context": f"O nome da estrutura geologica {struc} é {all_names_}."
                })
            
 

    return questions



In [213]:
def gerar_perguntas(g):
    questions = []
    id_counter = 1  # Contador para los IDs de las preguntas
    
    # Iterar sobre cada pozo
    for well_URI, well_info in wells.items():
        if well_info["crosses"]:
            well_name = well_info["labels"][0] if well_info["labels"] else well_URI  # Nombre del pozo

            # Diccionario para almacenar las formaciones cruzadas por material
            materials_to_crossed_formations = {}

            for cross in well_info["crosses"]:
                litog_URI = cross.split("#")[1]
                litog_name = g.value(URIRef(f"http://www.semanticweb.org/bg40/ontologies/2022/5/untitled-ontology-2#{litog_URI}"), rdfs.label)
                litog_name = str(litog_name) if litog_name else None

                if litog_name:
                    # Buscar el material asociado a esta unidad litoestratigráfica
                    for formacoes, info in formations.items():
                        formation_name = info["labels"][0] if info["labels"] else formacoes
                        
                        if formation_name == litog_name and info["constituted_by"]:
                            for material in info["constituted_by"]:
                                material_id = material.split("#")[1]
                                material_name = g.value(URIRef(f"http://www.semanticweb.org/bg40/ontologies/2022/5/untitled-ontology-2#{material_id}"), rdfs.label)
                                material_name = str(material_name)

                                # Agregar la formación al material correspondiente
                                if material_name not in materials_to_crossed_formations:
                                    materials_to_crossed_formations[material_name] = []

                                if litog_name not in materials_to_crossed_formations[material_name]:
                                    materials_to_crossed_formations[material_name].append(litog_name)
                                else:
                                    print(f"Advertencia: {litog_name} ya estaba en la lista para {material_name} en el pozo {well_name}")

            # Generar preguntas para cada material con sus formaciones cruzadas
            for material_name, crossed_formations in materials_to_crossed_formations.items():
                all_crossed_formations = ", ".join(sorted(crossed_formations))
                id_counter += 1
                
                questions.append({
                    "id": id_counter,
                    "question": f"Que UNIDADES LITOESTRATIGRÁFICAS o poço {well_name} / {well_URI} atravessa que são constituídas por MATERIAL do tipo {material_name}?",
                    "answer": crossed_formations,
                    "context": f"O poço {well_name} atravessa as seguintes unidades litoestratigráficas constituídas pelo MATERIAL do tipo {material_name}: {all_crossed_formations}."
                })

    return questions


#### ✅**Guardar as perguntas num arquivo JSON** 

In [216]:
# Gerar os pares pergunta-resposta
dados_treinamento = gerar_perguntas(g)

with open('dataset_miniKGraph_09_agosto.json', 'w', encoding='utf-8') as file:
    json.dump(dados_treinamento, file, ensure_ascii=False, indent=4)