### ArmandoAI - Criando grafos de conhecimento neo4j para chat de pesquisa hibrida

#### Instalar pacotes python

In [None]:
!pip install langchain-neo4j python-dotenv neo4j neo4j-graphrag ollama pandas yfiles-jupyter-graphs yfiles-jupyter-graphs-for-neo4j

#### Tenha Neo4j instalado  no sistema ou via docker e configurado com plugins apoc e graph-data-science!

#### Usando ollama versao 0.13.2, para compatibilidade com modelos nvidia-nemotron

linux install:

curl -fsSL https://ollama.com/install.sh | OLLAMA_VERSION=0.13.2 sh

### Modelos testados:

gemma3:12b

hf.co/bartowski/nvidia_NVIDIA-Nemotron-Nano-12B-v2-GGUF:Q4_K_M

rwxproject/nemotron-nano-9b-v2-q4_k_m

llama3.1:8b

DavidLanz-text2cypher-gemma-2-9b-it-finetuned-2024v1-Q4_0

qwen2.5-coder:14b



#### Texto para analise

In [1]:
einstein_text = """
Albert Einstein[a] (14 March 1879 – 18 April 1955) was a German-born theoretical physicist who is best known for developing the theory of relativity. Einstein also made important contributions to quantum mechanics.[1][5] His mass–energy equivalence formula E = mc2, which arises from special relativity, has been called "the world's most famous equation".[6] He received the 1921 Nobel Prize in Physics for his services to theoretical physics, and especially for his discovery of the law of the photoelectric effect.[7]

Born in the German Empire, Einstein moved to Switzerland in 1895, forsaking his German citizenship (as a subject of the Kingdom of Württemberg)[note 1] the following year. In 1897, at the age of seventeen, he enrolled in the mathematics and physics teaching diploma program at the Swiss federal polytechnic school in Zurich, graduating in 1900. He acquired Swiss citizenship a year later, which he kept for the rest of his life, and afterwards secured a permanent position at the Swiss Patent Office in Bern. In 1905, he submitted a successful PhD dissertation to the University of Zurich. In 1914, he moved to Berlin to join the Prussian Academy of Sciences and the Humboldt University of Berlin, becoming director of the Kaiser Wilhelm Institute for Physics in 1917; he also became a German citizen again, this time as a subject of the Kingdom of Prussia.[note 1] In 1933, while Einstein was visiting the United States, Adolf Hitler came to power in Germany. Horrified by the Nazi persecution of his fellow Jews,[8] he decided to remain in the US, and was granted American citizenship in 1940.[9] On the eve of World War II, he endorsed a letter to President Franklin D. Roosevelt alerting him to the potential German nuclear weapons program and recommending that the US begin similar research.

In 1905, sometimes described as his annus mirabilis (miracle year), he published four groundbreaking papers.[10] In them, he outlined a theory of the photoelectric effect, explained Brownian motion, introduced his special theory of relativity, and demonstrated that if the special theory is correct, mass and energy are equivalent to each other. In 1915, he proposed a general theory of relativity that extended his system of mechanics to incorporate gravitation. A cosmological paper that he published the following year laid out the implications of general relativity for the modeling of the structure and evolution of the universe as a whole.[11][12] In 1917, Einstein wrote a paper which introduced the concepts of spontaneous emission and stimulated emission, the latter of which is the core mechanism behind the laser and maser, and which contained a trove of information that would be beneficial to developments in physics later on, such as quantum electrodynamics and quantum optics.[13]

In the middle part of his career, Einstein made important contributions to statistical mechanics and quantum theory. Especially notable was his work on the quantum physics of radiation, in which light consists of particles, subsequently called photons. With physicist Satyendra Nath Bose, he laid the groundwork for Bose–Einstein statistics. For much of the last phase of his academic life, Einstein worked on two endeavors that ultimately proved unsuccessful. First, he advocated against quantum theory's introduction of fundamental randomness into science's picture of the world, objecting that God does not play dice.[14] Second, he attempted to devise a unified field theory by generalizing his geometric theory of gravitation to include electromagnetism. As a result, he became increasingly isolated from mainstream modern physics.
"""

In [22]:
import ollama
import json
import pandas as pd
import re


def extrair_triplas_ollama(text):
    # Definimos um Schema JSON para garantir que o Ollama use GBNF (gramática restrita)
    # Isso impede FISICAMENTE o modelo de escrever </think> ou textos extras.
    json_schema = {
        "type": "array",
        "items": {
            "type": "object",
            "properties": {
                "subject": {"type": "string"},
                "predicate": {"type": "string"},
                "object": {"type": "string"}
            },
            "required": ["subject", "predicate", "object"]
        }
    }

    prompt = f"""
    Act as an Information Extraction and Ontology Engineering specialist. Your task is to convert the provided text into a list of semantic triples following the format: [Subject, Predicate, Object].

    Guidelines:

    Atomic: Each triple must represent a single fact.

    Normalization: Use verbs in the infinitive or clear relations for the predicates (e.g., "born_in", "developed", "known_for").

    Entities: Ensure that the subject and object are clear entities or concepts extracted from the text.
    
        Example Input Text:

    "Albert Einstein (14 March 1879 – 18 April 1955) was a German-born theoretical physicist who is best known for developing the theory of relativity. Einstein also made important contributions to quantum mechanics. His mass–energy equivalence formula E = mc2, which arises from special relativity, has been called 'the world's most famous equation'." Expected Output (Example):

    [Albert Einstein, date_of_birth, 14 March 1879]

    [Albert Einstein, nationality, German]

    [Albert Einstein, profession, theoretical physicist]

    [Albert Einstein, developed, Theory of Relativity]

    [Albert Einstein, contributed_to, Quantum Mechanics]

    [Albert Einstein, move_to, Switzerland in 1895]
    
    Rules:

    1. Replace pronouns with the entity name.

    2. Return the result in JSON format: [{{"subject": "...", "predicate": "...", "object": "..."}}]

    Extract semantic triples from the text below in JSON format.

       
    Respond ONLY to the JSON array, without explanations, introductions, or markdown.

    TEXT: {text}

    Try to extract as much information as possible, even if it seems trivial. Be detailed and identify specific relationships. Consider the context to determine the correct meaning of the relationships.   
    
    Include dates if available.    
    """

    response = ollama.chat(
        model='gemma3:12b',
        # O parâmetro format com schema é o segredo para 2025
        format=json_schema,
        messages=[
            {
                "role": "system",
                "content": "/no_think"  # Specific directive for Nemotron-Nano-V2 to disable reasoning
            },
            {
                "role": "user",
                "content": prompt
            }
        ],
        options={
            "think": False  # Disables reasoning traces in supported 2025 Ollama versions
        }
    )
    conteudo = response['message']['content']
    # Limpa a resposta para garantir que seja apenas JSON
    try:
        # Busca o padrão de uma lista JSON [ ... ]
        match = re.search(r'\[\s*\{.*\}\s*\]', conteudo, re.DOTALL)
        if match:
            json_str = match.group(0)
            return json.loads(json_str)
        else:
            print("Erro: JSON não encontrado na resposta.")
            return []
    except json.JSONDecodeError as e:
        print(f"Erro ao decodificar JSON: {e}")
        print(f"Conteúdo problemático: {conteudo}")
        return []


# Execução
triplas = extrair_triplas_ollama(einstein_text)
df = pd.DataFrame(triplas)
df.to_csv('einstein_neo4j_ollama-gemma3-12b.csv', index=False)

print(f"Total de fatos: {len(df)}")
df.head(10)

Total de fatos: 97


Unnamed: 0,subject,predicate,object
0,Albert Einstein,date_of_birth,14 March 1879
1,Albert Einstein,date_of_death,18 April 1955
2,Albert Einstein,nationality,German
3,Albert Einstein,profession,theoretical physicist
4,Albert Einstein,developed,Theory of Relativity
5,Albert Einstein,contributed_to,Quantum Mechanics
6,Albert Einstein,known_for,mass–energy equivalence formula
7,mass–energy equivalence formula,arises_from,special relativity
8,mass–energy equivalence formula,called,the world's most famous equation
9,Albert Einstein,received,1921 Nobel Prize in Physics


Para resolver problemas de pronomes, nomes duplicados no Pandas antes de gerar os Embeddings ou enviar para o Neo4j, você precisa de três ações: unificar o nome da entidade principal, preencher objetos ausentes e encurtar predicados longos.

In [23]:
import pandas as pd

# 1. Unificação de Sujeitos
# Transforma 'Einstein' em 'Albert Einstein' de forma definitiva
df['subject'] = df['subject'].replace(
    ['Einstein', 'He', 'einstein'], 'Albert Einstein')

# 2. Tratamento de Objetos Ausentes (Preenchimento)
# Se o 'object' for '-' ou estiver vazio, tentamos extrair a última palavra do predicado
# ou simplesmente descartamos se não houver informação útil.
df['object'] = df['object'].replace('-', None)
# Opcional: Preencher objetos nulos com uma parte do predicado para não perder a linha
df['object'] = df['object'].fillna(
    df['predicate'].apply(lambda x: x.split()[-1]))

# 3. Limpeza de Predicados (Transformar frases em verbos curtos)
# No Neo4j, o relacionamento deve ser um verbo (ex: RECEIVE) e não uma frase longa.


def simplificar_predicado(texto):
    palavras = texto.lower().split()
    # Pega apenas os dois primeiros termos (geralmente o verbo principal)
    # Ex: "received the 1921 Nobel..." -> "received"
    verbos_comuns = ['was', 'made', 'received', 'born', 'acquired', 'submitted',
                     'became', 'decided', 'endorsed', 'published', 'proposed', 'wrote']
    for p in palavras:
        if p in verbos_comuns:
            return p
    return palavras[0]  # Fallback: retorna a primeira palavra


df['predicate'] = df['predicate'].apply(simplificar_predicado)

# 4. Remoção de duplicatas após a unificação
df = df.drop_duplicates().reset_index(drop=True)

df

Unnamed: 0,subject,predicate,object
0,Albert Einstein,date_of_birth,14 March 1879
1,Albert Einstein,date_of_death,18 April 1955
2,Albert Einstein,nationality,German
3,Albert Einstein,profession,theoretical physicist
4,Albert Einstein,developed,Theory of Relativity
...,...,...,...
91,Albert Einstein,advocating,against quantum theory's introduction
92,God,does,not play dice
93,Albert Einstein,attempting_to_devise,a unified field theory
94,Albert Einstein,generalizing,his geometric theory of gravitation


### Conexão e Limpeza do Banco

Nesta célula, você se conecta ao banco e remove os dados antigos para garantir uma importação limpa.


In [24]:
from neo4j import GraphDatabase

# Configurações de conexão (ajuste com seus dados)
URI = "bolt://localhost:7687"
AUTH = ("neo4j", "password")


def limpar_e_importar(df):
    driver = GraphDatabase.driver(URI, auth=AUTH)
    with driver.session() as session:
        # 1. Limpar banco de dados
        print("Limpando banco de dados...")
        session.run("MATCH (n) DETACH DELETE n")

        # 2. Criar Constraints (Garantia de Unicidade)
        session.run(
            "CREATE CONSTRAINT IF NOT EXISTS FOR (p:Person) REQUIRE p.name IS UNIQUE")
        session.run(
            "CREATE CONSTRAINT IF NOT EXISTS FOR (c:Concept) REQUIRE c.name IS UNIQUE")

        # 3. Importar dados do DataFrame
        print("Importando novas triplas...")
        for _, row in df.iterrows():
            # Query dinâmica para criar nós e relacionamentos
            query = """
            MERGE (s:Person {name: $subject})
            MERGE (o:Concept {name: $object})
            WITH s, o
            CALL apoc.create.relationship(s, $predicate, {year: $year}, o) YIELD rel
            RETURN rel
            """
            session.run(query,
                        subject=row['subject'],
                        predicate=row['predicate'],
                        object=row['object'],
                        year=row.get('year'))  # .get evita erro se não houver ano
    driver.close()
    print("Processo concluído!")


# Executar a função com o seu DataFrame limpo
limpar_e_importar(df)

Limpando banco de dados...
Importando novas triplas...
Processo concluído!


Por que usar APOC na célula?
No código acima, usei apoc.create.relationship. No Neo4j, o comando padrão MERGE não aceita variáveis para o tipo de relacionamento (ex: [:$predicate]). A biblioteca APOC (que já vem instalada no Neo4j Desktop e Aura) resolve isso, permitindo que cada linha do seu CSV crie um tipo de seta diferente (RECEIVED, BORN_IN, etc) Neo4j Python Driver.

Consultando o Grafo de volta para o Pandas

Após a importação, você pode validar o resultado em uma nova célula:

In [25]:
def consultar_grafo(query):
    driver = GraphDatabase.driver(URI, auth=AUTH)
    df_resultado = driver.execute_query(
        query, result_transformer_=lambda r: r.to_df())
    driver.close()
    return df_resultado


# Exemplo: Ver todos os relacionamentos de Einstein
query_teste = "MATCH (p:Person {name: 'Albert Einstein'})-[r]->(c) RETURN p.name, type(r), c.name"
df_conferencia = consultar_grafo(query_teste)
df_conferencia.head()

Unnamed: 0,p.name,type(r),c.name
0,Albert Einstein,date_of_birth,14 March 1879
1,Albert Einstein,date_of_death,18 April 1955
2,Albert Einstein,nationality,German
3,Albert Einstein,profession,theoretical physicist
4,Albert Einstein,developed,Theory of Relativity


In [26]:
from IPython.display import display, HTML
import os
from dotenv import load_dotenv
from yfiles_jupyter_graphs_for_neo4j import Neo4jGraphWidget
from neo4j import GraphDatabase
# load_dotenv(override=True) # caso usar arquivo .env 

# # Access the variables
# NEO4J_URI = os.getenv('NEO4J_URI')
# NEO4J_USERNAME = os.getenv('NEO4J_USERNAME')
# NEO4J_PASSWORD = os.getenv('NEO4J_PASSWORD')

# NEO4J_URI      = "neo4j+ssc://demo.neo4jlabs.com"
# NEO4J_USERNAME = "movies"
# NEO4J_PASSWORD = "movies"
# driver = GraphDatabase.driver(uri=NEO4J_URI, auth=(
#     NEO4J_USERNAME, NEO4J_PASSWORD), database='neo4j')

driver = GraphDatabase.driver(URI, auth=AUTH, database='neo4j')
g = Neo4jGraphWidget(driver)

g.show_cypher("MATCH (s)-[r]->(t) RETURN s,r,t LIMIT 120")
# Para salvar programaticamente em alguns ambientes:
# A renderização do widget gera um objeto HTML que pode ser capturado
# No entanto, a forma mais comum é usar o menu do Jupyter:
# File -> Save and Export As -> HTML

GraphWidget(layout=Layout(height='800px', width='100%'))