<a href="https://colab.research.google.com/github/mdrk300902/demo-repo/blob/main/knowledge_graph_qa_system.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>


#**Hybrid Knowledge Retrieval and Question Answering System using LangChain, Neo4j, and Flan-T5**

# Introduction

This project demonstrates how to build a knowledge retrieval and question answering system using LangChain, Neo4j graph database, and a large language model (LLM) based on Google's Flan-T5.

The system loads content from Wikipedia related to "Large language model," processes and splits the documents, and transforms them into a knowledge graph stored in Neo4j. Using vector embeddings and full-text search on Neo4j, the system retrieves relevant information in response to user questions.

The retrieved content is combined and summarized by the Flan-T5 language model to produce concise, natural language answers, enabling structured and unstructured knowledge integration.

This project showcases:

- Integration of language models with graph databases for enhanced information retrieval,
- Techniques for document chunking and embedding to handle large data efficiently,
- The use of advanced prompt engineering and token length management for large model inference.

It serves as a foundation for building scalable, hybrid retrieval-augmented generation (RAG) applications that combine structured data querying with deep language understanding.

  
   
       

Import Necessary libraries


In [1]:
!pip install -q langchain langchain-neo4j langchain-experimental transformers sentencepiece neo4j wikipedia tiktoken json-repair langchain-huggingface sentence-transformers

  Preparing metadata (setup.py) ... [?25l[?25hdone
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m209.2/209.2 kB[0m [31m12.1 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m313.2/313.2 kB[0m [31m22.4 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m2.5/2.5 MB[0m [31m94.1 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m193.4/193.4 kB[0m [31m18.7 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m183.9/183.9 kB[0m [31m18.0 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m313.2/313.2 kB[0m [31m29.7 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m50.9/50.9 kB[0m [31m5.2 MB/s[0m eta [36m0:00:00[0m
[?25h  Building wheel for wikipedia (setup.py) ... [?25l[?25hdone
[31mERROR: pip's dependency resolver 


Next, import all necessary libraries and modules used throughout the project:



In [2]:
import os
from transformers import pipeline
from langchain_huggingface import HuggingFacePipeline
from langchain.embeddings import HuggingFaceEmbeddings

from langchain.document_loaders import WikipediaLoader
from langchain.text_splitter import TokenTextSplitter
from langchain_neo4j import Neo4jGraph
from langchain_experimental.graph_transformers import LLMGraphTransformer
from langchain_community.vectorstores import Neo4jVector

from langchain_core.prompts import PromptTemplate
from langchain_core.pydantic_v1 import BaseModel, Field
import warnings
import logging
from transformers.utils import logging as transformers_logging


For example, replace imports like: `from langchain_core.pydantic_v1 import BaseModel`
with: `from pydantic import BaseModel`
or the v1 compatibility namespace if you are working in a code base that has not been fully upgraded to pydantic 2 yet. 	from pydantic.v1 import BaseModel

  exec(code_obj, self.user_global_ns, self.user_ns)


Suppress warnings

In [3]:
warnings.filterwarnings("ignore")                        # Suppress all Python warnings
logging.getLogger().setLevel(logging.ERROR)              # Suppress logging ERROR level
transformers_logging.set_verbosity_error()               # Suppress HuggingFace Transformers warnings

These environment variables are later used by the Neo4j client in your code to establish a connection.



In [4]:
os.environ["NEO4J_URI"] = "neo4j_uri"
os.environ["NEO4J_USERNAME"] = "neo4j_username"
os.environ["NEO4J_PASSWORD"] = "neo4j_password"

 Setup HuggingFace pipeline on GPU device 0


In [None]:
pipe = pipeline(
    "text2text-generation",
    model="google/flan-t5-large",
    tokenizer="google/flan-t5-large",
    device=0,  #
)
llm = HuggingFacePipeline(pipeline=pipe)

embedding_model = HuggingFaceEmbeddings(model_name="sentence-transformers/all-MiniLM-L6-v2")

 Load and split Wikipedia docs into chunks of 256 tokens with 50 overlap

In [6]:
wiki_loader = WikipediaLoader(query="Large language model")
docs = wiki_loader.load()
text_splitter = TokenTextSplitter(chunk_size=256, chunk_overlap=50)
documents = text_splitter.split_documents(docs[:3])

Initialize Neo4j graph client

In [7]:
graph = Neo4jGraph(
    url=os.environ["NEO4J_URI"],
    username=os.environ["NEO4J_USERNAME"],
    password=os.environ["NEO4J_PASSWORD"],
)

Create knowledge graph from documents

In [8]:
llm_transformer = LLMGraphTransformer(llm=llm)
graph_documents = llm_transformer.convert_to_graph_documents(documents)

Add graph documents to Neo4j

In [9]:
graph.add_graph_documents(graph_documents, include_source=True)

Setup Neo4jVector for hybrid similarity search

In [10]:


vector_index = Neo4jVector.from_existing_graph(
    embedding=embedding_model,
    search_type="hybrid",
    node_label="Document",
    text_node_properties=["text"],
    embedding_node_property="embedding",
)

class Entities(BaseModel):
    names: list[str] = Field(..., description="Extracted entities")

def simple_entity_extractor(text: str) -> list[str]:
    return ["Large language model", "AI"]

def structured_retriever(question: str) -> str:
    entities = simple_entity_extractor(question)
    result = ""
    for entity in entities:
        query = f"CALL db.index.fulltext.queryNodes('keyword', '{entity}', {{limit:5}}) YIELD node RETURN node"
        res = graph.query(query)
        result += f"Data for entity '{entity}': {res}\n"
    return result



Strict prompt builder enforcing max total tokens limit (512) as it is as much as the free colab T4 GPU supports

In [11]:

def build_limited_prompt(context, question, tokenizer, max_total_tokens=512):
    prompt_template = """Answer concisely based on context:

{context}

Question:
{question}
"""
    full_prompt = prompt_template.format(context=context, question=question)
    tokens = tokenizer.tokenize(full_prompt)
    if len(tokens) > max_total_tokens:
        tokens = tokens[:max_total_tokens]
    truncated_prompt = tokenizer.convert_tokens_to_string(tokens)
    return truncated_prompt

def answer_question(question: str) -> str:
    context = retriever(question)
    truncated_prompt = build_limited_prompt(context, question, pipe.tokenizer, max_total_tokens=512)
    return llm(truncated_prompt)

def retriever(question: str) -> str:
    structured = structured_retriever(question)
    unstructured_docs = vector_index.similarity_search(question)

    context = "\n".join(doc.page_content for doc in unstructured_docs)
    return f"Structured:\n{structured}\n\nUnstructured:\n{context}"

def answer_question(question: str) -> str:
    context = retriever(question)
    truncated_prompt = build_limited_prompt(context, question, pipe.tokenizer, max_total_tokens=512)
    return llm(truncated_prompt)

if __name__ == "__main__":
    print("Q: What is a large language model?")
    print("A:", answer_question("What is a large language model?"))

    print("\nQ: What is the name of the first large language model?")
    print("A:", answer_question("What is the name of the first large language model?"))

Q: What is a large language model?
A: ['node': 'summary': 'A large language model (LLM) is a type of machine learning model designed for natural language processing tasks such as language generation. LLMs are language models with many parameters, and are trained with self-supervised learning on a vast amount of text.nnThis page lists notable large language modelsnnFor the training cost column, 1 petaFLOP-day = 1 petaFLOP/sec  1 day = 8.64E19 FLOP. Also, only the largest model's cost is written.nnn== See also ==nList of chatbotsnList of language model benchmarksnn]

Q: What is the name of the first large language model?
A: ['node': 'summary': 'A large language model (LLM) is a type of machine learning model designed for natural language processing tasks such as language generation. LLMs are language models with many parameters, and are trained with self-supervised learning on a vast amount of text.nnThis page lists notable large language modelsnnFor the training cost column, 1 petaFLOP-

# Conclusion

This project successfully combines Neo4j graph database technology with large language models to build an efficient knowledge retrieval and question answering system.

By chunking, embedding, and indexing Wikipedia content, the system handles both structured graph queries and unstructured text search. The Flan-T5 model then synthesizes concise answers within token limits.

This architecture demonstrates a powerful hybrid approach to scalable, retrieval-augmented language understanding which can be extended to many domains.
