# Networkx ATLAS KG construction and RAG example
This notebook demonstrates the full streamlined process of creating a knowledge graph (KG) using the atlas-rag package and performing retrieval-augmented generation (RAG) with our created RAG methods.

## ATLAS KG Construction
It is suggested to use local hf model to run the KG construction code, as llm api service provider use optimized, lightweight models to reduce costs, which may sacrifice performance, and hence hard to have guaranteed performance. (for example from fp16 to fp8 etc.)

ATLAS KG construction consist of 5 steps:
- Triples Json Generation (Base KG Json)
- Convert Triples Json to Triples csv
- Conceptualize Entity in Triples csv
- Merge Concept CSV to Triples CSV
- Convert CSV to graphml for networkx to perform rag / to neo4j dumps for Billion KG RAG

In [1]:
import os 
os.environ['CUDA_VISIBLE_DEVICES'] = '1'
from atlas_rag.kg_construction.triple_extraction import KnowledgeGraphExtractor
from atlas_rag.kg_construction.triple_config import ProcessingConfig
from atlas_rag.llm_generator import LLMGenerator
from openai import OpenAI
from azure.ai.projects import AIProjectClient
from azure.identity import DefaultAzureCredential
from transformers import pipeline
from configparser import ConfigParser
# Load OpenRouter API key from config file
config = ConfigParser()
config.read('config.ini')
# model_name = "meta-llama/Llama-3.3-70B-Instruct"
# connection = AIProjectClient(
#     endpoint=config["urls"]["AZURE_URL"],
#     credential=DefaultAzureCredential(),
# )
# client = connection.inference.get_azure_openai_client(api_version="2024-12-01-preview")
client = OpenAI(base_url="http://0.0.0.0:8129/v1", api_key="EMPTY")
triple_generator = LLMGenerator(client=client, model_name="Qwen/Qwen2.5-7B-Instruct")

# model_name = "meta-llama/Meta-Llama-3.1-8B-Instruct"
# model_name = "meta-llama/Llama-3.2-3B-Instruct"
# client = pipeline(
#     "text-generation",
#     model=model_name,
#     device_map="auto",
# )
filename_pattern = 'test_data'
output_directory = f'benchmark_data/autograph/test_data'
# triple_generator = LLMGenerator(client, model_name=model_name)
model_name = "Qwen/Qwen2.5-7B-Instruct"

  from .autonotebook import tqdm as notebook_tqdm


In [2]:
kg_extraction_config = ProcessingConfig(
      model_path=model_name,
      data_directory=f'benchmark_data/autograph/{filename_pattern}',
      filename_pattern=filename_pattern,
      batch_size_triple=16,
      batch_size_concept=16,
      output_directory=f"{output_directory}",
      max_new_tokens=2048,
      max_workers=3,
      remove_doc_spaces=True, # For removing duplicated spaces in the document text
      include_concept=False, # Whether to include concept nodes and edges in the knowledge graph
      triple_extraction_prompt_path='benchmark_data/autograph/custom_prompt.json',
      triple_extraction_schema_path='benchmark_data/autograph/custom_schema.json',
      record=True, # Whether to record the results in a JSON file
)
kg_extractor = KnowledgeGraphExtractor(model=triple_generator, config=kg_extraction_config)

Using custom kg extraction prompt:
{'en': {'system': 'You are a helpful assistant', 'triple_extraction': 'You are an expert knowledge graph constructor. Your task is to extract factual information from the provided text and represent it as a list of knowledge graph triples.\nEach triple should be a JSON object with three keys:\n1.  `subject`: The main entity, concept, event, or attribute of the triple.\n2.  `relation`: The relationship between the subject and the object.\n3.  `object`: The entity, concept, value, event, or attribute that the subject has a relationship with.\nConstraints:\n- Extract all possible and relevant triples.\n- The `subject` and `object` can be specific entities (e.g., "Radio City", "Football in Albania", "Echosmith") or specific values (e.g., "3 July 2001", "1,310,696").\n- The `relation` should be a concise, descriptive phrase or verb that accurately describes the relationship (e.g., "founded by", "started on", "is a", "has circulation of").\n- Ensure the tri

### Triples Generation

In [3]:
# construct entity&event graph
kg_extractor.run_extraction()

Found data files: ['test_data.json']
Processing shard 1/1 (texts 0-0 of 1, 1 documents)
Generated 1 chunks for shard 1/1
Model: Qwen/Qwen2.5-7B-Instruct


100%|██████████| 1/1 [00:09<00:00,  9.26s/it]

Processed 1 batches (16 chunks)





In [5]:
# Convert Triples Json to CSV
kg_extractor.convert_json_to_csv()

Loading data from the json files
Number of files:  1


100%|██████████| 1/1 [00:00<00:00, 880.60it/s]

Processing file for file ids:  Qwen_Qwen2.5-7B-Instruct_test_data_output_20250810150302_1_in_1.json





In [8]:
# Concept Generation
kg_extractor.generate_concept_csv_temp()

all_batches 3


Shard_0:   0%|          | 0/3 [00:00<?, ?it/s]2025-08-10 14:57:49,132 - INFO - HTTP Request: POST http://0.0.0.0:8129/v1/chat/completions "HTTP/1.1 200 OK"
2025-08-10 14:57:49,156 - INFO - HTTP Request: POST http://0.0.0.0:8129/v1/chat/completions "HTTP/1.1 200 OK"
2025-08-10 14:57:49,179 - INFO - HTTP Request: POST http://0.0.0.0:8129/v1/chat/completions "HTTP/1.1 200 OK"
2025-08-10 14:57:49,342 - INFO - HTTP Request: POST http://0.0.0.0:8129/v1/chat/completions "HTTP/1.1 200 OK"
2025-08-10 14:57:49,343 - INFO - HTTP Request: POST http://0.0.0.0:8129/v1/chat/completions "HTTP/1.1 200 OK"
2025-08-10 14:57:49,363 - INFO - HTTP Request: POST http://0.0.0.0:8129/v1/chat/completions "HTTP/1.1 200 OK"
2025-08-10 14:57:49,504 - INFO - HTTP Request: POST http://0.0.0.0:8129/v1/chat/completions "HTTP/1.1 200 OK"
2025-08-10 14:57:49,527 - INFO - HTTP Request: POST http://0.0.0.0:8129/v1/chat/completions "HTTP/1.1 200 OK"
2025-08-10 14:57:49,552 - INFO - HTTP Request: POST http://0.0.0.0:8129/v1

Number of unique conceptualized nodes: 113
Number of unique conceptualized events: 0
Number of unique conceptualized entities: 64
Number of unique conceptualized relations: 51





In [9]:
kg_extractor.create_concept_csv()

Loading concepts...


37it [00:00, 46422.15it/s]


Loading concepts done.
Relation to concepts: 15
Node to concepts: 22
Processing triple nodes...


22it [00:00, 34302.86it/s]


Processing concept nodes...


100%|██████████| 64/64 [00:00<00:00, 222953.04it/s]


Processing triple edges...


15it [00:00, 72232.56it/s]


# Choice 1: Convert to graphml for networkx rag

In [6]:
# convert csv to graphml for networkx
kg_extractor.convert_to_graphml()

## ATLAS Multihop QA

In order to perform RAG, one need to first create embeddings & faiss index for constructed KG

[There maybe performance difference in using AutoModel and Sentence Transformer for NV-Ebmed-v2]

In [None]:
import os
os.environ['CUDA_VISIBLE_DEVICES'] = '1'
from sentence_transformers import SentenceTransformer
from atlas_rag.vectorstore.embedding_model import NvEmbed, SentenceEmbedding
from transformers import AutoModel
# Load the SentenceTransformer model
encoder_model_name = "sentence-transformers/all-MiniLM-L6-v2"
sentence_model = SentenceTransformer(encoder_model_name, trust_remote_code=True, model_kwargs={'device_map': "auto"})
sentence_encoder = SentenceEmbedding(sentence_model)
# sentence_model.max_seq_length = 32768
# sentence_model.tokenizer.padding_side="right"
# sentence_model = AutoModel.from_pretrained(encoder_model_name, trust_remote_code=True, device_map="auto")
# sentence_encoder = NvEmbed(sentence_model)

In [None]:
from openai import OpenAI
from atlas_rag.llm_generator import LLMGenerator
from configparser import ConfigParser
# Load OpenRouter API key from config file
config = ConfigParser()
config.read('config.ini')
# reader_model_name = "meta-llama/llama-3.3-70b-instruct"
reader_model_name = "meta-llama/Llama-3.3-70B-Instruct"
client = OpenAI(
  # base_url="https://openrouter.ai/api/v1",
  # api_key=config['settings']['OPENROUTER_API_KEY'],
  base_url="https://api.deepinfra.com/v1/openai",
  api_key=config['settings']['DEEPINFRA_API_KEY'],
)
llm_generator = LLMGenerator(client=client, model_name=reader_model_name)

In [None]:
from atlas_rag.vectorstore import create_embeddings_and_index
keyword = 'CICGPC_Glazing_ver1.0a'
working_directory = f'import/{keyword}'
data = create_embeddings_and_index(
    sentence_encoder=sentence_encoder,
    model_name = encoder_model_name,
    working_directory=working_directory,
    keyword=keyword,
    include_concept=True,
    include_events=True,
    normalize_embeddings= True,
    text_batch_size=64,
    node_and_edge_batch_size=64,
)

In [None]:
# Initialize desired RAG method for benchmarking
from atlas_rag.retriever import HippoRAG2Retriever
from atlas_rag import setup_logger

hipporag2_retriever = HippoRAG2Retriever(
    llm_generator=llm_generator,
    sentence_encoder=sentence_encoder,
    data = data,
)

In [None]:
# perform retrieval
content, sorted_context_ids = hipporag2_retriever.retrieve("How is the U-value relevant to thermal insulation performance in glazing products?", topN=3)
print(f"Retrieved content: {content}")

In [None]:
# start benchmarking
sorted_context = "\n".join(content)
llm_generator.generate_with_context("How is the U-value relevant to thermal insulation performance in glazing products?", sorted_context, max_new_tokens=2048, temperature=0.5)

# Choice 2: Convert to neo4j dumps

In [None]:
from sentence_transformers import SentenceTransformer
from atlas_rag.vectorstore.embedding_model import SentenceEmbedding
# use sentence embedding if you want to use sentence transformer
# use NvEmbed if you want to use NvEmbed-v2 model
sentence_model = SentenceTransformer('all-MiniLM-L6-v2')
sentence_encoder = SentenceEmbedding(sentence_model)

In [None]:
# add numeric id to the csv so that we can use vector indices
kg_extractor.add_numeric_id()

# compute embedding
kg_extractor.compute_kg_embedding(sentence_encoder) # default encoder_model_name="all-MiniLM-L12-v2", only compute all embeddings except any concept related embeddings
# kg_extractor.compute_embedding(encoder_model_name="all-MiniLM-L12-v2")
# kg_extractor.compute_embedding(encoder_model_name="nvidia/NV-Embed-v2")

# create faiss index
kg_extractor.create_faiss_index(faiss_gpu=False) # default index_type="HNSW,Flat", other options: "IVF65536_HNSW32,Flat" for large KG
# kg_extractor.create_faiss_index(index_type="HNSW,Flat")
# kg_extractor.create_faiss_index(index_type="IVF65536_HNSW32,Flat")


## Install Neo4j Server

Go to the AutoschemaKG/neo4j_scripts directory

```sh get_neo4j_demo.sh```

Then there a neo4j server is install in the directory: neo4j-server-dulce

Start the newly instealled empty Neo4j server for testing

```sh start_neo4j_demo.sh```



## Config Neo4j Server

Stop the server first before config and import data

```sh stop_neo4j_demo.sh```


Copy the ```AutoschemaKG/neo4j_scripts/neo4j.conf``` file to the conf directory of the Neo4j server (```neo4j-server-dulce/conf```). Then, update the following settings as needed: 1.Set dbms.default_database to the desired dataset name, such as ```wiki-csv-json-text```, ```pes2o-csv-json-text```, or ```cc-csv-json-text```. In this case we make it ```dulce-csv-json-text``` 2.Configure the Bolt, HTTP, and HTTPS connectors according to your requirements.

I have set up the config port to some random ports to avoid port conflicts in ```neo4j-server-dulce/conf/neo4j.conf``` .

 
``` 
# Bolt connector
server.bolt.enabled=true
#server.bolt.tls_level=DISABLED
server.bolt.listen_address=0.0.0.0:8612
server.bolt.advertised_address=:8612

# HTTP Connector. There can be zero or one HTTP connectors.
server.http.enabled=true
server.http.listen_address=0.0.0.0:7612
server.http.advertised_address=:7612

# HTTPS Connector. There can be zero or one HTTPS connectors.
server.https.enabled=false
server.https.listen_address=0.0.0.0:7781
server.https.advertised_address=:7781
```


## Import Data
We use the admin import method to import data, which is the fastest way. Other methods are too slow for large graphs.


## Load the CSV files into Neo4j

We try to import data from previously constructed csv files with numeric ids. All the csv files are in ```import/Dulce```. 
In total six csv files for the nodes and edges of triples, text chunks, and concepts. 

``` shell
./neo4j-server-dulce/bin/neo4j-admin database import full dulce-csv-json-text \
    --nodes ./import/Dulce/triples_csv/triple_nodes_Dulce_from_json_without_emb_with_numeric_id.csv \
    --nodes ./import/Dulce/triples_csv/text_nodes_Dulce_from_json_with_numeric_id.csv \
    --nodes ./import/Dulce/concept_csv/concept_nodes_Dulce_from_json_with_concept.csv \
    --relationships ./import/Dulce/triples_csv/triple_edges_Dulce_from_json_without_emb_with_numeric_id.csv \
    --relationships ./import/Dulce/triples_csv/text_edges_Dulce_from_json.csv \
    --relationships ./import/Dulce/concept_csv/concept_edges_Dulce_from_json_with_concept.csv  \
    --overwrite-destination \
    --multiline-fields=true \
    --id-type=string \
    --verbose --skip-bad-relationships=true
```

When this is finished, you can see the following notifications

```shell
IMPORT DONE in 2s 475ms. 
Imported:
  1183 nodes
  2519 relationships
  6743 properties
Peak memory usage: 1.032GiB
```

Then you can start host it by running in ```./neo4j_scripts```

```sh start_neo4j_demo.sh```

When you see the following line, then it is working well.


```Started neo4j (pid:742490). It is available at http://0.0.0.0:7612```



If you want to use the python driver to run neo4j, you need to use port 8612. You can access http://0.0.0.0:7612 in browser as well to use the neo4j GUI. 

The default user is ```neo4j``` with password ```admin2024```. 


## ATLAS Billion Level RAG
The LargeKGRetriever is designed to perform retrieval on a billion-level graph. 

There is a trade-off between retrieval performance and speed; this serves as a proof of concept for a billion-level knowledge graph.

After successfully hosting the Neo4j database, you can run the provided Python script to host the RAG API:
```shell
python neo4j_api_host/atlas_api_demo.py 
```

During the first startup of the API, it will create the necessary indexes and projection graphs in the Neo4j database for faster queries and computations. The time required for this process may vary depending on the size of the database. You can monitor the creation of these items in http://localhost:7612 by using the following commands:

To view the projected graphs:
```cypher
CALL gds.graph.list()
```
To view the indexes:
```cypher
SHOW INDEXES
```

The projected graph will be deleted after the database is shut down, while the indexes will not be removed.

After you saw: \
Index NodeNumericIDIndex created in 0.09 seconds \
Index TextNumericIDIndex created in 0.11 seconds \
Index EntityEventEdgeNumericIDIndex created in 0.02 seconds \
Projection graph largekgrag_graph created in 5.42 seconds 

You can perform rag as follows:

In [None]:
from openai import OpenAI

base_url ="http://0.0.0.0:10085/v1/"
client = OpenAI(api_key="EMPTY", base_url=base_url)

# knowledge graph en_simple_wiki_v0
message = [
    {
        "role": "system",
        "content": "You are a helpful assistant that answers questions based on the knowledge graph.",
    },
    {
        "role": "user",
        "content": "Question: Who is Alex Mercer?",
    }
]
response = client.chat.completions.create(
    model="llama",
    messages=message,
    max_tokens=2048,
    temperature=0.5
)
print(response.choices[0].message.content)