Skip to content

Commit d9c25f6

Browse files
committed
Update.
1 parent b59d082 commit d9c25f6

File tree

1 file changed

+151
-109
lines changed

1 file changed

+151
-109
lines changed

memgraph-graphRAG/graphRAG.ipynb

Lines changed: 151 additions & 109 deletions
Original file line numberDiff line numberDiff line change
@@ -338,6 +338,30 @@
338338
" return response.choices[0].message.content\n"
339339
]
340340
},
341+
{
342+
"cell_type": "markdown",
343+
"metadata": {},
344+
"source": [
345+
"## Running the graph RAG\n",
346+
"\n",
347+
"Now, it all comes together in the `main` function: \n",
348+
"\n",
349+
"1. Connect to the database \n",
350+
"2. Load the .env file with the `OPENAI_API_KEY=` defined\n",
351+
"3. Compute and store the node embeddings \n",
352+
"4. Compute the question embedding based on key information \n",
353+
"5. Perform the vector search to find the most semantically similar node\n",
354+
"6. Get the relevant data that is a few hops away from the pivot node\n",
355+
"7. Ask LLM the question with the relevant data "
356+
]
357+
},
358+
{
359+
"cell_type": "code",
360+
"execution_count": null,
361+
"metadata": {},
362+
"outputs": [],
363+
"source": []
364+
},
341365
{
342366
"cell_type": "code",
343367
"execution_count": null,
@@ -402,6 +426,45 @@
402426
" main()\n"
403427
]
404428
},
429+
{
430+
"cell_type": "markdown",
431+
"metadata": {},
432+
"source": [
433+
"\n",
434+
"Here are a few examples of questions and answers: \n",
435+
"\n",
436+
"`To whom was Viserys Targaryen loyal to?`\n",
437+
"\n",
438+
"The response is:\n",
439+
"**Based on the provided data, Viserys Targaryen was loyal to House Targaryen. This information is derived from the relationships indicating that Viserys Targaryen was loyal to House Targaryen and the connections between them**\n",
440+
"\n",
441+
"`Who killed Viserys Targaryen in Game of thrones?`\n",
442+
"\n",
443+
"The response is:\n",
444+
"\n",
445+
"**Based on the provided relevance expansion data, Khal Drogo killed Viserys Targaryen in \"Game of Thrones.\" This information is inferred from the relationship where Khal Drogo is linked to Viserys Targaryen with the action of being \"KILLED\" by Khal Drogo. The data does not show any other character directly killing Viserys Targaryen.\n",
446+
"**\n",
447+
"\n",
448+
"`\"What was the weapon used to kill Viserys Targaryen in Game of Thrones?\"`\n",
449+
"\n",
450+
"The response is: \n",
451+
"**\n",
452+
"Based on the provided data, the weapon used to kill Viserys Targaryen in Game of Thrones was not explicitly mentioned. The data only shows that Khal Drogo was involved in the killing of Viserys Targaryen. There is no specific mention of the weapon used in the relevance expansion data. Therefore, I do not have enough information to answer the question about the weapon used to kill Viserys Targaryen.**\n",
453+
"\n",
454+
"This response is wrong, there is a method mentioned, not weapon, but LLM didn't catch the context due to different naming. \n",
455+
"\n",
456+
"`\"Who betrayed Viserys Targaryen in Game of Thrones?\"`\n",
457+
"\n",
458+
"Based on the provided data, Khal Drogo betrayed Viserys Targaryen in Game of Thrones by killing him. This conclusion is drawn from the relationship between Khal Drogo and Viserys Targaryen where it is stated that Khal Drogo killed Viserys Targaryen.\n",
459+
"\n",
460+
"This response is based on the killing relationship, but betrayal could have different consequences. \n",
461+
"\n",
462+
"Let's expand this knowledge. \n",
463+
"\n",
464+
"`\"Who betrayed Viserys Targaryen in Game of Thrones?\"`\n",
465+
"Based on the provided data, Khal Drogo betrayed Viserys Targaryen in Game of Thrones. This conclusion is derived from the relationship between Viserys Targaryen and Khal Drogo, where Khal Drogo is connected to Viserys Targaryen through the 'BETRAYED_BY' relationship, indicating that Khal Drogo betrayed Viserys Targaryen."
466+
]
467+
},
405468
{
406469
"cell_type": "markdown",
407470
"metadata": {},
@@ -419,12 +482,20 @@
419482
},
420483
{
421484
"cell_type": "code",
422-
"execution_count": 9,
485+
"execution_count": null,
423486
"metadata": {},
424487
"outputs": [],
425488
"source": [
426489
"# Sample text summary for processing\n",
427-
"summary=\"Viserys Targaryen is the last living son of the former king, Aerys II Targaryen (the 'Mad King'). As one of the last known Targaryen heirs, Viserys Targaryen is obsessed with reclaiming the Iron Throne and restoring his family’s rule over Westeros. Ambitious and arrogant, he often treats his younger sister, Daenerys Targaryen, as a pawn, seeing her only as a means to gain power. His ruthless ambition leads him to make a marriage alliance with Khal Drogo, a powerful Dothraki warlord, hoping Khal Drogo will give him the army he needs. However, Viserys Targaryen’s impatience and disrespect toward the Dothraki culture lead to his downfall; he is ultimately killed by Khal Drogo in a brutal display of 'a crown for a king' – having molten gold poured over his head. Khal Drogo is a prominent warlord and leader of the Dothraki people, known for his fearsome reputation and formidable combat skills. He enters into a marriage with Princess Daenerys Targaryen as part of an alliance orchestrated by her brother, Prince Viserys Targaryen. Though initially aloof and intimidating to Daenerys Targaryen, Khal Drogo grows to care deeply for her, and their relationship evolves into one of mutual respect and love. Khal Drogo becomes devoted to Daenerys Targaryen and her dreams, including her goal of reclaiming the Iron Throne for House Targaryen. However, Khal Drogo’s fate takes a tragic turn after he sustains a serious wound in a skirmish, which becomes infected. A healer named Mirri Maz Duur performs a ritual that leaves Khal Drogo in a vegetative state, robbing him of his strength and dignity. Heartbroken, Daenerys Targaryen ends Khal Drogo’s life mercifully, signaling both the end of her first love and a significant turning point in her journey. King Joffrey Baratheon is the eldest son of Queen Cersei Lannister and, officially, King Robert Baratheon, though he is actually the result of an incestuous relationship between Queen Cersei and her twin brother, Ser Jaime Lannister. Joffrey Baratheon's personality is marked by sadism, cruelty, and impulsive behavior, traits that make him a despised ruler and widely loathed by the people of Westeros. Following King Robert Baratheon’s death, Joffrey Baratheon takes the throne and quickly reveals himself as a tyrannical ruler, prone to rash decisions and heedless of the advice given by those around him. His cruelty is particularly directed at Lady Sansa Stark, his former fiancée, whom he torments and humiliates on multiple occasions. As king, he alienates his allies and fosters unrest with his reckless brutality. Joffrey Baratheon’s reign ends abruptly when he is poisoned at his wedding feast to Margaery Tyrell, an event known as the 'Purple Wedding' which brings relief to many who suffered under his rule. His death marks a turning point in the power struggles of King’s Landing.\""
490+
"summary = \"\"\"\n",
491+
" Viserys Targaryen is the last living son of the former king, Aerys II Targaryen (the 'Mad King').\n",
492+
" As one of the last known Targaryen heirs, Viserys Targaryen is obsessed with reclaiming the Iron Throne and \n",
493+
" restoring his family’s rule over Westeros. Ambitious and arrogant, he often treats his younger sister, Daenerys Targaryen, \n",
494+
" as a pawn, seeing her only as a means to gain power. His ruthless ambition leads him to make a marriage alliance with \n",
495+
" Khal Drogo, a powerful Dothraki warlord, hoping Khal Drogo will give him the army he needs. \n",
496+
" However, Viserys Targaryen’s impatience and disrespect toward the Dothraki culture lead to his downfall;\n",
497+
" he is ultimately killed by Khal Drogo in a brutal display of 'a crown for a king' – having molten gold poured over his head. \n",
498+
" \"\"\""
428499
]
429500
},
430501
{
@@ -436,10 +507,7 @@
436507
"TODO: add links \n",
437508
"\n",
438509
"The first step in the process is to extract entities from the summary using\n",
439-
"SpaCy’s large language model. SpaCy is an advanced NLP (natural language\n",
440-
"processing) library in Python, designed for tasks such as entity recognition,\n",
441-
"part-of-speech tagging, and dependency parsing. It’s widely used for its speed\n",
442-
"and accuracy in processing text.\n",
510+
"SpaCy’s LLM.\n",
443511
"\n",
444512
"To begin, we need to install SpaCy and the specific model we wll be using."
445513
]
@@ -455,29 +523,6 @@
455523
"%python -m spacy download en_core_web_md"
456524
]
457525
},
458-
{
459-
"cell_type": "markdown",
460-
"metadata": {},
461-
"source": [
462-
"Next, set up your OpenAI API key."
463-
]
464-
},
465-
{
466-
"cell_type": "code",
467-
"execution_count": 10,
468-
"metadata": {},
469-
"outputs": [],
470-
"source": [
471-
"import os\n",
472-
"from wasabi import msg\n",
473-
"\n",
474-
"os.environ[\"OPENAI_API_KEY\"] = \"<YOUR_OPENAI_KEY>\"\n",
475-
"\n",
476-
"# Check for OpenAI API key\n",
477-
"if not os.getenv(\"OPENAI_API_KEY\"):\n",
478-
" msg.fail(\"OPENAI_API_KEY environment variable not set. Please set it to proceed.\", exits=1)"
479-
]
480-
},
481526
{
482527
"cell_type": "markdown",
483528
"metadata": {},
@@ -501,54 +546,47 @@
501546
"metadata": {},
502547
"outputs": [],
503548
"source": [
504-
"\n",
549+
"import os\n",
550+
"import spacy\n",
551+
"from spacy_llm.util import assemble\n",
505552
"import json\n",
506553
"from collections import Counter\n",
507554
"from pathlib import Path\n",
508555
"\n",
509-
"import spacy\n",
510-
"from spacy_llm.util import assemble\n",
511-
"\n",
512-
"# Load the spaCy model\n",
513-
"nlp = spacy.load(\"en_core_web_md\")\n",
514-
"\n",
515556
"# Split document into sentences\n",
516-
"def split_document_sent(text):\n",
557+
"def split_document_sent(text, nlp):\n",
517558
" doc = nlp(text)\n",
518559
" return [sent.text.strip() for sent in doc.sents]\n",
519560
"\n",
520-
"def process_text(text, verbose=False):\n",
561+
"\n",
562+
"def process_text(text, nlp, verbose=False):\n",
521563
" doc = nlp(text)\n",
522564
" if verbose:\n",
523-
" msg.text(f\"Text: {doc.text}\")\n",
524-
" msg.text(f\"Entities: {[(ent.text, ent.label_) for ent in doc.ents]}\")\n",
565+
" print(f\"Text: {doc.text}\")\n",
566+
" print(f\"Entities: {[(ent.text, ent.label_) for ent in doc.ents]}\")\n",
525567
" return doc\n",
526568
"\n",
569+
"\n",
527570
"# Pipeline to run entity extraction\n",
528-
"def extract_entities(text, verbose=False):\n",
571+
"def extract_entities(text, nlp, verbose=False):\n",
529572
" processed_data = []\n",
530573
" entity_counts = Counter()\n",
531574
"\n",
532-
" sentences = split_document_sent(text)\n",
575+
" sentences = split_document_sent(text, nlp)\n",
533576
" for sent in sentences:\n",
534-
" doc = process_text(sent, verbose)\n",
577+
" doc = process_text(sent, nlp, verbose)\n",
535578
" entities = [(ent.text, ent.label_) for ent in doc.ents]\n",
536579
"\n",
537580
" # Store processed data for each sentence\n",
538-
" processed_data.append({'text': doc.text, 'entities': entities})\n",
581+
" processed_data.append({\"text\": doc.text, \"entities\": entities})\n",
539582
"\n",
540583
" # Update counters\n",
541584
" entity_counts.update([ent[1] for ent in entities])\n",
542585
"\n",
543586
" # Export to JSON\n",
544-
" with open('processed_data.json', 'w') as f:\n",
587+
" with open(\"processed_data.json\", \"w\") as f:\n",
545588
" json.dump(processed_data, f)\n",
546-
"\n",
547-
" msg.text(f\"Entity counts: {entity_counts}\")\n",
548-
"\n",
549-
"# Run the pipeline on the summary text\n",
550-
"verbose = True\n",
551-
"extract_entities(summary, verbose)\n"
589+
"\n"
552590
]
553591
},
554592
{
@@ -572,45 +610,39 @@
572610
"metadata": {},
573611
"outputs": [],
574612
"source": [
575-
"import json\n",
576-
"import openai\n",
577-
"from pathlib import Path\n",
613+
" \n",
614+
" # Load the spaCy model\n",
615+
"\n",
616+
" nlp = spacy.load(\"en_core_web_md\")\n",
617+
" extract_entities(summary, nlp)\n",
618+
"\n",
619+
"\n",
620+
" # Load processed data from JSON\n",
621+
" json_path = Path(\"processed_data.json\")\n",
622+
" with open(json_path, \"r\") as f:\n",
623+
" processed_data = json.load(f)\n",
624+
"\n",
625+
" # Prepare nodes and relationships\n",
626+
" nodes = []\n",
627+
" relationships = []\n",
628+
"\n",
629+
" # Formulate a prompt for GPT-4\n",
630+
" prompt = (\n",
631+
" \"Extract entities and relationships from the following JSON data. For each entry in data['entities'], \"\n",
632+
" \"create a 'node' dictionary with fields 'id' (unique identifier), 'name' (entity text), and 'type' (entity label). \"\n",
633+
" \"For entities that have meaningful connections, define 'relationships' as dictionaries with 'source' (source node id), \"\n",
634+
" \"'target' (target node id), and 'relationship' (type of connection). Create max 30 nodes, format relationships in the format of capital letters and _ inbetween words and format the entire response in the JSON output containing only variables nodes and relationships without any text inbetween. Use following labels for nodes: Character, Title, Location, House, Death, Event, Allegiance and following relationship types: HAPPENED_IN, SIBLING_OF, PARENT_OF, MARRIED_TO, HEALED_BY, RULES, KILLED, LOYAL_TO, BETRAYED_BY. Make sure the entire JSON file fits in the output\"\n",
635+
" \"JSON data:\\n\"\n",
636+
" f\"{json.dumps(processed_data)}\"\n",
637+
" )\n",
578638
"\n",
579-
"# Load processed data from JSON\n",
580-
"json_path = Path(\"processed_data.json\")\n",
581-
"with open(json_path, \"r\") as f:\n",
582-
" processed_data = json.load(f)\n",
583-
"\n",
584-
"# Prepare nodes and relationships\n",
585-
"nodes = []\n",
586-
"relationships = []\n",
587-
"\n",
588-
"# Formulate a prompt for GPT-4\n",
589-
"prompt = (\n",
590-
" \"Extract entities and relationships from the following JSON data. For each entry in data['entities'], \"\n",
591-
" \"create a 'node' dictionary with fields 'id' (unique identifier), 'name' (entity text), and 'type' (entity label). \"\n",
592-
" \"For entities that have meaningful connections, define 'relationships' as dictionaries with 'source' (source node id), \"\n",
593-
" \"'target' (target node id), and 'relationship' (type of connection). Create max 30 nodes, format relationships in the format of capital letters and _ inbetween words and format the entire response in the JSON output containing only variables nodes and relationships without any text inbetween. Use following labels for nodes: Character, Title, Location, House, Death, Event, Allegiance and following relationship types: HAPPENED_IN, SIBLING_OF, PARENT_OF, MARRIED_TO, HEALED_BY, RULES, KILLED, LOYAL_TO, BETRAYED_BY. Make sure the entire JSON file fits in the output\" \n",
594-
" \"JSON data:\\n\"\n",
595-
" f\"{json.dumps(processed_data)}\"\n",
596-
")\n",
597-
"\n",
598-
"# Call GPT-4 to analyze the JSON and extract structured nodes and relationships\n",
599-
"response = openai.ChatCompletion.create(\n",
600-
" model=\"gpt-4\",\n",
601-
" messages=[{\"role\": \"system\", \"content\": \"You are a helpful assistant that structures data into nodes and relationships.\"},\n",
602-
" {\"role\": \"user\", \"content\": prompt}],\n",
603-
" max_tokens=2000\n",
604-
")\n",
605-
"\n",
606-
"# Parse GPT-4 response and add to nodes and relationships lists\n",
607-
"output = response['choices'][0]['message']['content']\n",
608-
"print(output)\n",
609-
"structured_data = json.loads(output) # Assuming GPT-4 outputs structured JSON\n",
610-
"\n",
611-
"# Populate nodes and relationships lists\n",
612-
"nodes.extend(structured_data.get(\"nodes\", []))\n",
613-
"relationships.extend(structured_data.get(\"relationships\", []))\n"
639+
" response = asyncio.run(get_response(client, prompt))\n",
640+
" structured_data = json.loads(response) # Assuming GPT-4 outputs structured JSON\n",
641+
"\n",
642+
" # Populate nodes and relationships lists\n",
643+
" nodes.extend(structured_data.get(\"nodes\", []))\n",
644+
" relationships.extend(structured_data.get(\"relationships\", []))\n",
645+
"\n"
614646
]
615647
},
616648
{
@@ -669,26 +701,36 @@
669701
"metadata": {},
670702
"outputs": [],
671703
"source": [
672-
"from neo4j import GraphDatabase\n",
673-
"\n",
674-
"# Initialize the Neo4j driver for Memgraph (modify the URI if necessary)\n",
675-
"uri = \"bolt://localhost:7687\"\n",
676-
"user = \"\"\n",
677-
"password = \"\"\n",
678-
"driver = GraphDatabase.driver(uri, auth=(user, password))\n",
679-
"\n",
680-
"# Function to execute Cypher queries in Memgraph\n",
681-
"def execute_cypher_queries(queries):\n",
682-
" with driver.session() as session:\n",
683-
" for query in queries:\n",
684-
" try:\n",
685-
" session.run(query)\n",
686-
" msg.good(f\"Executed query: {query}\")\n",
687-
" except Exception as e:\n",
688-
" msg.fail(f\"Error executing query: {query}. Error: {e}\")\n",
689-
"\n",
690-
"# Execute the generated Cypher queries\n",
691-
"execute_cypher_queries(cypher_queries)\n",
704+
"with driver.session() as session:\n",
705+
" for query in cypher_queries:\n",
706+
" try:\n",
707+
" session.run(query)\n",
708+
" print(f\"Executed query: {query}\")\n",
709+
" except Exception as e:\n",
710+
" print(f\"Error executing query: {query}. Error: {e}\")"
711+
]
712+
},
713+
{
714+
"cell_type": "markdown",
715+
"metadata": {},
716+
"source": [
717+
"The dataset now has additional knowledge: \n",
718+
"`\n",
719+
"Executed query: MATCH (a {id: 1}), (b {id: 2}) CREATE (a)-[:PARENT_OF]->(b)\n",
720+
"Executed query: MATCH (a {id: 1}), (b {id: 3}) CREATE (a)-[:SIBLING_OF]->(b)\n",
721+
"Executed query: MATCH (a {id: 1}), (b {id: 4}) CREATE (a)-[:LOYAL_TO]->(b)\n",
722+
"Executed query: MATCH (a {id: 1}), (b {id: 5}) CREATE (a)-[:HAPPENED_IN]->(b)\n",
723+
"Executed query: MATCH (a {id: 1}), (b {id: 6}) CREATE (a)-[:HAPPENED_IN]->(b)\n",
724+
"Executed query: MATCH (a {id: 1}), (b {id: 7}) CREATE (a)-[:SIBLING_OF]->(b)\n",
725+
"Executed query: MATCH (a {id: 1}), (b {id: 8}) CREATE (a)-[:MARRIED_TO]->(b)\n",
726+
"Executed query: MATCH (a {id: 1}), (b {id: 9}) CREATE (a)-[:HEALED_BY]->(b)\n",
727+
"Executed query: MATCH (a {id: 8}), (b {id: 9}) CREATE (a)-[:RULES]->(b)\n",
728+
"Executed query: MATCH (a {id: 8}), (b {id: 1}) CREATE (a)-[:BETRAYED_BY]->(b)\n",
729+
"` \n",
730+
"\n",
731+
"Now again asking the same question yields a correct answer: `\"Who betrayed Viserys Targaryen in Game of Thrones?\"`\n",
732+
"\n",
733+
"Based on the provided data, Khal Drogo betrayed Viserys Targaryen in Game of Thrones. This conclusion is derived from the relationship between Viserys Targaryen and Khal Drogo, where Khal Drogo is connected to Viserys Targaryen through the 'BETRAYED_BY' relationship, indicating that Khal Drogo betrayed Viserys Targaryen.\n",
692734
"\n"
693735
]
694736
}

0 commit comments

Comments
 (0)