<a href="https://colab.research.google.com/github/sidagarwal04/graph-powered-nlp-workshop/blob/main/graph_powered_nlp.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Graph Powered NLP Workshop Python Notebook

### Installing necessary drivers - Google Generative AI, Neo4j and Gradio.

**Instruction:** Don't forget to restart the runtime after running the below cell

In [2]:
!pip install google-generativeai
!pip install -q neo4j-driver
!pip install -q gradio



### Import necessary libraries from installed packages/drivers - PaLM, base64, json, gradio and GraphDatabase

In [3]:
import google.generativeai as genai
import base64
import json
import gradio as gr
from neo4j import GraphDatabase
from google.colab import userdata

### Add Gemini API Key from Google AI Studio

**Instruction:** Replace "API_KEY" with the value of API Key copied from Google AI Studio as mentioned in Step #6 of Part #2 of [Step-by-Step Guide](https://github.com/sidagarwal04/graph-powered-nlp-workshop/blob/main/step-by-step-guide.md#part-2-create-google-makersuite-account-train--test-prompt-in-google-makersuite-and-get-google-palm-2-api-key). Don't forget to add to include the key in double-quotes (" ")

In [4]:
genai.configure(api_key = userdata.get('apiKey'))

### Include the generated prompt from Google AI Studio.

**Instruction:** Remove the initial part of installing drivers and configuring the API key as it has already been done in previous steps. Also, put the entire code as a function with output to be returned instead of printing it.

In [None]:
def get_answer(input):

  generation_config = {
    "temperature": 0.9,
    "top_p": 1,
    "top_k": 1,
    "max_output_tokens": 2048,
    "stop_sequences": [
      "0",
    ],
  }

  safety_settings = [
    {
      "category": "HARM_CATEGORY_HARASSMENT",
      "threshold": "BLOCK_MEDIUM_AND_ABOVE"
    },
    {
      "category": "HARM_CATEGORY_HATE_SPEECH",
      "threshold": "BLOCK_MEDIUM_AND_ABOVE"
    },
    {
      "category": "HARM_CATEGORY_SEXUALLY_EXPLICIT",
      "threshold": "BLOCK_MEDIUM_AND_ABOVE"
    },
    {
      "category": "HARM_CATEGORY_DANGEROUS_CONTENT",
      "threshold": "BLOCK_MEDIUM_AND_ABOVE"
    },
  ]

  model = genai.GenerativeModel(model_name="gemini-1.0-pro",
                                generation_config=generation_config,
                                safety_settings=safety_settings)

  # prompt = f"""You are an expert in converting English questions to Neo4j Cypher Graph code! The Graph has following Node Labels - Movie, Person! the Movie Node has the following properties released, tagline, title. The Person node has properties such as name &amp; born. The Neo4j Graph has the following Relationship types ACTED_IN, DIRECTED, FOLLOWS, PRODUCED, REVIEWED, WROTE!

  # All relationships ACTED_IN, DIRECTED, PRODUCED, REVIEWED, WROTE start from Person node to Movie node and not the other way around except for FOLLOWS relationship which starts from Person node to Person node.

  # For example,
  # Example 1 - List down 5 movies that released after the year 2000, the Cypher command will be something like this
  # ``` MATCH (m:Movie)
  # WHERE m.released > 2000
  # RETURN m LIMIT 5
  # ```

  # Example 2 - Get all the people who acted in a movie that was released after 2010.
  # ```
  # MATCH (p:Person)-[r:ACTED_IN]->(m:Movie)
  # WHERE m.released > 2010
  # RETURN p,r,m
  # ```

  # Example 3 - Name the Director of the movie Apollo 13?
  # ```
  # MATCH (m:Movie)<-[:DIRECTED]-(p:Person)
  # WHERE m.title = 'Apollo';
  # RETURN p.name
  # ```

  # Do not include ``` and \n in the output

  # {input}"""
  # response = genai.generate_text(**defaults,prompt=prompt)

  prompt_parts = [
    "You are an expert in converting English questions to Neo4j Cypher Graph code! The Graph has following Node Labels - Movie, Person! the Movie Node has the following properties released, tagline, title. The Person node has properties such as name & born. The Neo4j Graph has the following Relationship types ACTED_IN, DIRECTED, FOLLOWS, PRODUCED, REVIEWED, WROTE!  All relationships ACTED_IN, DIRECTED, PRODUCED, REVIEWED, WROTE start from Person to Movie and not the other way around except for FOLLOWS relationship which starts from a Person node and ends on a Person node.\n\nDo not include ``` and \\n in the output",
    "input: List down 5 movies that released after the year 2000, the Cypher command will be something like this",
    "output: MATCH (m:Movie)  WHERE m.released > 2000  RETURN m LIMIT 5",
    "input: Get all the people who acted in a movie that was released after 2010",
    "output: MATCH (p:Person)-[r:ACTED_IN]->(m:Movie)  WHERE m.released > 2010  RETURN p,r,m",
    "input: Name the Director of the movie Apollo 13?",
    "output: MATCH (m:Movie)<-[:DIRECTED]-(p:Person)  WHERE m.title = 'Apollo';  RETURN p.name",
    "input: Who were the actors in the movie V for Vendetta",
    "output: ",
  ]

  response = model.generate_content(prompt_parts)
  return response.text

### Testing the output of get_answer() function with a test input

In [8]:
get_answer("Who were the actors in the movie V for Vendetta")

"MATCH (m:Movie)  WHERE m.title = 'V for Vendetta';  MATCH (p:Person)-[r:ACTED_IN]->(m)  RETURN p.name"

### Initialize GraphDatabase driver

**Instruction:** Replace URI, username and password before running the cell with the values from the txt file downloaded when creating Neo4j AuraDB instance in Step #4 of Part #1 of [Step By Step Guide](https://github.com/sidagarwal04/graph-powered-nlp-workshop/blob/main/step-by-step-guide.md#part-1-create-and-load-a-neo4j-instance) of this workshop.

In [25]:
uri=userdata.get('NEO4J_URI')
password=userdata.get('NEO4J_PASSWORD')
driver = GraphDatabase.driver(uri,
                              auth=("neo4j",
                                    password))

### Import required library for processing regular expressions

In [26]:
import re

### Function to clean the output query from get_answer() function by removing slash n's (\n) and substituting it with a space if it exists. Also, extract the string after RETURN expression in the output cypher query and utilize as a separate key to be used for printing the output in chatbot in later steps

In [27]:
def extract_query_and_return_key(input_query_result):
    slash_n_pattern = r'[ \n]+'
    ret_pattern = r'RETURN\s+(.*)'
    replacement = ' '

    cleaned_query = re.sub(slash_n_pattern, replacement, input_query_result)
    if cleaned_query:
        match = re.search(ret_pattern, cleaned_query)
        if match:
            extracted_string = match.group(1)
        else:
            extracted_string = ""
    return cleaned_query, extracted_string

### Testing the extract_query_and_return_key() function with a test input in natural language

In [28]:
extract_query_and_return_key(get_answer("Who were the actors in the movie V for Vendetta"))

("MATCH (m:Movie)<-[:ACTED_IN]-(p:Person) WHERE m.title = 'V for Vendetta'; RETURN p.name",
 'p.name')

### format_names_with_ampersand() to return results as a comma-separated string of values with last value having '&' (ampersand/and) in case the output is a list of values.

In [30]:
def format_names_with_ampersand(names):
    if len(names) == 0:
        return ""
    elif len(names) == 1:
        return names[0]
    else:
        formatted_names = ", ".join(names[:-1]) + " & " + names[-1]
        return formatted_names

### Testing format_names_with_ampersand() with sample input having list of values

In [31]:
format_names_with_ampersand(["Sachin","Virat","Rahul"])

'Sachin, Virat & Rahul'

### run_cypher_on_neo4j() to pass the output query from get_answer() to the Neo4j Database. If the length of output list is more than 1, format_name_with_ampersand() will further format the list and if the length of output list is equal to 1, output list is returned as it is. In case the output list is empty, an empty string is returned

In [32]:
def run_cypher_on_neo4j(inp_query, inp_key):
    out_list = []
    with driver.session() as session:
        result = session.run(inp_query)
        for record in result:
            out_list.append(record[inp_key])
    driver.close()
    if len(out_list) > 1:
        return format_names_with_ampersand(out_list)
    elif len(out_list) == 1:
        return out_list[0]
    else:
        return ""

### Additional generate_and_exec_cypher() to parse and format the output of get_answer() and pass it to run_cypher_on_neo4j()

In [33]:
def generate_and_exec_cypher(input_query):
    gen_query, gen_key = extract_query_and_return_key(get_answer(input_query))
    return run_cypher_on_neo4j(gen_query, gen_key)

### chatbot() to initiliaze the chatbot and pass the output of generate_and_exec_cypher to be displayed in the chatbot

In [34]:
def chatbot(input, history=[]):
    output = str(generate_and_exec_cypher(input))
    history.append((input, output))
    return history, history

### Initializing Gradio interface to run the chatbot.

**Instruction:** Run the chatbot in the localhost url generated after running the cell and play aroung with input and output in natural language while fetching the results from the Neo4j Database using PaLM 2 API for converting input text into cypher code.

In [None]:
gr.Interface(fn = chatbot,
             inputs = ["text",'state'],
             outputs = ["chatbot",'state']).launch(debug = True)

Setting queue=True in a Colab notebook requires sharing enabled. Setting `share=True` (you can turn this off by setting `share=False` in `launch()` explicitly).

Colab notebook detected. This cell will run indefinitely so that you can see errors and logs. To turn off, set debug=False in launch().
Running on public URL: https://eb4b137911df76b229.gradio.live

This share link expires in 72 hours. For free permanent hosting and GPU upgrades, run `gradio deploy` from Terminal to deploy to Spaces (https://huggingface.co/spaces)
