Install the Neo4j graphdatascience ([Documentation](https://neo4j.com/docs/graph-data-science/current/)) and openai libraries

In [1]:
%%capture
try:
    from graphdatascience import GraphDataScience
except:
    !pip install graphdatascience
    from graphdatascience import GraphDataScience

try:
    from openai import openai
except:
    !pip install openai
    import openai

Import modules

In [2]:
import pandas as pd
import os
import getpass

Register for a [Neo4j sandbox](https://sandbox.neo4j.com) and create a movies project, then register an account with [OpenAI](https://openai.com/) and get an API key

In [None]:
connectionUrl = input("Neo4j Database Url: ")
username = 'neo4j'
password = input("Password: ")
os.environ["OPENAI_API_KEY"] = getpass.getpass(prompt='OpenAI API key: ')
openai.api_key = os.getenv('OPENAI_API_KEY')

Verify the database connection and return the Graph Data Science version

In [None]:
gds = GraphDataScience(connectionUrl, auth=(username, password))
gds.set_database('neo4j')
print(gds.version())

Declare OpenAI API functions for later use

In [5]:
def get_actor_birthplace(actor):
    completion = openai.ChatCompletion.create(model="gpt-3.5-turbo", messages=[{"role": "user", "content": "Return the city and the country the actor " + actor + " was born in. I don't want any information other than city and country and no punctuation at the end. If the country is the United States then return USA as country. If you don't know which country the actor is born in then answer: n/a, n/a. Use this format: City, Country"}])
    return completion.choices[0].message.content

In [6]:
def get_actor_bio(actor):
    completion = openai.ChatCompletion.create(model="gpt-3.5-turbo", messages=[{"role": "user", "content": "Return a short biography of the actor " + actor + ". A maximum of 150 characters"}])
    return completion.choices[0].message.content

Match persons who acted in a movie from the movies graph and have a look at the pandas dataframe, the result is limited to 10 persons for this example

In [None]:
actors_df = gds.run_cypher("""
    MATCH (p:Person{name: 'Emil Eifrem'})
    RETURN p.name AS actor
    UNION
    MATCH (n:Person WHERE n.name <> 'Hugo Weaving')
    WHERE (n)-[:ACTED_IN]->(:Movie)
    RETURN n.name AS actor LIMIT 10
""")
actors_df.head(10)

Apply the OpenAI functions for each row in the pandas dataframe

In [8]:
actors_df['birthplace'] = actors_df.apply(lambda x: get_actor_birthplace(x['actor']), axis=1)

In [9]:
actors_df['biography'] = actors_df.apply(lambda x: get_actor_bio(x['actor']), axis=1)

The dataset now has two new columns. <u>**Note that the result will vary and sometimes be incorrect.** <u/>



In [None]:
actors_df.head(10)


Enrich the graph by adding a biography property to the person nodes

In [11]:
gds.run_cypher(
    """
    UNWIND $actors AS actor
    MATCH (p:Person{name: actor['actor']})
      SET p.biography = actor['biography']
    """,
    params = { 'actors': actors_df.to_dict(orient='records') }
)

Create new nodes and relationships representing the cities and countries the persons are born in

In [12]:
gds.run_cypher(
    """
    UNWIND $actors AS actor
    MATCH (p:Person{name: actor['actor']})
    WITH p as actor, split(actor['birthplace'], ', ') AS birthplace
    MERGE (city:City{name: birthplace[0]})
    MERGE (country:Country{name: birthplace[1]})
    MERGE (actor)-[:BORN_IN]->(city)
    MERGE (city)-[:IN_COUNTRY]->(country)
    """,
    params = { 'actors': actors_df.to_dict(orient='records') }
)