# Neo4j



## Deploy

```
docker run \
    --publish=7474:7474 --publish=7687:7687 \
    --env NEO4J_AUTH=none \
    --volume=./data:/data \
    neo4j:2025.08.0
    
```


## Limitações da versão community:


* Não suporta multiplos ambientes de grafos;
* Não escala para multiplos nós
* Controle de acesso baseado em função (RBAC) e recursos de segurança avançados, como integração LDAP, são exclusivos da Enterprise Edition.
* Limite de nós: 34 bilhões
* API do Spark funcional, mas limitada ao Spark 3.0+
* Outras comparações: https://neo4j.com/pricing/


# Experimentos com o Neo4j + Spark + Python

Acessar `http://0.0.0.0:7474/browser/` e rodar o `:play movie-graph` (para a criação inicial do grafo)



Sobre o conector Spark:
* Doc: `https://neo4j.com/docs/spark/current/`
* Compatibilidade: O conector suporta o Spark 3.0+


Sobre o conector Python:
* Doc: `https://neo4j.com/docs/python-manual/current/connect/`
* Não permite multiplas ciphers em um único run;

Exemplos de consulta para o grafo de exemplo "Movies":

Find the actor named "Tom Hanks":
    
    `MATCH (tom {name: "Tom Hanks"}) RETURN tom`


Find the movie with title "Cloud Atlas":

    `MATCH (cloudAtlas {title: "Cloud Atlas"}) RETURN cloudAtlas`


Find 10 people:

    `MATCH (people:Person) RETURN people.name LIMIT 10`
    
Find movies released in the 1990s:

    `MATCH (nineties:Movie) WHERE nineties.released >= 1990 AND nineties.released < 2000 RETURN nineties.title`


Movies and actors up to 4 "hops" away from Kevin Bacon

    `MATCH (bacon:Person {name:"Kevin Bacon"})-[*1..4]-(hollywood) RETURN DISTINCT hollywood`

In [87]:

from pyspark.sql import SparkSession

# Configurações de conexão com Neo4j
url = "neo4j://localhost:7687"  
# username = "neo4j"
# password = "password"
dbname = "neo4j"

# Criando a SparkSession com configurações globais
spark = (
    SparkSession.builder
    .config("neo4j.url", url)
    .config('spark.jars.packages', 'org.neo4j:neo4j-connector-apache-spark_2.12:5.1.0_for_spark_3')
    #.config("neo4j.authentication.basic.username", username)
    #.config("neo4j.authentication.basic.password", password)
    .config("neo4j.database", dbname)
    .getOrCreate()
)

Find the actor named "Tom Hanks"...

In [103]:
query = """
  MATCH (tom {name: "Tom Hanks"})
  RETURN tom
"""

df = spark.read.format("org.neo4j.spark.DataSource")\
  .option("query", query)\
  .load()\
  .select("tom.*")

df.show(truncate=False)


+----+--------+---------+----+
|<id>|<labels>|name     |born|
+----+--------+---------+----+
|692 |[Person]|Tom Hanks|1956|
+----+--------+---------+----+



Find movies released in the 1990s...

In [104]:
query = """
  MATCH (nineties:Movie) WHERE nineties.released >= 1990 AND nineties.released < 2000 RETURN nineties.title
"""

df = spark.read.format("org.neo4j.spark.DataSource")\
  .option("query", query)\
  .load()

df.show(truncate=False)

+----------------------+
|nineties.title        |
+----------------------+
|The Matrix            |
|The Devil's Advocate  |
|A Few Good Men        |
|As Good as It Gets    |
|What Dreams May Come  |
|Snow Falling on Cedars|
|You've Got Mail       |
|Sleepless in Seattle  |
|Joe Versus the Volcano|
|When Harry Met Sally  |
|That Thing You Do     |
|The Birdcage          |
|Unforgiven            |
|Johnny Mnemonic       |
|The Green Mile        |
|Hoffa                 |
|Apollo 13             |
|Twister               |
|Bicentennial Man      |
|A League of Their Own |
+----------------------+



Movies and actors up to 4 "hops" away from Kevin Bacon

In [105]:
query = """
MATCH (bacon:Person {name:"Kevin Bacon"})-[*1..4]-(hollywood) RETURN DISTINCT hollywood
"""

df = spark.read.format("org.neo4j.spark.DataSource")\
  .option("query", query)\
  .load()

df.show(truncate=False) #.select("hollywood.*")

+----------------------------------------+
|hollywood                               |
+----------------------------------------+
|{765, [Movie], null, null}              |
|{758, [Movie], null, null}              |
|{636, [Movie], null, null}              |
|{754, [Person], Sam Rockwell, 1968}     |
|{760, [Person], Michael Sheen, 1969}    |
|{736, [Person], Ron Howard, 1954}       |
|{759, [Person], Frank Langella, 1938}   |
|{761, [Person], Oliver Platt, 1960}     |
|{692, [Person], Tom Hanks, 1956}        |
|{766, [Person], Ed Harris, 1950}        |
|{755, [Person], Gary Sinise, 1955}      |
|{767, [Person], Bill Paxton, 1955}      |
|{646, [Person], James Marshall, 1967}   |
|{644, [Person], Kevin Pollak, 1957}     |
|{645, [Person], J.T. Walsh, 1943}       |
|{649, [Person], Aaron Sorkin, 1961}     |
|{643, [Person], Cuba Gooding Jr., 1968} |
|{647, [Person], Christopher Guest, 1948}|
|{648, [Person], Rob Reiner, 1947}       |
|{642, [Person], Noah Wyle, 1971}        |
+----------

## Criando novo ator via Spark

In [106]:
import pandas as pd
data = spark.createDataFrame(pd.DataFrame([["Lucas Ponce",1991]], columns=[ "name","born"]))
data.show()

  for column, series in pdf.iteritems():
  for column, series in pdf.iteritems():


+-----------+----+
|       name|born|
+-----------+----+
|Lucas Ponce|1991|
+-----------+----+



In [107]:
(
    data.write.format("org.neo4j.spark.DataSource")
    .mode("Overwrite")
    .option("labels", "Person")
    .option("node.keys", "name,born")
    .save()
)

ds = (
    spark.read.format("org.neo4j.spark.DataSource")
    .option("labels", "Person")
    .load()
)

25/09/05 11:27:15 WARN SchemaService: Switching to query schema resolution


In [108]:
ds.filter("name == 'Lucas Ponce'").show()

+----+--------+-----------+----+
|<id>|<labels>|       name|born|
+----+--------+-----------+----+
| 792|[Person]|Lucas Ponce|1991|
+----+--------+-----------+----+



In [109]:
ds.filter("name == 'Lucas Ponce'").explain()

== Physical Plan ==
*(1) Project [<id>#594L, <labels>#595, name#596, born#597L]
+- *(1) Filter (isnotnull(name#596) AND (name#596 = Lucas Ponce))
   +- BatchScan[<id>#594L, <labels>#595, name#596, born#597L] class org.neo4j.spark.reader.Neo4jScan RuntimeFilters: []




## Adicionando filmes ao ator:

In [110]:
# DataFrame com múltiplos relacionamentos
relationships_data = pd.DataFrame([
    ("Lucas Ponce", "The Matrix", "ACTED_IN"),
    ("Lucas Ponce",  "The Matrix Reloaded", "ACTED_IN")
], columns=["name", "title", "relationship_type"])

df_multiple = spark.createDataFrame(relationships_data)

df_multiple.show()

  for column, series in pdf.iteritems():
  for column, series in pdf.iteritems():


+-----------+-------------------+-----------------+
|       name|              title|relationship_type|
+-----------+-------------------+-----------------+
|Lucas Ponce|         The Matrix|         ACTED_IN|
|Lucas Ponce|The Matrix Reloaded|         ACTED_IN|
+-----------+-------------------+-----------------+



In [111]:
df_multiple.write \
    .format("org.neo4j.spark.DataSource") \
    .mode("append") \
    .option("relationship", "ACTED_IN") \
    .option("relationship.save.strategy", "keys") \
    .option("relationship.source.labels", ":Person") \
    .option("relationship.source.save.mode", "Match") \
    .option("relationship.source.node.keys", "name:name") \
    .option("relationship.target.labels", ":Movie") \
    .option("relationship.target.save.mode", "Match") \
    .option("relationship.target.node.keys", "title:title") \
    .save()

In [113]:
query = """
MATCH (lucas:Person {name: "Lucas Ponce"})-[rel:ACTED_IN]->(movies:Movie)
RETURN lucas.name AS ator, 
       movies.title AS filme, 
       movies.released AS ano_lancamento,
       rel.roles AS papeis
ORDER BY movies.released
"""

df = spark.read.format("org.neo4j.spark.DataSource")\
  .option("query", query)\
  .load()

df.show(truncate=False)

+-----------+-------------------+--------------+------+
|ator       |filme              |ano_lancamento|papeis|
+-----------+-------------------+--------------+------+
|Lucas Ponce|The Matrix         |1999          |null  |
|Lucas Ponce|The Matrix Reloaded|2003          |null  |
+-----------+-------------------+--------------+------+



In [115]:
spark.stop()

# Python

In [36]:
! pip install neo4j

Defaulting to user installation because normal site-packages is not writeable
Collecting neo4j
  Downloading neo4j-5.28.2-py3-none-any.whl.metadata (5.9 kB)
Downloading neo4j-5.28.2-py3-none-any.whl (313 kB)
Installing collected packages: neo4j
Successfully installed neo4j-5.28.2

[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m24.2[0m[39;49m -> [0m[32;49m25.2[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpython3 -m pip install --upgrade pip[0m


In [114]:
from neo4j import GraphDatabase


driver = GraphDatabase.driver("bolt://localhost:7687", 
                             #auth=("neo4j", "password")
                             )

with driver.session() as session:
        
    # Contar total de nós
    result_nos = session.run("MATCH () RETURN count(*) as total_nos")
    total_nos = result_nos.single()["total_nos"]

    # Contar tipos de labels únicos
    result_labels = session.run("""
        MATCH (n)
        UNWIND labels(n) as label
        RETURN count(DISTINCT label) as total_labels
    """)
    total_labels = result_labels.single()["total_labels"]

    print(f"Total de nós: {total_nos}")
    print(f"Total de tipos de labels: {total_labels}")
    
    # Remover todos os nós
    session.run("MATCH (n) DETACH DELETE n")
    
    new_total_nos = session.run("MATCH () RETURN count(*) as total_nos").single()["total_nos"]
    print(f"Total de nós (atualizado): {new_total_nos}")

driver.close()

Total de nós: 172
Total de tipos de labels: 2
Total de nós (atualizado): 0


Outros exemplos:  https://github.com/neo4j/neo4j-spark-connector/blob/5.0/examples/neo4j_data_engineering.ipynb