# Memgraph


## Deploy:

`docker compose up`


## Observações:

* Tem integração com o Spark apenas para escrever dados no Memgraph https://memgraph.com/docs/data-migration/migrate-with-apache-spark
 * Por usar o conector no neo4j, tem que ser Spark 3.X
 * Dá para "gambiarrar" tentando usar o conector do neo4j, mas é bem limitado e dá erros internos https://stackoverflow.com/questions/75244143/is-there-something-in-memgraph-that-has-the-same-function-as-apoc-in-neo4j  (exemplo abaixo)
 
    
    
## Python 

In [1]:
! pip install pymgclient neo4j  # neo4j driver é compatível

Defaulting to user installation because normal site-packages is not writeable
Collecting pymgclient
  Downloading pymgclient-1.5.1.tar.gz (129 kB)
  Preparing metadata (setup.py) ... [?25ldone
Collecting pyopenssl (from pymgclient)
  Downloading pyopenssl-25.1.0-py3-none-any.whl.metadata (17 kB)
Downloading pyopenssl-25.1.0-py3-none-any.whl (56 kB)
Building wheels for collected packages: pymgclient
  Building wheel for pymgclient (setup.py) ... [?25ldone
[?25h  Created wheel for pymgclient: filename=pymgclient-1.5.1-cp310-cp310-linux_x86_64.whl size=2270835 sha256=0848d7c405683be2b848a654852391fbf2dcb3109cde7c7978f393a36a1de95d
  Stored in directory: /home/lucasmsp/.cache/pip/wheels/3e/cb/ee/1e117a9b25d585950a95260942b90ff24b9a2b560c462db23b
Successfully built pymgclient
Installing collected packages: pyopenssl, pymgclient
Successfully installed pymgclient-1.5.1 pyopenssl-25.1.0

[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m24.2[

In [2]:
from neo4j import GraphDatabase
import logging

class MemgraphClient:
    def __init__(self, uri="bolt://localhost:7687", user="", password=""):
        self.driver = GraphDatabase.driver(uri, auth=(user, password))
        
    def close(self):
        self.driver.close()
        
    def execute_query(self, query, parameters=None):
        with self.driver.session() as session:
            result = session.run(query, parameters)
            return [record for record in result]
    
    def create_graph_example(self):
        """Criar um grafo de exemplo"""
        
        # Limpar dados existentes
        self.execute_query("MATCH (n) DETACH DELETE n")
        
        # Criar nós
        queries = [
            "CREATE (a:Person {name: 'Alice', age: 30})",
            "CREATE (b:Person {name: 'Bob', age: 25})",
            "CREATE (c:Person {name: 'Charlie', age: 35})",
            "CREATE (d:City {name: 'São Paulo', country: 'Brazil'})",
            "CREATE (e:City {name: 'Rio de Janeiro', country: 'Brazil'})"
        ]
        
        for query in queries:
            self.execute_query(query)
        
        # Criar relacionamentos
        relationships = [
            "MATCH (a:Person {name: 'Alice'}), (d:City {name: 'São Paulo'}) CREATE (a)-[:LIVES_IN]->(d)",
            "MATCH (b:Person {name: 'Bob'}), (e:City {name: 'Rio de Janeiro'}) CREATE (b)-[:LIVES_IN]->(e)",
            "MATCH (a:Person {name: 'Alice'}), (b:Person {name: 'Bob'}) CREATE (a)-[:KNOWS {since: 2020}]->(b)",
            "MATCH (b:Person {name: 'Bob'}), (c:Person {name: 'Charlie'}) CREATE (b)-[:KNOWS {since: 2021}]->(c)"
        ]
        
        for query in relationships:
            self.execute_query(query)
            
        print("Grafo de exemplo criado!")
    
    def query_examples(self):
        """Exemplos de consultas"""
        
        



In [3]:
client = MemgraphClient()
        
# Verificar se está conectado
result = client.execute_query("RETURN 'Hello Memgraph!' as message")
print(f"Conectado! {result[0]['message']}")


✅ Conectado! Hello Memgraph!


In [None]:
client.create_graph_example()

In [22]:
print("\n=== Consultas de Exemplo ===")
        
# 1. Listar todas as pessoas
print("\n1. Todas as pessoas:")
result = client.execute_query("MATCH (p:Person) RETURN p.name, p.age")
for record in result:
    print(f"  - {record['p.name']}, {record['p.age']} anos")


=== Consultas de Exemplo ===

1. Todas as pessoas:
  - Alice, 30 anos
  - Bob, 25 anos
  - Charlie, 35 anos
  - Lucas Ponce, 34 anos


In [8]:
# 2. Pessoas e suas cidades
print("\n2. Pessoas e onde moram:")
result = client.execute_query("""
    MATCH (p:Person)-[:LIVES_IN]->(c:City) 
    RETURN p.name, c.name
""")
for record in result:
    print(f"  - {record['p.name']} mora em {record['c.name']}")


2. Pessoas e onde moram:
  - Alice mora em São Paulo
  - Bob mora em Rio de Janeiro


In [9]:
# 3. Rede de amizades
print("\n3. Rede de amizades:")
result = client.execute_query("""
    MATCH (p1:Person)-[r:KNOWS]->(p2:Person) 
    RETURN p1.name, p2.name, r.since
""")
for record in result:
    print(f"  - {record['p1.name']} conhece {record['p2.name']} desde {record['r.since']}")


3. Rede de amizades:
  - Alice conhece Bob desde 2020
  - Bob conhece Charlie desde 2021


In [10]:
# 4. Estatísticas
print("\n4. Estatísticas:")
result = client.execute_query("MATCH (n) RETURN labels(n)[0] as tipo, count(n) as quantidade")
for record in result:
    print(f"  - {record['tipo']}: {record['quantidade']}")


4. Estatísticas:
  - City: 2
  - Person: 3


## Pyspark

In [1]:
from pyspark.sql import SparkSession
from pyspark.sql.types import StructType, StructField, StringType, IntegerType

spark = (
    SparkSession.builder.config("neo4j.url", "bolt://localhost:7687")
    .config("neo4j.authentication.basic.username", "")
    .config("neo4j.authentication.basic.password", "")
    .config(
        "spark.jars.packages",
        "org.neo4j:neo4j-connector-apache-spark_2.12:5.1.0_for_spark_3",
    )
    .config("neo4j.database", "memgraph")
    .getOrCreate()
)



25/09/05 18:31:57 WARN Utils: Your hostname, lucasmsp-Inspiron-7580 resolves to a loopback address: 127.0.1.1; using 192.168.15.13 instead (on interface wlp3s0)
25/09/05 18:31:57 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to another address
:: loading settings :: url = jar:file:/opt/spark-3.3.0/jars/ivy-2.5.0.jar!/org/apache/ivy/core/settings/ivysettings.xml


Ivy Default Cache set to: /home/lucasmsp/.ivy2/cache
The jars for the packages stored in: /home/lucasmsp/.ivy2/jars
org.neo4j#neo4j-connector-apache-spark_2.12 added as a dependency
:: resolving dependencies :: org.apache.spark#spark-submit-parent-be37abdf-f62a-4810-af07-7984e237f774;1.0
	confs: [default]
	found org.neo4j#neo4j-connector-apache-spark_2.12;5.1.0_for_spark_3 in central
	found org.neo4j#neo4j-connector-apache-spark_2.12_common;5.1.0 in central
	found org.neo4j.driver#neo4j-java-driver;4.4.12 in central
	found org.reactivestreams#reactive-streams;1.0.4 in local-m2-cache
	found org.apache.xbean#xbean-asm6-shaded;4.10 in central
	found org.neo4j#neo4j-cypher-dsl;2022.9.0 in central
	found org.apiguardian#apiguardian-api;1.1.2 in central
:: resolution report :: resolve 267ms :: artifacts dl 10ms
	:: modules in use:
	org.apache.xbean#xbean-asm6-shaded;4.10 from central in [default]
	org.apiguardian#apiguardian-api;1.1.2 from central in [default]
	org.neo4j#neo4j-connector-apac

25/09/05 18:31:58 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable


Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).


25/09/05 18:31:59 WARN Utils: Service 'SparkUI' could not bind on port 4040. Attempting port 4041.


In [12]:
# Schema para pessoas
pessoas_schema = StructType([
    StructField("name", StringType(), True),
    StructField("age", IntegerType(), True)
])

# Dados das pessoas
pessoas_data = [
    ("Lucas Ponce", 34),
]


# Criar DataFrame
df_pessoas = spark.createDataFrame(pessoas_data, pessoas_schema)

# Escrever nós no Neo4j
df_pessoas.write \
    .format("org.neo4j.spark.DataSource") \
    .mode("append") \
    .option("labels", ":Person") \
    .option("node.keys", "name") \
    .option("batch.size", 1000)\
    .option("numPartitions", 8)\
    .save()

                                                                                

In [15]:
# Dados das empresas
empresas_data = [
    ("Lucas Ponce", "Alice", 2021)
]

empresas_schema = StructType([
    StructField("name1", StringType(), True),
    StructField("name2", StringType(), True),
    StructField("since", StringType(), True)
])


df_empresas = spark.createDataFrame(empresas_data, empresas_schema)

df_empresas.write \
    .format("org.neo4j.spark.DataSource") \
    .mode("append") \
    .option("relationship", "KNOWS") \
    .option("relationship.save.strategy", "keys") \
    .option("relationship.source.labels", ":Person") \
    .option("relationship.source.save.mode", "Match") \
    .option("relationship.source.node.keys", "name1:name") \
    .option("relationship.target.labels", ":Person") \
    .option("relationship.target.save.mode", "Match") \
    .option("relationship.target.node.keys", "name2:name") \
    .option("relationship.properties", "since") \
    .save()


In [3]:
spark.read \
    .format("org.neo4j.spark.DataSource") \
    .option("neo4j.database", "memgraph")\
    .option("query", "MATCH (p1:Person)-[r:KNOWS]->(p2:Person)  RETURN p1.name, p2.name, r.since") \
    .load().show()

IllegalArgumentException: Please provide a valid READ query

In [51]:
spark.stop()

In [46]:
import pandas as pd

driver = GraphDatabase.driver("bolt://localhost:7687", auth=("", ""))
with driver.session() as session:
    result = session.run("MATCH (p1:Person)-[r:KNOWS]->(p2:Person)  RETURN p1.name, p2.name, r.since")
    data = [record.data() for record in result]
driver.close()
    
pd.DataFrame(data)


Unnamed: 0,p1.name,p2.name,r.since
0,Alice,Bob,2020
1,Bob,Charlie,2021
2,Lucas Ponce,Alice,2021


In [2]:
spark.read\
    .format("org.neo4j.spark.DataSource")\
    .option("labels", ":Person")\
    .load()\
    .show()

25/09/05 18:32:08 WARN SchemaService: Switching to query schema resolution
25/09/05 18:32:08 WARN SchemaService: For the following exception
org.neo4j.driver.exceptions.ClientException: There is no procedure named 'apoc.meta.nodeTypeProperties'.
	at org.neo4j.driver.internal.util.Futures.blockingGet(Futures.java:111)
	at org.neo4j.driver.internal.InternalSession.run(InternalSession.java:62)
	at org.neo4j.driver.internal.InternalSession.run(InternalSession.java:47)
	at org.neo4j.driver.internal.AbstractQueryRunner.run(AbstractQueryRunner.java:34)
	at org.neo4j.driver.internal.AbstractQueryRunner.run(AbstractQueryRunner.java:39)
	at org.neo4j.spark.service.SchemaService.retrieveSchemaFromApoc(SchemaService.scala:69)
	at org.neo4j.spark.service.SchemaService.liftedTree1$1(SchemaService.scala:47)
	at org.neo4j.spark.service.SchemaService.structForNode(SchemaService.scala:36)
	at org.neo4j.spark.service.SchemaService.struct(SchemaService.scala:332)
	at org.neo4j.spark.DataSource.$anonfun$in

                                                                                

+----+--------+-----------+---+
|<id>|<labels>|       name|age|
+----+--------+-----------+---+
|2677|[Person]|      Alice| 30|
|2678|[Person]|        Bob| 25|
|2679|[Person]|    Charlie| 35|
|2682|[Person]|Lucas Ponce| 34|
+----+--------+-----------+---+



In [50]:
spark.read\
    .format("org.neo4j.spark.DataSource")\
    .option("relationship", "KNOWS")\
    .option("relationship.source.labels", ":Person")\
    .option("relationship.target.labels", ":Person")\
    .load()\
    .show()

25/09/05 18:28:10 WARN SchemaService: Switching to query schema resolution
25/09/05 18:28:10 WARN SchemaService: For the following exception
org.neo4j.driver.exceptions.ClientException: There is no procedure named 'apoc.meta.nodeTypeProperties'.
	at org.neo4j.driver.internal.util.Futures.blockingGet(Futures.java:111)
	at org.neo4j.driver.internal.InternalSession.run(InternalSession.java:62)
	at org.neo4j.driver.internal.InternalSession.run(InternalSession.java:47)
	at org.neo4j.driver.internal.AbstractQueryRunner.run(AbstractQueryRunner.java:34)
	at org.neo4j.driver.internal.AbstractQueryRunner.run(AbstractQueryRunner.java:39)
	at org.neo4j.spark.service.SchemaService.retrieveSchemaFromApoc(SchemaService.scala:69)
	at org.neo4j.spark.service.SchemaService.liftedTree1$1(SchemaService.scala:47)
	at org.neo4j.spark.service.SchemaService.structForNode(SchemaService.scala:36)
	at org.neo4j.spark.service.SchemaService.structForRelationship(SchemaService.scala:155)
	at org.neo4j.spark.service

In [6]:
spark.read\
    .format("org.neo4j.spark.DataSource")\
    .option("relationship", "KNOWS")\
    .option("relationship.source.labels", ":Person")\
    .option("relationship.target.labels", ":Person")\
    .load().toPandas()

25/09/05 18:39:25 WARN SchemaService: Switching to query schema resolution
25/09/05 18:39:25 WARN SchemaService: For the following exception
org.neo4j.driver.exceptions.ClientException: There is no procedure named 'apoc.meta.nodeTypeProperties'.
	at org.neo4j.driver.internal.util.Futures.blockingGet(Futures.java:111)
	at org.neo4j.driver.internal.InternalSession.run(InternalSession.java:62)
	at org.neo4j.driver.internal.InternalSession.run(InternalSession.java:47)
	at org.neo4j.driver.internal.AbstractQueryRunner.run(AbstractQueryRunner.java:34)
	at org.neo4j.driver.internal.AbstractQueryRunner.run(AbstractQueryRunner.java:39)
	at org.neo4j.spark.service.SchemaService.retrieveSchemaFromApoc(SchemaService.scala:69)
	at org.neo4j.spark.service.SchemaService.liftedTree1$1(SchemaService.scala:47)
	at org.neo4j.spark.service.SchemaService.structForNode(SchemaService.scala:36)
	at org.neo4j.spark.service.SchemaService.structForRelationship(SchemaService.scala:155)
	at org.neo4j.spark.service

Unnamed: 0,<rel.id>,<rel.type>,<source.id>,<source.labels>,source.name,source.age,<target.id>,<target.labels>,target.name,target.age,rel.since
0,11969,KNOWS,2677,[Person],Alice,30,2678,[Person],Bob,25,2020
1,11970,KNOWS,2678,[Person],Bob,25,2679,[Person],Charlie,35,2021
2,11971,KNOWS,2682,[Person],Lucas Ponce,34,2677,[Person],Alice,30,2021
