Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add memgraph tutorial #507

Merged
merged 6 commits into from
Sep 16, 2023
Merged

Conversation

karmenrabar
Copy link
Contributor

I've created a demo Jupyter notebook that demonstrates how to generate PyGraphistry visualizations using the Python driver for Neo4j, while working with data in Memgraph. Additionally, I've included a README file to illustrate the process of connecting to Memgraph.

@lmeyerov lmeyerov self-requested a review September 14, 2023 08:09
Copy link
Contributor

@lmeyerov lmeyerov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Super cool!

  • Can you scrub & rotate the user/pass, and maybe switch to api tokens?
  • The github preview wasn't showing the screenshots some reason here, maybe check?
  • This file is 0.5MB b/c the screenshots, maybe there is a way to host the images outside of the code repo? Not sure of a good pattern there

@karmenrabar
Copy link
Contributor Author

Thank you, @lmeyerov, for reviewing the PR and providing constructive comments! I've made the changes; hopefully, it's better now. Please let me know 😊 I've hosted the images in my public repo and outside the code one

@lmeyerov
Copy link
Contributor

@karmenrabar I dug into the text and am enjoying this tutorial, this should be quite helpful for folks!

  • I made a pass smoothing some prose + clarifying text on Graphistry side (for new users). Feel free to tweak if you think further helpful.

  • For the schema viz step, can you switch it to graphistry.cypher("CALL db.schema()").plot() ? If that doesn't work in memgraph, no worries, we can land as is, let me know

@karmenrabar
Copy link
Contributor Author

Amazing, thank you for additional text, I appreciate it @lmeyerov ! I'm super glad it could be helpful.

Unfortunately, it appears that graphistry.cypher("CALL db.schema()").plot() is not compatible with Memgraph, since Memgraph has a different method for retrieving schema information. Therefore it would be cool if we land it as it is. But it is a good idea to explore the alternatives for mentioned query to use with Graphistry !

@lmeyerov lmeyerov merged commit db31c0a into graphistry:master Sep 16, 2023
6 checks passed
@lmeyerov
Copy link
Contributor

Thanks @karmenrabar , merged!

RE: CALL db.schema(), I'm curious if there's another Bolt-standardized cypher command here, or maybe Memgraph has a proprietary cypher extension that can be used instead?

@karmenrabar
Copy link
Contributor Author

@lmeyerov amazing, thank you !

Similarly to CALL db.schema(), Memgraph does provide a meta_util.schema procedure that can be used to get the graph schema in Memgraph Lab. More about it can be found here. If include_properties is set to true, the graph schema will contain additional information about properties.
For example:

CALL meta_util.schema(true) 
YIELD nodes, relationships 
RETURN nodes, relationships;

You can also generate graph schema using Memgraph Lab, which provides a visual user interface for managing and interacting with your graph data (source).

@lmeyerov
Copy link
Contributor

lmeyerov commented Sep 17, 2023

Awesome - I think it'd help to update the tutorial to that, or a sample

Thinking through making this useful for our community, can the data creation step switch to a pandas -> apache arrow upload, and for the fetch step, to apache arrow download? A lot of our users like to work with hundreds of thousands or millions of events & entities, and assuming speed on memgraph side, we find this to keep interactions subsecond

@karmenrabar
Copy link
Contributor Author

Memgraph indeed offers data loading capabilities using PyArrow and it is done by using GQL Alchemy.. However, a different driver is used and, to achieve the fastest performance when executing queries, it's best to use it with pre-defined indexes. Also, the data format suitable for PyArrow differs from the one that is used here. But, it's a good idea to explore for a next project !

@lmeyerov
Copy link
Contributor

Oh super interesting, thanks!

Just to make sure I understand GQL Alchemy right:

  • Will the data transferred over the network to memgraph/neo4j be transferred in arrow format, or is it a client-side ORM that will translate arrow to local objects and then construct regular bolt-protocol messages?

  • Any sense of expected speedups and why?

When I was looking at the repo, I think it still transmits over bolt, but maybe instead of doing a clientside ORM, it uses a serverside bulk CSV load, which may help? I couldn't tell however..

@karmenrabar
Copy link
Contributor Author

It is a client-side ORM (OGM) that translates tables to graph with a proper configuration. It does that with GQLAlchemy query builder that builds Cypher query which is being run over Bolt. So, you are right, I wouldn’t expect any speedups since it’s not using LOAD CSV clause with preset indexes (which is the best way of import). But, it should still be as fast as running simple Cypher queries like I did.

:)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants