New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add SQL database to graph conversion tool (Db2Graph) #99
Conversation
- delta change to avoid div by zero error - limit size max 1 billion as too laarge values lead to postgres error
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
First pass, will need more iterations.
Renamed the file to marius_db2graph to align with commands like marius_preprocess; Created two new functions for get_fetch_size to avoid duplicate code; Added the marius_db2graph command to setup.cfg (but haven't tested it because I'm not sure if pip installing it right now would work); Added 'my-sql' as an option to use mysql-connector because this wasn't added previously;
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will do another pass once the feature edges are removed and we make a determination on whether generate_uuid is necessary
Renamed edge_entity_entity_queries to edge_queries, edge_entity_entity_queries_list to edge_queries_list, and edge_entity_entity_rel_list to edge_rel_list
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The example needs some work.
There is a lot of setup that is not relevant to the tool and my attempt to run the example as written failed. Also, the fact that the dataset requires creating a Kaggle account and does not have a simple download link is a problem.
Main todos:
- Create a dockerfile which performs all setup (Postgres setup, database download and setup, marius install with pip) up to the creation of the conf/config.yaml and conf/edges_queries.txt files
- Find a new dataset (or a version of this dataset) which can be downloaded easily without any account requirements.
Once this is fixed I will do another attempt
Added dockerfile, run.sh, sakila.yaml Modified basicConfig in marius_db2graph.py to avoid Python version issue
…e to install from marius main
Introducing a new feature to Marius: Db2Graph, a SQL database to graph conversion tool. Db2Graph converts relational databases into graphs as sets of triples which can be used as input datasets for Marius, allowing streamlined preprocessing from database to Marius.
Db2Graph is contained in Marius but can be used as a standalone tool. Db2Graph currently supports graph conversion from three relational database management systems: MySQL, MariaDB, and PostgreSQL. Conversion with Db2Graph is achieved in the following steps:
This pull request adds the source file
src/python/tools/db2graph/db2graph.py
and a documentation pagedocs/db2graph/db2graph.rst
which describes the requirements, definitions, and steps for using Db2Graph, and a real example use case.Testing is provided using pytest and GitHub actions to validate the correctness of the db2graph functions.