Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[loader] Integrate generic SQL loader #729

Closed
amotl opened this issue Oct 14, 2020 · 11 comments
Closed

[loader] Integrate generic SQL loader #729

amotl opened this issue Oct 14, 2020 · 11 comments
Labels

Comments

@amotl
Copy link
Contributor

amotl commented Oct 14, 2020

Dear @saulpw and @anjakefala,

SQLAlchemy does not only provide the best-in-class ORM for Python, its underlying engine is also a gateway to many different databases on the SQL level. While the list of databases supported by SQLAlchemy enumerates all of the builtin backends, there are more database drivers out there which plug in to that architecture, for example CrateDB.

@aswan89 probably has been aware of that, so I was happy to find his genericSQL plugin for VisiData [1].

So, because I believe SQL support should be made a first citizen for VisiData, I would like to propose to integrate this plugin into VisiData core. It will probably also make ongoing maintenance easier.

Now that VisiData 2.x changed its license to GPL3, I am hereby humbly asking @aswan89 whether you would agree to follow this adjustment.

At the same time, I would like to propose to fold the threefold entrypoints openurl_oracle, openurl_mysql and openurl_mssql into a single entrypoint open_sql, if that would even be possible in some way [2]. This would definitively be cool so VisiData will not have to know about all possible databases supported by SQLAlchemy.

With kind regards,
Andreas.

P.S.: These days, as you might know, there are not only SQL databases around. Supporting loading data from other databases like MongoDB or EdgeDB in the long run would also be awesome.

[1] However, I have to admit it was difficult to find without being aware of the plugin subsystem at first. Luckily, I have been able to find it within plugins.jsonl by searching for SQLAlchemy.
[2] Disclaimer: I don't know enough of the code base in detail yet. Currently, it looks like VisiData will invoke open_"scheme" automagically when it obtains an URI on the command line like vd mysql://abc:def@localhost:1234/abc. Thus, generalizing that interface might not be that easy, but maybe @saulpw and @anjakefala have some ideas how to move into that direction.

@saulpw
Copy link
Owner

saulpw commented Oct 15, 2020

Hi @amotl, thanks for your comments and suggestion! I've thought about using SQLAlchemy for the reasons you've indicated, but if I remember correctly, it seems you have to describe your schema using it, in order for it to work. This is contrary to the usage of VisiData, which should be able to connect to an arbitrary database and allow you to interact with it without knowing the schema in advance. If there's some way to do this using SQLAlchemy, that could be a very useful integration, I agree.

If you know how to use SQLAlchemy to do this, please attach a small working Python script that does it, and that should be a good starting place for us to see how it might integrate into VisiData.

@amotl
Copy link
Contributor Author

amotl commented Oct 15, 2020

Hi Saul,

If I remember correctly, it seems you have to describe your schema using it, in order for it to work.

For using the full ORM functionality, working with declarative schemas defined in Python is the way to go, right. However, the ORM layer can also reflect the current schema from the database.

[However, we want to] interact with [the database] without knowing the schema in advance. If there's some way to do this using SQLAlchemy, that could be a very useful integration, I agree.

It should work by just using the SQLAlchemy dialect engine under the hood, completely skipping the ORM layer, see https://docs.sqlalchemy.org/en/13/core/engines.html.

If you know how to use SQLAlchemy to do this, please attach a small working Python script that does it.

Sure. Does generic_sql.py currently need a schema to work? I haven't scanned it thoroughly.

With kind regards,
Andreas.

@amotl
Copy link
Contributor Author

amotl commented Oct 15, 2020

Basics

  • Connect to database
  • Invoke SQL expressions
from sqlalchemy import create_engine
engine = create_engine('mysql://john:doe@localhost/test')
with engine.connect() as connection:
    result = connection.execute("SELECT * FROM tablename")
    for row in result:
        pass

-- https://docs.sqlalchemy.org/en/13/core/connections.html

List of tables

  • Reflect list of tables
from sqlalchemy import create_engine
from sqlalchemy.engine import reflection
engine = create_engine('...')
inspector = reflection.Inspector.from_engine(engine)
table_names = inspector.get_table_names()

-- https://docs.sqlalchemy.org/en/13/core/reflection.html#fine-grained-reflection-with-inspector

List of columns

  • Reflect list of columns
from sqlalchemy import create_engine
from sqlalchemy.engine import reflection
engine = create_engine('...')
inspector = reflection.Inspector.from_engine(engine)
column_names = map(lambda x: x.name, inspector.get_columns("tablename"))

-- https://docs.sqlalchemy.org/en/13/core/reflection.html#sqlalchemy.engine.reflection.Inspector.get_columns

List of databases

This is not quite straight-forward, SQLAlchemy apparently doesn't provide that feature in a database-agnostic manner. So, we should probably only support full qualified connection strings including the database name when invoking VisiData.

However, if we want to go the extra mile and even present the user a list of databases, we will probably have to implement it specific to each database, see [3] for getting a rough idea.


Everything should work out of the box for all databases listed at [1]. Obviously, the specific database drivers will still have to be installed. Expanding from here, it will also be possible to talk to many additional databases, see [2].

[1] https://docs.sqlalchemy.org/en/13/core/engines.html#supported-databases
[2] https://docs.sqlalchemy.org/en/13/dialects/
[3] https://stackoverflow.com/questions/22689895/list-of-databases-in-sqlalchemy

@amotl
Copy link
Contributor Author

amotl commented Oct 15, 2020

Obviously, the specific database drivers will still have to be installed.

Just to get you an idea. Please adjust to your operating system.

# Install "mysql_config" (for MySQL).
brew install mysql-client

# Make "mysql_config" available on path.
export PATH=$PATH:/usr/local/Cellar/mysql-client/8.0.21/bin

# Install a bunch of database drivers.
pip install mysqlclient psycopg2-binary cx-Oracle crate[sqlalchemy]
pip install sqlalchemy-access sqlalchemy-pytds sqlalchemy-firebird sqlalchemy-sybase https://github.com/aadel/sqlalchemy-solr/archive/master.zip

Alternatively, install the pure-Python MySQL driver.

pip install PyMySQL

Alternatively, use the MariaDB client.

brew install mariadb-connector-c
pip install mariadb

@aswan89
Copy link
Contributor

aswan89 commented Oct 15, 2020

The generic_sql plugin does not require foreknowledge of a target database schema to function.
SQLAlchemy can use a DB URL to package a SQL dialect, connection credentials, and target database address in a single string. Right now my plugin exposes 3 openurl handler functions (Oracle, MySQL, and MSSQL) since they seemed to be most popular systems that weren't covered by other handlers already. The actual processing functions are all generic enough that any SQLAlchemy compatible DB URL could be exposed with minimal effort.

The plugin currently summarizes all available schemas within the database for presentation and subsequent selection by the user. My assumption is that this should play nicely with anything that is compatible with SQL Alchemy but I haven't tested on anything but Oracle.

I have no problem with integrating my code into visidata proper and would definitely appreciate not needing to consider possible friction between the plugin system and having the code held natively within Visidata.

@amotl
Copy link
Contributor Author

amotl commented Oct 15, 2020

Oh, and because I just found @Mytherin has also starred this repository (maybe also @hannesmuehleisen?), we might think about also supporting DuckDB through duckdb_engine by @Mause, see also duckdb/duckdb#305.

@Mause
Copy link

Mause commented Oct 15, 2020

Please be aware that it is very very very basic 😅 the python package for duckdb doesn't really expose enough information in results to be able to build a decent implementation

@saulpw
Copy link
Owner

saulpw commented Oct 15, 2020

Thanks for chiming in, @aswan89, and for your work on the genericSQL plugin. I forgot that you used SQLAlchemy (of course!)

And I'm sorry @amotl, I missed the main point of your original issue and responded as though it were just another loader or feature request.

So, because I believe SQL support should be made a first citizen for VisiData, I would like to propose to integrate this plugin into VisiData core. It will probably also make ongoing maintenance easier.

The real request here, is for me to take on the maintenance of the SQL plugin. I don't have ready access to test databases and it's a pain to set them up, so it would make ongoing maintenance more work for us. I appreciate that @aswan89 is willing to do this work (and already has at least one test environment that he's using with VisiData on a regular basis). The whole point of the plugin system is to distribute the maintenance effort to people who have a vested interest in the feature working. If genericSQL were incorporated into core VisiData, it would just rot anyway (see how the Postgresql loader is broken right now, even :)

That said, I have been thinking about making a more fully featured SQL explorer/editor based on VisiData, but it would be "sold separately", as it's a different internal architecture to work seamlessly with offline data. You and anyone who would be interested in this, put some 👀 on this comment and I'll ping you if it's what I decide to work on.

@saulpw
Copy link
Owner

saulpw commented Nov 30, 2020

Kondo'ed for now.

@saulpw saulpw closed this as completed Nov 30, 2020
@dufferzafar
Copy link
Contributor

Hi @saulpw, just wanted to know if there's been any updates on Visdata based SQL db explorers?

@saulpw
Copy link
Owner

saulpw commented May 18, 2022

Hi @dufferzafar, good question. There's a new plugin for Ibis in vdplus. Only a few commands are implemented and it only works with sqlite currently, but in a month or two it should support all the backends that Ibis supports (read-only for the time being).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

6 participants