diff --git a/demos/demos_databases_apis/sql/postgres.ipynb b/demos/demos_databases_apis/sql/postgres.ipynb new file mode 100644 index 0000000000..b529b58b54 --- /dev/null +++ b/demos/demos_databases_apis/sql/postgres.ipynb @@ -0,0 +1,390 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# SQL Example\n", + "\n", + "**Get & plot data from the first table in a SQL DB**\n", + "\n", + "* Uses SQL ODBC drivers prepacked with Graphistry\n", + "* Default: visualizes the schema migration table used within Graphistry\n", + "* Shows several viz modes + a convenience function for sql->interactive viz\n", + "* Try: Modify the indicated lines to change to visualize any other table\n", + "\n", + "Further docs\n", + " - [UI Guide](https://labs.graphistry.com/graphistry/ui.html)\n", + " - [More demos: database connectors, ...](/notebook/tree/demos/demos_databases_apis)\n", + " - [CSV upload notebook app](/notebook/tree/demos/upload_csv_miniapp.ipynb)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Setup\n", + "\n", + "#### Graphistry" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "import graphistry\n", + "\n", + "#pip install graphistry -q\n", + "#graphistry.register(protocol='http', server='my.server.com', key='MY_API_KEY')" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### SQL connection string\n", + "* Modify with your own db connection strig\n", + "* For heavier use and sharing, see sample for hiding creds from the notebook while still reusing them across sessions" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "user = \"graphistry\"\n", + "pwd = \"password\"\n", + "server = \"postgres:5432\"\n", + "\n", + "##OPTIONAL: Mount in installation's ${PWD}/.notebooks/db_secrets.json and read in\n", + "#import json\n", + "#with open('/home/graphistry/notebooks/db_secrets.json') as json_file:\n", + "# cfg = json.load(json_file)\n", + "# user = cfg['user']\n", + "# pwd = cfg['pwd']\n", + "# server = cfg['server']\n", + "#\n", + "## .. The first time you run this notebook, save the secret cfg to the system's persistent notebook folder:\n", + "#import json\n", + "#with open('/home/graphistry/notebooks/db_secrets.json', 'w') as outfile:\n", + "# json.dump({\n", + "# \"user\": \"graphistry\",\n", + "# \"pwd\": \"password\",\n", + "# \"server\": \"postgres:5432\"\n", + "# }, outfile)\n", + "## Delete ^^^ after use\n", + "\n", + "db_string = \"postgres://\" + user + \":\" + pwd + \"@\" + server\n", + "### Take care not to save a print of the result" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# OPTIONAL: Install ODBC drivers in other environments:\n", + "# ! apt-get update\n", + "# ! apt-get install -y g++ unixodbc unixodbc-dev\n", + "# ! conda install -c anaconda pyodbc=4.0.26 sqlalchemy=1.3.5" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Connect to DB" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "import pandas as pd\n", + "import sqlalchemy as db\n", + "from sqlalchemy import create_engine, MetaData, Table, Column, Integer, Float, String\n", + "from sqlalchemy.orm import sessionmaker" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "engine = create_engine(db_string)\n", + "Session = sessionmaker(bind=engine)\n", + "session = Session()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Inspect available tables" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "table_names = engine.table_names()\n", + "\", \".join(table_names)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Optional: Modify to pick your own table!" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "if 'django_migrations' in table_names:\n", + " table = 'django_migrations'\n", + "else:\n", + " table = table_names[0]" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Initialize viz: Get data" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "result = engine.execute(\"SELECT * FROM \\\"\" + table + \"\\\" LIMIT 1000\")\n", + "df = pd.DataFrame(result.fetchall(), columns=result.keys())\n", + "print(\"table\", table, '# rows', len(df))\n", + "df.sample(min(3, len(df)))" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Plot\n", + "\n", + "Several variants:\n", + "1. Treat each row & cell value as a node, and connect row<>cell values\n", + "2. Treat each cell value as a node, and connect all cell values together when they occur on the same row\n", + "3. Treat each cell value as an edge, and specify which columns to to connect values together on\n", + "4. Use explict node/edge tables\n", + "\n", + "#### 1. Treat each row & cell value as a node, and connect row<>cell values" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "graphistry.hypergraph(df)['graph'].plot()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### 2. Treat each cell value as a node, and connect all cell values together when they occur on the same row" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "graphistry.hypergraph(df, direct=True)['graph'].plot()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### 3. Treat each cell value as an edge, and specify which columns to to connect values together on" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "graphistry.hypergraph(df, direct=True,\n", + " opts={\n", + " 'EDGES': {\n", + " 'id': ['name'],\n", + " 'applied': ['name'],\n", + " 'name': ['app']\n", + " }\n", + " })['graph'].plot()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### 4. Use explict node/edge tables" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "g = graphistry.bind(source='name', destination='app').edges(df.assign(name=df['name'].apply(lambda x: 'id_' + x)))\n", + "g.plot()" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Add node bindings..\n", + "nodes_df = pd.concat([\n", + " df[['name', 'id', 'applied']],\n", + " df[['app']].drop_duplicates().assign(\\\n", + " id = df[['app']].drop_duplicates()['app'], \\\n", + " name = df[['app']].drop_duplicates()['app'])\n", + "], ignore_index=True, sort=False)\n", + "\n", + "g = g.bind(node='id', point_title='name').nodes(nodes_df)\n", + "\n", + "g.plot()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Convenience function" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "def explore(sql, *args, **kvargs):\n", + " result = engine.execute(sql)\n", + " df = pd.DataFrame(result.fetchall(), columns=result.keys())\n", + " print('# rows', len(df))\n", + " g = graphistry.hypergraph(df, *args, **kvargs)['graph']\n", + " return g" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### Simple use" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "explore(\"SELECT * FROM django_migrations LIMIT 1000\").plot()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### Pass in graphistry.hypergraph() options" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "scrolled": false + }, + "outputs": [], + "source": [ + "explore(\"SELECT * FROM django_migrations LIMIT 1000\", direct=True).plot()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### Get data back" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "explore(\"SELECT * FROM django_migrations LIMIT 1000\")._nodes" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Further docs\n", + " - [UI Guide](https://labs.graphistry.com/graphistry/ui.html)\n", + " - [More demos: database connectors, ...](/notebook/tree/demos/demos_databases_apis)\n", + " - [CSV upload notebook app](/notebook/tree/demos/upload_csv_miniapp.ipynb)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.7.3" + } + }, + "nbformat": 4, + "nbformat_minor": 2 +} diff --git a/demos/demos_databases_apis/sql/postgres_alchemy_pypyodbc.ipynb b/demos/demos_databases_apis/sql/postgres_alchemy_pypyodbc.ipynb deleted file mode 100644 index 80133ec1df..0000000000 --- a/demos/demos_databases_apis/sql/postgres_alchemy_pypyodbc.ipynb +++ /dev/null @@ -1,414 +0,0 @@ -{ - "cells": [ - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "# SQL Example\n", - "\n", - "Get & plot data from a SQL DB.\n", - "\n", - "* For drivers, this notebook uses pure Python ODBC (`pypyodbc`) and connects to Postgres\n", - "* Creates a table in the DB if does not exist" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Creds\n", - "\n", - "* Create ODBC connection string\n", - "* We recommend changing `user`/`pwd` to ENV_VARS or files rather than visible hard-coded strings" - ] - }, - { - "cell_type": "code", - "execution_count": 56, - "metadata": {}, - "outputs": [], - "source": [ - "user = \"myadmin\"\n", - "pwd = \"mypassword\"\n", - "server = \"rds-test.zzz.us-west-2.rds.amazonaws.com:5432/postgres\"\n", - "#import os\n", - "#user = os.environ['MY_USER']\n", - "#pwd = os.environ['MY_PWD']\n", - "#server = os.environ['MY_SERVER']\n", - "\n", - "db_string = \"postgres://\" + user + \":\" + pwd + \"@\" + server" - ] - }, - { - "cell_type": "code", - "execution_count": 57, - "metadata": {}, - "outputs": [], - "source": [ - "import graphistry\n", - "#graphistry.register(key='MY_API_KEY', server='my.server.com')" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Optional: Install ODBC drivers for Postgres" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# Optional: Install a faster custom ODBC driver by logging in to jupyter docker as root:\n", - "\n", - "# $ ssh -i key.pem ubuntu@my_public_ip\n", - "# ubuntu@my_private_ip $ docker exec -it -u root graphistry_notebook_1 bash\n", - "# root@601a8af7ea4c $ apt-get update\n", - "# root@601a8af7ea4c $ apt-get install unixodbc\n", - "# root@601a8af7ea4c $ pip3 install pyodbc" - ] - }, - { - "cell_type": "code", - "execution_count": 1, - "metadata": {}, - "outputs": [], - "source": [ - "# If not already exists...\n", - "\n", - "!pip3 install wheel -q\n", - "!pip3 install pypyodbc -q\n", - "!pip3 install sqlalchemy -q" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Connect to DB" - ] - }, - { - "cell_type": "code", - "execution_count": 34, - "metadata": {}, - "outputs": [], - "source": [ - "import pandas as pd\n", - "import sqlalchemy as db\n", - "from sqlalchemy import create_engine, MetaData, Table, Column, Integer, Float, String\n", - "from sqlalchemy.orm import sessionmaker" - ] - }, - { - "cell_type": "code", - "execution_count": 19, - "metadata": {}, - "outputs": [], - "source": [ - "engine = create_engine(db_string)\n", - "Session = sessionmaker(bind=engine)\n", - "session = Session()" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Optional: Prepopulate table \"my_table\"" - ] - }, - { - "cell_type": "code", - "execution_count": 58, - "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "['my_table']" - ] - }, - "execution_count": 58, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "engine.table_names()" - ] - }, - { - "cell_type": "code", - "execution_count": 59, - "metadata": {}, - "outputs": [], - "source": [ - "if not engine.dialect.has_table(engine, 'my_table'): # If table don't exist, Create.\n", - " metadata = MetaData(engine)\n", - " mytable = Table('my_table', metadata,\n", - " Column('Id', Integer, primary_key=True, nullable=False), \n", - " Column('Country', String),\n", - " Column('Brand', String),\n", - " Column('Price', Float))\n", - " metadata.create_all()\n", - " engine.execute(\n", - " mytable.insert(),\n", - " [{\"Country\": \"c\" + str(x % 2), \"Brand\": \"b_\" + str(x), \"Price\": x * 2} for x in range(0, 10)]) " - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Get data" - ] - }, - { - "cell_type": "code", - "execution_count": 54, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "# rows 10\n" - ] - }, - { - "data": { - "text/html": [ - "
| \n", - " | Id | \n", - "Country | \n", - "Brand | \n", - "Price | \n", - "
|---|---|---|---|---|
| 6 | \n", - "7 | \n", - "c0 | \n", - "b_6 | \n", - "12.0 | \n", - "
| 5 | \n", - "6 | \n", - "c1 | \n", - "b_5 | \n", - "10.0 | \n", - "
| 7 | \n", - "8 | \n", - "c1 | \n", - "b_7 | \n", - "14.0 | \n", - "