This notebook shows how you can query Cloud SQL and have the result available in a DataFrame.

You can start the proxy from Bash (with [this script](./Cloud%20SQL%20Proxy%20%28Command%20Line%29.ipynb)), or from Python. (see below)

Run ps to see if cloud_sql_proxy is running.

In [1]:
%%bash
ps -f

UID        PID  PPID  C STIME TTY          TIME CMD
root      1060    67  0 23:05 ?        00:00:06 /usr/bin/python -m ipykernel -f /root/.local/share/jupyter/runtime/kernel-abdfb292-efe5-42db-aaf2-6215ccb8e6a6.json
root      1368    67  0 23:06 ?        00:00:07 /usr/bin/python -m ipykernel -f /root/.local/share/jupyter/runtime/kernel-4a53b4f4-dc40-4630-ab95-fcf90e4f8cec.json
root      1606    67  0 23:28 ?        00:00:05 /usr/bin/python -m ipykernel -f /root/.local/share/jupyter/runtime/kernel-1c1aa837-846f-4af2-bf6d-ce8897af5f22.json
root      1830    67  1 23:33 ?        00:00:06 /usr/bin/python -m ipykernel -f /root/.local/share/jupyter/runtime/kernel-fd7764c8-4b17-429c-9264-23b25803f864.json
root      1914    67  1 23:34 ?        00:00:06 /usr/bin/python -m ipykernel -f /root/.local/share/jupyter/runtime/kernel-67457076-7b92-4207-a24e-99a80d8acb96.json
root      2058    67  1 23:36 ?        00:00:06 /usr/bin/python -m ipykernel -f /root/.local/share/jupyter/runtime/kernel-aa2ad3

In [2]:
%%bash
pgrep cloud_sql_proxy

If the proxy is started with bash, it locks input on that notebook. So we run kill here to stop it, if needed.

In [85]:
%%bash
kill -9 880

## Python code to start a cloud sql proxy.

It seems there is an oauthclient go bug, so run the following command as a workaround.

In [3]:
!cp /content/datalab/.config/application_default_credentials.json ~/.config/gcloud/

In [4]:
import subprocess

In [11]:
class Connection(object):
  def __init__(self, instance_name, port, process):
    self._instance_name = instance_name
    self._port = port
    self._pid = process.pid
    self._process = process
    
  def __repr__(self):
    return "Name:{}, Port:{}, pid:{}, returncode:{}".format(
    self._instance_name, self._port, self._pid, self._process.returncode)
  
  def get_port(self):
    return self._port

# The collections "library". Currently an object.
# TODO(oemilyo): Make this into a proper library.
import UserDict

class Connections(UserDict.IterableUserDict):

  class Error(Exception):
    """Base class for exceptions in this module."""
    pass
  
  class ConnectionAlreadyExisted(Error):
    """Raised when a connection already exist for an instance."""
    def __init__(self, instance_name, port):
      super(ConnectionAlreadyExisted, self).__init__(
        "Connection already existed for {} at {}".format(instance_name, port))
        
  # TODO(oemilyo): A way to close connections.
  
  def get_or_create_port_for_instance(self, instance_name):
    if instance_name not in self.data:
      self.create_connection(instance_name)
    return self.data[instance_name].get_port()
  
  def get_port_for_instance(self, instance_name):
    """Returns port for instance, or None if no connection exists."""
    if instance_name not in self.data:
      return None
    # TODO(oemilyo): Check if the port is usable?
    return self.data[instance_name].get_port()

  def create_connection(self, instance_name, port=None):
    chosen_port = port or 3307  # we have to figure out the next available port
    if instance_name in self.data:
      raise ConnectionAlreadyExisted(instance_name, self.data[instance_name])
    else:
      # Run the proxy in the background.
      # For windows, use creationflags=CREATE_NEW_CONSOLE(0x00000010)
      # pid = subprocess.Popen(["../../cloud_sql_proxy -instances=datalab-deploy-test1:us-central1:cloudsql=tcp:3307"],
      #                        creationflags=CREATE_NEW_CONSOLE).pid
      # If we want the process to keep running when this python process is dead (when?), 
      # then give it its own STDIN/STDOUT: stdout=subprocess.PIPE, stderr=subprocess.PIPE, stdin=subprocess.PIPE
      print "Starting a new process"
      process = subprocess.Popen([
        "/content/cloud_sql_proxy", 
        "-instances={}=tcp:{}".format(instance_name, chosen_port)])
      # TODO(oemilyo): Handle errors!
      # TODO(oemilyo): It seems if the user hasn't signed in (aka no cloud credentials),
      # the process dies and becomes <defunct> (zombie) without setting a returncode.
      # Figure out how to handle that.
      self.data[instance_name] = Connection(instance_name, chosen_port, process)
      
  def remove_connection(self, instance_name):
    """Cleans the connection map."""
    # Doesn't kill the process.
    # TODO(oemilyo): Something to kill the process as well.
    try: 
      del self.data[instance_name]
    except KeyError:
      pass

In [12]:
connections = Connections()  # This needs to be a global singleton.

## Start/ connect to the proxy.

Cannot use mysql client because it is not installed.

In [2]:
%%bash
mysql

bash: line 1: mysql: command not found


Run the following to use Python to start a proxy.

In [7]:
instance_name = "datalab-deploy-test1:us-central1:cloudsql"  # provided by user
input_args = {
  'instance_name': instance_name,  # User will give us the name
  'db': 'test_db',
  'user': 'root'
}

In [15]:
# Check if other cloud_sql_proxy are running.
# Unfortunately, we don't know what databases those are connecting to.
cloud_proxies_pid = subprocess.check_output(["pgrep", "cloud_sql_proxy"])
print cloud_proxies_pid.split('\n')

['2290', '']


In [14]:
# Checks the port.
print connections.get_port_for_instance(input_args['instance_name'])
print connections

3307
{'datalab-deploy-test1:us-central1:cloudsql': Name:datalab-deploy-test1:us-central1:cloudsql, Port:3307, pid:2290, returncode:None}


In [56]:
# If for some reason the existing cxn is bad, remove it and start anew.
connections.remove_connection(input_args['instance_name'])

In [13]:
# Run this if using Python-started proxies.
port = connections.get_or_create_port_for_instance(input_args['instance_name'])

Starting a new process


In [None]:
# Otherwise, run this if proxy is started via bash with the chosen port.
port = 3307

In [21]:
import MySQLdb
import MySQLdb.cursors

In [22]:
# TODO: Handle error, e.g. OperationalError == cannot connect.
db = MySQLdb.connect(
  host='127.0.0.1', port=port, db=input_args['db'], user=input_args['user'], 
  charset='utf8', cursorclass=MySQLdb.cursors.SSDictCursor)

In [91]:
# We could also connect with Connector/Python
# TODO: Doesn't work currently.
import mysql.connector
cnx = mysql.connector.connect(host='127.0.0.1', port=port, db=input_args['db'], user=input_args['user'], charset='utf8')

ImportError: No module named mysql

## Try to access the Cloud SQL tables.

In [23]:
user_query = "Select * from foobar"

In [24]:
# Do some queries with a mysqlDB cursor.
c = db.cursor()
c.execute(user_query)
result = c.fetchall()
c.close()

print result

({'id': 1L, 'value': 3L}, {'id': 2L, 'value': 7L})


In [None]:
# Do some query with connector cursor.
# TODO: Doesn't work.
cursor = cnx.cursor()

cursor.execute(query)

print cursor

cursor.close()

## Wrap the query result in a DataFrame.

In [25]:
import pandas

In [27]:
results_dataframe = None

if results_dataframe is None: 
  # I forgot why I added this check...
  results_dataframe = pandas.DataFrame(list(result))

print results_dataframe

   id  value
0   1      3
1   2      7


## If the connection was made with the Connector library, close it.

In [None]:
# Close the connector connection.
cnx.close()

## Debug stuffs.

In [52]:
import datalab
datalab.context.Context.default().credentials


<oauth2client.client.GoogleCredentials at 0x7fc2b5283fd0>