- Date
2013-07-11
- Author
Marco Massenzio
- Version
0.3
- Updated
2013-10-03
This allows one to explore (or modify) the state of some of the Tables in the Postgres DB by running an arbitrary query against a dev (or Prod) instance.
There may also be a set of predefined queries that may be run automatically; further, these predefined queries may have a number of parameters that will have to be substituted before executing them.
The output of the script must be suitable for both human and machine (eg, piping) consumption
This is a python script that takes a number of command-line arguments, constructs the query and then executes it against a specified VM instance.
The output format will initially be JSON, but it will eventually be possible for the user to choose from a predefined set (eg, CSV, HTML table, etc.)
These can be seen using the --help
option:
$ python snooper.py --help
usage: snooper.py [-h] [--queries QUERIES] [--out OUT] [--host HOST]
[--format FORMAT] [--query QUERY] [--list] [--env ENV]
[--debug] --conf CONF [--coupons COUPONS]
[--provider PROVIDER] [--cloud CLOUD]
[param [param ...]]
SQL command line execution tool
positional arguments:
param positional parameters for the query
optional arguments:
-h, --help show this help message and exit
--queries QUERIES an optional input file (JSON) defining a set of SQL
queries
--out FILE an optional output file
--host HOST the host to run the query against (must be running the
Postgres server and have the external port accessible
--format FORMAT the format for the output
--query QUERY, -q QUERY
used in conjunction with --queries, specifies the
named query to run
--list, -l Lists all available queries in the files specified
with the --queries flag and exits
--env ENV the section in the {'const': None, 'help': 'the
section in the %s configuration file, from which to
take the connection configuration parameters',
'option_strings': ['--env'], 'dest': 'env',
'required': False, 'nargs': None, 'choices': None,
'default': 'dev', 'prog': 'snooper.py', 'container':
<argparse._ArgumentGroup object at 0x106e7ced0>,
'type': None, 'metavar': None} configuration file,
from which to take the connection configuration
parameters
--debug If set, the server will be run in debug mode: do NOT
use in production
--conf CONF The location of the configuration file which contains
the connection parameters
--coupons COUPONS This will generate the requested number of coupons,
insert in the database and return in the --out file
the list of codes. MUST specify --provider and --cloud
UUIDs
--provider PROVIDER The name of the Service Provider for the coupons
--cloud CLOUD The name of the Cloud Target for the promotion codes
If the query to run has named arguments, then those should be passed in (for now, I will only support positional arguments).
For example, say you want to run the get_user
query from the queries.json
file:
"queries": {
"get_user_by_id": {
"sql": "SELECT username, first_name, last_name FROM USER WHERE ID=%(id)s",
"drill_down": {
"username": "/api/1/query/get_user_by_username/username/$"
}
},
...
}
then one would invoke the script as follows:
snooper --queries queries.json --host 10.10.121.99 --query get_user id 99
the script would then run the query as follows:
SELECT USERNAME, PASSWORD FROM USER WHERE ID=99
The (name, value) pair must always be passed like shown above (with a space, no =
): this would be an error:
snooper --queries queries.json --host 10.10.121.99 --query get_user id=99 # WRONG
Alternatively, the script can be used to run an arbitrary SQL query from the command line:
snooper --host 10.10.121.99 'SELECT USERNAME, PASSWORD FROM USER WHERE ID=99'
Please note it's an error to pass both the --queries
argument and a query (in this case, the query literal would be incorrectly interpreted as one of the query's positional parameters).
These are taken from a configuration file (defined by the --conf
option) and grouped by environments, as in:
# Connection configuration for Snooper
[dev]
db = mydb
user = uzer
password = duba
host = 10.10.121.99
[prod]
db = pencloud
user = zooz
password = blaaaz
Use the --env
command-line arg to specify a given environment (dev
is used by default).
Note the hostname
can be specified via the configuration file, or via the --host
command-line argument (if specified in both, CLI option takes precedence).
This is mostly useful for UI clients taking advantage of the REST API, and allows one to execute a chain of drill-down queries.
Taking as an example this query:
"queries": {
"get_user_by_role": {
"sql": "SELECT uuid,email_address,first_name,last_name FROM users WHERE role=%(uuid)s",
"drill_down": {
"uuid": "/api/1/query/get_user_by_id/id/$"
}
}
}
if executed, it may return something like:
{
"connection": "dbname=pencloud user=rmview host=localhost",
"query": "SELECT uuid,email_address,first_name,last_name FROM users WHERE role='ProviderAdmin'",
"results": [
{
"email_address": "pappleton@rivermeadow.com",
"first_name": "",
"last_name": "SPAdmin",
"uuid": "06afdd16-319f-481e-b2ed-33944bf7227c"
},
{
"email_address": "kenny@rivermeadow.com",
"first_name": "Kenneth",
"last_name": "Keppler",
"uuid": "f8e3bf70-2817-4dd5-8533-6e79f685434d"
},
...
then, a drill-down on the first returned user could be run by executing a call to:
GET /api/1/query/get_user_by_id/id/06afdd16-319f-481e-b2ed-33944bf7227c
Note the $
placeholder that needs to be replaced with the actual value of the returned column, and the fact that the actual URL path element (id
) does not necessarily match the column's name (uuid
).
The drill-down key will always match (case, spaces, underscores, etc.) the actual name of the returned element, regardless of case, etc. for the DB schema.
Hence, given a query such as:
SELECT uuid "Source UUID" , HOSTNAME hostname FROM SOURCE WHERE SOURCE.UUID=?
the drill-down map would looks something like:
"drill_down": {
"Source UUID": "/api/1/query/get_source_by_id/id/$",
"hostname": "/api/1/query/get_hostname/hostname/$"
}
Knwon limitation
It is currently not possible to substitute multiple values in a drill-down query, something like:
"drill_down": {
"Source UUID": "/api/1/query/get_source_by_id/id/$",
"hostname": "/api/1/query/get_hostname/hostname/$/username/$user/organization/$org"
}
although it would be perfectly valid to use something like:
"drill_down": {
"Source UUID": "/api/1/query/get_source_by_id/id/$",
"hostname": "/api/1/query/get_hostname/hostname/$/username/a_user/organization/my_org"
}
note in this case, the additional arguments are not taken from the query result, but are fixed: all the client needs to do, is to substitute the $
placeholder with the column value.
The server will provide a full API wrapper around the script functionality, returning the response in JSON:
GET /api/1/query
Response:
{
"queries": [
"get_user", "get_migration", "get_success_duration"
]
}
query/<query_name/<param_1>/<value_1>/<param_2>/<value_2>
There can be an arbitrary number of {param, value} pairs, following the query's name: the parameters' values will be substituted in the query according to the %(param)s
patterns.
Given:
"get_user_by_role": {
"sql": "SELECT uuid,email_address,first_name,last_name FROM users WHERE role=%(role)s",
"params": [{
"name": "role",
"label": "Role"
}]
}
We can execute the following request:
GET /api/1/query/get_user_by_role/role/ProviderAdmin
Response:
{
"connection": "dbname=pencloud user=rmview host=localhost",
"query": "SELECT uuid,email_address,first_name,last_name FROM users WHERE role='ProviderAdmin'",
"results": [
{
"email_address": "pappleton@rivermeadow.com",
"first_name": "",
"last_name": "SPAdmin",
"uuid": "06afdd16-319f-481e-b2ed-33944bf7227c"
},
{
"email_address": "kenny@rivermeadow.com",
"first_name": "Kenneth",
"last_name": "Keppler",
"uuid": "f8e3bf70-2817-4dd5-8533-6e79f685434d"
},
{
"email_address": "rtsai@rivermeadow.com",
"first_name": "Robert",
"last_name": "Tsai",
"uuid": "ca043832-c2b6-45f8-b0ad-3ea416336e39"
},
{
"email_address": "rheaton@rivermeadow.com",
"first_name": "Rich",
"last_name": "Heaton",
"uuid": "5a24565a-571c-48a0-b205-43291121d7c3"
},
{
"email_address": "eric.culp@poweredbypeak.com",
"first_name": "None",
"last_name": "None",
"uuid": "b43627bd-8a3c-45c0-8666-520ac4d758f5"
}
],
"rowcount": 5,
"timestamp": "2013-07-24T16:24:24.777920",
drill_down: {
"email_address": "/api/1/query/get_user_by_email/email/$"
}
}
POST /api/1/query/my_get_user
{
"sql": "SELECT last_name, email_address FROM USERS WHERE UUID=%(id)s AND first_name=%(name)s",
"drill_down": {
"email_address": "/api/1/query/get_user_by_email/email/$"
},
"params": [
{
"label": "User UUID",
"name": "id"
},
{
"label": "First Name",
"name": "name"
}
]
}
Response:
201 CREATED
<no body>
PUT /api/1/query/my_get_user
{
"sql": "SELECT FIRST_NAME, LAST_NAME FROM USER WHERE ID=%(id)s",
"params": [
{
"label": "User UUID",
"name": "id"
}
]
}
Response:
200 OK
<no body>
This will return a 404 NOT FOUND
if the named query does not exist or a 406 BAD REQUEST
if the format of the body does not meet the specification.
DELETE /api/1/query/query_to_delete
Response:
205 RESET CONTENT
<empty body>
GET /api/1/info/<query>
Response:
200 OK
JSON full body of the query
For example, given this entry in the queries.json
file:
{
"get_user": {
"sql": "SELECT FIRST_NAME, LAST_NAME FROM USER WHERE ID=%(id)s",
"params": [
{
"label": "User UUID",
"name": "id"
}
]
"drill_down": {
"FIRST_NAME": "/api/1/query/get_user_by_name/name/$"
}
},
... other queries
}
The following request:
GET /api/1/info/get_user
would get back:
{
"name": "get_user",
"sql": "SELECT FIRST_NAME, LAST_NAME FROM USER WHERE ID=%(id)s",
"params": [
{
"label": "User UUID",
"name": "id"
}
]
"drill_down": {
"FIRST_NAME": "/api/1/query/get_user_by_name/name/$"
}
}
GET /api/1/info
Response:
200 OK
JSON full bodies of all the queries
See above for an example.
Snooper is packaged as an EGG, to build:
python setup.py bdist_egg
this will generate a snooper-x.x.xxx-py2.7.egg
where x.x.xxx
is the release version no. (set in src/snooper.py
VERSION
).
We recommend the use of virtualenv
and once activated, it's just a matter of running:
pip install ./snooper-x.x.xxx-py2.7.egg
NOTE this currently does not work due to some mysterious version error with pip
Do this instead
- build a virtual environment and activate it
- obtain the latest
requirements.txt
from the github repo - run
pip install -r requirements.txt
- download the
snooper-x.x.xxx-py2.7.egg
file from our repo - run
easy_install snooper-x.x.xxx-py2.7.egg
It's ugly, but no one ever said that Python had to be as good as Java, after all.
# TODO: update the above with actual locations once I finalize them
Follow the instructions here
But essentially:
edit the configuration file:
# vim /var/lib/pgsql/9.2/datapg_hba.conf
add this line:
host all all 10.10.0.0/16 trust
ensure the server is listening on all ports:
# vim /var/lib/pgsql/9.2/postgresql.conf
ensure this line is present:
listen_addresses = '*' # what IP address(es) to listen on;
restart Postgres:
# service postgresql-9.2 restart