# Intake-Postgres Plugin: Benchmarking

This notebook benchmarks different backends for the _intake-postgres_ plugin.
Current benchmarks include Pandas (_pd.read\_sql\_query()_) and PostgresAdapter.

### Setup
1. Start a PostgreSQL server. If Docker is installed, an easy way to do this is with the following command:
    ```
    docker run --rm -p 5432:5432 mdillon/postgis:9.6-alpine
    ```
    Wait until the line _"LOG:  database system is ready to accept connections"_ appears.
1. In the same conda environment as this notebook, install `pandas`, `sqlalchemy`, and `psycopg2`. Optionally, `postgresql` can also be installed (this is only the client library, not the database server):
    ```
    conda install pandas sqlalchemy psycopg2 postgres
    ```
1. Install the _intake-postgres_ plugin:
    ```
    conda install -c intake intake-postgres
    ```
1. For benchmarking *PostgresAdapter*, clone down the git repo:
    ```
    git clone https://github.com/ContinuumIO/PostgresAdapter.git
    ```
1. Buiild and install the *PostgresAdapter* conda package:
    ```
    conda build buildscripts/condarecipe --python 3.6
    conda install --use-local postgresadapter
    ```

### Basic Usage

First, import the necessary modules:

In [None]:
## For inserting test data
import pandas as pd
from sqlalchemy import create_engine

## For benchmarking
import numpy as np
import uuid
import postgresadapter
from postgresadapter import PostgresAdapter

print('postgresadapter version:', postgresadapter.__version__)
URI = 'postgresql://postgres@localhost:5432/postgres'

Insert some data into some database tables:

In [None]:
engine = create_engine(URI)

def make_uuid(_1, _2):
    return str(uuid.uuid4())

# Ints
df = pd.DataFrame({'random': np.random.randint(np.iinfo(np.int64).min, np.iinfo(np.int64).max, size=100000),
                   'uuid': np.fromfunction(make_uuid, (100000, 1))},
                  index=np.arange(100000))
df.to_sql('integer', engine, if_exists='replace')

Verify the data was written, by connecting to the database directly with the `psql` command-line tool:

In [None]:
# Verify the data was written
!psql -h localhost -U postgres -qt -c 'select * from integer offset 99990;'

In [None]:
#engine = create_engine(URI)
#df = pd.read_sql_table('integer', engine)
%timeit pd.read_sql_query('select * from integer', engine)
pd.read_sql_query('select * from integer', engine).tail()

In [None]:
#adapter = PostgresAdapter(URI, table='integer')
%timeit PostgresAdapter(URI, dataframe=True, query='select * from integer')
%timeit adapter._to_dataframe()
#print(adapter.field_names)
#print(adapter.get_field_types())
#print(adapter.num_records)
adapter._to_dataframe().tail()