The purpose of this Jupyter notebook is to demonstrate connecting from a Jupyter notebook running in a container to a postgres database running in another container.

## Postgres Installation

The Postgres database is run in a separate linked container, that by default named `birth_db`. This container uses the `birth_db` image, which is built from `birth_db/Dockerfile`. That image is based on the lastest [postgres](https://hub.docker.com/_/postgres/) image. Additionally, it creates the `birth_db` database with a single table named `birth_data`, and populates `birth_data` table with the rows in `births2012_downsampled.csv`.


The default postgres user in Docker is `postgres`, but that can be changed with the `POSTGRES_USER` environment variable. 

The password for the postgres user is set by the environment variable `POSTGRES_PASSWORD`, with the defalt set in `birth_db/Dockerfile`. Obviously, the way I am doing it here is insecure and really stupid, so I need to figure out how people handle this in production. The Docker [Postgres](https://hub.docker.com/_/postgres/) docs mention something about referencing files for different environment variables.

How to reference the hostnames of linked containers took me a while to figure out and I finally found the docs [here](https://docs.docker.com/compose/networking/).

In [10]:
from sqlalchemy import create_engine

In [11]:
# Define a database name
dbname = 'birth_db'
username = 'postgres' 
password = 'mysecretpassword'
hostname = 'birth_db'
port = '5432'
db_uri = f"postgres://{username}:{password}@{hostname}:{port}/{dbname}"
engine = create_engine(db_uri)

The Postgres container has already created and populated the `birth_data` table.

In [15]:
engine.table_names()

OperationalError: (psycopg2.OperationalError) could not translate host name "birth_db" to address: Name or service not known
 (Background on this error at: http://sqlalche.me/e/e3q8)

# Pandas and SQL

This is a cool trick I want to remember. I used SQLAlachemy/pandas to write the `CREATE TABLE` statement. Pandas does this in the background with `to_sql`, but the methods it uses are private and took a little digging.

In [12]:
from pandas.io.sql import SQLDatabase, SQLTable

In [13]:
db = SQLDatabase(engine)

In [22]:
birth_data = birth_data.reset_index()

In [23]:
print(db._create_sql_schema(table_name='birth_data_table', frame=birth_data, keys=['index']))


CREATE TABLE birth_data_table (
	index BIGSERIAL NOT NULL, 
	alcohol_use FLOAT(53), 
	anencephaly FLOAT(53), 
	attendant TEXT, 
	birth_loc_type FLOAT(53), 
	birth_month TEXT, 
	birth_state FLOAT(53), 
	birth_weight FLOAT(53), 
	birth_year BIGINT, 
	cigarette_use FLOAT(53), 
	cigarettes_per_day FLOAT(53), 
	cigarettes_trimester1 FLOAT(53), 
	cigarettes_trimester2 FLOAT(53), 
	cigarettes_trimester3 FLOAT(53), 
	day TEXT, 
	delivery_method TEXT, 
	"downs syndrome" FLOAT(53), 
	drinks_per_week FLOAT(53), 
	father_age FLOAT(53), 
	father_race TEXT, 
	gestation_weeks FLOAT(53), 
	infant_sex TEXT, 
	mother_age BIGINT, 
	mother_birth_country FLOAT(53), 
	mother_birth_state FLOAT(53), 
	mother_education TEXT, 
	mother_marital_status TEXT, 
	mother_race TEXT, 
	mother_state FLOAT(53), 
	population FLOAT(53), 
	pregnancy_weight FLOAT(53), 
	resident TEXT, 
	revision TEXT, 
	spina_bifida FLOAT(53), 
	"table" TEXT, 
	timestamp BIGINT, 
	uses_tobacco FLOAT(53), 
	weight_gain FLOAT(53), 
	CONSTRAINT