# Database Recap
-----


In [None]:
%pylab inline
import pandas as pd
import psycopg2
import sklearn
import seaborn as sns
from sqlalchemy import create_engine
sns.set_style("white")

### Connect to the database

In [None]:
db_name = "appliedda"
hostname = "10.10.2.10"
conn = psycopg2.connect(database=db_name, host = hostname) #database connection

The database connection allows us to make queries to a database from Python. 

In [None]:
df_tables = pd.read_sql("""SELECT table_schema, table_name
                          FROM information_schema.tables
                          order by table_schema, table_name;""", conn)

In [None]:
df_tables.head()

The table `information_schema.tables` contains information about all tables in the database. This comes in handy when you forget a tablename. 

# Exercise: Display all tables that are are NOT in the information schema or pg_catalog

First, we'll create a *mask*, or a Boolean (true/false) array which has as many entries as `df_tables` has rows. For each row in `df_tables`, this code checks whether `table_schema` is either `information_schema` or `pg_catalog`. We'll keep track of this information in a variable called `mask`. 

In [None]:
mask = df_tables['table_schema'].isin(['information_schema', 'pg_catalog'])
mask[0:5]

We can use this mask to select rows out of our `df_tables` dataframe, keeping only the elements that are `True` in `mask`:

In [None]:
df_tables[mask]

If we want to display all the tables that are NOT in `information_schema` or `pg_catalog`, we want exactly the opposite of our mask. We can use `~`, which is short for "not":

In [None]:
df_tables[~mask]