# SQL Querying

This notebook can be used to query tables in the Congressional Data database. In order to use this notebook, you will need to set an environment variable 'CD_DWH' to the database connection string. If you do not have the credentials, please slack us at #datasci-congressdata channel and/or talk to a project lead.

**It is best practice to not hard code database URI strings directly in notebooks or code as when we push to Github, that would mean credentials are public for anyone to see.**

In [1]:
import os
import sys

import pandas as pd
pd.options.display.max_columns = 999
import sqlalchemy as sqla
from sqlalchemy import create_engine

DB_URI = os.getenv('CD_DWH')
engine = create_engine(DB_URI)

In [2]:
# Checking that the Kernel is using the Conda environment datasci-congressional-data
# Below you should see something like '/Users/Username/anaconda3/envs/datasci-congressional-data/bin/python
# If you do NOT see "datasci-congressional-data" this means you are not in the right Python Environment
# Please make sure you have gone through the onboarding docs and/or talk to a project lead.
sys.executable

'/Users/VincentLa/venv/data_science/bin/python'

Below are the tables that currently exist in the database!

In [None]:
QUERY = """
select *
from information_schema.tables
where table_schema not in ('information_schema', 'pg_catalog', 'public')
"""
with engine.begin() as conn:
    results = pd.read_sql(QUERY, conn)
results.head(100)

Unnamed: 0,table_catalog,table_schema,table_name,table_type,self_referencing_column_name,reference_generation,user_defined_type_catalog,user_defined_type_schema,user_defined_type_name,is_insertable_into,is_typed,commit_action
0,datascicongressionaldata,data_ingest,casos__california_candidate_statewide_election...,BASE TABLE,,,,,,YES,NO,
1,datascicongressionaldata,data_ingest,ceda__california_candidate_local_election_results,BASE TABLE,,,,,,YES,NO,
2,datascicongressionaldata,stg_analytics,stg_candidate_contributions,BASE TABLE,,,,,,YES,NO,
3,datascicongressionaldata,data_ingest,sfdata__campaign_finance_form460_schedulea,BASE TABLE,,,,,,YES,NO,
4,datascicongressionaldata,data_ingest,maplight__california_candidate,BASE TABLE,,,,,,YES,NO,
5,datascicongressionaldata,data_ingest,maplight__california_other,BASE TABLE,,,,,,YES,NO,
6,datascicongressionaldata,trg_analytics,candidate_contributions,BASE TABLE,,,,,,YES,NO,
7,datascicongressionaldata,stg_analytics,stg_candidate_election_results,BASE TABLE,,,,,,YES,NO,


## Query Example

In [None]:
QUERY = """
select
  *
from trg_analytics.candidate_contributions
"""
with engine.begin() as conn:
    results = pd.read_sql(QUERY, conn)
results.head(100)