# SQL Querying

This notebook can be used to query tables in the Congressional Data database. In order to use this notebook, you will need to set an environment variable 'CD_DWH' to the database connection string. If you do not have the credentials, please slack us at #datasci-congressdata channel and/or talk to a project lead.

**It is best practice to not hard code database URI strings directly in notebooks or code as when we push to Github, that would mean credentials are public for anyone to see.**

In [None]:
import os
import sys
import math

import pandas as pd
pd.options.display.max_columns = 999
import sqlalchemy as sqla
from sqlalchemy import create_engine
from sqlalchemy.engine import reflection

from googleplaces import GooglePlaces, types, lang

DB_URI = os.getenv('CD_DWH')
engine = create_engine(DB_URI)
YOUR_API_KEY = os.getenv('GOOGLE_PLACES_API')

In [None]:
# Checking that the Kernel is using the Conda environment datasci-congressional-data
# Below you should see something like '/Users/Username/anaconda3/envs/datasci-congressional-data/bin/python
# If you do NOT see "datasci-congressional-data" this means you are not in the right Python Environment
# Please make sure you have gone through the onboarding docs and/or talk to a project lead.
sys.executable

Below are the tables that currently exist in the database!

## Query for Geo-codable columns

https://modeanalytics.com/code_for_san_francisco/reports/14ee2086d1e7

In [None]:
insp = reflection.Inspector.from_engine(engine)
print(insp.get_table_names())

In [None]:
QUERRY = """
SELECT
    *
  FROM trg_analytics.candidate_contributions
  LIMIT 10"""
with engine.begin() as conn:
    candidate_contributions_sample = pd.read_sql(QUERRY, conn)

In [None]:
candidate_contributions_sample.head()

In [None]:
QUERRY = """
SELECT
    transaction_id
    , transaction_amount
    , donor_zip_code
    , recipient_candidate_district
  FROM trg_analytics.candidate_contributions
  WHERE
     transaction_type = 'Monetary Contribution' """
with engine.begin() as conn:
    candidate_contributions_geo = pd.read_sql(QUERRY, conn)

In [None]:
candidate_contributions_geo.head()

http://congressional-district.insidegov.com/d/d/California
https://developers.google.com/maps/documentation/geocoding/intro

In [None]:
def geocode(df, api_key=YOUR_API_KEY):
    """Add Geocoded columns to df

    Keyword Args:
    df: Dataframe which must have an "address" column with a clean address
    api_key: Google Places API Key
    """
    google_places = GooglePlaces(api_key)
    matches = []

    # This counter is just for debugging purposes since I don't want to hit the API threshold
    i = 0
    for place in df.address:
        print(place)
        print(i)
        query_result = google_places.nearby_search(
            location=place,
            radius=100
        )
        print(query_result.places)
        matches.append(query_result.places)
        i = i + 1
        if i == 10:
            break

    for i in range(len(matches), len(df)):
        matches.append(None)
    df['matches'] = matches

    return df