#Census Data Correlation
Correlate another table with US Census data.  Expands a data set dimensions by finding population segments that correlate with the master table.


#License

Copyright 2020 Google LLC,

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

  https://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.



#Disclaimer
This is not an officially supported Google product. It is a reference implementation. There is absolutely NO WARRANTY provided for using this code. The code is Apache Licensed and CAN BE fully modified, white labeled, and disassembled by your team.

This code generated (see starthinker/scripts for possible source):
  - **Command**: "python starthinker_ui/manage.py colab"
  - **Command**: "python starthinker/tools/colab.py [JSON RECIPE]"



#1. Install Dependencies
First install the libraries needed to execute recipes, this only needs to be done once, then click play.


In [None]:
!pip install git+https://github.com/google/starthinker


#2. Set Configuration

This code is required to initialize the project. Fill in required fields and press play.

1. If the recipe uses a Google Cloud Project:
  - Set the configuration **project** value to the project identifier from [these instructions](https://github.com/google/starthinker/blob/master/tutorials/cloud_project.md).

1. If the recipe has **auth** set to **user**:
  - If you have user credentials:
    - Set the configuration **user** value to your user credentials JSON.
  - If you DO NOT have user credentials:
    - Set the configuration **client** value to [downloaded client credentials](https://github.com/google/starthinker/blob/master/tutorials/cloud_client_installed.md).

1. If the recipe has **auth** set to **service**:
  - Set the configuration **service** value to [downloaded service credentials](https://github.com/google/starthinker/blob/master/tutorials/cloud_service.md).



In [None]:
from starthinker.util.configuration import Configuration


CONFIG = Configuration(
  project="",
  client={},
  service={},
  user="/content/user.json",
  verbose=True
)



#3. Enter Census Data Correlation Recipe Parameters
 1. Pre-requisite is Census Normalize, run that at least once.
 1. Specify JOIN, PASS, SUM, and CORRELATE columns to build the correlation query.
 1. Define the DATASET and TABLE for the joinable source. Can be a view.
 1. Choose the significance level.  More significance usually means more NULL results, balance quantity and quality using this value.
 1. Specify where to write the results.
 1. IMPORTANT:** If you use VIEWS, you will have to delete them manually if the recipe changes.
Modify the values below for your use case, can be done multiple times, then click play.


In [None]:
FIELDS = {
  'auth':'service',  # Credentials used for writing data.
  'join':'',  # Name of column to join on, must match Census Geo_Id column.
  'pass':[],  # Comma seperated list of columns to pass through.
  'sum':[],  # Comma seperated list of columns to sum, optional.
  'correlate':[],  # Comma seperated list of percentage columns to correlate.
  'from_dataset':'',  # Existing BigQuery dataset.
  'from_table':'',  # Table to use as join data.
  'significance':'80',  # Select level of significance to test.
  'to_dataset':'',  # Existing BigQuery dataset.
  'type':'table',  # Write Census_Percent as table or view.
}

print("Parameters Set To: %s" % FIELDS)


#4. Execute Census Data Correlation
This does NOT need to be modified unless you are changing the recipe, click play.


In [None]:
from starthinker.util.configuration import execute
from starthinker.util.recipe import json_set_fields

TASKS = [
  {
    'census':{
      'auth':{'field':{'name':'auth','kind':'authentication','order':0,'default':'service','description':'Credentials used for writing data.'}},
      'correlate':{
        'join':{'field':{'name':'join','kind':'string','order':1,'default':'','description':'Name of column to join on, must match Census Geo_Id column.'}},
        'pass':{'field':{'name':'pass','kind':'string_list','order':2,'default':[],'description':'Comma seperated list of columns to pass through.'}},
        'sum':{'field':{'name':'sum','kind':'string_list','order':3,'default':[],'description':'Comma seperated list of columns to sum, optional.'}},
        'correlate':{'field':{'name':'correlate','kind':'string_list','order':4,'default':[],'description':'Comma seperated list of percentage columns to correlate.'}},
        'dataset':{'field':{'name':'from_dataset','kind':'string','order':5,'default':'','description':'Existing BigQuery dataset.'}},
        'table':{'field':{'name':'from_table','kind':'string','order':6,'default':'','description':'Table to use as join data.'}},
        'significance':{'field':{'name':'significance','kind':'choice','order':7,'default':'80','description':'Select level of significance to test.','choices':['80','90','98','99','99.5','99.95']}}
      },
      'to':{
        'dataset':{'field':{'name':'to_dataset','kind':'string','order':9,'default':'','description':'Existing BigQuery dataset.'}},
        'type':{'field':{'name':'type','kind':'choice','order':10,'default':'table','description':'Write Census_Percent as table or view.','choices':['table','view']}}
      }
    }
  }
]

json_set_fields(TASKS, FIELDS)

execute(CONFIG, TASKS, force=True)
