Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create infrastructure for sample code #331

Merged
merged 3 commits into from Aug 7, 2020
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
21 changes: 21 additions & 0 deletions useful_resources/sample_code/README.md
@@ -0,0 +1,21 @@
# Sample Scripts

The scripts in this folder perform data operations, typically moving data between one connector and another. To perform the same operation, all you should need to do is change the configuration variables in the `Configuration Variables` section of each script. You can also use these scripts as a jumping off point to create new scripts doing similar opeations.

If you can't find the script you want here, you can create an issue in the tracker with the label [script request](https://github.com/move-coop/parsons/labels/script%20request). Please put in as much detail as possible what you would like the script to do.

If you have a script you'd like to add, you have two options. You can create an issue in the tracker with the contents of your script, and add the label [sample script to add](https://github.com/move-coop/parsons/labels/script%20to%20add). Or you can add the script yourself.

If you wish to add the script yourself, please use the `template_script.py` file so that your contribution's structure is consistent with all the other scripts. Please keep most comments as these are directed at the eventual user of your script, but delete any comments labeled "//To Script Writer//" once you understand their advice.

Please also add your new script to the table below.

# Existing Scripts

| File Name | Brief Description | Connectors Used | Written For Parsons Version |
| ----------- | ----------- | ----------- | ----------- |
| apply_activist_code.py | Gets activist codes stored in Redshift and applies to users in Van | Redshift, VAN| unknown |
| s3_to_redshift.py | Moves files from S3 to Redshift| Redshift, S3| unknown |
| s3_to_s3.py | Get files from vendor s3 bucket and moves to own S3 bucket | S3 | unknown |
| update_user_in_actionkit.py | Adds a voterbase_id (the Targetsmart ID) to users in ActionKit |Redshift, ActionKit | unknown |

56 changes: 33 additions & 23 deletions useful_resources/sample_code/apply_activist_code.py
@@ -1,36 +1,46 @@
from parsons import Table, Redshift, VAN
import logging
import os
### METADATA

# Connectors: Redshift, VAN
# Description: Gets activist codes stored in redshift and applies to users in Van
# Parsons Version: unknown


### CONFIGURATION

# Redshift setup - this assumes a Civis Platform parameter called "REDSHIFT"
# Set the configuration variables below or set environmental variables of the same name and leave these
# with empty strings. We recommend using environmental variables if possible.

set_env_var(os.environ['REDSHIFT_PORT'])
set_env_var(os.environ['REDSHIFT_DB'])
set_env_var(os.environ['REDSHIFT_HOST'])
set_env_var(os.environ['REDSHIFT_CREDENTIAL_USERNAME'])
set_env_var(os.environ['REDSHIFT_CREDENTIAL_PASSWORD'])
rs = Redshift()
config_vars = {
# Redshift
"REDSHIFT_PORT": "",
"REDSHIFT_DB": "",
"REDSHIFT_HOST": "",
"REDSHIFT_CREDENTIAL_USERNAME": "",
"REDSHIFT_CREDENTIAL_PASSWORD": "",
# Van
"VAN_PASSWORD": "",
"VAN_DB_NAME": ""
}

# AWS setup - this assumes a Civis Platform parameter called "AWS"

set_env_var('S3_TEMP_BUCKET', 'parsons-tmc')
set_env_var('AWS_ACCESS_KEY_ID', os.environ['AWS_ACCESS_KEY_ID'])
set_env_var('AWS_SECRET_ACCESS_KEY', os.environ['AWS_SECRET_ACCESS_KEY'])
s3 = S3()
### CODE

from parsons import Table, Redshift, VAN
from parsons import logger
import os

# Setup

# Logging
for name, value in config_vars.items(): # sets variables if provided in this script
eliotst marked this conversation as resolved.
Show resolved Hide resolved
if value.strip() != "":
os.environ[name] = value
shaunagm marked this conversation as resolved.
Show resolved Hide resolved

logger = logging.getLogger(__name__)
_handler = logging.StreamHandler()
_formatter = logging.Formatter('%(levelname)s %(message)s')
_handler.setFormatter(_formatter)
logger.addHandler(_handler)
logger.setLevel('INFO')
rs = Redshift() # just create Redshift() - VAN connector is created dynamically below

# Create dictionary of VAN states and API keys from multiline Civis credential

myv_states = {x.split(",")[0]: x.split(",")[1] for x in os.environ['VAN_PASSWORD'].split("\r\n")}
myv_keys = {k: VAN(api_key=v, db='MyVoters') for k,v in myv_states.items()}
myv_keys = {k: VAN(api_key=v, db=os.environ['VAN_DB_NAME']) for k,v in myv_states.items()}

# Create simple set of states for insertion into SQL
states = "','".join([s for s in myv_keys])
Expand Down
59 changes: 35 additions & 24 deletions useful_resources/sample_code/s3_to_redshift.py
@@ -1,36 +1,47 @@
import os
import logging
from parsons import Redshift, S3, utilities
### METADATA

# Redshift setup - this assumes a Civis Platform parameter called "REDSHIFT"
# Connectors: Redshift, S3
# Description: Moves files from S3 to Reshift
# Parsons Version: unknown

set_env_var(os.environ['REDSHIFT_PORT'])
set_env_var(os.environ['REDSHIFT_DB'])
set_env_var(os.environ['REDSHIFT_HOST'])
set_env_var(os.environ['REDSHIFT_CREDENTIAL_USERNAME'])
set_env_var(os.environ['REDSHIFT_CREDENTIAL_PASSWORD'])
rs = Redshift()

# AWS setup - this assumes a Civis Platform parameter called "AWS"
### CONFIGURATION

set_env_var('S3_TEMP_BUCKET', 'parsons-tmc')
set_env_var('AWS_ACCESS_KEY_ID', os.environ['AWS_ACCESS_KEY_ID'])
set_env_var('AWS_SECRET_ACCESS_KEY', os.environ['AWS_SECRET_ACCESS_KEY'])
s3 = S3()
# Set the configuration variables below or set environmental variables of the same name and leave these
# with empty strings. We recommend using environmental variables if possible.

# Logging
config_vars = {
# S3
"AWS_ACCESS_KEY_ID": "",
"AWS_SECRET_ACCESS_KEY": "",
"BUCKET": "",
# Redshift
"REDSHIFT_PORT": "",
"REDSHIFT_DB": "",
"REDSHIFT_HOST": "",
"REDSHIFT_CREDENTIAL_USERNAME": "",
"REDSHIFT_CREDENTIAL_PASSWORD": "",
"S3_TEMP_BUCKET": "",
}

logger = logging.getLogger(__name__)
_handler = logging.StreamHandler()
_formatter = logging.Formatter('%(levelname)s %(message)s')
_handler.setFormatter(_formatter)
logger.addHandler(_handler)
logger.setLevel('INFO')

### CODE

bucket = os.environ['BUCKET']
schema = os.environ['SCHEMA']
import os
from parsons import Redshift, S3, utilities, logger

# Setup

for name, value in config_vars.items(): # sets variables if provided in this script
if value.strip() != "":
os.environ[name] = value

s3 = S3()
rs = Redshift()

# Code

bucket = os.environ['BUCKET']
keys = s3.list_keys(bucket)
files = keys.keys()

Expand Down
71 changes: 36 additions & 35 deletions useful_resources/sample_code/s3_to_s3.py
@@ -1,53 +1,54 @@
import os
import logging
from parsons import Redshift, S3, utilities
### METADATA

# Connectors: S3
# Description: Gets files from source s3 bucket and moves to destination S3 bucket
# Parsons Version: unknown


### CONFIGURATION

# Redshift setup - this assumes a Civis Platform parameter called "REDSHIFT"
# Set the configuration variables below or set environmental variables of the same name and leave these
# with empty strings. We recommend using environmental variables if possible.

set_env_var(os.environ['REDSHIFT_PORT'])
set_env_var(os.environ['REDSHIFT_DB'])
set_env_var(os.environ['REDSHIFT_HOST'])
set_env_var(os.environ['REDSHIFT_CREDENTIAL_USERNAME'])
set_env_var(os.environ['REDSHIFT_CREDENTIAL_PASSWORD'])
rs = Redshift()
config_vars = {
# S3 (source)
"AWS_SOURCE_ACCESS_KEY_ID": "",
"AWS_SOURCE_SECRET_ACCESS_KEY": "",
# S3 (destination)
'AWS_DESTINATION_SECRET_ACCESS_KEY': "",
'AWS_DESTINATION_ACCESS_KEY_ID': ""
}

# AWS setup - this assumes a Civis Platform parameter called "AWS"
DESTINATION_BUCKET = None

set_env_var('S3_TEMP_BUCKET', 'parsons-tmc')
set_env_var('AWS_ACCESS_KEY_ID', os.environ['AWS_USERNAME'])
set_env_var('AWS_SECRET_ACCESS_KEY', os.environ['AWS_PASSWORD'])
s3 = S3()

# 2nd AWS setup - this assumes a Civis Platform parameter called "AWS_VENDOR"
s3_vendor = S3(os.environ['AWS_VENDOR_ACCESS_KEY_ID'],
os.environ['AWS_VENDOR_SECRET_ACCESS_KEY'])
### CODE

# Logging
import os
from parsons import Redshift, S3, utilities, logger

# Setup

logger = logging.getLogger(__name__)
_handler = logging.StreamHandler()
_formatter = logging.Formatter('%(levelname)s %(message)s')
_handler.setFormatter(_formatter)
logger.addHandler(_handler)
logger.setLevel('INFO')
for name, value in config_vars.items(): # sets variables if provided in this script
if value.strip() != "":
os.environ[name] = value

s3_source = S3(os.environ['AWS_SOURCE_ACCESS_KEY_ID'], os.environ['AWS_SOURCE_SECRET_ACCESS_KEY'])
s3_destination = S3(os.environ['AWS_DESTINATION_ACCESS_KEY_ID'], os.environ['AWS_DESTINATION_SECRET_ACCESS_KEY'])

# Let's write some code!

# Get Vendor Bucket Information
bucket_guide = s3_vendor.list_buckets()
# Get Source Bucket Information
bucket_guide = s3_source.list_buckets()
logger.info(f"We will be getting data from {len(bucket_guide)} buckets...")

# Define Destination Bucket
dest_bucket = 'tmc-data'

# Moving Files from Vendor s3 Bucket to Destination s3 Bucket
# Moving Files from Source s3 Bucket to Destination s3 Bucket
for bucket in bucket_guide:

logger.info(f"Working on files for {bucket}...")
keys = s3_vendor.list_keys(bucket)
keys = s3_source.list_keys(bucket)
logger.info(f"Found {len(keys)}.")
for key in keys:
temp_file = s3_vendor.get_file(bucket, key)
tmc_key = f"vendor_exports/{bucket}/{key}"
s3.put_file(dest_bucket, tmc_key, temp_file)
temp_file = s3_source.get_file(bucket, key)
s3_destination.put_file(DESTINATION_BUCKET, key, temp_file)
utilities.files.close_temp_file(temp_file)
35 changes: 35 additions & 0 deletions useful_resources/sample_code/template_script.py
@@ -0,0 +1,35 @@
### METADATA

# Connectors:
# Description:

### CONFIGURATION

# Set the configuration variables below or set environmental variables of the same name and leave these
# with empty strings. We recommend using environmental variables if possible.

# //To Script Writer// : add the environmental variable name but not the value
# //To Script Writer// : separate environmental variables by connector

config_vars = {
# Connector 1:
"EXAMPLE_VARIABLE_NAME": "",
# Connector 2:
"ANOTHER_EXAMPLE_VARIABLE_NAME": ""
}


### CODE

import os # //To Script Writer//: import any other packages your script uses
from parsons import utilities, logger # //To Script Writer//: import any connectors your script uses

# Setup

for name, value in config_vars.items(): # if variables specified above, sets them as environmental variables
if value.strip() != "":
os.environ[name] = value

# //To Script Writer// : instantiate connectors here, eg: rs = Redshift().

# Code
71 changes: 36 additions & 35 deletions useful_resources/sample_code/update_user_in_actionkit.py
@@ -1,45 +1,46 @@
# Civis Container Link: https://platform.civisanalytics.com/spa/#/scripts/containers/33553735
import sys
import os
import datetime
import logging
from parsons import Redshift, Table, ActionKit

# Redshift setup - this assumes a Civis Platform parameter called "REDSHIFT"

set_env_var(os.environ['REDSHIFT_PORT'])
set_env_var(os.environ['REDSHIFT_DB'])
set_env_var(os.environ['REDSHIFT_HOST'])
set_env_var(os.environ['REDSHIFT_CREDENTIAL_USERNAME'])
set_env_var(os.environ['REDSHIFT_CREDENTIAL_PASSWORD'])
rs = Redshift()
### METADATA

# Connectors: Redshift, ActionKit
# Description: Adds a voterbase_id (the Targetsmart ID) to users in ActionKit
# Parsons Version: unknown


### CONFIGURATION

# AWS setup - this assumes a Civis Platform parameter called "AWS"
# Set the configuration variables below or set environmental variables of the same name and leave these
# with empty strings. We recommend using environmental variables if possible.

set_env_var('S3_TEMP_BUCKET', 'parsons-tmc')
set_env_var('AWS_ACCESS_KEY_ID', os.environ['AWS_ACCESS_KEY_ID'])
set_env_var('AWS_SECRET_ACCESS_KEY', os.environ['AWS_SECRET_ACCESS_KEY'])
s3 = S3()
config_vars = {
# Redshift
"REDSHIFT_PORT": "",
"REDSHIFT_DB": "",
"REDSHIFT_HOST": "",
"REDSHIFT_CREDENTIAL_USERNAME": "",
"REDSHIFT_CREDENTIAL_PASSWORD": "",
# ActionKit
"AK_USERNAME": "",
"AK_PASSWORD": "",
"AK_DOMAIN": ""
}

# ActionKit Setup - this assumes a Civis Platform Customo Credential parameter called "AK"
username = os.environ['AK_USERNAME']
password = os.environ['AK_PASSWORD']
domain = os.environ['AK_DOMAIN']
ak = ActionKit(domain=domain, username=username, password=password)
### CODE

# Logging
import sys, os, datetime
from parsons import Redshift, Table, ActionKit, logger

logger = logging.getLogger(__name__)
_handler = logging.StreamHandler()
_formatter = logging.Formatter('%(levelname)s %(message)s')
_handler.setFormatter(_formatter)
logger.addHandler(_handler)
logger.setLevel('INFO')
# Setup

for name, value in config_vars.items(): # sets variables if provided in this script
if value.strip() != "":
os.environ[name] = value

rs = Redshift()
ak = ActionKit()

# This example involves adding a voterbase_id (the Targetsmart ID) to a user in ActionKit

timestamp = datetime.datetime.now().strftime("%Y-%m-%d %H:%M:%S") #timestamp to be used for log table
loaded = [['id','voterbase_id','date_updated']] #column names for log table
timestamp = datetime.datetime.now().strftime("%Y-%m-%d %H:%M:%S") # timestamp to be used for log table
loaded = [['id','voterbase_id','date_updated']] # column names for log table

source_table = 'schema.table' # this is the table with the information I'm pushing to ActionKit
log_table = 'schema.table' # this is where we will log every user id that gets marked with a voterbase_id
Expand All @@ -65,7 +66,7 @@
loaded.append([row['id'], row['voterbase_id'],timestamp])

logger.info("Done with loop! Loading into log table...")
Table(loaded).to_redshift(log_table,if_exists='append')
Table(loaded).to_redshift(log_table, if_exists='append')

else:
logger.info(f"No one to update today...")