# Participant Management - Translation & Communication
## Large Scale Computing for the Social Sciences, Final Project
### Max Kramer
---
It is recommended to run this script on an EMR cluster, though it is entirely possible to run it locally or on the midway cluster as well. It is required to have `awscli` installed and your *.credentials* file updated with your AWS information. If you intend to run more than a few hundred participants, it is **strongly** recommended to run this notebook on an EMR cluster.

### Import libraries and initialize AWS clients

In [1]:
import csv
import boto3
import awscli
import pandas as pd

In [2]:
translate = boto3.client('translate') # initialize translation client

s3 = boto3.client('s3') # initialize s3 client for data storage

sns = boto3.client('sns') # initialize sns client for communication

### Read in data

In [3]:
response = s3.list_objects(Bucket='lcssfinal')
print(response)

{'ResponseMetadata': {'RequestId': 'MMN29DDCHQASV1GB', 'HostId': 'MaZEZ6H6yGnN0s9mB3WwOhE1M0tgirVJBnuTvs3W/dLCS91MgoVsKBgeuW/8ozT3lU58GcJdnoA=', 'HTTPStatusCode': 200, 'HTTPHeaders': {'x-amz-id-2': 'MaZEZ6H6yGnN0s9mB3WwOhE1M0tgirVJBnuTvs3W/dLCS91MgoVsKBgeuW/8ozT3lU58GcJdnoA=', 'x-amz-request-id': 'MMN29DDCHQASV1GB', 'date': 'Sat, 05 Jun 2021 00:15:10 GMT', 'x-amz-bucket-region': 'us-east-1', 'content-type': 'application/xml', 'transfer-encoding': 'chunked', 'server': 'AmazonS3'}, 'RetryAttempts': 0}, 'IsTruncated': False, 'Marker': '', 'Contents': [{'Key': 'filter_output.csv', 'LastModified': datetime.datetime(2021, 6, 5, 0, 14, 32, tzinfo=tzutc()), 'ETag': '"1a3cc4ad12991ab24763671e919c430d"', 'Size': 748, 'StorageClass': 'STANDARD', 'Owner': {'DisplayName': 'awslabsc0w2139152t1616996578', 'ID': 'cb76d247bbf1e7440d5f28ff9ce5451937b64f8c71e297b2842c864829666f61'}}, {'Key': 'ingest_output.csv', 'LastModified': datetime.datetime(2021, 6, 4, 23, 35, 58, tzinfo=tzutc()), 'ETag': '"f4b1e13f

In [4]:
obj = s3.get_object(Bucket='lcssfinal', Key='filter_output.csv')
obj_df = pd.read_csv(obj['Body'])
df = obj_df

df

Unnamed: 0,First Name,Last Name,Age,Height,Weight,Gender Identity,Handedness,Email Address,Cell Phone Number,Preferred Language
0,Coen,Needell,23,66,140,male,right,mkramer1@uchicago.edu,7733185225,en
1,Leon,Zhou,19,73,140,male,right,mkramer1@uchicago.edu,7733185225,es
2,Trent,Davis,20,59,125,male,right,mkramer1@uchicago.edu,7733185225,fr
3,Coen,Needell,23,66,140,male,right,mkramer1@uchicago.edu,7733185225,en
4,Leon,Zhou,19,73,140,male,right,mkramer1@uchicago.edu,7733185225,es
5,Trent,Davis,20,59,125,male,right,mkramer1@uchicago.edu,7733185225,fr
6,Coen,Needell,23,66,140,male,right,mkramer1@uchicago.edu,7733185225,en
7,Leon,Zhou,19,73,140,male,right,mkramer1@uchicago.edu,7733185225,es
8,Trent,Davis,20,59,125,male,right,mkramer1@uchicago.edu,7733185225,fr


### Generate SNS Topic & Subscribe Participants

In [5]:
ARNs = {} # Store ARNs for SNS

for index, row in df.iterrows():
    cell_number = row['Cell Phone Number'] # get phone numbers
    email = row['Email Address'] # get email addresses
    lang = row['Preferred Language']
    topic = sns.create_topic(Name="IRB_compliance_{}".format(row['Preferred Language'])) # generate new SNS topic for IRB forms
    IRB_ARN = topic['TopicArn'] # get TopicARN
    ARNs[lang] = IRB_ARN # save ARN with languageCode
    
    resp_email = sns.subscribe( # Subscribe Email
    TopicArn = IRB_ARN,
    Protocol = 'email', Endpoint=row['Email Address'])
    
    resp_cell = sns.subscribe( # Subscribe SMS
    TopicArn = IRB_ARN,
    Protocol = 'sms', Endpoint="+1"+ str(row["Cell Phone Number"]))
    break # ONLY HERE TO PREVENT ME FROM GETTING HUNDREDS OF EMAILS DURING TEST

###  Load in IRB Form

In [6]:
f = open("IRB.txt", "r") # Read in IRB Form
IRB_form = f.read()
print(IRB_form)

University of Chicago Online Consent Form for Research Participation

Study Number: IRB19-1395
Study Title: Human memory & cognition
Researcher: Wilma A. Bainbridge, PhD

Description: We are researchers at the University of Chicago doing a research study about how human sensation (vision, audition) and memory interact. During this study, you will see visual stimuli (e.g., images, videos, text) presented on a computer monitor, and/or hear sounds (e.g., music, tones, voices, sounds from a video) through headphones or speakers. You will respond to the task instructions with a button press, typing, mouse movement, drawing, or speaking through a microphone. You may also be asked to complete questionnaires that assess cognitive abilities. Participation will take less than 1 hour. Your participation is voluntary and you can withdraw at any time. Depending on the experiment, you may receive between $6-$10/hour for your participation.

Risks and Benefits: Taking part in this research study may 

### Translate Message and Send

In [None]:
for val in df['Preferred Language'].unique(): # iterate over languages
    languageCode = val # get active user preferred language
    ARN = ARNs[languageCode] # get TopicARN for language 
    
    subject = translate.translate_text( # Translate message header
    Text="BrainBridge Lab IRB Compliance Form",
    SourceLanguageCode="auto",
    TargetLanguageCode=languageCode
    )

    message = translate.translate_text( # Translate IRB form
    Text=IRB_form,
    SourceLanguageCode='auto',
    TargetLanguageCode=languageCode)

    # Get contents for IRB emails
    body = message['TranslatedText']
    headline = subject['TranslatedText']
    
    # Publish to corresponding TopicARN
    sns.publish(TopicArn = ARN,
                Message = body,
                Subject = headline
               )

    # NOTE: NOT RUNNING THIS TO AVOID GETTING HUNDREDS OF EMAILS, CONFIRMED IT WORKS ON SMALLER DATA, SEE GIT REPO FOR EMAIL 