# Participant Management - Translation & Communication
## Large Scale Computing for the Social Sciences, Final Project
### Max Kramer
---
It is recommended to run this script on an EMR cluster, though it is entirely possible to run it locally or on the midway cluster as well. It is required to have `awscli` installed and your *.credentials* file updated with your AWS information. If you intend to run more than a few hundred participants, it is **strongly** recommended to run this notebook on an EMR cluster.

### Import libraries and initialize AWS clients

In [41]:
import csv
import boto3
import awscli
import pandas as pd

In [18]:
translate = boto3.client('translate') # initialize translation client

s3 = boto3.client('s3') # initialize s3 client for data storage

sns = boto3.client('sns') # initialize sns client for communication

### Read in data

In [101]:
response = s3.list_objects(Bucket='lcssfinal')
print(response)

{'ResponseMetadata': {'RequestId': 'PC3RC39X8EXWRWGX', 'HostId': 'djI9O5U5/h46Fqn2i2B9O1h6Sr1Z0p90pG1Wz4BuGKGSPnZnOc4mMy1PILMH7k7VPKml6K1fR6Y=', 'HTTPStatusCode': 200, 'HTTPHeaders': {'x-amz-id-2': 'djI9O5U5/h46Fqn2i2B9O1h6Sr1Z0p90pG1Wz4BuGKGSPnZnOc4mMy1PILMH7k7VPKml6K1fR6Y=', 'x-amz-request-id': 'PC3RC39X8EXWRWGX', 'date': 'Fri, 04 Jun 2021 07:18:18 GMT', 'x-amz-bucket-region': 'us-east-1', 'content-type': 'application/xml', 'transfer-encoding': 'chunked', 'server': 'AmazonS3'}, 'RetryAttempts': 0}, 'IsTruncated': False, 'Marker': '', 'Contents': [{'Key': 'ingest_output.csv', 'LastModified': datetime.datetime(2021, 6, 4, 6, 17, 28, tzinfo=tzutc()), 'ETag': '"8e84f64ca943e32bf81f94d9e23c9ead"', 'Size': 838, 'StorageClass': 'STANDARD', 'Owner': {'DisplayName': 'awslabsc0w2139152t1616996578', 'ID': 'cb76d247bbf1e7440d5f28ff9ce5451937b64f8c71e297b2842c864829666f61'}}], 'Name': 'lcssfinal', 'Prefix': '', 'MaxKeys': 1000, 'EncodingType': 'url'}


In [87]:
df_list =  [ ] 

for file in response['Contents']:
    obj = s3.get_object(Bucket='lcssfinal', Key=file['Key'])
    obj_df = pd.read_csv(obj['Body'])
    df_list.append(obj_df)
    
df = pd.concat(df_list)

df

Unnamed: 0,First Name,Last Name,Date of Birth,Height,Weight,Gender Identity,Handedness,Email Address,Cell Phone Number,Preferred Language
0,Max,Kramer,07/25/1997,"6'0""",215lbs,Male,Right,mkramer1@uchicago.edu,7733185225,en
1,Coen,Needell,07/22/1994,"5'10""",140lbs,Male,Right,mkramer1@uchicago.edu,7733185225,en
2,Deepa,Prasad,04/02/1993,"5'7""",120lbs,Female,Left,mkramer1@uchicago.edu,7733185225,en
3,Wilma,Bainbridge,10/12/1998,"5'4""",125lbs,Female,Left,mkramer1@uchicago.edu,7733185225,es
4,Leon,Zhou,07/25/1997,"6'1""",140lbs,Male,Right,mkramer1@uchicago.edu,7733185225,es
5,Madeline,Gedvila,07/22/1994,"5'5""",155lbs,Female,Right,mkramer1@uchicago.edu,7733185225,zh
6,Rebecca,Greenberg,04/02/1993,"5'6""",110lbs,Female,Left,mkramer1@uchicago.edu,7733185225,zh
7,Trent,Davis,10/12/1998,"5'7""",125lbs,Male,Right,mkramer1@uchicago.edu,7733185225,fr


### Generate SNS Topic & Subscribe Participants

In [116]:
ARNs = {} # Store ARNs for SNS

for index, row in df.iterrows():
    cell_number = row['Cell Phone Number'] # get phone numbers
    email = row['Email Address'] # get email addresses
    lang = row['Preferred Language']
    topic = sns.create_topic(Name="IRB_compliance_{}".format(row['Preferred Language'])) # generate new SNS topic for IRB forms
    IRB_ARN = topic['TopicArn'] # get TopicARN
    ARNs[lang] = IRB_ARN # save ARN with languageCode
    
    resp_email = sns.subscribe( # Subscribe Email
    TopicArn = IRB_ARN,
    Protocol = 'email', Endpoint=row['Email Address'])
    
    resp_cell = sns.subscribe( # Subscribe SMS
    TopicArn = IRB_ARN,
    Protocol = 'sms', Endpoint="+1"+ str(row["Cell Phone Number"]))

###  Load in IRB Form

In [96]:
f = open("IRB.txt", "r") # Read in IRB Form
IRB_form = f.read()
print(IRB_form)

University of Chicago Online Consent Form for Research Participation

Study Number: IRB19-1395
Study Title: Human memory & cognition
Researcher: Wilma A. Bainbridge, PhD

Description: We are researchers at the University of Chicago doing a research study about how human sensation (vision, audition) and memory interact. During this study, you will see visual stimuli (e.g., images, videos, text) presented on a computer monitor, and/or hear sounds (e.g., music, tones, voices, sounds from a video) through headphones or speakers. You will respond to the task instructions with a button press, typing, mouse movement, drawing, or speaking through a microphone. You may also be asked to complete questionnaires that assess cognitive abilities. Participation will take less than 1 hour. Your participation is voluntary and you can withdraw at any time. Depending on the experiment, you may receive between $6-$10/hour for your participation.

Risks and Benefits: Taking part in this research study may 

### Translate Message and Send

In [119]:
for val in df['Preferred Language'].unique(): # iterate over languages
    languageCode = val # get active user preferred language
    ARN = ARNs[languageCode] # get TopicARN for language 
    
    subject = translate.translate_text( # Translate message header
    Text="BrainBridge Lab IRB Compliance Form",
    SourceLanguageCode="auto",
    TargetLanguageCode=languageCode
    )

    message = translate.translate_text( # Translate IRB form
    Text=IRB_form,
    SourceLanguageCode='auto',
    TargetLanguageCode=languageCode)

    # Get contents for IRB emails
    body = message['TranslatedText']
    headline = subject['TranslatedText']
    
    # Publish to corresponding TopicARN
    sns.publish(TopicArn = ARN,
                Message = body,
                Subject = headline
               )