# Save Worker Answers from MTurk

## Pre-requisites
If you haven't already, you'll need to setup MTurk and AWS accounts that are linked together to use MTurk with Python. The MTurk account will be used to post tasks to the MTurk crowd and the AWS accounts will be used to connect to MTurk via API and provide access to any additional AWS resources that are needed to execute your task.

1. If you don't have an AWS account already, visit https://aws.amazon.com and create an account you can use for your project.
2. If you don't have an MTurk Requester account already, visit https://requester.mturk.com and create a new account.

After you've setup your accounts, you will need to link them together. When logged into both the root of your AWS account and your MTurk account, visit https://requester.mturk.com/developer to link them together.

From your AWS console create a new AWS IAM User or select an existing one you plan to use. Add the AmazonMechanicalTurkFullAccess policy to your user. Then select the Security Credentials tab and create a new Access Key, copy the Access Key and Secret Access Key for future use.

If you haven't installed the awscli yet, install it with pip (pip install awscli) and configure a profile using the access key and secret key above (aws configure --profile mturk). 

We also recommend installing xmltodict as shown below.

In [None]:
!pip install xmltodict

In [None]:
import boto3
import xmltodict
import json
import sys
# add parent directory so that config can be imported
sys.path.append('..')
import config
import os
from datetime import datetime
from pprint import pprint

In [None]:
create_hits_in_production = False
environments = {
    "production": {        
        "endpoint": "https://mturk-requester.us-east-1.amazonaws.com",
        "preview": "https://www.mturk.com/mturk/preview"
        
    },
    "sandbox": {
        "endpoint": "https://mturk-requester-sandbox.us-east-1.amazonaws.com",
        "preview": "https://workersandbox.mturk.com/mturk/preview"
    },
}
mturk_environment = environments["production"] if create_hits_in_production else environments["sandbox"]

# provide the AWS key id and the access key
client = boto3.client(
    service_name='mturk',
    aws_access_key_id='',
    aws_secret_access_key='',
    region_name='us-east-1',
    endpoint_url=mturk_environment['endpoint'],
)

In [None]:
# This will return your current MTurk balance if you are connected to Production.
# If you are connected to the Sandbox it will return $10,000.
print(client.get_account_balance()['AvailableBalance'])

## Get Results
Depending on the task, results will be available anywhere from a few minutes to a few hours. Here we retrieve the status of each HIT and the responses that have been provided by Workers.

First define the id of the hit whose answers you want to retrieve. Then load all the data from this HIT and save it to a dictionary.

In [None]:
# define the id of the hit whose answers you want to retrieve
hit_id = '3RBI0I35YIOUY8RL0HC2XX8GSEFY3Y'
#hit_type = 'componentAnnotation'
hit_type = 'relationAnnotation'
#hit_type = 'payment_for_non_submitted_HIT'

hit_answers = {}

# Get a list of the Assignments that have been submitted by Workers
assignmentsList = client.list_assignments_for_hit(
    HITId=hit_id,
    AssignmentStatuses=['Submitted', 'Approved', 'Rejected'],
    MaxResults=100
)
all_assignments = assignmentsList['Assignments']

for assignment in all_assignments:
    #print("assignment")
    #print(assignment)

    assignment_answer = {}

    # Retreive the attributes for each Assignment
    assignment_answer['AssignmentId'] = assignment['AssignmentId']
    assignment_answer['WorkerId'] = assignment['WorkerId']
    assignment_answer['HITId'] = assignment['HITId']
    assignment_answer['AssignmentStatus'] = assignment['AssignmentStatus']
    
    assignment_answer['AutoApprovalTime'] = (assignment['AutoApprovalTime']).isoformat()
    assignment_answer['AcceptTime'] = (assignment['AcceptTime']).isoformat()
    assignment_answer['SubmitTime'] = (assignment['SubmitTime']).isoformat()
    try:
        assignment_answer['ApprovalTime'] = (assignment['ApprovalTime']).isoformat()
    except:
        assignment_answer['ApprovalTime'] = "NOT YET APPROVED"


    # Retrieve the value submitted by the Worker from the XML
    answer_dict = xmltodict.parse(assignment['Answer'])

    worker_answer = {}

    if hit_type in ['componentAnnotation', 'relationAnnotation']:
            
        for answer in answer_dict['QuestionFormAnswers']['Answer']:
            worker_answer[answer['QuestionIdentifier']] = answer['FreeText']
    
    elif hit_type == 'payment_for_non_submitted_HIT':
        print("")
        worker_answer = answer_dict['QuestionFormAnswers']['Answer']["FreeText"]
        print("worker input which should be his id: ", worker_answer)
        print("")

    assignment_answer['worker_answer'] = worker_answer
    hit_answers[assignment['AssignmentId']] = assignment_answer


    print("assignment_answer: ", assignment_answer)
    print("")
print("number of assignments: ", len(hit_answers))

Then save the created dictionary in a txt file whose name contains the id of the hit.

In [None]:
#save the asnwers in txt file
filename = 'H1a_worker_answers.txt'
if create_hits_in_production:
    environment_name = 'production'
else:
    environment_name = 'sandbox'
environment_name = 'production'

with open(os.path.join('WorkerAnswers',environment_name, hit_type, filename), "w") as f:
    f.write(json.dumps(hit_answers,indent=2))
