Skip to content
This repository has been archived by the owner on May 24, 2019. It is now read-only.

BulkCreate

Thomas J. Leeper edited this page Jun 8, 2015 · 6 revisions

Bulk HIT Creation

The CreateHIT function allows a requester to create a single HIT. As of MTurkR v0.6.4, it is also possible to create multiple HITs in a single function call using BulkCreate. This function takes multiple question values and creates one HIT for each value, using a fixed set of other parameters. While this does not create a "batch" in the sense used by the Requester User Interface, BulkCreate requires an annotation argument so that all of the HITs can easily be operated on using functions such as ExpireHIT, ExtendHIT, etc.

In addition to BulkCreate, three additional bulk creation wrapper functions have been added:

  1. BulkCreateFromURLs is an easy way to create ExternalQuestion HITs, by simply supplying a vector of HIT URLs. This will create one HIT for each URL, and group them under a common title, description, etc. The annotation field is required:

    BulkCreateFromURLs(url = paste0("https://www.example.com/",1:3,".html"),
                       frame.height = 400,
                       annotation = paste("Bulk From URLs", Sys.Date()),
                       title = "Categorize an image",
                       description = "Categorize this image",
                       reward = ".05",
                       expiration = seconds(days = 4),
                       duration = seconds(minutes = 5),
                       auto.approval.delay = seconds(days = 1),
                       keywords = "categorization, image, moderation, category")
  2. BulkCreateFromTemplate can be used to create a set of HITs from a template HTML file, in the style of the Requester User Interface (i.e., the CSV upload feature of the RUI). If you (a) create an HTML file with placeholders for a set of variables (e.g., ${varname}) and (b) create a data.frame of variable values, this function will create a HIT structure from the template for each row of the data.frame and then create a HIT from each of those completed templates. Here's an example of an HTML template:

    <!DOCTYPE html>
    <html>
     <head>
      <meta http-equiv='Content-Type' content='text/html; charset=UTF-8'/>
      <script type='text/javascript' src='https://s3.amazonaws.com/mturk-public/externalHIT_v1.js'></script>
     </head>
     <body>
      <form name='mturk_form' method='post' id='mturk_form' action='https://www.mturk.com/mturk/externalSubmit'>
      <input type='hidden' value='' name='assignmentId' id='assignmentId'/>
      <h1>${hittitle}</h1>
      <p>${hitvariable}</p>
      <p>What do you think?</p>
      <p><textarea name='comment' cols='80' rows='3'></textarea></p>
      <p><input type='submit' id='submitButton' value='Submit' /></p></form>
      <script language='Javascript'>turkSetAssignmentID();</script>
     </body>
    </html>

    And here's MTurkR code:

    temp <- system.file("template.html", package = "MTurkR")
    a <- data.frame(hittitle = c("HIT title 1", "HIT title 2", "HIT title 3"),
                    hitvariable = c("HIT text 1", "HIT text 2", "HIT text 3"), 
                    stringsAsFactors = FALSE)
    BulkCreateFromTemplate(template = temp,
                           input = a,
                           annotation = paste("Bulk From Template", Sys.Date()),
                           title = "Categorize an image",
                           description = "Categorize this image",
                           reward = ".05",
                           expiration = seconds(days = 4),
                           duration = seconds(minutes = 5),
                           auto.approval.delay = seconds(days = 1),
                           keywords = "categorization, image, moderation, category")
  3. The final BulkCreateFromHITLayout uses the same logic of a template HTML file and an input data.frame. In this workflow, however, the template is created in the Requester User Interface (RUI), the "HITLayoutId" for that template is retrieved from the RUI, and the variable values are passed to the BulkCreateFromHITLayout function. You can find an example of this workflow here, which closely mirrors the previous example.

Some Tips for Bulk HIT Creation

  1. Test out your batch in the sandbox first using a small number of input values to make sure the code and the HITs themselves work.

  2. Use SufficientFunds() to estimate the cost of your project. Because HITs created in bulk are likely to be low-paying, it can be hard to estimate the cost of a project yourself due to Amazon's $0.005 minimum per-assignment commission.

  3. All the bulk creation functions require an annotation argument. This makes it easy to perform operations on the full set of HITs (e.g., ExtendHIT, ExpireHIT, GetAssignments, etc.) using just a single function call (as opposed to calling each function on each individual HIT).

  4. Specify the auto.approval.delay argument. By default, this is set to 30 days (or seconds(days = 30)). Approving each assignment in bulk creation mode will be time consuming because approving each HIT requires a separate API call. Specifying a shorter approval delay will allow the MTurk system to approve the work for you without the need to call ApproveAssignment yourself.

  5. Performing operations on a bulk creation batch involves a (potentially very large) number of separate API calls. By default, MTurkR records all API calls in a local file (MTurkRlog.tsv) in the working directory. You can save some time and avoid this record-keeping by using the global option options(MTurkR.log = FALSE) or by passing MTurkR.log = FALSE in your bulk creation function. This has the potential to speed up the HIT creation process and other operations performed in bulk.

  6. Set aside enough time for code to run. Again, because MTurkR has to make a separate API call for each HIT being created, it can be time consuming to do bulk creation and, especially, to perform any subsequent operation on a batch (such as assignment approval).