Django-based clone of Amazon's Mechanical Turk service running in your local environment.
Switch branches/tags
Clone or download
charman Merge pull request #29 from exploy/master
 Option to generate random next hit on submit
Latest commit 218d760 Sep 5, 2018

README.markdown

Run a clone of Amazon's Mechanical Turk service in your local environment.

This tool is meant to be used as a web service running locally on your network or personal machine. It will load HIT template files generated by the Amazon Mechanical Turk web GUI provided to requesters for creating HITs. Input CSV files are also uploaded to create a HIT based on the template with each row of values in the CSV file.

The results of the HITs completed by the workers can be exported in CSV files.

Installation

Initial setup

git clone https://github.com/hltcoe/turkle.git
cd turkle

Make sure that the dependencies listed below are met, and then run the commands

python manage.py migrate
python manage.py runserver

TODO: instructions for installing from an extracted bundle that is distributed along with the required eggs.

Dependencies

  • Turkle depends on the packages listed in requirements.txt. If the packages are not already installed in your environment, and you have an internet connection, then you can run the following command to install the required Python packages:

    pip install -r requirements.txt

    or with virtualenv:

    virtualenv venv
    source venv/bin/activate
    pip install -r requirements.txt

Usage

Example HITs

Example HIT HTML templates and corresponding CSV data files are provided under examples/.

Worker instructions

Load the URL of the tool (by default http://localhost:8000) in your browser. Click on List of HITs, and then start completing the HITs under the Unfinished HITs

Requester instructions

Publish HITs

To publish new HITs, cd to the root directory of this server's code repository and run the command:

python manage.py publish_hits <template_file_path> <csv_file_path>

with <template_file_path> replaced with the absolute path to the HIT template file and <csv_file_path> replaced with the path to the CSV file containing the data for the individual HITs.

Get results

To get the results of the completed HITs, cd to the root directory of this server's code repository and run the command:

python manage.py dump_results <template_file_path> <results_csv_file_path>

with:

  • <template_file_path> replaced with the absolute path to where the template file was located when the HITs were published. This argument acts as a filter so that only completed HITs from the same template are dumped.
  • <results_csv_file_path> replaced with the desired path to where the results will be saved. The format is:
  • UTF-8 encoding
  • a header row for the first line
  • one HIT result per line
  • values in each line are comma-delimited in the Excel style.

Configuration

To streamline worker task completion, submission of a HIT can automatically load the next unfinished HIT. To enable this setting change NEXT_HIT_ON_SUBMIT to True at the bottom of settings.py.

After changing settings, the web server may need to be restarted for changes to take effect.

Docker usage

Instead of installing Turkle and dependencies directly, you can run Turkle as a Docker container, using scripts to manage your HIT templates and data. Either build a Turkle image:

docker build --force-rm -t hltcoe/turkle .

or pull the latest from the Docker registry:

docker pull hltcoe/turkle

and start a container with an easy name, and mapping container port 8080 somewhere on the Docker host (e.g. 18080):

docker run -d --name container_name -p 18080:8080 hltcoe/turkle

Your annotator can now browse to that port on the Docker host. To give them something to do, upload an Amazon Turk HIT template and data:

scripts/upload_hit.sh container_name data.csv template.html

At any point, you can download the current state of annotations:

scripts/download_annotations.sh container_name annotation_state.csv

You can upload new data to be annotated, without changing the template:

scripts/upload_hit.sh container_name new_data.csv

Or replace both:

scripts/upload_hit.sh container_name new_data.csv new_template.html