Run a clone of Amazon's Mechanical Turk service in your local environment.
This tool is meant to be used as a web service running locally on your network or personal machine. It will load HIT template files generated by the Amazon Mechanical Turk web GUI provided to requesters for creating HITs. Input CSV files are also uploaded to create a HIT based on the template with each row of values in the CSV file.
The results of the HITs completed by the workers can be exported in CSV files.
git clone https://github.com/hltcoe/turkle.git cd turkle
Make sure that the dependencies listed below are met, and then run the commands
python manage.py migrate python manage.py runserver
TODO: instructions for installing from an extracted bundle that is distributed along with the required eggs.
Turkle depends on the packages listed in
requirements.txt. If the packages are not already installed in your environment, and you have an internet connection, then you can run the following command to install the required Python packages:
pip install -r requirements.txt
or with virtualenv:
virtualenv venv source venv/bin/activate pip install -r requirements.txt
Example HIT HTML templates and corresponding CSV data files are
Load the URL of the tool (by default http://localhost:8000) in your browser. Click on List of HITs, and then start completing the HITs under the Unfinished HITs
To publish new HITs,
cd to the root directory of this server's code
repository and run the command:
python manage.py publish_hits <template_file_path> <csv_file_path>
<template_file_path> replaced with the absolute path to the HIT template
<csv_file_path> replaced with the path to the CSV file containing
the data for the individual HITs.
To get the results of the completed HITs,
cd to the root directory of
this server's code repository and run the command:
python manage.py dump_results <template_file_path> <results_csv_file_path>
<template_file_path>replaced with the absolute path to where the template file was located when the HITs were published. This argument acts as a filter so that only completed HITs from the same template are dumped.
<results_csv_file_path>replaced with the desired path to where the results will be saved. The format is:
- UTF-8 encoding
- a header row for the first line
- one HIT result per line
- values in each line are comma-delimited in the Excel style.
To streamline worker task completion, submission of a HIT can
automatically load the next unfinished HIT. To enable this setting
True at the bottom of
After changing settings, the web server may need to be restarted for changes to take effect.
Instead of installing Turkle and dependencies directly, you can run Turkle as a Docker container, using scripts to manage your HIT templates and data. Either build a Turkle image:
docker build --force-rm -t hltcoe/turkle .
or pull the latest from the Docker registry:
docker pull hltcoe/turkle
and start a container with an easy name, and mapping container port 8080 somewhere on the Docker host (e.g. 18080):
docker run -d --name container_name -p 18080:8080 hltcoe/turkle
Your annotator can now browse to that port on the Docker host. To give them something to do, upload an Amazon Turk HIT template and data:
scripts/upload_hit.sh container_name data.csv template.html
At any point, you can download the current state of annotations:
scripts/download_annotations.sh container_name annotation_state.csv
You can upload new data to be annotated, without changing the template:
scripts/upload_hit.sh container_name new_data.csv
Or replace both:
scripts/upload_hit.sh container_name new_data.csv new_template.html