-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Create a CSV file with title and text in plaintext for Amazon Mechnical Turk with 2000 unique support questions #92
Comments
Do you think we need to include non-English tickets for this case? |
The CSV should probably include ticket ID as well, and if we need a version without we can easily remove that column. |
@rtanglao Just to clarify the HTML bit, would this
turn into this?
|
This is how many tickets we have with X annotations, not including SUMO. We'll definitely include the 100 tickets with 7-9 annotations, and then should we include all of the tickets with 2 or 3 annotations, for a total of 300 human-tagged tickets? Or add 100 tickets with 3 annotations to round out the 200?
|
@willfenton Let's include all tickets with any number (greater than 1) of non-Sumo annotations. |
Keeping in mind these 2000 tickets will be tagged in triplicate by 3 different workers each. |
Ok, sounds good. Just need some clarification on the HTML preprocessing now |
Yes, you can work under that assumption. I have confirmed that the annotations output includes the original raw texts for both fields, so we will be able to rejoin the annotations against our own data and re-associate with a ticket-id after tagging is complete. |
Sample output from my script, imported into Google Sheets My preprocessing is removing newlines and carriage returns ( |
Requirements:
sumo-ticket-title,sumo-ticket-text
The text was updated successfully, but these errors were encountered: