Henare Degan edited this page Mar 10, 2011 · 1 revision

Our design goals for the process and supporting tool were to:

  • Ensure the accuracy of the published transcription;
  • Minimise harm should the transcription be inaccurate; and
  • Rapidly complete the transcription.
Accuracy trumps speed.

The basic process we came up with had this ideal flow:

  • Scan the pages, which should take about an hour at my office for all 1500 pages
  • Upload the scanned images to the transcription engine on the web site
  • Crowd-sourceable task #1: identify the start pages to break the batch into sets of 5-10 pages, each set specific to a particular MP
  • Crowd-sourceable task #2: fill in ROMI form as much as possible from a set of images
  • Crowd-sourceable task #3: verify accurate transcription, using the same interface as above for editing
  • Notify the MP's email account that their ROMI transcription is ready for review and correction
At some later stage, perhaps common for all transcriptions belonging to a single session or ROMI volume:
  • Notify the MP's email account that their ROMI transcription will be published on a particular date
At that date:
  • Publish the ROMI transcription
  • Notify users subscribed to the MP's feed
To keep things moving, I proposed an objection handling process which flags rather than removes the transcription:
  • If someone hits the objection button, the transcription's page is decorated with an unmissable banner telling page visitors about the objection and urging them to refer to the original image.
  • Even if nobody has objected, the page should have an obvious notice that the transcription is there to make it easy for them to search for information, but that they should refer to the original image before publishing anything based on the ROMI entry.
  • Subscribers to an MP will be notified of objections to that MP's ROMI transcription. Perhaps we have a separate user preference for this level of detail.
  • Objections can be cleared as easily as they're set. Clearing an objection is alerted via the same means as the objection.
  • We'll need a "locked" flag to prevent modification to transcriptions when things get silly. The flag will prevent non-staff from modifying the transcription, changing the objection state, and even — perhaps — commenting. This roughly matches Wikipedia, who from what I've read have found that a lock for a couple of weeks straightens out most situations.
From my scan of the sample (email me if you want a copy), transcribers will encounter many ambiguous and illegible items. They'll want to chat about it. We should let them chat as closely as possible to the content itself.

To keep it simple, I figure each transcription should have a single stream of items including both user-posted notes and system-posted notes (modifications, objection status, publication status, etc). We can integrate the comment stream with the form editing interface.

I'm not sure whether the conversation about the transcription ("what does this say?") should be in the same thread as conversation about the content ("what does this mean?") or not.

Matt volunteered to carve that mess into distinct tickets so we can keep track of where we are.

I've played with rough layouts and researched JavaScript-driven image zooming. Do I have to restrict myself to 1024x768 screens (960x600 effective browser pane area)?

You can’t perform that action at this time.
You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session.
Press h to open a hovercard with more details.