Join GitHub today
GitHub is home to over 28 million developers working together to host and review code, manage projects, and build software together.Sign up
Our design goals for the process and supporting tool were to:
- Ensure the accuracy of the published transcription;
- Minimise harm should the transcription be inaccurate; and
- Rapidly complete the transcription.
The basic process we came up with had this ideal flow:
- Scan the pages, which should take about an hour at my office for all 1500 pages
- Upload the scanned images to the transcription engine on the web site
- Crowd-sourceable task #1: identify the start pages to break the batch into sets of 5-10 pages, each set specific to a particular MP
- Crowd-sourceable task #2: fill in ROMI form as much as possible from a set of images
- Crowd-sourceable task #3: verify accurate transcription, using the same interface as above for editing
- Notify the MP's email account that their ROMI transcription is ready for review and correction
- Notify the MP's email account that their ROMI transcription will be published on a particular date
- Publish the ROMI transcription
- Notify users subscribed to the MP's feed
- If someone hits the objection button, the transcription's page is decorated with an unmissable banner telling page visitors about the objection and urging them to refer to the original image.
- Even if nobody has objected, the page should have an obvious notice that the transcription is there to make it easy for them to search for information, but that they should refer to the original image before publishing anything based on the ROMI entry.
- Subscribers to an MP will be notified of objections to that MP's ROMI transcription. Perhaps we have a separate user preference for this level of detail.
- Objections can be cleared as easily as they're set. Clearing an objection is alerted via the same means as the objection.
- We'll need a "locked" flag to prevent modification to transcriptions when things get silly. The flag will prevent non-staff from modifying the transcription, changing the objection state, and even — perhaps — commenting. This roughly matches Wikipedia, who from what I've read have found that a lock for a couple of weeks straightens out most situations.
To keep it simple, I figure each transcription should have a single stream of items including both user-posted notes and system-posted notes (modifications, objection status, publication status, etc). We can integrate the comment stream with the form editing interface.
I'm not sure whether the conversation about the transcription ("what does this say?") should be in the same thread as conversation about the content ("what does this mean?") or not.
Matt volunteered to carve that mess into distinct tickets so we can keep track of where we are.