-
Notifications
You must be signed in to change notification settings - Fork 0
/
long_description.html
60 lines (58 loc) · 4.01 KB
/
long_description.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
<div class="row-fluid">
<div class="span12">
<p>
PDF Table Transcribe is a <strong>demo application</strong> for PyBossa that shows how you can
crowdsourcing a PDF table transcription problem.
</p>
<p>
This application uses the <a target="_blank" href="http://mozilla.github.com/pdf.js">Mozilla PDF.JS</a> library to load
an <a href="https://github.com/mozilla/pdf.js/wiki/Frequently-Asked-Questions#wiki-faq-xhr">external PDF file</a> and render it directly in the web browser <strong>without using any third party plugin</strong>.
</p>
<p>
By using PDF.JS, we have the possibility of rendering almost any PDF that is hosted under an <a href="https://github.com/mozilla/pdf.js/wiki/Frequently-Asked-Questions#wiki-faq-xhr">HTTP server</a> and then use a customized form to get the data that we want to extract from it .
</p>
</div>
</div>
<div class="row-fluid">
<div class="span12">
<p>
In this <strong>simple demo application</strong>, we <strong>load a PDF file</strong> in one side of the page, and in the other one <strong>a table</strong> where the volunteer will be able to transcribe, when there is a table, the table content by typing the text in the table cells. While this example is really simple, adapting the template to extract specific bits of information from the PDF will be really easy (you will only need to add more HTML input fields with instructions about what you want to extract from the PDF file). The idea is that you could be able for example to extract specific items from the documents, like captions, tabular data, authorship, institutions, etc.
</p>
<p>
<img class="img-polaroid span12" src="http://i.imgur.com/MrYT6oO.png">
</p>
<p>
<p>The provided script for creating the tasks is very simple: you only need to tell the script where is the PDF file hosted, the URL, and which pages you want to convert as tasks. By default, this demo explores the 14 pages of the example PDF file.</p>
</p>
<p><span class="label label-info">Info</span> You can download the PDF file used in this demo <a href="https://dl.dropboxusercontent.com/u/27667029/pdftabletranscribe.pdf">here</a>. As you can see, you can use your Dropbox Public folder to store the PDF files and transcribe them!</p>
</div>
</div>
<div class="row-fluid">
<div class="span12">
<p>
Based on the answer of the users, we will be able to transcribe the pages,
distributing the tasks (thanks to PyBossa) to different users and volunteers.
</p>
<p>
<span class="label label-warning">
<i class="icon icon-white icon-bullhorn"></i>
Note</span> If you want to learn more about how to use this application as a template,
check the:
<ul>
<li><a href="http://github.com/PyBossa/pdftabletranscribe">source code</a></li>
<li><a href="https://docs.google.com/spreadsheet/ccc?key=0AsNlt0WgPAHwdHdJZ0RZLXB1TjYxeU9rLVNGY1F4VkE&usp=sharing">Google Docs Spreadsheet Task template for the application</a></li>
<li><a href="http://docs.pybossa.com/">the official documentation of PyBossa</a> and </li>
<li><a href="http://docs.pybossa.com/en/latest/user/tutorial.html">the step by step tutorial.</a></li>
</ul>
</p>
<p>
Logo image courtesy of <a href="http://www.flickr.com/photos/mrmorodo/8174824430/">TempusVolat</a>
</p>
<hr>
<iframe src="http://ghbtns.com/github-btn.html?user=PyBossa&repo=pdftabletranscribe&type=watch&count=true&size=large" allowtransparency="true" frameborder="0" scrolling="0" width="170" height="30"></iframe>
<iframe src="http://ghbtns.com/github-btn.html?user=PyBossa&repo=pdftabletranscribe&type=fork&count=true&size=large" allowtransparency="true" frameborder="0" scrolling="0" width="170" height="30"></iframe>
</div>
</div>
<script type="text/javascript">
$("[rel=tooltip]").tooltip();
</script>