docs/index.html

<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=ISO-8859-1">
<title>genCollectionInterface Documentation</title>
<meta author="R. Scotty Auble"></meta>
<date>
11-7-2009
</date>
</head>
<body>
<h1>genCollectionInterface</h1>
<h2>1.0 Introduction:</h2>
<p>genCollectionInterface (gCI) is a set of templates and html
generation tools written in python which produce a web browser-based
interface to a book collection. These tools were created during the
summer and fall of 2009 as part of the Rural Design Collective's Summer
Mentoring Program. The goal of the project was to enable and/or enhance
access to the Children's Book Collection of the Internet Archive on the
OLPC/XO laptop platform. The chosen solution was based on consideration
of the state of the OLPC platform and XO hardware, as well as usability
and "fun factor" for the end users (presumably children aged 5-15).</p>
<h2>2.0 Interface design:</h2>
<p>The collection interface is designed to be very accessible to
children at low reading levels while also providing a feature-rich
capability for readers. Books are organized and presented in topical
"categories" so that it is easy to find material on a topic of interest.
Any number of categories are supported by the tools, and the initial
page of the interface displays these (see fig 1). Note: the initial
category list page is NOT generated by the tools, it is hand-edited to
reflect what the tool generates. However the tool DOES output a category
list that references the generated pages and which is a good starting
point for creating the category page. The icons displayed with each
category are part of the template design and must be developed for each
new category added. The RDC design team spent a considerable effort
crafting the icons to be intuitive as well as to emulate the "look and
feel" of the XO/Sugar OS interface so that it would be familiar and
friendly to the users. The sidebar widget is also hand coded to reflect
the tool output.</p>
<p>Browsing a category, the titles are presented as icons derived
from a scan of the book cover or title page (see fig. 2). This is in
support of the way children are attracted to books by the colorful
covers and illustrations. The titles are displayed belos the icons, and
the author, date, description and other "meta data" are displayed as a
"tool tip" when hovering over the icon. Clicking the icon will allow the
book to be read, by various means depending on whether the laptop is
being used in an internet-connected or stand-alone environment. See
"Interface adaptability" below. While browsing the category, another
category can be selected using a navigation widget which is available on
the upper right of the page.</p>
<h3>2.1 Interface adaptability:</h3>
<p>The interface is adaptable depending on whether the laptop it is
being used on is connected to the internet or not. In the case of a
connected laptop, the user may read a book by either a) using an
embedded book reader which displays the books by downloading one jpg
image of a page at a time, or b) downloading a DJVU copy of the book in
it's entirety, which is then stored on the system for reading at any
later time including when the laptop is not connected. With option b)
the Read activity displays the book.</p>
<P>When no internet connection is available, an alternative
interface is provided, which allows the books to be stored on an
attached device such as a memory drive. In this case the interface
allows the books to be downloaded off the attached device and stored on
the system as DJVU files, which are read with the Read Activity.
Additional tools are supplied to download the books and covers files for
the collection, after which they are copied to the attached storage
device. See section 5.1 for details on how to create the attached
storage solution. NOTE: The DJVU file support on the XO platform and
Read activity is not fully tested, but problems were observed with
versions of Sugar less than .84. The attached storage solution was
lightly tested using Sugar .84 running in a virtual machine in Sun
VirtualBox and worked correctly.</P>
<p>Because of the adaptable interface the collection is fully
accessible on any platform with a web browser. Furthermore, if a DJVU
reader such as Evince is installed, the books can be read offline on any
platform as well. Finally, although the current version requires some
minor code modification to support it, any book format may be easily
supported for download by changing filenames and links either at
generation time or afterward.</p>
<h2>3.0 gCI Design and Architecture:</h2>
<p>gCI is designed as a single python program which uses a category
decription file, a csv file obtained from a search on an Internet
Archive collection, and an ID mapping file, along with several
templates, to generate an html and javascript interface file for each
category. These may be accessed from a server over the internet or
locally depending on connectivity and/or attached storage availablity.
The templates and data files are described below.</p>

<h3>3.1 Template Design</h3>
<p>The template consists of several html fragment files and included
javascript libraries. There are three "permanent" parts to the template:
headtmpl.html.tmpl, divtmpl.html.tmpl, and foottmpl.html.tmpl. These
files must be present in the directory where the tool is run.</p>
<h4>3.1.1 Header Template File</h4>
<p>The header template file consists of the html code for the final
output down to the div container for the book icons. The necessary
javascript libraries and css files are all specified in this section.
When the interface is installed on a server the libraries and css files
must all be in the correct location. If the interface is installed on an
attached storage device, the libraries are installed on the same
relative path structure. When the interface is generated, the $header
keyword in the template is replaced by the proper category name.</p>
<h4>3.1.2 Div Template File</h4>
<p>The divtmpl.html.tmpl file consists of a single div which will
contain the icon, the link to the book or bookreader, the title, and the
javascript code to invoke the tooltip when the icon is hovered over. For
each book in the category the keywords in the template are replaced with
the appropriate data which is taken from the input csv file. These are
as follows:</p>
<ul>
	<li>$idstr: the book id, used to complete a link to the book</li>
	<li>$iaclid: the IACL id, used to attach the tooltip, DOM id for
	the link</li>
	<li>$coverstr: the cover file name, to display image of cover
	<li>$title: the book title, display the book title</li>
</ul>
<h4>3.1.3 Footer Template File</h4>
<p>The footer template contains the code for the sidebar, and
terminating tags for the document.</p>
<h3>3.2 CSV Meta data search result file</h3>
<p>The input csv file supplies the metadata used for text
replacement of keywords in the div template to create links to the books
and the cover files, and also to sort the search results into categories
for the interface. In addition, tooltips describing the books are built
using the csv information and javascript.</p>
<P>The CSV file may be produced by a search using Advanced Search
supplied by Internet Archive. The search results used for the IACL
collection are included with the generator, but any other search results
may be used. When producing the CSV, the following fields should be
included IN THE ORDER SPECIFIED (unused fields should be present but may
be blank in the CSV). This order may seem strange until you realize this
CSV file came from a search on the Internet Archive using the advanced
search utility - to generate a different collection interface you could
get a different csv file using a different search; we searched for
"collection=iacl". When you set up a search, you specify which fields to
include in the output. Make sure you get at least the following fields,
or there will be problems when the interface is generated. Our search
used more fields, and they came out in the order below, but you could
easily play with the result in a spreadsheet program and remove unneeded
fields or add them in.:</P>
<ol>
	<li>Id (field 6): this field gets assigned as the div id and is
	used to attach the tooltip, must be unique.</li>
	<li>Title: (field 8)this field will be used as the title,
	displayed in full in the tool tip, truncated to eight words in the
	category display</li>
	<li>Author (field 2): this field is displayed only in the tooltip,
	when setting up IA search specify "creator"</li>
	<li>Description (field 4): This field is displayed as "details" in
	the tooltip only</li>
	<li>Subjects (field 7): this field is displayed in the tooltip. It
	is also searched for a match to an entry in the categories.txt input
	file during interface generation, and if it matches, this book will be
	entered in that category's output file.</li>
</ol>
<h3>3.3 Categories file</h3>
<p>The category description file contains the categories to be
generated as html files for the collection interface. It is up to the
user to determine the categories list and create the categories.txt
file. You must have a categories file even if it is empty. The format is
one category per line. We made ours by skimming through the subjects
field of the search output and listing out most of what we saw there.
Anything not matched from categories.txt will go into a catchall file
"other.html", so you will get some output no matter what. For every
category you'll create an icon to display in the sidebar widget, and
another to display on the opening page list of categories.</p>

<h3>3.4 ID Mapping file</h3>
<p>The ID mapping file contains a mapping from the Internet Archive
identifier to the Open Library Identifier. This is used in accessing the
covers for the books display, which are accessed via the Open Library
covers interface. A link to the cover is created by mapping the IA
identifier contained in the search result CSV file to the OLID, then
accessing the cover using the Open Library covers API. The IACL
collection mapping was supplied to the project by sources at the Open
LIbrary Project. To determine how to create such a mapping, refer to the
Open Library API documentation. The mapping file is formatted as
follows: &lt;OLID field&gt; &lt;IAID field&gt;, one entry per line This
file should be named iaclBookList.txt and is included in the run
directory.</p>

<h2>4.0 Installation</h2>
<p>Install the tools by downloading and unzipping the distribution
archive. It's recommended you do this is a dedicated directory, as lots
of files are going to be output without checking to see if they will
clobber anything.</p>
<h2>5.0 Usage</h2>
<p>As packaged, the tool will generate the collection interface for
web-based access. To generate an attached storage-based collection
interface, see section 5.1 below. Each time the tool is run, delete
*.html and *.js in the run directory, because the tool appends to the
output files if they exist. The usage is: <br>
<br></br>
python genCollectionInterface.py &lt;csvfile&gt; &lt;categoryfile&gt;<br></br>
<br></br>
To ensure the tool is working, you can make a run using our data as
follows: <br>
genCollectionInterface.py search.csv categories.txt</p>
<p>You'll see a bunch of messages as the files are parsed and output
is generated, when it's done there will be a whole bunch of html and js
files in the directory. The ones used for the interface are called
&lt;Category&gt;Category.html, for example "Adventure and
AdventurersCategory.html", or for javascript files,
&lt;Category&gt;ToolTip.js, for example "Adventure and
AdventurersToolTip.js" There's also a "catchall" category: "Other",
where anything not in the categories.txt will go. Lastly, there's an
"categories.html" which is simply a list of links to individual
categories files and may be used as a starting point for building a
"category list" page, as that is not autogenerated.</p>

<h3>5.1 Generating the attached storage interface</h3>
<p>To generate the attached storage solution so that the collection
may be accessed without a web connection, run the tool with the
"-attached-storage" option. This will generate output files with the
links set to download the books off the attached storage, which must be
mounted on /media/LIB. On the XO, this means that you are using a USB
flash drive for the attached storage and have named the volume LIB when
you created it. The djvu files must reside in the /djvu subdirectory on
the attached storage. There is a downloader tool included in the
transmittal archive that will download all the books and cover image
files in the collection, dl.py. Once the downloads are completed copy
all the books to the attached storage /media/LIB/djvu, and the covers to
/media/lib/covers.</p>
<p>This solution would probably work on many installations of linux,
but very likely you'd have to change the code to reflect other storage
naming conventions on other platforms, especially Windows where the
paths have drive letters.</p>

</body>
</html>