Permalink
Browse files

first PRS T1 -> Okular bride

  • Loading branch information...
1 parent d7eaa95 commit bef56e1779dcf53e8a26b3c0b03b426e3ce87a74 @nichtich committed Jan 2, 2012
Showing with 281 additions and 99 deletions.
  1. +80 −0 README.md
  2. +0 −32 directories.md
  3. +118 −65 prst1.pl
  4. +82 −0 share/notepad2okular.xsl
  5. +1 −2 share/notepad2svg.xsl
View
@@ -0,0 +1,80 @@
+# Introduction
+
+This code repository contains some documentation and scripts to make use of the
+Sony PRS T1 eBook reader on Linux. The device runs Android in it can be rooted
+but you can also just modify its local storage as it is visible via USB.
+
+# Motivation
+
+First I wanted to understand how my eReader manages eBooks and annotations.
+Second I want to read and annotate (!) books in the device and later export
+the annotations in an open format. Unfortunately the situation for annotations
+is even worse than the situation for eBooks. De-facto standards for eBooks are
+EPUB and PDF (plus Amazon's own prison format that I prefer to ignore). For
+annotations *there is no standard format*.
+
+# Annotation formats
+
+Some software such as Okular and Mendeley store annotations in special files.
+Other software stores annotations in the book files. PRS T1 stores annotations
+as SVG with metadata in SQLite3, so it can be extracted and transformed to other
+formats.
+
+# Current state of this software
+
+Don't expect this project to become a real application. Right now there is only
+one command line script `prst1.pl` to copy and transform books, notes and
+annotations from the device to a local directory (you may first need to create
+some directories, `database`, `notepads`, `download`, `markup`...)
+
+ $ ./prst1.pl -?
+
+A future idea woul include hacking calibre, but I prefer to to stick to one
+particular software but better work with general data formats.
+
+# Visbile directory structure of SONY PRS-T1
+
+ READER
+ |-- Sony_Reader
+ | |-- database
+ | | |-- cache
+ | | | |-- books
+ | | | |-- x (numeric book identifier)
+ | | | |-- thumbnail
+ | | | |-- markup
+ | | |-- sync
+ | |-- media
+ | | |-- audio
+ | | |-- images
+ | | |-- notepads
+ | | |-- books
+ | |-- data
+ | |-- albumthumbs
+ |-- download
+
+The `database` directory contains several SQLite3 files (`.db`) and
+a cache.
+
+The `cache` directory contains one folder for each book, numbered
+(1,2,3...). For each book there is a `thumbnail` directory with
+images, at least with the cover image `main_thumbnail.jpg`. The
+`markup` directory contains a `.png` file and a `.svg` file for
+each markup in a book (not including highlightings).
+
+The `media` directory contains actual ebooks, notes, audio-files,
+and images. Media files are referenced by filename in the sqlite3
+databases, so don't just rename them!
+
+# Database structure
+
+Have a look at the `dbschemas` directory or browse around the database with
+SQLite3:
+
+ $ ./prst1.pl database
+ $ sqlite3 database/books.db
+
+# License
+
+Right now this repository is all public domain. Feel free to fork and publish
+under other licenses as well.
+
View
@@ -1,32 +0,0 @@
-# Visbile directory structure of SONY PRS-T1
-
- READER
- |-- Sony_Reader
- | |-- database
- | | |-- cache
- | | | |-- books
- | | | |-- xxx
- | | | |-- thumbnail
- | | | |-- markup
- | | |-- sync
- | |-- media
- | | |-- audio
- | | |-- images
- | | |-- notepads
- | | |-- books
- | |-- data
- | |-- albumthumbs
- |-- download
-
-The `database` directory contains several SQLite3 files (`.db`) and
-a cache.
-
-The `cache` directory contains one folder for each book, numbered
-(1,2,3...). For each book there is a `thumbnail` directory with
-images, at least with the cover image `main_thumbnail.jpg`. The
-`markup` directory contains a `.png` file and a `.svg` file for
-each markup in a book (not including highlightings).
-
-The `media` directory contains actual ebooks, notes, audio-files,
-and images. Media files are referenced by filename in the sqlite3
-databases, so don't just rename them!
View
183 prst1.pl
@@ -10,28 +10,27 @@
use Getopt::Long;
use File::Basename;
use DBI;
+use Cwd qw(abs_path);
use Data::Dumper;
use Data::Tabular::Dumper;
use File::ShareDir;
+use IPC::Run qw(run);
# Get command line options and check environment
-my ($help,$man,$from,$to,$opt_database,$opt_notepads,$opt_books);
+my ($help,$man,$from,$to);
GetOptions(
'from:s' => \$from,
'to:s' => \$to,
- 'notepads' => \$opt_notepads,
- 'books' => \$opt_books,
- 'database' => \$opt_database,
'help|?' => \$help,
'man' => \$man,
) or pod2usage(2);
pod2usage(1) if $help;
pod2usage(-verbose => 2) if $man;
-if (@ARGV) {
- $opt_notepads = 1 if ( grep { $_ =~ /^notepads$/ } @ARGV );
- $opt_books = 1 if ( grep { $_ =~ /^books$/ } @ARGV );
- $opt_database = 1 if ( grep { $_ =~ /^database$/ } @ARGV );
+my %cmd = map { $_ => undef } qw(notepads books markups database okular);
+$cmd{database} = 1; # always needed
+foreach (@ARGV) {
+ $cmd{$_} = 1 if exists($cmd{$_});
}
checkdir( $from => '/media/READER' );
@@ -40,9 +39,11 @@
my $xsltproc = `which xsltproc` or fail('missing xsltproc');
my $notepad2svg = $App::Marginalia::SHAREDIR."/notepad2svg.xsl";
-r $notepad2svg or die "Missing $notepad2svg";
+my $notepad2okular = $App::Marginalia::SHAREDIR."/notepad2okular.xsl";
+-r $notepad2okular or die "Missing $notepad2okular";
# download and convert notepads
-if ($opt_notepads) {
+if ($cmd{notepads}) {
my $note_from = "$from/Sony_Reader/media/notepads";
my $note_to = "$to/notepads";
checkdir( $note_from );
@@ -51,6 +52,7 @@
print "notepads...\n";
foreach my $note_file (<$note_from/*.note>) {
my $id = basename($note_file,'.note');
+ system('cp', $note_file, "$note_to/" );
if ( system("xsltproc", "-o", "$note_to/$id.svg", $notepad2svg, "$note_file") ) {
print "processing $note_file with xslt failed: $?\n";
exit 2;
@@ -65,65 +67,116 @@
my $db_to = "$to/database";
checkdir($db_from);
-if ($opt_database) {
- print "databases...\n";
- `cp $db_from/*.db $db_to`;
- # `cp $db_from/sync/*.db $db_to/sync`;
+if ($cmd{database}) {
+ print "databases...\n";
+ `cp $db_from/*.db $db_to`;
+ # `cp $db_from/sync/*.db $db_to/sync`;
}
+# we always at least need some information about books
print "books...\n";
my %books;
my $dbh = dbconnect('books.db');
my $res = $dbh->selectall_hashref("SELECT * FROM books", 1);
foreach my $id ( sort { $a <=> $b } keys %$res ) {
- my $row = $res->{$id};
+ my $row = $res->{$id};
# print Dumper($row);
- my $from_file = $row->{file_path};
- my $filename = basename($from_file); # == $row->{file_name};
- my $to_file;
- if ($from_file =~ qr{^Sony_Reader/media/books/}) {
- $to_file = "books/$filename";
- } elsif ($from_file =~ qr{^download/} ) {
- $to_file = "download/$filename";
- } else {
- print STDERR "Skipping book $id with unknown location $from_file\n";
- }
-
- if ( $opt_books ) {
- $from_file = "$from/$from_file";
- system('cp',$from_file,$to_file);
- }
- print "$id,$filename\n"; # TODO: books.csv
-
- $books{$id} = {
- to_file => $to_file,
- };
- #print join(",", map { $row->{$_} } qw(_id author title file_path), ) . "\n";
- # thumbnail may be interesting too
+ my $from_file = $row->{file_path};
+ my $filename = basename($from_file); # == $row->{file_name};
+ my $to_file;
+ if ($from_file =~ qr{^Sony_Reader/media/books/}) {
+ $to_file = "books/$filename";
+ } elsif ($from_file =~ qr{^download/} ) {
+ $to_file = "download/$filename";
+ } else {
+ print STDERR "Skipping book $id with unknown location $from_file\n";
+ }
+
+ if ( $cmd{books} ) {
+ $from_file = "$from/$from_file";
+ system('cp',$from_file,$to_file);
+ }
+ # TODO: save to books.csv
+ print "$id,$filename\n";
+
+ $books{$id} = {
+ to_file => $to_file,
+ # TODO: thumbnail may be interesting too
+ };
}
-print "markups...\n";
-$res = $dbh->selectall_hashref("SELECT * FROM markups", 1);
-foreach my $id ( sort { $a <=> $b } keys %$res ) {
- my $row = $res->{$id};
-# print Dumper($row);
-
- my $book_id = $row->{content_id};
- my $page = int($row->{page} + 0.5);
- my $file = $row->{file1};
- my $type = $row->{markup_type};
- my $name = $row->{name};
- if ($type == 20) {
- print "book $book_id page $page\n";
- -d "$to/markup/$book_id" or `mkdir -p $to/markup/$book_id`;
- system('cp',"$from/$file","markup/$book_id/");
- my $filename = basename($file);
- print "book $book_id page $page: $filename\n";
- } else {
- print "ignoring markup type $type for book $book_id page $page\n";
- }
+if ($cmd{markups}) {
+ print "markups...\n";
+ $res = $dbh->selectall_hashref("SELECT * FROM markups", 1);
+ foreach my $id ( sort { $a <=> $b } keys %$res ) {
+ my $row = $res->{$id};
+ # print Dumper($row);
+
+ my $book_id = $row->{content_id};
+ my $page = int($row->{page} + 0.5);
+ my $file = $row->{file1};
+ my $type = $row->{markup_type};
+ my $name = $row->{name};
+ if ($type == 20) {
+ -d "$to/markup/$book_id" or `mkdir -p $to/markup/$book_id`;
+ system('cp',"$from/$file","markup/$book_id/");
+ my $filename = basename($file);
+ print "book $book_id page $page: $filename\n";
+ } else {
+ print "ignoring markup type $type for book $book_id page $page\n";
+ }
+ }
+}
+
+if ( $cmd{okular} ) {
+ print "okular...\n";
+ my $kdeprefix = `kde4-config --localprefix`;
+ $kdeprefix =~ s{/?\n$}{}m;
+ my $docdata = "$kdeprefix/share/apps/okular/docdata";
+ checkdir($docdata);
+
+ my $book_ids = $dbh->selectall_arrayref("SELECT DISTINCT content_id FROM markups WHERE markup_type = 20");
+ $book_ids = [ map { 1*$_->[0] } @$book_ids ];
+ foreach my $book_id (@$book_ids) {
+ my $book_url = abs_path($books{$book_id}->{to_file});
+ my $book_size = -s $book_url;
+ my $book_filename = basename($book_url); # TODO: distinguish download/books/other
+ my $outfile = "$docdata/$book_size.$book_filename.xml";
+
+ open (OKFILE, ">", $outfile);
+ print "$outfile\n";
+
+ my $sql = "SELECT page, added_date, file1 FROM markups WHERE markup_type=20 AND content_id=$book_id ORDER BY page";
+ $res = $dbh->selectall_arrayref($sql);
+
+ my $cur_page;
+print OKFILE <<XML;
+<?xml version="1.0" encoding="utf-8"?>
+<!DOCTYPE documentInfo>
+<documentInfo url="$book_url">
+ <pageList>
+XML
+ foreach my $row (@$res) {
+ my $page = int($row->[0]+0.5);
+ if (!defined $cur_page or $cur_page ne $page) {
+ print OKFILE " </page>\n" if defined $cur_page;
+ print OKFILE " <page number='".($page-1)."'>\n";
+ $cur_page = $page;
+ }
+ my $created = $row->[1];
+ my $file = "$from/$row->[2]";
+ # TODO: catch xslt errors
+ # TODO: stringparam created $created
+ #
+ run ["xsltproc", $notepad2okular, $file], ">>", \*OKFILE;
+ }
+ # additional information (history and current viewport) omitted (TODO)
+ print OKFILE " </page>\n" if defined $cur_page;
+ print OKFILE " </pageList>\n</documentInfo>\n";
+ close OKFILE;
+ }
}
### some handy functions
@@ -160,19 +213,21 @@ =head1 SYNOPSIS
Options [and default values]:
-from DIR base directory of eReader [/media/READER]
-to DIR base directory of target [current directory]
- -notepads convert and copy all notepads to target
- -books copy all boks to target
-help|-? brief help message
-man full documentation
Commands:
- notepads
- books
+ books copy all books to target
+ database copy all SQLite3 databases to target
+ notepads convert and copy all notepads to target
+ markups copy all markups to target
+ okular add annotations file for okular PDF reader
=head1 DESCRIPTION
This command line script loads some information from a Sony PRS T1 eReader
-device.
+device for further processing. See L<https://github.com/nichtich/sony-prs-t1>
+for more documentation and source code.
=head1 OPTIONS
@@ -184,10 +239,8 @@ =head1 OPTIONS
=item B<-to>
-Base directory to write information to. Defaults to current directory.
-
-=item B<-notepads>
+Base directory to write information to. Defaults to the current directory.
-Get notepads.
+=back
=cut
Oops, something went wrong.

0 comments on commit bef56e1

Please sign in to comment.