Skip to content

personads/chokitto

master
Switch branches/tags
Code

Latest commit

 

Git stats

Files

Permalink
Failed to load latest commit information.
Type
Name
Latest commit message
Commit time
lib
 
 
 
 
 
 
 
 
 
 

chokitto

Chokitto (チョキっと) is a minimal Python library for extracting highlights and annotations from your Kindle eReader.

  • Create a neat overview of all your notes and highlights in Markdown or JSON
  • Export annotations from side-loaded documents and library books
  • Use filters to extract only the information you need (e.g. title('Book No \d+', 'regex'))
  • Deduplicate entries and merge matching highlights and notes

Store your annotations in a unified and cleaner way for future reference, book clubs and literature reviews. 📚


Installation

Chokitto is written in Python3 (assumed to be the default interpreter for python) and only uses the standard library. Installation, merely involves cloning this repository:

git clone https://github.com/personads/chokitto
cd chokitto

Usage

By default, chokitto requires the path to the clippings file (e.g. /documents/My Clippings.txt on Kindle). It is then parsed, optionally filtered and exported (default: Markdown) to standard output:

python chokitto.py path/to/clippings
# for instance, a Kindle connected to a Mac
python chokitto.py "/Volumes/Kindle/documents/My Clippings.txt"

This will produce a Markdown document with all documents and their clippings sorted by type and location. The output can be written to a file by using a pipe or the -o / --output argument:

python chokitto.py path/to/clippings > path/to/output.md
python chokitto.py path/to/clippings -o path/to/output.md

The -v / --verbose option can be used to print additional parsing and filtering information. It is best used together with a pre-specified output file.

python chokitto.py path/to/clippings -o path/to/output.md -v

If you just want to take a quick look as to which documents the clipping file contains, use the -ls / --list option.

python chokitto.py path/to/clippings -ls

Chokitto will then parse the data and output an alphabetically sorted list of documents before exiting.

Documents (42 total):
  <Document: "A Great Book" by "Lastname, Name", 6 clippings>
  <Document: "Another Great Book" by "Lastname, Name", 12 clippings>
  <Document: "Unauthored Document", 5 clippings>
  ...

For additional information regarding basic usage, please refer to the help text which can be accessed using the -h / --help flag.

python chokitto.py -h

Parsers

Currently, only the KindleParser is available and enabled by default. It processes the My Clippings.txt file which contains the (slightly chaotic) highlights, annotations and bookmarks made in eBooks, PDFs and other documents on the eReader.

The parser can be explicitly specified by using the -p / --parsers argument:

python chokitto.py path/to/clippings -p "kindle" 

The library itself is written to accommodate any kind of parser which returns documents and clippings, so we hope to extend it in the future.

Merging

Kindle's default behavior is to write every clipping action to the My Clippings.txt file. This means that changing the span of a highlight will be produce two entries in the file with different lengths. Furthermore, notes which are added to a highlighted section are stored as separate entries and can be difficult to match and find.

By using the -m / --merge option, chokitto can attempt to remove duplicate entries and reconnect separated highlights and notes:

python chokitto.py path/to/clippings -m

This will produce merged clippings such as highlight+note which appear in the output as follows:

### Page 42, Location 4649-4650

>[Highlight] We are making a point here.

>[Note] They have a point.

Added around 2020-01-01 10:13:17.

Filters

Filters can be used to specify which documents and clippings to include in the output. They are specified using the filter('arg', 'arg') syntax or simply as filter if there are no arguments or if they are left at their default values. Any number of them can be combined using the -f / --filters option:

python chokitto.py path/to/clippings -f \
"title('One Great Book')" \
"type('highlight')" \
"after('2020-01-01 00:00:00')"

This will produce output which only includes highlights from "One Great Book" which were made after the beginning of 2020.

Filter by String

String filters can be applied to document titles and authors as well as to clipping types. They follow the syntax filter('Exact Match') and can be used together with regular expressions such as filter('One Great (Book|Document) \d+', 'regex').

Filtering by Document Title is done using title('Title'), e.g.:

python chokitto.py path/to/clippings -f "title('One Great Book')"
# filter for an entire series
python chokitto.py path/to/clippings -f "title('^Book No\. \d+', 'regex')"

Filtering by Document Author is done using author('Author'), e.g.:

python chokitto.py path/to/clippings -f "author('That Author')"
# filter for a family of authors
python chokitto.py path/to/clippings -f "author('Lastname, .+', 'regex')"

Filtering by Clipping Type is done using type('type'), e.g.:

python chokitto.py path/to/clippings -f "type('highlight')"
# use '+' to filter for merged types (remember to merge!)
python chokitto.py path/to/clippings -m -f "type('highlight+note')"
# use regular expressions to filter for multiple types
python chokitto.py path/to/clippings -f "type('(bookmark|note)', 'regex')"

Filter by Date and Time

Date filters can be useful for exporting more recent or older clippings depending on the time and date they were created. They follow the syntax filter('yyy-mm-dd hh:mm:ss').

# only return clippings created after a certain date
python chokitto.py path/to/clippings -f "after('2020-01-01 00:00:00')"
# only return clippings created before a certain date
python chokitto.py path/to/clippings -f "before('2020-01-01 00:00:00')"

Exporters

Exporters handle the formatting of the output. They are specified using the syntax exporter or exporter('arg', 'arg') if you want to change the default arguments. The default exporter is Markdown and it can be changed using the -e / --exporter option:

python chokitto.py path/to/clippings -e "markdown"

Markdown

The Markdown exporter will produce a document split into "# root → ## document → ### clipping type → #### clipping" sorted by location. If the output contains only a single document, the hierarchy shifts up one heading.

# One Great Book

Lastname, Name

## Bookmarks

### Page 11, Location 48

## Highlights

### Page 25, Location 1602-1603

> This part was especially interesting.

Added on 2020-01-01 2020-01-01 9:41:53.

## Highlights + Notes

### Page 42, Location 4649-4650

>[Highlight] We are making a point here.

>[Note] They have a point.

Added around 2020-01-01 10:13:17.

If you would like to change the date formatting or omit it entirely, there's an argument for that:

python chokitto.py path/to/clippings -e "markdown('%m.%d at %H:%M')"
# omit the timestamp entirely
python chokitto.py path/to/clippings -e "markdown('')"

JSON

The JSON exporter will produce a list of document objects containing a list of clipping objects. If the output contains only a single document, the document object is returned directly.

python chokitto.py path/to/clippings -e "json"

This produces an output akin to:

[
    {
        "title": "One Great Book",
        "author": "Lastname, Name",
        "clippings": [
            {
                "type": "bookmark",
                "page": 11,
                "location": 48,
                "datetime": "2020-01-01 8:20:12",
                "content": null,
            },
            {
                "type": "highlight",
                "page": 25,
                "location": [1602, 1603],
                "datetime": "2020-01-01 9:41:53",
                "content": "This part was especially interesting."
            }
        ]
    },
    {
        "title": "One More Great Book",
        "author": "Lastname, Name",
        "clippings": [
            {
                "type": "highlight+note",
                "page": 42,
                "location": [4649, 4650],
                "datetime": "2020-01-01 10:13:17",
                "content": [
                    "[highlight] We are making a point here.",
                    "[note] They have a point."
                ]
            }
        ]
    }
]

Similarly to the Markdown exporter, it is possible to change the date formatting or omit it entirely:

python chokitto.py path/to/clippings -e "json('%m.%d at %H:%M')"
# omit the timestamp entirely
python chokitto.py path/to/clippings -e "json('')"

PDFMerger (Experimental)

The PDFMergeExporter attempts to merge highlights and notes with a corresponding PDF document. This is especially useful for research papers.

As this involves some more advanced PDF parsing, it requires the installation of the MuPDF toolkit as well as its Python bindings.

# install MuPDF using your package manager of choice, e.g.:
brew install mupdf
# then install the python binding using pip
pip install PyMuPDF

Experimental Caveats

  • Only one document at a time can be merged, so please find it first using -ls and specify a filter -f which will only return the document in question.
  • Clippings from PDFs only provide vague locations, so matching highlights to the original document will work better the more specific the text is.
  • For the same reason it is difficult to match notes with the correct highlights. Chokitto will merge all potential matches so please remove the incorrect ones from the final output document.

To use the PDFMerger, specify pdfmerge as the exporter -e along with the path to the original PDF document (e.g. the file on Kindle). Use filters -f to retrieve this single PDF's clippings. The output will be printed to standard output or to the file specified in -o (recommended).

# find the unique document title using -ls
python chokitto.py path/to/clippings.txt -ls
# the recommended method for merging clippings and PDFs
python chokitto.py path/to/clippings.txt -v -m -f "title('pdf-title')" -e "pdfmerge('path/to/pdf-title.pdf')" -o path/to/output.pdf
# for example, use the data from a connected Kindle
python chokitto.py "/Volumes/Kindle/documents/My Clippings.txt" -v -m -f "title('pdf-title')" -e "pdfmerge('/Volumes/Kindle/documents/pdf-title.pdf')" -o path/to/output.pdf
# or pipe the output directly to disk
python chokitto.py path/to/clippings.txt -m -f "title('pdf-title')" -e "pdfmerge('path/to/pdf-title.pdf')" > path/to/output.pdf

The output will be the original PDF document plus highlights in yellow and corresponding text bubble style annotations.

About

Chokitto (チョキっと) is a Python library for exporting annotations from your Kindle.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages