Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use Skim to extract notes and highlighted text #77

Open
jlegewie opened this issue Feb 20, 2013 · 7 comments
Open

Use Skim to extract notes and highlighted text #77

jlegewie opened this issue Feb 20, 2013 · 7 comments

Comments

@jlegewie
Copy link
Owner

Skim has command line tools that a) convert embedded PDF annotations to Skim notes and b) writes Skim notes from a PDF file to a .skim file. Using calls to these two command line tools should make it relatively easy to extract pdf annotations and highlighted text using skim. But this solution also only works on mac.

http://sourceforge.net/apps/mediawiki/skim-app/index.php?title=Interaction_with_Skim#SkimNotes_Command_Line_Tool

The popplerExtractorCall function in zotfile.js calls an external binary.

@jacob-long
Copy link

Is there anything I can do to help you with this, short of learning js myself?

@jlegewie
Copy link
Owner Author

jlegewie commented Apr 9, 2015

I am not going to work on this anytime soon but you can give it a shot if you want. I would start by checking out the Skim command line tools. What are the steps to get extracted annotations with skim command line tools (ignoring zotfile for now)? When that works I can give you some hints about adding that to zotfile.

@jacob-long
Copy link

Okay, I finally found some time to tinker. This method requires running terminal from the directory where the skimnotes folder is located, which as far as I can tell has to be downloaded separately from Skim.app. Link to download page

Using Skim's wiki as a guide, here is how I was able to get text extracted. The following are shell commands.

./skimnotes -format text example.pdf

This creates a text file in the same folder as example.pdf with notes/highlights. The *.txt file is formatted like this:

* Highlight, page 4
Some highlighted text.

* Highlight, page 8
Some other highlighted text.

Alternately, it can output to rtf:

./skimnotes -format rtf example.pdf 

Another option is to specify the output file name and location after the source file:

./skimnotes -format text example.pdf /users/johndoe/desktop/output.txt

A different approach entirely that is probably (without respect to whatever advantages it may or may not offer to implementation) not as ideal is possible via skimpdf. This comes with skimnotes so both tools are available whenever using one or the other. One function of skimpdf is embed, which basically takes the annotations from the source file and converts the file to a "normal" PDF with annotations.

Example:

./skimpdf embed example.pdf example-embed.pdf

After performing this operation, ZotFile was able to extract annotations from example-embed.pdf. With that said, example-embed.pdf is not an ideal file if the user plans to use Skim to work with it again, since the Skim notes are no longer present. If this approach is easier to implement, though, you could potentially have ZotFile create a temporary embedded PDF, extract annotations, then discard the embedded PDF. The end user wouldn't see any of this happening.

@jlegewie
Copy link
Owner Author

Looks like good progress. Implementing the whole thing in zotfile would require quit some work in javascript. Is that something you want to look at?

@shippy
Copy link

shippy commented Aug 5, 2015

I have lots of Skim annotations, a fondness for Zotero, and JS knowledge sitting idle. I can't seem to find any guide for Zotfile contributors / desired PR format -- @jlegewie, do you think you could point me where I should start, please?

@jlegewie
Copy link
Owner Author

jlegewie commented Aug 5, 2015

I don't really have a desired PR format but I am trying to guide contributes through the code so that their contributions fits in. jlongrc made some progress on using skim to extract annotations (see above). You should take a look at the pdfAnnotations class for some guidance on the zotfile side. Currently the extensions.zotfile.pdfExtraction.UsePDFJS option determines whether to use the poppler or the pdf.js method. But I would start by implementing this in the FF JavaScript Scratchpad. Set the environment to "Browser" (Environment -> Browser) and you have access to the zotero and zotfile function. So you can use Zotero.ZotFile.runProcess() to run a external process such as skimpdf. Does that help to get started?

@bjohas
Copy link
Contributor

bjohas commented Jul 18, 2018

Hello all,
just wondering whether anybody worked on Skim/zotfile integration!
Many thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants