New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Import/export annotations to org file. #133

Closed
wants to merge 4 commits into
from

Conversation

Projects
None yet
5 participants
@machc

machc commented Nov 1, 2015

This allows to export/import the PDF annotations to/from an org file.
Some information (such as created/modified time) may be missing after
an import, but the contents, type, color, and positions are working.

The motivation is to be able to keep my notes apart from the pdfs, in order to easily backup and version control them. Besides, I'm using org-ref to manage bibliography; it allows quick access to org note files from any source, so it makes sense to use pdf-tools to do generate these files.

Known issues:

  • importer seems to merge multiple markup annotations in one when they're close to each other.

Future work (I plan to work on these when I find the time):

  • make the annotation type a tag in the org file
  • capture text of markup annotations and add it to the org file
Import/export annotations to org file.
This allows to export/import the PDF annotations to/from an org file.
Some information (such as created/modified time) may be missing after
an import, but the contents, type, color, and positions are working.

pinguim06 added some commits Nov 10, 2015

Fix double lines in highlight importing.
Annotations edges are saved in the pdf, but we need to use the
corresponding region for some tasks. This implements a little hack to
guess the region from the edges.
@politza

This comment has been minimized.

Show comment
Hide comment
@politza

politza Nov 11, 2015

Owner

Carlos Pinguim notifications@github.com writes:

This allows to export/import the PDF annotations to/from an org file.

I tend to think, that this kind of code should live in it's own package.

Known issues:

  • What happens if annotations are imported multiple times ?

Btw. you could use `pdf-annot-activate-handler-functions' in-order to
pop to the org buffer.

Owner

politza commented Nov 11, 2015

Carlos Pinguim notifications@github.com writes:

This allows to export/import the PDF annotations to/from an org file.

I tend to think, that this kind of code should live in it's own package.

Known issues:

  • What happens if annotations are imported multiple times ?

Btw. you could use `pdf-annot-activate-handler-functions' in-order to
pop to the org buffer.

@machc

This comment has been minimized.

Show comment
Hide comment
@machc

machc Nov 16, 2015

  • What happens if annotations are imported multiple times ?

I think it breaks. I was focusing more on the exporting, sorry.

Anyway, I agree it's probably best to create a package just for this org exporting thing. Will do that when I find some time. For the time being, I've made some improvements, if anybody is interested.

machc commented Nov 16, 2015

  • What happens if annotations are imported multiple times ?

I think it breaks. I was focusing more on the exporting, sorry.

Anyway, I agree it's probably best to create a package just for this org exporting thing. Will do that when I find some time. For the time being, I've made some improvements, if anybody is interested.

@machc machc closed this Nov 16, 2015

@titaniumbones

This comment has been minimized.

Show comment
Hide comment
@titaniumbones

titaniumbones Nov 17, 2015

Contributor

Your pdf-annot-edges-to-region does an excellent job of guessing content
of highlighted region for me. @politza, have you looked at @pinguim's
last 2 commits? I think they may be worth including.

On 15/11/15 10:58 PM, Carlos Pinguim wrote:

  * What happens if annotations are imported multiple times ?

I think it breaks. I was focusing more on the exporting, sorry.

Anyway, I agree it's probably best to create a package just for this
org exporting thing. Will do that when I find some time. For the time
being, I've made some improvements
https://github.com/pinguim06/pdf-tools, if anybody is interested.


Reply to this email directly or view it on GitHub
#133 (comment).

Contributor

titaniumbones commented Nov 17, 2015

Your pdf-annot-edges-to-region does an excellent job of guessing content
of highlighted region for me. @politza, have you looked at @pinguim's
last 2 commits? I think they may be worth including.

On 15/11/15 10:58 PM, Carlos Pinguim wrote:

  * What happens if annotations are imported multiple times ?

I think it breaks. I was focusing more on the exporting, sorry.

Anyway, I agree it's probably best to create a package just for this
org exporting thing. Will do that when I find some time. For the time
being, I've made some improvements
https://github.com/pinguim06/pdf-tools, if anybody is interested.


Reply to this email directly or view it on GitHub
#133 (comment).

@myrjola

This comment has been minimized.

Show comment
Hide comment
@myrjola

myrjola Nov 22, 2015

I did some changes to make this support my workflow in https://gist.github.com/myrjola/15585e3461b4d3178953. Most notably I made the export insert the pdf outline as org-headings and extracting images of square annotations and inlining them. I very often want to grab certain graphs or images to my notes.

I'm looking forward to the package you were speaking about @pinguim06. Maybe that can be made to support different export formats.

There's still much room for improvement. Here's an example of my output format:

* Annotations from stallings_operating_systems

**** 3.1 What Is a Process?, 130

[[file:annot-130-3.png]]

([[pdfview:/Users/martin/Dropbox/bibliography/bibtex-pdfs/stallings_operating_systems.pdf::130++0.41333][stallings_operating_systems]], 130)

It is inefficient for applications to be written directly for a given hardware
([[pdfview:/Users/martin/Dropbox/bibliography/bibtex-pdfs/stallings_operating_systems.pdf::130++0.532129][stallings_operating_systems]], 130)

The processor itself provides only limited support for multiprogramming.
Software is needed to manage the sharing of the processor and other
resources by multiple applications at the same time.
c. When multiple applications are active at the same time, it is necessary to
([[pdfview:/Users/martin/Dropbox/bibliography/bibtex-pdfs/stallings_operating_systems.pdf::130++0.635509][stallings_operating_systems]], 130)

**** 3.2 Process States, 132

which the execution of an application corresponds to the existence of one or more
([[pdfview:/Users/martin/Dropbox/bibliography/bibtex-pdfs/stallings_operating_systems.pdf::131++0.243713][stallings_operating_systems]], 131)

Process Control Blocks
([[pdfview:/Users/martin/Dropbox/bibliography/bibtex-pdfs/stallings_operating_systems.pdf::131++0.312762][stallings_operating_systems]], 131)

now begins to execute.
Thus, we can say that a process consists of program code and associated data
plus a process control block. For a single-processor computer, at any given time, at
([[pdfview:/Users/martin/Dropbox/bibliography/bibtex-pdfs/stallings_operating_systems.pdf::132++0.756452][stallings_operating_systems]], 132)

etc...

myrjola commented Nov 22, 2015

I did some changes to make this support my workflow in https://gist.github.com/myrjola/15585e3461b4d3178953. Most notably I made the export insert the pdf outline as org-headings and extracting images of square annotations and inlining them. I very often want to grab certain graphs or images to my notes.

I'm looking forward to the package you were speaking about @pinguim06. Maybe that can be made to support different export formats.

There's still much room for improvement. Here's an example of my output format:

* Annotations from stallings_operating_systems

**** 3.1 What Is a Process?, 130

[[file:annot-130-3.png]]

([[pdfview:/Users/martin/Dropbox/bibliography/bibtex-pdfs/stallings_operating_systems.pdf::130++0.41333][stallings_operating_systems]], 130)

It is inefficient for applications to be written directly for a given hardware
([[pdfview:/Users/martin/Dropbox/bibliography/bibtex-pdfs/stallings_operating_systems.pdf::130++0.532129][stallings_operating_systems]], 130)

The processor itself provides only limited support for multiprogramming.
Software is needed to manage the sharing of the processor and other
resources by multiple applications at the same time.
c. When multiple applications are active at the same time, it is necessary to
([[pdfview:/Users/martin/Dropbox/bibliography/bibtex-pdfs/stallings_operating_systems.pdf::130++0.635509][stallings_operating_systems]], 130)

**** 3.2 Process States, 132

which the execution of an application corresponds to the existence of one or more
([[pdfview:/Users/martin/Dropbox/bibliography/bibtex-pdfs/stallings_operating_systems.pdf::131++0.243713][stallings_operating_systems]], 131)

Process Control Blocks
([[pdfview:/Users/martin/Dropbox/bibliography/bibtex-pdfs/stallings_operating_systems.pdf::131++0.312762][stallings_operating_systems]], 131)

now begins to execute.
Thus, we can say that a process consists of program code and associated data
plus a process control block. For a single-processor computer, at any given time, at
([[pdfview:/Users/martin/Dropbox/bibliography/bibtex-pdfs/stallings_operating_systems.pdf::132++0.756452][stallings_operating_systems]], 132)

etc...
@politza

This comment has been minimized.

Show comment
Hide comment
@politza

politza Nov 22, 2015

Owner

Martin Yrjölä notifications@github.com writes:

I did some changes to make this support my workflow [...]

Btw, there is no need to go through pdf-view-mode etc, if all you
(people) want to extract some information from a PDF.

Use pdf-info-* instead. All functions representing server commands
accept a filename as final argument.

Owner

politza commented Nov 22, 2015

Martin Yrjölä notifications@github.com writes:

I did some changes to make this support my workflow [...]

Btw, there is no need to go through pdf-view-mode etc, if all you
(people) want to extract some information from a PDF.

Use pdf-info-* instead. All functions representing server commands
accept a filename as final argument.

@titaniumbones

This comment has been minimized.

Show comment
Hide comment
@titaniumbones

titaniumbones Nov 22, 2015

Contributor

That's really helpful @politza. I notice that pdf-info-getannots does not accept an optional list of annotation types, the way pdf-annot-getannots does. No big deal, just have to filter results with an if.

Contributor

titaniumbones commented Nov 22, 2015

That's really helpful @politza. I notice that pdf-info-getannots does not accept an optional list of annotation types, the way pdf-annot-getannots does. No big deal, just have to filter results with an if.

@machc

This comment has been minimized.

Show comment
Hide comment
@machc

machc Jan 9, 2016

The package is here: https://github.com/pinguim06/pdf-tools-org

@myrjola, I like your idea of using the pdf outline as org headings and including images, perhaps we could work on a merge.

machc commented Jan 9, 2016

The package is here: https://github.com/pinguim06/pdf-tools-org

@myrjola, I like your idea of using the pdf outline as org headings and including images, perhaps we could work on a merge.

@machc machc referenced this pull request Jan 16, 2016

Closed

Importation #1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment