Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

start-end highlight positions get messed up when long documents are edited #50

Closed
sati-bodhi opened this issue Jan 7, 2023 · 22 comments

Comments

@sati-bodhi
Copy link
Contributor

Though I am unable to come up with a reproducible scenario for this bug upfront, I've had experience of highlights being displaced when a relatively long (>600 lines) document is edited.

I have worked on similar issues before and the solution (or hack) I came up with is to perform a search for the precise string in question on the line or relevant region itself to double-check start-end positions whenever the document is edited.

This would probably take up (significantly) more computing resources, especially when highlights are many. But it fixes the problem.

I have not figured out why relative regions run awry easily, even when changes are traceable and meticulously synchronized.

@sati-bodhi
Copy link
Contributor Author

Here's a sample file to play with.

sample.txt

Change extension to .org

@nobiot
Copy link
Owner

nobiot commented Jan 8, 2023

I am working on a collective fix, which you can see in PR #51. Perhaps we can look at this issue you report here after we have merged it to the main branch.

You could help me by using the branch named dev/round-off and see if it changed anything for this problem. But you don't have to. I will leave the branch for a couple days for now to see if there is any glaring issue I have missed.

@sati-bodhi
Copy link
Contributor Author

@nobiot I've forked your repo but can't find a way to access the dev/round-off branch with it.

@sati-bodhi
Copy link
Contributor Author

It happens with static ebook buffers which don't allow editing as well.

@nobiot
Copy link
Owner

nobiot commented Jan 9, 2023

Have you managed to use dev/round-off branch?
I have a suspicion about what might be the cause; if it is correct, I have fixed it. The problem was the org-remark-save terminates in the middle of dolist with an error, and thus the highlights after the error won't be updated correctly.

@nobiot
Copy link
Owner

nobiot commented Jan 9, 2023

I'm testing the file you provided above (#50 (comment)). I cannot reproduce the issue.

I'm also testing with a larger text file (Alice in Wonderland, text version, https://www.gutenberg.org/files/11/11-h/11-h.htm). It has 2698 lines. I have no problem.

Is there anything you do to make the problem happen?

@sati-bodhi
Copy link
Contributor Author

Have you managed to use dev/round-off branch?

I wasn't able to test the new branch - git clone downloads just the main branch for testing. I've forked your repo without unchecking the "main branch only" option.

@sati-bodhi
Copy link
Contributor Author

I have yet to discover a systematic cause for the problem; but it happens to Chapter 11 of this ebook as well.

You can read it with nov using my pull request #54 version.
Here are my config.el settings that work with the pull request commit.

  ;; compatibility with org-noter
  (defun org-noter-get-epub-source ()
    "Returns the path of the epub source from which the present session is initiated."
    (if org-noter--session
        (concat
         (org-noter--session-property-text org-noter--session)
         "/"
         (file-name-base (cdr (aref nov-documents nov-documents-index))))
      nil))

  (add-to-list 'org-remark-source-find-file-name-functions 'org-noter-get-epub-source)

Highlights beyond "Infant Coping Behaviors" org-remark-beg: 5239 are consistently displaced by 40 points upon visiting a new chapter, adding notes and returning to it.

marginalia.txt

@sati-bodhi
Copy link
Contributor Author

I have yet to discover a systematic cause for the problem; but it happens to Chapter 11 of this ebook as well.

You can read it with nov using my pull request #54 version. Here are my config.el settings that work with the pull request commit.

  ;; compatibility with org-noter
  (defun org-noter-get-epub-source ()
    "Returns the path of the epub source from which the present session is initiated."
    (if org-noter--session
        (concat
         (org-noter--session-property-text org-noter--session)
         "/"
         (file-name-base (cdr (aref nov-documents nov-documents-index))))
      nil))

  (add-to-list 'org-remark-source-find-file-name-functions 'org-noter-get-epub-source)

Highlights beyond "Infant Coping Behaviors" org-remark-beg: 5239 are consistently displaced by 40 points upon visiting a new chapter, adding notes and returning to it.

marginalia.txt

This epub compatibility issue could be related, but could also be a different issue altogether. At least I am finally getting a (somewhat) consistent error at my side that is worth investigating.

@sati-bodhi
Copy link
Contributor Author

@nobiot, I've tested out the new dev/round-off branch with my work. The highlights have been stable thus far.

Kudos for your work!

@nobiot
Copy link
Owner

nobiot commented Jan 10, 2023

Great. Thanks for reporting back and good to hear it has been stable for real use like yours. I will first merge the dev/round-off later this week without the nov.el support first. As noted, Then I think we can work on yet another branch I just created dev/nov.el (branching off of the latest dev/round-off) just to contain the development efforts of nov.el.

Sorry to move you from one to another; I appreciate the fact you are new to Git and GitHub -- I'm also not used to using it to work with others at such a high pace like this. But I think nov.el support is not too far off after round-off.

@nobiot
Copy link
Owner

nobiot commented Jan 11, 2023

@sati-bodhi
I think I now know highlights are displaced only after "Infant Coping Behaviors" org-remark-beg: 5239.

It's the <table> tag that underlying HTML uses for TABLE 11.1 Infant Coping Behaviors.

Try having a narrow frame of Emacs and then open the same document in nov.el.

The program pads spaces when it renders a table in different ways according to the size of the frame, so the positions beyond the place where a table is rendered is not consistent if you change the viewing size...

Compare the two images below. I have turned on whitespace-mode to see the impact of padding but probably unnecessary. Just look at the line number displayed on the left fringe for the first raw of the table where you see "1. SIGNAL: Infant acts in a way that function to modify mother’s behavior".

Line number 68 vs 113.
The points at the end of the raw are different; this should be obvious as the narrow screen has more lines.

This explains the reason why only the highlights after this table gets displaced.

I am taking these screen shots on the latest dev/nov.el branch, which incorporates all the fixes in the dev/round-off branch.

I am not sure what would be the best course of action for this. Perhaps we might need to resort to storing and checking the beginning and ending strings to automatically adjust the points as you suggested before. At the moment, my capacity is running out.

For now, you would need to read epub books in a consistent Emacs frame size to keep the highlights consistent -- I believe that's what you have been doing as you say highlights have been consistently displayed thus far.

Screenshot from 2023-01-11 17-50-27

Screenshot from 2023-01-11 17-51-57

@nobiot
Copy link
Owner

nobiot commented Jan 11, 2023

As nov.el uses shr.el to render HTML, I believe this is the relevant function:

;; Table rendering is the only complicated thing here.  We do this by
;; first counting how many TDs there are in each TR, and registering
;; how wide they think they should be ("width=45%", etc).  Then we
;; render each TD separately (this is done in temporary buffers, so
;; that we can use all the rendering machinery as if we were in the
;; main buffer).  Now we know how much space each TD really takes, so
;; we then render everything again with the new widths, and finally
;; insert all these boxes into the main buffer.
(defun shr-tag-table-1 (dom)

@sati-bodhi
Copy link
Contributor Author

I think I have now know out highlights are displaced only after "Infant Coping Behaviors"

Yes, I've recently noticed this as well.

When I set nov-text-width to 80 and below, it slightly displaces the highlights under that table by 1 point - nothing as drastic as before, so I guess we can make-do with the present set up at it is.

Wonder how Calibre marks and keep record of its highlights with epubcfi behind the scene. Here's the a single record in the annot_data column of the annotations table in metadata.db.

annot_data      | {"end_cfi": "/2/4/224/2/1:29", "highlighted_text": "Beyond the Pleasure Principle", "notes": "Beyond the Pleasure Principle lies the Death Instinct.", "spine_index": 11, "spine_name": "text/part0010.html", "start_cfi": "/2/4/224/2/1:0", "style": {"kind": "color", "type": "builtin", "which": "yellow"}, "timestamp": "2022-10-05T06:22:08.295Z", "toc_family_titles": ["Part One: Origins", "2 Sigmund Freud: The Drive/Structure Model"], "type": "highlight", "uuid": "0AMX_848HXmFJRNANJGFew"}

My guess is the last 2 digits in start_cfi and end_cfi represent the y (i.e. line number) and x (i.e. column number, a single character each) positions in the rendered epub document, which should be affected by the above situation as well, without some form of remedy with the code that does the processing.

Curiously, besides highlighted_text above, there is a separate searchable text column which documents the same string.

searchable_text | Beyond the Pleasure Principle

This design seem to fit my conjecture about

storing and checking the beginning and ending strings to automatically adjust the points

over here #50 (comment)

@nobiot
Copy link
Owner

nobiot commented Jan 14, 2023

You can use the current dev/nov.el branch and try out the new feature to let Org-remark automatically adjust the position of highlights based on the new "org-remark-original-text" property.

You'd need to do two things:

  • Enable org-remark-nov-mode minor mode
  • Manually add "org-remark-original-text" property to the notes

First, I have introduced the minor mode and all the technical set up is hidden away from the end user within it. Loading the file does not change the behaviour of the system. I do the following; you can do something similar

(add-to-list 'load-path (expand-file-name "org-remark" my/repos))
(require 'org-remark-global-tracking)
(org-remark-global-tracking-mode +1)

(with-eval-after-load 'nov
  (require 'org-remark-nov)
  (org-remark-nov-mode +1))

Second, "org-remark-original-text" property is added to the notes only when you create a new notes. For your existing notes in Chapter 11 after the table, you'd need to manually add the original text prop.

image

The logic should work even when a text spreads across two lines.

But it does not smartly look at different styles of quotation mark. In your notes that you gave me, you have "infant’s coping behavior" highlighted, but the heading text has "Infant's Coping Behaviour" with a different apostrophe; you would need to have the exact match to the body of the text in the nov-mode buffer

image

@sati-bodhi
Copy link
Contributor Author

Second, "org-remark-original-text" property is added to the notes only when you create a new notes. For your existing notes in Chapter 11 after the table, you'd need to manually add the original text prop.

image

Wonder if it is feasible to combine org-remark-link and org-remark-original-text into a single property which holds the values from both as a single link: [[nov:/home/sati/Calibre Library/Ed Tronick/The Neurobehavioral and Social-Emotional Development of Infants and Children (Norton Series on (1040)/The Neurobehavioral and Social-Emotional D - Ed Tronick.epub::17:58794][Infant's coping behavior]].

The link-label (original value of org-remark-original-text) can be extracted with:

(let ((s "[[link-target][link-label]]"))
  (string-match org-bracket-link-regexp s)
   (match-string 3 s))

Presently, the default label given by org-insert-link is a very long string that does not provide any new or critical information we need: EPUB file at /path/to/file.epub.

Replacing it with the highlighted search text would make the properties drawer less cluttered and more concise - assuming users don't tend to make highlights that are too long; long strings do not look good on headings also.

To keep things clean, we could truncate the search string to no more than, say, 10 words for exceptional cases. But this may complicate matters, especially when working with CJK languages where word boundaries are not marked by spaces.

@nobiot
Copy link
Owner

nobiot commented Jan 22, 2023

Wonder if it is feasible to combine org-remark-link and org-remark-original-text into a single property which holds the values from both as a single link:

My current hunch is that they should be separate properties. Link descriptions can be overridden by the user, and probably should be. I think having a separate, dedicated original-text property makes the purpose explicit and clear (after documentation).

Presently, the default label given by org-insert-link is a very long string that does not provide any new or critical information we need

Replacing it with the highlighted search text would make the properties drawer less cluttered and more concise

This is a good point and I will consider the suggestion of truncating the string. I think this can be done independent of the original-text discussion above.

I personally have many highlights with a long string (a whole paragraph, for instance). This hasn't bothered me but your suggestion makes sense.

Usually, I hide the property drawers so "clutter" does not really bother me when I'm working with the marginal notes... But that's just my way. I am happy to see a real use case that's different from mine.

@sati-bodhi
Copy link
Contributor Author

You can use the current dev/nov.el branch and try out the new feature to let Org-remark automatically adjust the position of highlights based on the new "org-remark-original-text" property.

You'd need to do two things:

* Enable `org-remark-nov-mode` minor mode

* Manually add "org-remark-original-text" property to the notes

First, I have introduced the minor mode and all the technical set up is hidden away from the end user within it. Loading the file does not change the behaviour of the system. I do the following; you can do something similar

(add-to-list 'load-path (expand-file-name "org-remark" my/repos))
(require 'org-remark-global-tracking)
(org-remark-global-tracking-mode +1)

(with-eval-after-load 'nov
  (require 'org-remark-nov)
  (org-remark-nov-mode +1))

Second, "org-remark-original-text" property is added to the notes only when you create a new notes. For your existing notes in Chapter 11 after the table, you'd need to manually add the original text prop.

image

The logic should work even when a text spreads across two lines.

But it does not smartly look at different styles of quotation mark. In your notes that you gave me, you have "infant’s coping behavior" highlighted, but the heading text has "Infant's Coping Behaviour" with a different apostrophe; you would need to have the exact match to the body of the text in the nov-mode buffer

image

It happened that the text after the table somehow got displaced again, this time by 70 points. Is it sufficient to just set org-remark-original-text to the target, or do I have to manually adjust org-remark-beg and org-remark-end as well?

@nobiot
Copy link
Owner

nobiot commented Jan 23, 2023

If it is temporary adjustment, original-text should be enough. Org-remark looks for the string two paragraphs before and after to adjust the location.

Changing beg+end is a “permanent” move — moving the base location, if you will.

@sati-bodhi
Copy link
Contributor Author

If it is temporary adjustment, original-text should be enough. Org-remark looks for the string two paragraphs before and after to adjust the location.

Changing beg+end is a “permanent” move — moving the base location, if you will.

What happens if there are multiple instances of the same string in that region?

@nobiot
Copy link
Owner

nobiot commented Jan 24, 2023

It will temporarily move to the first match found.

@nobiot
Copy link
Owner

nobiot commented Aug 20, 2023

The position adjustment as described above has been implemented and released on v1.2.1. Closing this issue.

@nobiot nobiot closed this as completed Aug 20, 2023
This issue was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants