Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Click on a faulty link generates some error and tracebacks #2256

Closed
dodrg opened this issue Dec 18, 2022 · 8 comments
Closed

Click on a faulty link generates some error and tracebacks #2256

dodrg opened this issue Dec 18, 2022 · 8 comments
Labels
Milestone

Comments

@dodrg
Copy link

dodrg commented Dec 18, 2022

Description
Somehow it happend that Zim's automatic link-generation produced a faulty link with the resulting source code:
[[ #zim-pageview| ''#zim-pageview'']]

Simplifying the faulty link:
[[ #test]]

Please note the leading space before the hash in the link part of the code.

Reproduction
Copy this to the source code of a Zim page to reproduce the problem.

Annotation
I know this is not a valid link and should not appear. But I managed it somehow that the auto-linking of Zim produced the first link while copy/pasting the verbatim marked text. This shall be an issue for another bug report, as currently I cannot reproduce it.

The reported and reproducible issue is that the handling of the invalid link generates an error dialog with a traceback.

Commented Debug Information

Handling the faulty link resulted in more errors and tracebacks as it has been visible to me at the first glance:

  1. When just hovering over the link, the typical bubble note with the link destination does not appear.

This is the debug output of the hover action:

Traceback (most recent call last):
  File "/usr/lib/python3/dist-packages/zim/notebook/page.py", line 153, in makeValidPageName
    Path.assertValidPageName(newname)
  File "/usr/lib/python3/dist-packages/zim/notebook/page.py", line 139, in assertValidPageName
    raise AssertionError('Not a valid page name: %s' % name)
AssertionError: Not a valid page name: 

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/lib/python3/dist-packages/zim/gui/pageview/__init__.py", line 6222, in on_link_enter
    href = HRef.new_from_wiki_link(link['href'])
  File "/usr/lib/python3/dist-packages/zim/notebook/page.py", line 375, in new_from_wiki_link
    names = Path.makeValidPageName(href.lstrip('+')) if href else ""
  File "/usr/lib/python3/dist-packages/zim/notebook/page.py", line 155, in makeValidPageName
    raise ValueError('Not a valid page name: %s (was: %s)' % (newname, name))
ValueError: Not a valid page name:  (was:  )
Error in sys.excepthook:
Traceback (most recent call last):
  File "/usr/lib/python3/dist-packages/apport_python_hook.py", line 153, in apport_excepthook
    with os.fdopen(os.open(pr_filename,
FileNotFoundError: [Errno 2] No such file or directory: '/var/crash/_usr_bin_zim.1000.crash'

Original exception was:
Traceback (most recent call last):
  File "/usr/lib/python3/dist-packages/zim/notebook/page.py", line 153, in makeValidPageName
    Path.assertValidPageName(newname)
  File "/usr/lib/python3/dist-packages/zim/notebook/page.py", line 139, in assertValidPageName
    raise AssertionError('Not a valid page name: %s' % name)
AssertionError: Not a valid page name: 

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/lib/python3/dist-packages/zim/gui/pageview/__init__.py", line 6222, in on_link_enter
    href = HRef.new_from_wiki_link(link['href'])
  File "/usr/lib/python3/dist-packages/zim/notebook/page.py", line 375, in new_from_wiki_link
    names = Path.makeValidPageName(href.lstrip('+')) if href else ""
  File "/usr/lib/python3/dist-packages/zim/notebook/page.py", line 155, in makeValidPageName
    raise ValueError('Not a valid page name: %s (was: %s)' % (newname, name))
ValueError: Not a valid page name:  (was:  )
DEBUG: Activate link:  #text
ERROR: Exception during activate-link((' #text', {'new_window': False}))
Traceback (most recent call last):
  File "/usr/lib/python3/dist-packages/zim/notebook/page.py", line 153, in makeValidPageName
    Path.assertValidPageName(newname)
  File "/usr/lib/python3/dist-packages/zim/notebook/page.py", line 139, in assertValidPageName
    raise AssertionError('Not a valid page name: %s' % name)
AssertionError: Not a valid page name: 
  1. When doing a click with the left mouse button on the link, an error dialog appears but no change in the page presentation.

The debug output for the left-click:

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/lib/python3/dist-packages/zim/gui/pageview/__init__.py", line 6610, in do_activate_link
    self._do_activate_link(link, hints)
  File "/usr/lib/python3/dist-packages/zim/gui/pageview/__init__.py", line 6619, in _do_activate_link
    href = HRef.new_from_wiki_link(link)
  File "/usr/lib/python3/dist-packages/zim/notebook/page.py", line 375, in new_from_wiki_link
    names = Path.makeValidPageName(href.lstrip('+')) if href else ""
  File "/usr/lib/python3/dist-packages/zim/notebook/page.py", line 155, in makeValidPageName
    raise ValueError('Not a valid page name: %s (was: %s)' % (newname, name))
ValueError: Not a valid page name:  (was:  )
DEBUG: Running ErrorDialog

The error dialog contained the identical traceback and additionally these details:

This is zim 0.75.1
Platform: posix
Locale: en_US UTF-8
FS encoding: utf-8
Python: (3, 8, 10, 'final', 0)
PyGObject: (3, 36, 0)

Some more details to the system with the error:

Zim package: the Debian package of Zim from your website (zim_0.75.1_all.deb).
The system is an Ubuntu 20.4 LTS based LinuxMint 20.3
  1. When doing a click with the right mouse button on that link the object window appears with the choice for actions reduced to a minimum:
    [Cut|Copy|Paste|Paste As Verbatim|Delete|<Separator>|Select all|Insert Emoji]
    Missing are all the link-specific entries:
    [Open|Open in a new Window|<Separator>|Cut Link|Copy Link|Edit Link|Remove Link]
    Also some entries are missing that are available for regular text and links:
    [Copy as ...|Copy link to this location|Move selected Text ...]

This debug output appears when doing the right-click:

Traceback (most recent call last):
  File "/usr/lib/python3/dist-packages/zim/notebook/page.py", line 153, in makeValidPageName
    Path.assertValidPageName(newname)
  File "/usr/lib/python3/dist-packages/zim/notebook/page.py", line 139, in assertValidPageName
    raise AssertionError('Not a valid page name: %s' % name)
AssertionError: Not a valid page name: 

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/lib/python3/dist-packages/zim/gui/pageview/__init__.py", line 6679, in do_populate_popup
    self._default_do_populate_popup(menu)
  File "/usr/lib/python3/dist-packages/zim/gui/pageview/__init__.py", line 6828, in _default_do_populate_popup
    href = HRef.new_from_wiki_link(link['href'])
  File "/usr/lib/python3/dist-packages/zim/notebook/page.py", line 375, in new_from_wiki_link
    names = Path.makeValidPageName(href.lstrip('+')) if href else ""
  File "/usr/lib/python3/dist-packages/zim/notebook/page.py", line 155, in makeValidPageName
    raise ValueError('Not a valid page name: %s (was: %s)' % (newname, name))
ValueError: Not a valid page name:  (was:  )
Error in sys.excepthook:
Traceback (most recent call last):
  File "/usr/lib/python3/dist-packages/apport_python_hook.py", line 153, in apport_excepthook
    with os.fdopen(os.open(pr_filename,
FileNotFoundError: [Errno 2] No such file or directory: '/var/crash/_usr_bin_zim.1000.crash'

Original exception was:
Traceback (most recent call last):
  File "/usr/lib/python3/dist-packages/zim/notebook/page.py", line 153, in makeValidPageName
    Path.assertValidPageName(newname)
  File "/usr/lib/python3/dist-packages/zim/notebook/page.py", line 139, in assertValidPageName
    raise AssertionError('Not a valid page name: %s' % name)
AssertionError: Not a valid page name: 

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/lib/python3/dist-packages/zim/gui/pageview/__init__.py", line 6679, in do_populate_popup
    self._default_do_populate_popup(menu)
  File "/usr/lib/python3/dist-packages/zim/gui/pageview/__init__.py", line 6828, in _default_do_populate_popup
    href = HRef.new_from_wiki_link(link['href'])
  File "/usr/lib/python3/dist-packages/zim/notebook/page.py", line 375, in new_from_wiki_link
    names = Path.makeValidPageName(href.lstrip('+')) if href else ""
  File "/usr/lib/python3/dist-packages/zim/notebook/page.py", line 155, in makeValidPageName
    raise ValueError('Not a valid page name: %s (was: %s)' % (newname, name))
ValueError: Not a valid page name:  (was:  )

Comments

I would not have expected that an invalid link results in an error but Zim will correct the code smoothly as also done in other parts of the software.

I would see the algorithm that decides to automatically convert a string into a link item in the task to also check the validity of a link item (assuring a consistent handling of links).

In contrast to the faulty anchor link a http link with the same leading space will not fail:
[[ http://test.net/path|http://test.net/path]]

Instead of opening the web browser this faulty link will create the new Zim pages

http
      test.netpath

But it will not raise an error. So there the code is correct, even though I'm not happy with the result.

@dodrg
Copy link
Author

dodrg commented Dec 18, 2022

First Analysis

Obvously Zim splits the link at the '#':
./zim/notebook/page.py Line 371

if '#' in href:
	href, anchor = href.split('#', 1)
	anchor = zim.formats.heading_to_anchor(anchor) # make valid achor string

In the case of the problematic anchor link 'href' = ' '
As this is not an "http://..." or similar, it should be assumed to be a local Zim page reference.
Regarding the method makeValidPageName
./zim/notebook/page.py Line 142 cont.

newname = _pagename_reduce_colon_re.sub(':', name.strip(':'))
newname = _pagename_invalid_char_re.sub('', newname)
newname = newname.replace('_', ' ')

Zim removes all unwanted characters, including '_' by _pagename_invalid_char_re
And substitutes ' ' with '_'
The final result of our scenario would be a new page named '_'.
But the test Path.assertValidPageName(newname) in Line 153 fails due to it's subsequent call of _pagename_invalid_char_re, not allowing '_' as valid character. So the result is an empty string.
=> This fails immediately.

The fault seems to be the disregard of the fact, that an anchor part of the link exists, that would result on an anchor link on the current page.

This rises the question how to handle faulty links inserted by purpose like

[[_*?=#test|TEST]]

This type of link has the same evidence as my original issue.

I personally would not call the test Path.assertValidPageName(newname) in Line 153 of page.py, but first rejoin the cleaned href with the cleaned anchor and then check for the result being a single # (or empty '' depending on the details while the join).

@introt
Copy link
Collaborator

introt commented Dec 19, 2022

Thanks for the very thorough bug analysis! I encountered this issue before in #1794, where I suggested the following patch to cover up the ValueError, but as you've pointed out, this doesn't cover the hover action.

diff --git a/zim/gui/pageview/__init__.py b/zim/gui/pageview/__init__.py
index 83c63c42..57a67568 100644
--- a/zim/gui/pageview/__init__.py
+++ b/zim/gui/pageview/__init__.py
@@ -6511,6 +6511,8 @@ class PageView(GSignalEmitterMixin, Gtk.VBox):
        def do_activate_link(self, link, hints):
                try:
                        self._do_activate_link(link, hints)
+               except ValueError:
+                       ErrorDialog(self, _('Invalid link: "%s"') % link).run() # T: error when invalid link clicked
                except:
                        zim.errors.exception_handler(
                                'Exception during activate-link(%r)' % ((link, hints),))

I personally would not call the test Path.assertValidPageName(newname) in Line 153 of page.py, but first rejoin the cleaned href with the cleaned anchor and then check for the result being a single # (or empty '' depending on the details while the join).

See #1942 - "We need something akin to a HRef.makeValidHRef method as well as a validator for paths."

Related issue: #965

@dodrg
Copy link
Author

dodrg commented Dec 21, 2022

Hi introt,

Exchanging the error dialog with traceback by an error dialog might look nicer, but the problem persists.

I agree with the #1942 that the whole path/url recognition is involved. So just making a quick'n dirty patch for this one point could break even more at the other hand. To get a new validator for this would be great and. By the way, I would be happy if things like "file:///my/path/with spaces.txt" and similar would autolink correctly...

Is there any dev info besides the source code?

@introt
Copy link
Collaborator

introt commented Dec 21, 2022

Hi,

Is there any dev info besides the source code?

Besides what's mentioned in the README, not much I'm afraid.

You can build the API docs with pydoctor --project-name=zim-desktop-wiki --project-version=0.75.1 --project-url=https://zim-wiki.org/ --html-viewsource-base=https://github.com/zim-desktop-wiki/zim-desktop-wiki/tree/develop --make-html --html-output=docs/api --project-base-dir="$(pwd)" --docformat=epytext --intersphinx=https://docs.python.org/3/objects.inv ./zim (we should probs add a GitHub Action for this)

The Tags and Anchors blueprint might have some relevant info: https://github.com/zim-desktop-wiki/zim-desktop-wiki/wiki/BlueprintTagsAndAnchors

As well as the comments on this PR: #1847

Hope this helps, happy hacking!

@jaap-karssenberg
Copy link
Member

Agree that in this case the whitespace should be ignored. Tempted to just put a strip() in new_from_wiki_link() to remove any preceding / trailing whitespace. But doubting whether this is the right thing to do. Could there be any use case where a link actually starts or ends with whitespace? Starting for sure not, as it makes the pagename invalid. Doubting the end - could anchors realistically end with whitespace? Maybe put lstrip() just to be sure.

More complex fix would be to put the same logic in the wiki parser and in relevant parts of the PageView such that the whitespace is never handed to the new_from_wiki_link() method in the first place.

Notes: also put dialog wrapper in place as suggested above for other cases where the name really is invalid

@jaap-karssenberg
Copy link
Member

Found a precedent in the commonmark spec, so leaning towards just applying strip(). Still think this should also be part of the parser, but no harm in also doing it in the new_from_wiki_link() as catch all.

See https://spec.commonmark.org/0.30/#example-509

@dodrg
Copy link
Author

dodrg commented Jan 11, 2023

Could there be any use case where a link actually starts or ends with whitespace?

Yes.
In Linux spaces, new lines and arbitary signs like *?+ are allowed as filenames.
Doing a drag 'n drop in Linux from the file manager to Zim will create a valid link.

Create the file in /TEMP/ :

echo "My new file" > ' leading space, some specials *?+; and a
new line with trailing space '

Doing a drag'n drop to Zim results in this view:

file:///srv/data/TEMP/ leading space%2C some specials %2A%3F%2B%3B and a%0Anew line with trailing space 

Please note that the space (front and end) is shown correctly. Unfortunatelly some special characters are substituted in the user's editor view.

Doing a copy/paste from Zim to elsewhere:

file:///TEMP/%20leading%20space%2C%20some%20specials%20%2A%3F%2B%3B%20and%20a%0Anew%20line%20with%20trailing%20space%20

The source of the Zim page:

[[file:///TEMP/%20leading%20space%0Aand%20new%20line%20%2B%20trailing%20space%20|''file:///TEMP/ leading space%0Aand new line %2B trailing space'' ]]
file:///TEMP/%20leading%20space%2C%20some%20specials%20%2A%3F%2B%3B%20and%20a%0Anew%20line%20with%20trailing%20space%20

A doubleclick to this link in Zim will try to open it:

  • If the content is a text file, it will be opened in the text editor.
  • If the content is some HTML, it will be opened by the web browser.
  • If the content is an executable (and the x-bit is set) it will fail.
DEBUG: Activate link: file:///TEMP/ leading space%2C some specials %2A%3F%2B%3B and a%0Anew line with trailing space 
DEBUG: open_file(/TEMP/ leading space, some specials *?+; and a
new line with trailing space , None)
INFO: Spawning: ('xdg-open', 'file:///TEMP/%20leading%20space%2C%20some%20specials%20%2A%3F%2B%3B%20and%20a%0Anew%20line%20with%20trailing%20space%20') (cwd: /home/drg)
DEBUG: Process started with PID: 940378
gio: file:///TEMP/%20leading%20space%2C%20some%20specials%20%2A%3F%2B%3B%20and%20a%0Anew%20line%20with%20trailing%20space%20: No application is registered as handling this file
DEBUG: Running ErrorDialog
DEBUG: Could not open: file:///TEMP/%20leading%20space%2C%20some%20specials%20%2A%3F%2B%3B%20and%20a%0Anew%20line%20with%20trailing%20space%20
NoneType: None
ERROR: Could not open: file:///TEMP/%20leading%20space%2C%20some%20specials%20%2A%3F%2B%3B%20and%20a%0Anew%20line%20with%20trailing%20space%20

NOTE: In Linux, the extension of the filename is not relevant in most cases. These examples have been tested without extension.

Comment

Usually I handle Zim notebook content in the editor view (almost never I'm using the HTML-server). So the limitations of local links in a web view are not relevant for my usage case. But having an easy way to integrate files is relevant.

  • It is nice to have almost full support of the filesystem's possibilities (spaces, etc.)
  • It would be nice to have an even better "pretty view" by having no substitution signs if not necessary of for Zim.
  • Having the choice to opt for copying the file locally to the page into the structure of the notebook would be great (but is beyond the scope of this bug report).

=> Using strip() would reduce the filesystem support.

My suggestion is still

  • to query the content of the anchor if the URL results to an empty string.
  • This should also be the behavior with a not empty URL.
  • If URL and anchor are empty the link attempt should finish silently with an unlinked result (and tell "nothing to link" in the debug log)

@jaap-karssenberg
Copy link
Member

jaap-karssenberg commented Jan 11, 2023 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants