Click on a faulty link generates some error and tracebacks #2256

dodrg · 2022-12-18T04:30:08Z

Description
Somehow it happend that Zim's automatic link-generation produced a faulty link with the resulting source code:
[[ #zim-pageview| ''#zim-pageview'']]

Simplifying the faulty link:
[[ #test]]

Please note the leading space before the hash in the link part of the code.

Reproduction
Copy this to the source code of a Zim page to reproduce the problem.

Annotation
I know this is not a valid link and should not appear. But I managed it somehow that the auto-linking of Zim produced the first link while copy/pasting the verbatim marked text. This shall be an issue for another bug report, as currently I cannot reproduce it.

The reported and reproducible issue is that the handling of the invalid link generates an error dialog with a traceback.

Commented Debug Information

Handling the faulty link resulted in more errors and tracebacks as it has been visible to me at the first glance:

When just hovering over the link, the typical bubble note with the link destination does not appear.

This is the debug output of the hover action:

Traceback (most recent call last):
  File "/usr/lib/python3/dist-packages/zim/notebook/page.py", line 153, in makeValidPageName
    Path.assertValidPageName(newname)
  File "/usr/lib/python3/dist-packages/zim/notebook/page.py", line 139, in assertValidPageName
    raise AssertionError('Not a valid page name: %s' % name)
AssertionError: Not a valid page name: 

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/lib/python3/dist-packages/zim/gui/pageview/__init__.py", line 6222, in on_link_enter
    href = HRef.new_from_wiki_link(link['href'])
  File "/usr/lib/python3/dist-packages/zim/notebook/page.py", line 375, in new_from_wiki_link
    names = Path.makeValidPageName(href.lstrip('+')) if href else ""
  File "/usr/lib/python3/dist-packages/zim/notebook/page.py", line 155, in makeValidPageName
    raise ValueError('Not a valid page name: %s (was: %s)' % (newname, name))
ValueError: Not a valid page name:  (was:  )
Error in sys.excepthook:
Traceback (most recent call last):
  File "/usr/lib/python3/dist-packages/apport_python_hook.py", line 153, in apport_excepthook
    with os.fdopen(os.open(pr_filename,
FileNotFoundError: [Errno 2] No such file or directory: '/var/crash/_usr_bin_zim.1000.crash'

Original exception was:
Traceback (most recent call last):
  File "/usr/lib/python3/dist-packages/zim/notebook/page.py", line 153, in makeValidPageName
    Path.assertValidPageName(newname)
  File "/usr/lib/python3/dist-packages/zim/notebook/page.py", line 139, in assertValidPageName
    raise AssertionError('Not a valid page name: %s' % name)
AssertionError: Not a valid page name: 

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/lib/python3/dist-packages/zim/gui/pageview/__init__.py", line 6222, in on_link_enter
    href = HRef.new_from_wiki_link(link['href'])
  File "/usr/lib/python3/dist-packages/zim/notebook/page.py", line 375, in new_from_wiki_link
    names = Path.makeValidPageName(href.lstrip('+')) if href else ""
  File "/usr/lib/python3/dist-packages/zim/notebook/page.py", line 155, in makeValidPageName
    raise ValueError('Not a valid page name: %s (was: %s)' % (newname, name))
ValueError: Not a valid page name:  (was:  )
DEBUG: Activate link:  #text
ERROR: Exception during activate-link((' #text', {'new_window': False}))
Traceback (most recent call last):
  File "/usr/lib/python3/dist-packages/zim/notebook/page.py", line 153, in makeValidPageName
    Path.assertValidPageName(newname)
  File "/usr/lib/python3/dist-packages/zim/notebook/page.py", line 139, in assertValidPageName
    raise AssertionError('Not a valid page name: %s' % name)
AssertionError: Not a valid page name:

When doing a click with the left mouse button on the link, an error dialog appears but no change in the page presentation.

The debug output for the left-click:

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/lib/python3/dist-packages/zim/gui/pageview/__init__.py", line 6610, in do_activate_link
    self._do_activate_link(link, hints)
  File "/usr/lib/python3/dist-packages/zim/gui/pageview/__init__.py", line 6619, in _do_activate_link
    href = HRef.new_from_wiki_link(link)
  File "/usr/lib/python3/dist-packages/zim/notebook/page.py", line 375, in new_from_wiki_link
    names = Path.makeValidPageName(href.lstrip('+')) if href else ""
  File "/usr/lib/python3/dist-packages/zim/notebook/page.py", line 155, in makeValidPageName
    raise ValueError('Not a valid page name: %s (was: %s)' % (newname, name))
ValueError: Not a valid page name:  (was:  )
DEBUG: Running ErrorDialog

The error dialog contained the identical traceback and additionally these details:

This is zim 0.75.1
Platform: posix
Locale: en_US UTF-8
FS encoding: utf-8
Python: (3, 8, 10, 'final', 0)
PyGObject: (3, 36, 0)

Some more details to the system with the error:

Zim package: the Debian package of Zim from your website (zim_0.75.1_all.deb).
The system is an Ubuntu 20.4 LTS based LinuxMint 20.3

When doing a click with the right mouse button on that link the object window appears with the choice for actions reduced to a minimum:
[Cut|Copy|Paste|Paste As Verbatim|Delete|<Separator>|Select all|Insert Emoji]
Missing are all the link-specific entries:
[Open|Open in a new Window|<Separator>|Cut Link|Copy Link|Edit Link|Remove Link]
Also some entries are missing that are available for regular text and links:
[Copy as ...|Copy link to this location|Move selected Text ...]

This debug output appears when doing the right-click:

Traceback (most recent call last):
  File "/usr/lib/python3/dist-packages/zim/notebook/page.py", line 153, in makeValidPageName
    Path.assertValidPageName(newname)
  File "/usr/lib/python3/dist-packages/zim/notebook/page.py", line 139, in assertValidPageName
    raise AssertionError('Not a valid page name: %s' % name)
AssertionError: Not a valid page name: 

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/lib/python3/dist-packages/zim/gui/pageview/__init__.py", line 6679, in do_populate_popup
    self._default_do_populate_popup(menu)
  File "/usr/lib/python3/dist-packages/zim/gui/pageview/__init__.py", line 6828, in _default_do_populate_popup
    href = HRef.new_from_wiki_link(link['href'])
  File "/usr/lib/python3/dist-packages/zim/notebook/page.py", line 375, in new_from_wiki_link
    names = Path.makeValidPageName(href.lstrip('+')) if href else ""
  File "/usr/lib/python3/dist-packages/zim/notebook/page.py", line 155, in makeValidPageName
    raise ValueError('Not a valid page name: %s (was: %s)' % (newname, name))
ValueError: Not a valid page name:  (was:  )
Error in sys.excepthook:
Traceback (most recent call last):
  File "/usr/lib/python3/dist-packages/apport_python_hook.py", line 153, in apport_excepthook
    with os.fdopen(os.open(pr_filename,
FileNotFoundError: [Errno 2] No such file or directory: '/var/crash/_usr_bin_zim.1000.crash'

Original exception was:
Traceback (most recent call last):
  File "/usr/lib/python3/dist-packages/zim/notebook/page.py", line 153, in makeValidPageName
    Path.assertValidPageName(newname)
  File "/usr/lib/python3/dist-packages/zim/notebook/page.py", line 139, in assertValidPageName
    raise AssertionError('Not a valid page name: %s' % name)
AssertionError: Not a valid page name: 

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/lib/python3/dist-packages/zim/gui/pageview/__init__.py", line 6679, in do_populate_popup
    self._default_do_populate_popup(menu)
  File "/usr/lib/python3/dist-packages/zim/gui/pageview/__init__.py", line 6828, in _default_do_populate_popup
    href = HRef.new_from_wiki_link(link['href'])
  File "/usr/lib/python3/dist-packages/zim/notebook/page.py", line 375, in new_from_wiki_link
    names = Path.makeValidPageName(href.lstrip('+')) if href else ""
  File "/usr/lib/python3/dist-packages/zim/notebook/page.py", line 155, in makeValidPageName
    raise ValueError('Not a valid page name: %s (was: %s)' % (newname, name))
ValueError: Not a valid page name:  (was:  )

Comments

I would not have expected that an invalid link results in an error but Zim will correct the code smoothly as also done in other parts of the software.

I would see the algorithm that decides to automatically convert a string into a link item in the task to also check the validity of a link item (assuring a consistent handling of links).

In contrast to the faulty anchor link a http link with the same leading space will not fail:
[[ http://test.net/path|http://test.net/path]]

Instead of opening the web browser this faulty link will create the new Zim pages

http
      test.netpath

But it will not raise an error. So there the code is correct, even though I'm not happy with the result.

The text was updated successfully, but these errors were encountered:

dodrg · 2022-12-18T04:36:05Z

First Analysis

Obvously Zim splits the link at the '#':
./zim/notebook/page.py Line 371

if '#' in href:
	href, anchor = href.split('#', 1)
	anchor = zim.formats.heading_to_anchor(anchor) # make valid achor string

In the case of the problematic anchor link 'href' = ' '
As this is not an "http://..." or similar, it should be assumed to be a local Zim page reference.
Regarding the method makeValidPageName
./zim/notebook/page.py Line 142 cont.

newname = _pagename_reduce_colon_re.sub(':', name.strip(':'))
newname = _pagename_invalid_char_re.sub('', newname)
newname = newname.replace('_', ' ')

Zim removes all unwanted characters, including '_' by _pagename_invalid_char_re
And substitutes ' ' with '_'
The final result of our scenario would be a new page named '_'.
But the test Path.assertValidPageName(newname) in Line 153 fails due to it's subsequent call of _pagename_invalid_char_re, not allowing '_' as valid character. So the result is an empty string.
=> This fails immediately.

The fault seems to be the disregard of the fact, that an anchor part of the link exists, that would result on an anchor link on the current page.

This rises the question how to handle faulty links inserted by purpose like

[[_*?=#test|TEST]]

This type of link has the same evidence as my original issue.

I personally would not call the test Path.assertValidPageName(newname) in Line 153 of page.py, but first rejoin the cleaned href with the cleaned anchor and then check for the result being a single # (or empty '' depending on the details while the join).

introt · 2022-12-19T18:54:55Z

Thanks for the very thorough bug analysis! I encountered this issue before in #1794, where I suggested the following patch to cover up the ValueError, but as you've pointed out, this doesn't cover the hover action.

diff --git a/zim/gui/pageview/__init__.py b/zim/gui/pageview/__init__.py
index 83c63c42..57a67568 100644
--- a/zim/gui/pageview/__init__.py
+++ b/zim/gui/pageview/__init__.py
@@ -6511,6 +6511,8 @@ class PageView(GSignalEmitterMixin, Gtk.VBox):
        def do_activate_link(self, link, hints):
                try:
                        self._do_activate_link(link, hints)
+               except ValueError:
+                       ErrorDialog(self, _('Invalid link: "%s"') % link).run() # T: error when invalid link clicked
                except:
                        zim.errors.exception_handler(
                                'Exception during activate-link(%r)' % ((link, hints),))

I personally would not call the test Path.assertValidPageName(newname) in Line 153 of page.py, but first rejoin the cleaned href with the cleaned anchor and then check for the result being a single # (or empty '' depending on the details while the join).

See #1942 - "We need something akin to a HRef.makeValidHRef method as well as a validator for paths."

Related issue: #965

dodrg · 2022-12-21T00:33:47Z

Hi introt,

Exchanging the error dialog with traceback by an error dialog might look nicer, but the problem persists.

I agree with the #1942 that the whole path/url recognition is involved. So just making a quick'n dirty patch for this one point could break even more at the other hand. To get a new validator for this would be great and. By the way, I would be happy if things like "file:///my/path/with spaces.txt" and similar would autolink correctly...

Is there any dev info besides the source code?

introt · 2022-12-21T17:19:26Z

Hi,

Is there any dev info besides the source code?

Besides what's mentioned in the README, not much I'm afraid.

You can build the API docs with pydoctor --project-name=zim-desktop-wiki --project-version=0.75.1 --project-url=https://zim-wiki.org/ --html-viewsource-base=https://github.com/zim-desktop-wiki/zim-desktop-wiki/tree/develop --make-html --html-output=docs/api --project-base-dir="$(pwd)" --docformat=epytext --intersphinx=https://docs.python.org/3/objects.inv ./zim (we should probs add a GitHub Action for this)

The Tags and Anchors blueprint might have some relevant info: https://github.com/zim-desktop-wiki/zim-desktop-wiki/wiki/BlueprintTagsAndAnchors

As well as the comments on this PR: #1847

Hope this helps, happy hacking!

jaap-karssenberg · 2023-01-10T09:39:05Z

Agree that in this case the whitespace should be ignored. Tempted to just put a strip() in new_from_wiki_link() to remove any preceding / trailing whitespace. But doubting whether this is the right thing to do. Could there be any use case where a link actually starts or ends with whitespace? Starting for sure not, as it makes the pagename invalid. Doubting the end - could anchors realistically end with whitespace? Maybe put lstrip() just to be sure.

More complex fix would be to put the same logic in the wiki parser and in relevant parts of the PageView such that the whitespace is never handed to the new_from_wiki_link() method in the first place.

Notes: also put dialog wrapper in place as suggested above for other cases where the name really is invalid

jaap-karssenberg · 2023-01-10T15:20:25Z

Found a precedent in the commonmark spec, so leaning towards just applying strip(). Still think this should also be part of the parser, but no harm in also doing it in the new_from_wiki_link() as catch all.

See https://spec.commonmark.org/0.30/#example-509

dodrg · 2023-01-11T01:45:33Z

Could there be any use case where a link actually starts or ends with whitespace?

Yes.
In Linux spaces, new lines and arbitary signs like *?+ are allowed as filenames.
Doing a drag 'n drop in Linux from the file manager to Zim will create a valid link.

Create the file in /TEMP/ :

echo "My new file" > ' leading space, some specials *?+; and a
new line with trailing space '

Doing a drag'n drop to Zim results in this view:

file:///srv/data/TEMP/ leading space%2C some specials %2A%3F%2B%3B and a%0Anew line with trailing space

Please note that the space (front and end) is shown correctly. Unfortunatelly some special characters are substituted in the user's editor view.

Doing a copy/paste from Zim to elsewhere:

file:///TEMP/%20leading%20space%2C%20some%20specials%20%2A%3F%2B%3B%20and%20a%0Anew%20line%20with%20trailing%20space%20

The source of the Zim page:

[[file:///TEMP/%20leading%20space%0Aand%20new%20line%20%2B%20trailing%20space%20|''file:///TEMP/ leading space%0Aand new line %2B trailing space'' ]]
file:///TEMP/%20leading%20space%2C%20some%20specials%20%2A%3F%2B%3B%20and%20a%0Anew%20line%20with%20trailing%20space%20

A doubleclick to this link in Zim will try to open it:

If the content is a text file, it will be opened in the text editor.
If the content is some HTML, it will be opened by the web browser.
If the content is an executable (and the x-bit is set) it will fail.

DEBUG: Activate link: file:///TEMP/ leading space%2C some specials %2A%3F%2B%3B and a%0Anew line with trailing space 
DEBUG: open_file(/TEMP/ leading space, some specials *?+; and a
new line with trailing space , None)
INFO: Spawning: ('xdg-open', 'file:///TEMP/%20leading%20space%2C%20some%20specials%20%2A%3F%2B%3B%20and%20a%0Anew%20line%20with%20trailing%20space%20') (cwd: /home/drg)
DEBUG: Process started with PID: 940378
gio: file:///TEMP/%20leading%20space%2C%20some%20specials%20%2A%3F%2B%3B%20and%20a%0Anew%20line%20with%20trailing%20space%20: No application is registered as handling this file
DEBUG: Running ErrorDialog
DEBUG: Could not open: file:///TEMP/%20leading%20space%2C%20some%20specials%20%2A%3F%2B%3B%20and%20a%0Anew%20line%20with%20trailing%20space%20
NoneType: None
ERROR: Could not open: file:///TEMP/%20leading%20space%2C%20some%20specials%20%2A%3F%2B%3B%20and%20a%0Anew%20line%20with%20trailing%20space%20

NOTE: In Linux, the extension of the filename is not relevant in most cases. These examples have been tested without extension.

Comment

Usually I handle Zim notebook content in the editor view (almost never I'm using the HTML-server). So the limitations of local links in a web view are not relevant for my usage case. But having an easy way to integrate files is relevant.

It is nice to have almost full support of the filesystem's possibilities (spaces, etc.)
It would be nice to have an even better "pretty view" by having no substitution signs if not necessary of for Zim.
Having the choice to opt for copying the file locally to the page into the structure of the notebook would be great (but is beyond the scope of this bug report).

=> Using strip() would reduce the filesystem support.

My suggestion is still

to query the content of the anchor if the URL results to an empty string.
This should also be the behavior with a not empty URL.
If URL and anchor are empty the link attempt should finish silently with an unlinked result (and tell "nothing to link" in the debug log)

jaap-karssenberg · 2023-01-11T08:10:22Z

To start with the `strip()` should happen when parsing wiki links, not necessarily for all paths / URIs. Whitespace in the middle of a filename is a very common case indeed. However, a file path starting or ending with withspace is a special case in my mind that may require some escaping or character encoding to get it to work. By comparison, markdown parsing will also drop whitespace at the start or end of a URI.

…

-- Jaap

On Wed, Jan 11, 2023 at 2:45 AM dodrg ***@***.***> wrote: Could there be any use case where a link actually starts or ends with whitespace? Yes. In Linux spaces, new lines and arbitary signs like *?+ are allowed as filenames. Doing a drag 'n drop in Linux from the file manager to Zim will create a valid link. Create the file in /TEMP/ : echo "My new file" > ' leading space, some specials *?+; and a new line with trailing space ' Doing a drag'n drop to Zim results in this view: file:///srv/data/TEMP/ leading space%2C some specials %2A%3F%2B%3B and a%0Anew line with trailing space Please note that the space (front and end) is shown correctly. Unfortunatelly some special characters are substituted in the user's editor view. Doing a copy/paste from Zim to elsewhere: file:///TEMP/%20leading%20space%2C%20some%20specials%20%2A%3F%2B%3B%20and%20a%0Anew%20line%20with%20trailing%20space%20 The source of the Zim page: [[file:///TEMP/%20leading%20space%0Aand%20new%20line%20%2B%20trailing%20space%20|''file:///TEMP/ leading space%0Aand new line %2B trailing space'' ]] file:///TEMP/%20leading%20space%2C%20some%20specials%20%2A%3F%2B%3B%20and%20a%0Anew%20line%20with%20trailing%20space%20 A doubleclick to this link in Zim will try to open it: - If the content is a text file, it will be opened in the text editor. - If the content is some HTML, it will be opened by the web browser. - If the content is an executable (and the x-bit is set) it will fail. DEBUG: Activate link: file:///TEMP/ leading space%2C some specials %2A%3F%2B%3B and a%0Anew line with trailing space DEBUG: open_file(/TEMP/ leading space, some specials *?+; and a new line with trailing space , None) INFO: Spawning: ('xdg-open', 'file:///TEMP/%20leading%20space%2C%20some%20specials%20%2A%3F%2B%3B%20and%20a%0Anew%20line%20with%20trailing%20space%20') (cwd: /home/drg) DEBUG: Process started with PID: 940378 gio: file:///TEMP/%20leading%20space%2C%20some%20specials%20%2A%3F%2B%3B%20and%20a%0Anew%20line%20with%20trailing%20space%20: No application is registered as handling this file DEBUG: Running ErrorDialog DEBUG: Could not open: file:///TEMP/%20leading%20space%2C%20some%20specials%20%2A%3F%2B%3B%20and%20a%0Anew%20line%20with%20trailing%20space%20 NoneType: None ERROR: Could not open: file:///TEMP/%20leading%20space%2C%20some%20specials%20%2A%3F%2B%3B%20and%20a%0Anew%20line%20with%20trailing%20space%20 NOTE: In Linux, the extension of the filename is not relevant in most cases. These examples have been tested without extension. *Comment* Usually I handle Zim notebook content in the editor view (almost never I'm using the HTML-server). So the limitations of local links in a web view are not relevant for my usage case. But having an easy way to integrate files is relevant. - It is nice to have almost full support of the filesystem's possibilities (spaces, etc.) - It would be nice to have an even better "pretty view" by having no substitution signs if not necessary of for Zim. - Having the choice to opt for copying the file locally to the page into the structure of the notebook would be great (but is beyond the scope of this bug report). => Using strip() would reduce the filesystem support. My suggestion is still - to query the content of the anchor if the URL results to an empty string. - This should also be the behavior with a not empty URL. - If URL and anchor are empty the link attempt should finish silently with an unlinked result (and tell "nothing to link" in the debug log) If the user makes nonsense with special characters, IMHO this should be the users responsibility. In doubt, as the export filter to html / markdown has to check the code at conversion time, a restrictive link verification at edit time is to early in the production process, as all tests can be circumvented by source editing and Zim cannot know the final aims of the user. So only to warrant the save operation of Zim itself should preferably set the limits for the content. — Reply to this email directly, view it on GitHub <#2256 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAJQYHRKXX3OHWZCFA7YSJTWRYGEPANCNFSM6AAAAAATCJ4F2Q> . You are receiving this because you commented.Message ID: ***@***.***>

dodrg added bug needs triage labels Dec 18, 2022

introt mentioned this issue Dec 24, 2022

Links to headings and anchors lost upon export #2265

Closed

introt mentioned this issue Jan 7, 2023

Changing line with /// #2283

Closed

jaap-karssenberg removed the needs triage label Jan 10, 2023

jaap-karssenberg added this to the 0.75.2 milestone Jan 10, 2023

jaap-karssenberg closed this as completed in f4a33c7 May 30, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Click on a faulty link generates some error and tracebacks #2256

Click on a faulty link generates some error and tracebacks #2256

dodrg commented Dec 18, 2022 •

edited

dodrg commented Dec 18, 2022 •

edited

introt commented Dec 19, 2022

dodrg commented Dec 21, 2022

introt commented Dec 21, 2022

jaap-karssenberg commented Jan 10, 2023

jaap-karssenberg commented Jan 10, 2023

dodrg commented Jan 11, 2023 •

edited

jaap-karssenberg commented Jan 11, 2023 via email

Click on a faulty link generates some error and tracebacks #2256

Click on a faulty link generates some error and tracebacks #2256

Comments

dodrg commented Dec 18, 2022 • edited

dodrg commented Dec 18, 2022 • edited

introt commented Dec 19, 2022

dodrg commented Dec 21, 2022

introt commented Dec 21, 2022

jaap-karssenberg commented Jan 10, 2023

jaap-karssenberg commented Jan 10, 2023

dodrg commented Jan 11, 2023 • edited

jaap-karssenberg commented Jan 11, 2023 via email

dodrg commented Dec 18, 2022 •

edited

dodrg commented Dec 18, 2022 •

edited

dodrg commented Jan 11, 2023 •

edited