-
Notifications
You must be signed in to change notification settings - Fork 4.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
A web link is not processed if it contains Cyrillic #8634
Comments
URL processing has been undergoing change in recent versions of N++. |
@sasumner I give an example of how to handle web links AkelPad editor |
NPP uses the following RegEx which needs to be changed: BTW, a This is a task for @guy038 the RegEx master. :) |
So, not any kind of new problem relating to the change to URL displaying, but rather a long-standing thing...that there are probably other issues already open for. |
@sasumner I noticed this error only today |
Not related to the new
Possibly. :) |
I would vote for making the URL regex changeable via settings dialog. |
I don't know enough about the issue. I do know that Visual Studio, Gmail and other editors interpret URLs correctly even if they contain non-Latin characters and/or certain characters (e.g. |
Replacing https://github.com/notepad-plus-plus/notepad-plus-plus. Feel free to test, improve and submit a PR. RegEx source: https://stackoverflow.com/questions/6038061/regular-expression-to-find-urls-within-a-string |
As I wrote: I don't know enough about the issue. |
Which may be why we currently have:
Of course in the IANA link I also see some numbers, |
@Yaron10 :-) and we've come full circle - I vote for ... :-) |
Well, I vote for it too. :) |
Only a opinion thrown in here, because I observe URL problems for a while know. There are two parts of the problem, completely independent of each other:
Until now, I did only get involved in the second part of the problem. I did never care about the detection. But, whenever I read about problems with that, the question occurs to me: Is a Regex expression really the right way to detect URLs? From inside C++ program code? Since you didn't manage to solve this for years, I would say: no. Is it so difficult to write a simple parser for it? I wonder, why no one ever seemed to have the idea to do this. And before you make this Regex configurable, you should consider, that there may be other solutions than a Regex expression for it. See #3353, #5029 and #7791 also. Edit: and #3912. |
@Yaron10 |
You're using your modified This may explain the significant size difference between the original DLL and yours. Can you build your modified
Searching for a better RegEx the other day, I've come across this idea. |
@Yaron10, With the Appveyor build, it doesn't. Here, only the original regex expression works. So there is an unnecessary difference between the Appveyor build and the original build, which should be fixed someday. Thank you, now I can run your regular expression against my parser. Edit: My own build of |
Please share if you figure it out. Thank you. |
Ok, did some reasearch and found this community discussion from 2017 which links to this github issue. And a new regex But then it was closed with comment: requirement could not be fulfilled. I did a test and it looks like it finds what needs to be found. |
I'm not one to play regex golf, but the regex @Ekopalypse mentions just above seems a bit, well, confuscated. This would seem to cover it, and be simpler (although I did no testing):
Normally, it doesn't matter, but if changing something this "important", it should be done so that someone in the future doesn't have to puzzle about why it was made "overcomplicated". Note also that neither of these 2 regex handle the case of a URL wrapped in parentheses/brackets/etc. correctly; examples:
People (including me!) like to put URLs in parentheses; currently I have to remember to do it this way:
Perhaps Another point: Is there no Unicode possibility before the |
The |
That means you would NOT want them included in the URL, correct? |
Correct.
I hope that is NOT correct. :) |
No, according to https://tools.ietf.org/html/rfc3986#section-3.1, to quote 3.1. Scheme and according to https://tools.ietf.org/html/rfc2234#section-6.1 ... ... |
@Yaron10, I can now build a The mechanism is, that Scintilla provides a class for regex operations, which can be left as it is or be overridden with own regex functions. The code to do this is contained in To add the boost regex stuff to the project, you have to have the boost library installed on your computer. I downloaded a precompiled version from
After you can build it, you have to turn it on by defining the preprocessor symbol By the way, I'm happy that the standard regular expression used for URL's here comes out with the regex syntax naturally provided by the original |
Thank you for the detailed guide. |
@Yaron10 look at this ones (They don't get through your regular expression): |
Yes, we're looking forward to your parser. :) BTW, the second link is not interpreted correctly here (in your post) either. |
I did a test to see if there is a performance impact and it seems that this
is ~3% faster than the one currently used and has the advantage to find unicode letters as well. |
@Uhf7, |
@Yaron10, @Ekopalypse I revised my parser version to exclude your nice example from being detected completely as URL. What I changed: I allow quotes only in the query part of the URL. And the query part is scrutinized a little closer and has to be at least similar to the "official" query format (which seems not to be complied with in every case). If you still want to test, here is my current source file again. |
👍 Interestingly, Firefox opens |
Minor: is |
Yes, |
Just to make sure there's no misunderstanding here: I meant Thank you. |
Now I see it. Thank you again. I'm going to exclude this too. |
👍 |
I did the following modifications:
|
Thanks for that too. 👍 |
Has anyone new strange text passages for me, which should or should not be detected as URL? |
I'm done. :) It took some time but the result is great. |
thank you for testing, I will PR the new detection method for URLs soon. |
Yaron10, this is, because standard indicators are supposed to complement syntax highlighting, not to replace it. So I find it correct, that it becomes multi-colorized while not hovered. That only a part of the link is displayed blue while hovered, seems to be another Scintilla problem. I will search it as soon as I can. It was there before my modifications, in the But, damage control: On double click, the whole URL is used, and the full box style is not affected by this. And if the URL is syntactically correct in the document, it should be single color anyway. I bet, that Notepad++ is really the first application, which is using the standard indicators to highlight URLs, although exactly this is recommended in ScintillaDoc.html (search "URL " in it). |
👍
Thank you. But I'm not sure it should be worth it (mini-minor issue). You didn't refer to the first case which is slightly more significant: Comment a link in a C++ file. |
I did mean exactly the C++ example too in the 1st part of my answer: That's syntax highlighting, and the part of the line after And I see, that the MS VS does it not this way, it prioritizes the URL, but this seems not to be worth of imitation to me. |
No need to apologize. It was so clear to me, that the underline should not be green, that it didn't occur to me to mention it :-). The underline is always in the default text color. Technically, it is not possible to make the underline color follow the syntax highlighting color. When not hovered, we have two options:
And: If the underline color would follow the current syntax highlighting color, it wouldn't unite/embrace the link anymore, it would fall apart optically even more in situations shown in your XML example. |
But then you came up with
This is a very good point. Just theoretically. Thanks again for your work. I appreciate it. |
I've just encountered a slightly more serious issue with Word-Wrap: ON. Should I open a new issue or you'd rather continue the discussion here? Thank you. |
thank you for testing thoroughly, another interesting effect. But this is basically the same Scintilla problem as with the hovered, syntax-highlighted links, which become blue only partially. The detection whether the character belongs to a hovered indicator or not doesn't work. I have a way to solve this filed as https://sourceforge.net/p/scintilla/bugs/2199/. Actually, I wanted to wait for an answer to this, before I create the PR to integrate it in the Notepad++ SciLexer.dll. But I will create it without answer, after an adequate waiting time. Edit: The 100th comment to this issue, we shouldn't loose the focus here, the whole thing was actually about the cyrillic characters in a link, I have to PR my parser suggestion someday. |
Thank you for the quick fix. 👍 Hopefully, last post here. |
Fix inaccurate URL detection and enhance URL detection for non-English character. Fix notepad-plus-plus#3912, fix notepad-plus-plus#3353, fix notepad-plus-plus#4643, fix notepad-plus-plus#5029, fix notepad-plus-plus#6155, fix notepad-plus-plus#7791, fix notepad-plus-plus#8634, close notepad-plus-plus#8921
A web link is not processed if it contains Cyrillic.
Notepad++ v7.8.6 (32-bit)
Build time : Apr 21 2020 - 15:17:06
Path : I:\Tools_Servis\TextCode\NPP++\npp.7.8.6.bin\notepad++.exe
Admin mode : ON
Local Conf mode : ON
OS Name : Windows 7 Home Premium (64-bit)
OS Build : 7601.0
Plugins : ComparePlugin.dll DSpellCheck.dll Explorer.dll HexEditor.dll HTMLTag_unicode.dll ImgTag.dll JSMinNPP.dll LocationNavigate.dll mimeTools.dll MultiClipboard.dll NativeLang.dll NppConverter.dll NppExport.dll NppMarkdownPanel.dll NppSnippets.dll PreviewHTML.dll ShtirlitzNppPlugin.dll Tidy2.dll VisualStudioLineCopy.dll WebEdit.dll WindowManager.dll XMLTools.dll _CustomizeToolbar.dll
The text was updated successfully, but these errors were encountered: