Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

accented characters not displaying correctly #268

Closed
manprost opened this issue Aug 24, 2020 · 13 comments
Closed

accented characters not displaying correctly #268

manprost opened this issue Aug 24, 2020 · 13 comments

Comments

@manprost
Copy link

manprost commented Aug 24, 2020

Brief description of the issue.

Accented characters are showing garbled in message list and are not being seen by the filters (maybe related to #267).

How to reproduce the bug?

  1. Subscribe to a feed which contains messages with accented characters in the title (such as http://feeds.weblogssl.com/vayatele2).
  2. Update

What was the expected result?

The messages show the full title in the message list, for example: Las nueve mejores películas... If there's any filter expecting, for example, "películas" in the title, the message gets correctly filtered.

What actually happened?

The messages show an erroneous title: Las nueve mejores películas...

image

The messages don't get filtered.

Other information (logs, see Wiki)

Thank you.

@martinrotter
Copy link
Owner

Can you please test this development build and check bug is still present?

https://bintray.com/martinrotter/rssguard/download_file?file_path=rssguard-3.7.1-b36e84b-win64.7z

@martinrotter
Copy link
Owner

image

@manprost
Copy link
Author

Can you please test this development build and check bug is still present?

https://bintray.com/martinrotter/rssguard/download_file?file_path=rssguard-3.7.1-b36e84b-win64.7z

Hmmmm I've tried with rssguard-3.7.1-b36e84b-nowebengine-win64.7z but the issue is still present:

image

And I see that rssguard-3.7.1-b36e84b-win64.7z also has a build date of 8/21/20 9:50AM:

image

Thank you!

@martinrotter martinrotter self-assigned this Aug 24, 2020
@martinrotter martinrotter added Component-Message-List Status-Needs-Help Someone else must provide better info, testing or PR. Type-Defect This is BUG!!! labels Aug 24, 2020
@martinrotter
Copy link
Owner

Well, this is weird. I now tried exact same build and the issue just does not happen here. I am unable to reproduce it. :(

@martinrotter
Copy link
Owner

Did you try to start that development version with clean data folder? To make sure that messages are "fresh"? Simply unpack to different folder, make sure that database is empty on start, add the feed again and test.

@manprost
Copy link
Author

Hmmm, this is strange, this is what I see with the empty database:

image

Which is the same I see in my database:

image

As you can see, there's a message that's been correctly received ("El festival de Berlín"), but the other ones haven't ("La maldición de Bly").

The build date (21-aug, 3 days ago), is correct? The modified date for rssguard.exe is also 21-aug-2020...

Thank you!

@martinrotter
Copy link
Owner

@mpr0st Is encoding of your feed set to UTF-8?

@martinrotter
Copy link
Owner

OK, so I just commited this commit: 56c44a2 which should force unescape HTML for TITLE/AUTHOR of the messages. Please, download newest build with 56c44a2 in its version/name when it gets compiled (20 minutes from now) and test with totally clean data and let me know if it helped.

I couldn't still reproduce the issue on my side, but tried to blindly fix it, let me know. :)

@manprost
Copy link
Author

Sorry, I think I know why you can't reproduce the issue, my bad completely, I forgot to tell you about an important detail, I'm accessing the feed via Inoreader.

If I directly add the feed, it works perfectly:

image

But, via Inoreader the issue persists:

image

Can't tell you how sorry I am, rookie mistake.

Thank you.

@martinrotter
Copy link
Owner

Holy sh. :D That is in fact important, anyway, I will finalize it and check Inoreader too. :) I thought that these online service APIs sanitizy/decode all strings by themselves but it seems they do not.

@martinrotter martinrotter added this to the 3.7.1 milestone Aug 25, 2020
martinrotter pushed a commit that referenced this issue Aug 25, 2020
@martinrotter
Copy link
Owner

OK, I wrote completely new de-escaping mechanism which supports all html symbols.

I also enabled the mechanism for TITLE/AUTHOR fields from Inoreader messages. Please test dev. build f1195f7 when it compiles. Thank you and let me know.

PS: Contents (body) of the message is not "de-escaped", primarily because sometimes de-escaping of comples HTML code can lead to broken HTML display/artifacts, particularly if feed is semi-broken.

If you wish to enable de-escaping in message's contents/body too, let me also know.

@martinrotter martinrotter removed the Status-Needs-Help Someone else must provide better info, testing or PR. label Aug 25, 2020
@manprost
Copy link
Author

Now it works like a charm.

Regarding the body, I've seen the accented characters correctly at all times, so I guess there's nothing to adjust here.

Thank you very much.

@martinrotter
Copy link
Owner

@mpr0st Yes, I would bet that the Inoreader API sanitizes the content. OK, in that case this ticket is solved. I will probably release bug-fix 3.7.1 version soon. It will not introduce any drasticly new features, but bug fixes are there and quite a lot of them already.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants