New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add dash(-) numeric character references #4502

Merged
merged 3 commits into from Jan 24, 2019

Conversation

Projects
None yet
3 participants
@sonix-github
Copy link
Contributor

sonix-github commented Jan 22, 2019

Add dash(-) numeric character references according to https://en.wikipedia.org/wiki/Dash

@Frenzie

This comment has been minimized.

Copy link
Member

Frenzie commented Jan 22, 2019

Could you please describe the specific problem this fixes? I ask specifically because of this utility function we have. :-)

koreader/frontend/util.lua

Lines 557 to 573 in b629881

--- Convert a Unicode codepoint (number) to UTF8 char
--
--- @int c Unicode codepoint
--- @treturn string UTF8 char
function util.unicodeCodepointToUtf8(c)
if c < 128 then
return string.char(c)
elseif c < 2048 then
return string.char(192 + c/64, 128 + c%64)
elseif c < 55296 or 57343 < c and c < 65536 then
return string.char(224 + c/4096, 128 + c/64%64, 128 + c%64)
elseif c < 1114112 then
return string.char(240 + c/262144, 128 + c/4096%64, 128 + c/64%64, 128 + c%64)
else
return util.unicodeCodepointToUtf8(65533) -- U+FFFD REPLACEMENT CHARACTER
end
end

Btw, unless it's a GH font issue, it looks like you used the same dash for all of them?

PS I don't know where this library came from specifically, but there are updates available too, e.g., here.

["&#8212;"] = "-",
["&#x2014;"] = "-",
["&#8213;"] = "-",
["&#x2015;"] = "-",
["&#(%d+);"] = function (x)

This comment has been minimized.

@Frenzie

Frenzie Jan 22, 2019

Member

Btw, I think it might be better (as in clearer) to replace this with a reference to our utility function where it's used so that the library remains vanilla.

@sonix-github

This comment has been minimized.

Copy link
Contributor Author

sonix-github commented Jan 24, 2019

Could you please describe the specific problem this fixes? I ask specifically because of this utility function we have. :-)

I am using RSS feed from https://www.kosmonautix.cz/rubrika/micro/feed/ and xml contains title with value "&#8211". Unfortunately xml parser does not recognize this value so it returned original text as it is, so the title in table with feed titles is in koreader with the string "&#8211" instead of "-".
So I looked up for place where the "recognition" takes place and found it in xml.lua. Then I changed the code in my PocketBook PB626 to see if it helps and it is fixed. If there is a better solution I am not against it :-)

Content of whole rss feed:
RSS_www.kosmonautix.cz.txt

Btw, unless it's a GH font issue, it looks like you used the same dash for all of them?

Yes, I used the same dash for all of them.

@Frenzie

This comment has been minimized.

Copy link
Member

Frenzie commented Jan 24, 2019

In that case it's the title that should be fixed before it's used as a filename and the library should definitely be left as it is. You can use util.htmlEntitiesToUtf8.

At a quick skim it looks like that should be here:

-- if a title looks like <title>blabla</title> it'll just be feed.title
-- if a title looks like <title attr="alb">blabla</title> then we get a table
-- where [1] is the title string and the attributes are also available
local function getFeedTitle(possible_title)
if type(possible_title) == "string" then
return possible_title
elseif possible_title[1] and type(possible_title[1]) == "string" then
return possible_title[1]
end
end

@sonix-github

This comment has been minimized.

Copy link
Contributor Author

sonix-github commented Jan 24, 2019

@Frenzie
Thanks for the tip 👍
I have found that the string (channel title) must be fixed before processing feeds.rss.channel.title , therefore I reverted all changes in library xml.lua and made changes only in main.lua
It was successfully tested on my PB626 :-)

@@ -1,4 +1,4 @@
local DataStorage = require("datastorage")
local DataStorage = require("datastorage")

This comment has been minimized.

@Frenzie

Frenzie Jan 24, 2019

Member

Did you perhaps accidentally change the line endings? (Sorry, I'm just on the GH website atm, so I can't check.) Looks good besides that. :-)

This comment has been minimized.

@sonix-github

sonix-github Jan 24, 2019

Author Contributor

Hmm, that's strange. I have checked the raw file and realized that it was saved as utf8 with bom. I have corrected it.

@poire-z poire-z merged commit 4d15058 into koreader:master Jan 24, 2019

1 check passed

ci/circleci Your tests passed on CircleCI!
Details
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment