-
-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
NewsDownloader externalize download engine #3618
NewsDownloader externalize download engine #3618
Conversation
Ok. so I'll add lua-http as backend next time. This time I'll only focus on interfacing backend (already done) |
Can someone take a look at my code? |
return processDownloadAndReturnSink(redirected_url) | ||
end | ||
else | ||
logger.warn("translator HTTP status not okay:", status) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not translator here ;-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done.
|
||
function InternalDownloadBackend:download(url, path) | ||
local sink = processDownloadAndReturnSink(url) | ||
ltn12.sink.file(io.open(path, 'w')) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not entirely sure what's going on here.
sink
is for use as an argument to LuaSocket: http://w3.impa.br/~diego/software/luasocket/ltn12.html#sink.table
You can return the table that sink created for you, not the sink. You're doing that of course, but you're naming the table itselfsink
, which I find confusing at best. It's not a sink. It's just a results table for the sink.- Regardless of all that, the
ltn12.sink
here is unrelated to thelocal sink
right above it, isn't it? - Anyway, so you'd probably want to
table.concat(table_name)
(please name it something notsink
) just like you're doing in that getResponseAsString() function. But that probably makes more sense asreturn table.concat(table_name)
on line 40.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks I fixed it.
|
||
local InternalDownloadBackend = {} | ||
|
||
local function processDownloadAndReturnSink(url) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd just call it something simple like get
or getURL
. Skip the getResponseAsString
since this function should just do return table.concat(response)
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks. I replaced processDownloadAndReturnSink with getResponseAsString as this ways' better.
|
||
if status ~= "HTTP/1.1 200 OK" then | ||
logger.dbg("InternalDownloadBackend: HTTP response code <> 200. Response code: ", status) | ||
if status and string.sub(status, 1, 1) ~= "3" then -- handle 301, 302... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unless I'm misreading this makes no sense as written. It should be == 3
, not not ~= 3
. But since the status code is a number you can just do code > 300 and code < 400
. Just use it instead of ignoring it up above on line 24.
Also are you sure that this will only apply to HTTPS: that is, regular HTTP only returns something after all the internal redirections have already been taken care of?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
right copy-paste from pseudocode.
actually as status is something like:
"HTTP/1.1 200 OK"
"HTTP/1.1 400 Bad Request"
it should be :
string.sub(status, 10, 1) == "3"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
something wrong with that. I give extra debug, substring without second arg:
logger.dbg("InternalDownloadBackend: HTTP response code <> 200. Response code: ", status)
local code = string.sub(status, 10)
logger.dbg("RESPO:", code)
gives:
01/29/18-22:09:24 DEBUG InternalDownloadBackend: HTTP response code <> 200. Response code: HTTP/1.1 301 Moved Permanently
01/29/18-22:09:24 DEBUG RESPO: 301 Moved Permanently
adding second arg to substring:
logger.dbg("InternalDownloadBackend: HTTP response code <> 200. Response code: ", status)
local code = string.sub(status, 10, 1)
logger.dbg("RESPO:", code)
gives:
01/29/18-22:12:09 DEBUG InternalDownloadBackend: HTTP response code <> 200. Response code: HTTP/1.1 301 Moved Permanently
01/29/18-22:12:09 DEBUG RESPO:
./luajit: plugins/newsdownloader.koplugin/internaldownloadbackend.lua:38: InternalDownloadBackend: Don't know how to handle HTTP status:
see here 01/29/18-22:12:09 DEBUG RESPO: <- it's blank!!! empty null
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But you should use the number code, not the human readable string status. Up above where you're explicitly dismissing it in the dummy variable _
, replace that with something like code
and use it the way I said: code > 300 and code < 400
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But status returns me full string status:
"HTTP/1.1 200 OK"
"HTTP/1.1 400 Bad Request"
I don't have number (code only) status:(
found ugly but working workaround.
local socket_url = require("socket.url") | ||
|
||
local InternalDownloadBackend = {} | ||
local max_redirects = 10; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably doesn't really matter but most browsers default to 20.
|
||
function InternalDownloadBackend:download(url, path) | ||
local response = self:getResponseAsString(url) | ||
local file = assert(io.open(path, 'w')) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that an assert means insta-crash in case it fails. Which is excellent during development, but not so much for regular use. ;-)
There are a couple of methods we've added to get around this issue:
local dbg = require("dbg")
if dbg.is_on then
-- do some asserting or erroring
end
Example use:
Lines 75 to 94 in 43a6cf4
if dbg.is_on then | |
local mt = { | |
__index = function(t, k) | |
local prop_value = rawget(t, k) | |
local prop_exists = prop_value ~= nil | |
if not prop_exists then | |
local warning = rawget(t, "_name") and string.format("Size.%s.%s", rawget(t, "_name"), k) | |
or string.format("Size.%s", k) | |
error("Size: this property does not exist: " .. warning) | |
end | |
assert(prop_exists) | |
return prop_value | |
end | |
} | |
setmetatable(Size, mt) | |
for el, table in pairs(Size) do | |
table._name = el | |
setmetatable(Size[el], mt) | |
end | |
end |
Then there's also dbg.guard
and dbg.dassert
.
dbg.dassert(file ~= nil)
koreader/frontend/ui/uimanager.lua
Lines 298 to 302 in 775b7a3
dbg:guard(UIManager, 'schedule', | |
function(self, time, action) | |
assert(time[1] >= 0 and time[2] >= 0, "Only positive time allowed") | |
assert(action ~= nil) | |
end) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok I'll remove assert
Codecov Report
@@ Coverage Diff @@
## master #3618 +/- ##
==========================================
- Coverage 53.22% 53.09% -0.13%
==========================================
Files 200 201 +1
Lines 26446 26693 +247
==========================================
+ Hits 14075 14173 +98
- Misses 12371 12520 +149
Continue to review full report at Codecov.
|
can I retrigger CI?
this doesn't seem to be something I should be responsible for. |
|
||
local httpRequest = parsed.scheme == 'http' and http.request or https.request | ||
-- first argument returned by skip is code | ||
local _, headers, status = socket.skip(1, httpRequest(request)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't have number (code only) status:(
This line is where you're explicitly ignoring it.
local code, headers, status
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks:)
Is the redirect handling working as intended on Reuters? If so I'll merge this. |
Reuters works (fixed). PC world full download still doesn't (so no changes here). Added better exception handling. |
@@ -212,28 +212,38 @@ end | |||
|
|||
function NewsDownloader:processFeedSource(url, limit, unsupported_feeds_urls, download_full_article) | |||
|
|||
local response = DownloadBackend:getResponseAsString(url) | |||
local ok, response = pcall(function() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pcall(DownloadBackend.getResponseAsString, url)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
replacing
local ok, response = pcall(function()
return DownloadBackend:getResponseAsString(url)
end)
with local ok, response = pcall(DownloadBackend.getResponseAsString, url)
brokes code.
It displays error for all sources after <1 sec.
Oh, I just fixed both of those in #3644 :-P |
ok, should I revert internaldownloadbackend to inital extract: |
Sure! |
@Frenzie I suspect your commit didn't fix any of the problems. What I did: I merge upstream master to my branch latest develop:
build koreader. run download in news donwloader. Can you check if it works for you ? |
As always, |
Ok. If it's fine then please merge |
Wait, is that file even used? |
local LuaHttpDownloadBackend = {} | ||
|
||
function LuaHttpDownloadBackend:getResponseAsString(url) | ||
local _, stream = assert(http_request.new_from_uri(url):go()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought the asserts had been replaced by something more user-friendly?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh right, you stuck it behind a pcall.
I don't particularly want any dead code lying around in master. That's why Git has branches. |
Oh, no please don't remove it. It's already prepared for external lib. Please don't make me additional work, once I'm back to the topic |
Thanks |
No problem. |
@Frenzie I tested again (submodule updated). And again it seems it doesn't work for full download Reuters and full download PCWorld. Maybe I'm doing something wrong - I did fetch upstream(oryginal koreader), merge latest koreader master to my local master and update submodules? @Frenzie could you check it? It does create files with correct names (it did so before, as we've title in feeds first xml we download), but when you open it you have "301 Moved" (for Reuters) and "400 bad request" (for PC World). |
Oh, my bad. That's a bug in this version of NewsDownloader. It should only create files when the status returned is 200. I believe that was fixed in the code as it was? |
But the code was to handle prepare for change download backend, and in addition I added http 301 handling, and just at the end I added better exception handling. But I thought you mean that after submodule bump it should also handle 301 and other (non 200, e.g 400) statuses, in old code without my extra changes for 301? That's what i assumed from :
I thought that's why you wanted to revert my changes handling 301 statuses?
Ok so I'll pr for handling 301 again |
Done please merge . #3652 |
I thought it was fixed because like you said it wrote the files. It didn't write any files previously for me; it just said there was a problem with the feed. |
appreciated help to build :
https://github.com/daurnimator/lua-http as lua library
I stuck on segfault :(