-
Notifications
You must be signed in to change notification settings - Fork 132
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add LIFO queue option for recursive download #1
Conversation
c201df5
to
2d37a1a
Compare
c1db620
to
f1dc95b
Compare
Nice, could you just add these two little changes: diff --git a/doc/wget.texi b/doc/wget.texi -@itemx --queue-type=@var{queuetype} diff --git a/src/init.c b/src/init.c static bool |
how do i generate docs to detect that error? this command doesnt show any error about that
|
Normally, this will be done automatically by 'make'. |
is there a make type for that? like
theres nothing about texinfo in config.log. this is the makeinfo and pod2man output:
am i supposed to run that command to get more info?
makeinfo is 4.13
|
'textinfo' is a typo, should be texinfo ;-) test -z "wget.dvi wget.pdf wget.ps wget.html" So maybe it is this ./texi2pod.pl working different here (or for you) ? |
i get the error now. dunno why i didnt get it before. maybe bc i didnt do
|
5ace8f0
to
0f94726
Compare
k email sent feedback wanted for this patch #1 |
35485b0
to
3fbfdba
Compare
0b1f1d9
to
6170341
Compare
basic problem
yes depth doesnt matter
yup. depth no matter alternative solutionenqueue html last isnt enough
keeping FIFO and enqueue html links last (with sort) isnt enough because all depth n links are still downloaded before any depth n+1 links FIFO enqueue html last ≠ LIFO enqueue html first |
enqueue html last isnt enough
show it with code because i dont understand the current FIFO code is:
the LIFO solution is:
|
this can fix a problem that links expire before they're dequeued for download the result of using LIFO instead FIFO is that links are downloaded immediately after the page they're in, instead of after other links are downloaded which can be a considerable time Test case: The download targets are all deepest (depth 2) links. They expire a while after their parent depth 1 page is downloaded. The FIFO queue download all depth 1 pages before downloading any depth 2 links. This takes so long that the depth 2 links expire before they're dequeued for download
closed in favor of #2 |
basic problem
the basic problem is that the FIFO queue can create a long time between downloading a page and its links. this is different from the browser experience that the page is designed for. resulting in wget fail that a browser user dont experience
savannah link
this patch is also posted at https://savannah.gnu.org/bugs/?37581
making it optional
k the patch is changed here #1
the patch file is https://github.com/mirror/wget/pull/1.patch
reason to place html pages at the top of the queue
if ll_bubblesort isn't used only the deepest level links are downloaded directly after its parent page despite using LIFO
alternative solution
enqueue child directly after parent seem difficult
another solution is to enqueue the depth n+1 links directly after enqueuing its parent depth n link instead of continuing enqueuing depth n links
this require interrupting the depth n enqueue at html links. dequeue everything (including the html link). enqueue the depth n+1 links. and the continue the depth n enqueue. this require a big reorganization or doesnt make sense
a way to do this could be to store the non-enqueued links in a temporary queue and enqueue them after everything else
the LIFO solution is better than this solution bc
enqueue html last doesnt work
keeping FIFO and enqueue html links last (with sort) doesnt solve the problem because all depth n links are still downloaded before any depth n+1 links
test case description
i dont mean that all resources can be downloaded fast. i just mean that they are downloaded directly after the page that contain them
the example is an image hosting site (imagevenue.com) where all images has its own html page (imagevenue.com/img.php) with a generated image link that expires a while after the html page is generated to prevent links directly to image files
all links can be downloaded with lifo because each branch page has only 1 link in this example and there's more than enough time to download that 1 link if the download begin directly after the link is generated
if a branch page (f.e. imagevenue.com/img.php) had many images (links) there could still be a problem. but the problem would be the same for regular users (browsers) that download the resource directly after the page is loaded and the fault is therefore the site's rather than wget's
test
imagevenue fail
this fails to download the imagevenue.com/img.php images because it's downloading all the img.php pages before the temporary image links in them, and by the time it gets to them they're expired
this downloads images directly after a img.php page is downloaded so they dont have time to expire
invalid input
invalid input is prevented
download order
this test show the FIFO and LIFO download order
i created this local site:
i.html
a.html
a-a.html
a-b.html
b.html
b-a.html
b-b.html
fifo download links long after its parent page. especially the deepest level links
lifo download links directly after its parent page