Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unable to retrive the full page of url #1142

Closed
peterpt opened this issue Sep 6, 2021 · 7 comments
Closed

Unable to retrive the full page of url #1142

peterpt opened this issue Sep 6, 2021 · 7 comments
Labels

Comments

@peterpt
Copy link

peterpt commented Sep 6, 2021

I am using httpie 0.9.8 and when i try to download the webpage of dropbox i dont get the page fully downloaded , only partially .

Using this dropbox link :
https://www.dropbox.com/sh/erv1tycztizfvyd/AADeXwemV9sK37MSHqxmYz_5a?dl=0

and using the command :
http https://www.dropbox.com/sh/erv1tycztizfvyd/AADeXwemV9sK37MSHqxmYz_5a?dl=0 -o page.html

i get the links from 30 files where in reality exists 50 files inside that folder in dropbox if we opened with a normal web browser like firefox or chrome .

Any idea ?

$ http --debug <COMPLETE ARGUMENT LIST THAT TRIGGERS THE ERROR>
<COMPLETE OUTPUT>

Provide any additional information, screenshots, or code examples below:

@peterpt peterpt added bug Something isn't working new Needs triage. Comments are welcome! labels Sep 6, 2021
@BoboTiG
Copy link
Contributor

BoboTiG commented Sep 7, 2021

Could you try the latest HTTPie version (2.5.0 at the time)?

@peterpt
Copy link
Author

peterpt commented Sep 7, 2021

httpie
test.zip
grabdrop
i have try it with 2.5.0 and still have the same issue .

I created a simple script that uses httpie and uses grep next to parse the html file and retrieve only the links , probably is more easier for you guys make tests , i am sending it as an attachment in zip file .

@Bhavye-Malhotra
Copy link

I have tested the script and it works perfectly fine. The issue seems fixed in version 2.6.0

image

@peterpt
Copy link
Author

peterpt commented Nov 20, 2021

You have 50 links of wav files in that drop box , httpie only get 31 , your output was the same as i got in 2.5.0 .

@Bhavye-Malhotra
Copy link

Ohh, let me recheck and try to fix it.

@Bhavye-Malhotra
Copy link

Bhavye-Malhotra commented Nov 20, 2021

@peterpt hey, I tried to extract the URLs manually by downloading the source HTML page locally and running the script on that HTML file and I found out that it still shows 31 links and not 48 which is the total number of files so I think its a thing from dropbox side and not httpie 😅
hope it helps.
image

@isidentical
Copy link
Contributor

isidentical commented Nov 20, 2021

The reason there are only 30-something in the HTTPie usage and 50-something on the browser is that once the page is fully loaded, the browser starts executing javascript which does light pagination to load more entries. If you want to test this out by yourself, simply block the https://www.dropbox.com/list_shared_link_folder_entries API on your browser's command toolkit and then try again. This time, you'll also see only 30-something entries.

I am afraid this is something httpie does not support but it can be achieved by other means (e.g using a headless browser via selenium to fully load the page and then do the extraction).

@isidentical isidentical added wontfix and removed new Needs triage. Comments are welcome! bug Something isn't working labels Nov 20, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants