Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

scrapy don't detect an html element but it is visible on source page #2109

Closed
mayouf opened this issue Jul 9, 2016 · 1 comment
Closed

Comments

@mayouf
Copy link

mayouf commented Jul 9, 2016

Hello,

Cuuld someone help with a website where the informations appears in the source page and the browser but get vanish as soon as I use "scrapy shell" or "scrapy crawl".
I am not banned for sure and the whole page appears correctly but not the div containing the data I need.
Here is picture of the below link (french website property auction) with a regular browser like mozilla :
http://www.licitor.com/ventes-judiciaires-immobilieres/tgi-fontainebleau/mercredi-15-juin-2016.html

issue licitor 1

I encircled the data I target, BUT let us do it with the view(response) from scrapy shell:

scrapy shell http://www.licitor.com/ventes-judiciaires-immobilieres/tgi-fontainebleau/mercredi-15-juin-2016.html

screenshot from 2016-07-09 17 30 13

and here is the result: a whole div disappeared !!!

I have some error in my terminal:

screenshot from 2016-07-09 17 32 52

To give you more information, I first tried on a scrapy project and I had to add DOWNLOAD_HANDLERS: {'s3': None} in my settings in order to get rid of an ERROR message...it did not work also, then by debugging I just tried with the shell and I noticed the response body was missing a part of the HTML.

I am running on ubuntu 14 and I have anaconda installed on it with scrapy 1.03.

Where do I miss the point please people ?

Thanks in advance for your help, I spent the whole afternoon on it !!

@kmike
Copy link
Member

kmike commented Jul 10, 2016

Hey @mayouf,

The difference is likely caused either by JavaScript or by different request headers. We are using github issue tracker for Scrapy bug reports; for support questions please use http://stackoverflow.com (with 'Scrapy' tag).

@kmike kmike closed this as completed Jul 10, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants