-
Notifications
You must be signed in to change notification settings - Fork 43
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Immoscout crawler doesn't work #10
Comments
With python2, one should from simplejson import JSONDecodeError Let's focus on python3. Could you paste the content of the file href.json here? |
Hi @nickirk I got the same error. The content of
I tried checking if the url you provide as an example worked (maybe mine was broken), but it fails the same way. |
Looks like immobilienscout24.de has put a restriction on spiders, when I use scrapy to fetch the content, I got a 405 error, meaning method not allowed. I am looking for a way to evade this using scrapy. If you guys have found a way, please comment here. |
Maybe change the user agent when doing the request?
On Thu 20. Aug 2020 at 21:07, Ke ***@***.***> wrote:
Looks like immobilienscout24.de has put a restriction on spiders, when I
use scrapy to fetch the content, I got a 405 error, meaning method not
allowed. I am looking for a way to evade this using scrapy. If you guys
have found a way, please comment here.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#10 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AABDGWAESYCNYOHRIDXUYZTSBVYATANCNFSM4QCGESQA>
.
--
Rodrigo Oliveira
|
I tried simply replace the user agent to a value I found online and it didn't work |
@nickirk Have you found a working solution for this issue yet? Thanks. |
Sorry guys, I have been busy with my thesis and have no time to look into this issue. I encourage you to follow the discussions here and try something by yourselves. I personally recommend using the script on wg-gesucht.de (there is also a minor issue regarding applying the filters on wg-gesucht.de, but other than that, the script should work). |
anyone found a solution for this issue? |
Tried to fix this issue by proxy rotation. |
Hi, In the case of your scripts you just need to change submit.py to the following -> instead of submitting send telegram message. import requests
def submit_app(bot_message):
bot_token = '<bot token>'
bot_chatID = '<chat ID>'
link = 'https://www.immobilienscout24.de' + \
bot_message + '%23/basicContact/email'
send_text = 'https://api.telegram.org/bot' + bot_token + \
'/sendMessage?chat_id=' + bot_chatID + \
'&parse_mode=Markdown&text=' + link
response = requests.get(send_text)
return response.json() Unfortunately crawler doesn't work anymore :/ |
Immoscout uses some kind of bot protection and redirect to ReCaptcha :/ |
Is this still not working? Thought about giving it a try but reading this comments it doesn't look too promising |
Nope, unfortunately now there is no way to go around it. They heavily protect themselves against webscraping |
Well that's unfortunate, thanks for the quick reply tho |
They are using a certain service for recaptcha against bots, all the used puzzles can be solved with a certain propability programmatically with a lot of effort. The question imo are if the anti captcha logic can be good enough and if there is somebody who wants to invest that time. |
Hi there,
I won't get the immoscout bot to work. WG-Gesucht works fine.
Here's the error depending if I use python3 or 2. I assume the crawler is broken as the href file never get's any data. Any ideas how to fix it?
Janns-MBP:immobot jann$ python immo.py
Traceback (most recent call last):
File "immo.py", line 8, in
from json import JSONDecodeError
ImportError: cannot import name JSONDecodeError
Janns-MBP:immobot jann$ python3 immo.py
There was a problem with reading a json formatted object
Traceback (most recent call last):
File "immo.py", line 17, in
data = json.load(data_file)
File "/usr/local/Cellar/python@3.8/3.8.4/Frameworks/Python.framework/Versions/3.8/lib/python3.8/json/init.py", line 293, in load
return loads(fp.read(),
File "/usr/local/Cellar/python@3.8/3.8.4/Frameworks/Python.framework/Versions/3.8/lib/python3.8/json/init.py", line 357, in loads
return _default_decoder.decode(s)
File "/usr/local/Cellar/python@3.8/3.8.4/Frameworks/Python.framework/Versions/3.8/lib/python3.8/json/decoder.py", line 337, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
File "/usr/local/Cellar/python@3.8/3.8.4/Frameworks/Python.framework/Versions/3.8/lib/python3.8/json/decoder.py", line 355, in raw_decode
raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)
Time: 2020-08-17 22:43:03.966169
^CTraceback (most recent call last):
File "immo.py", line 74, in
time.sleep(60)
The text was updated successfully, but these errors were encountered: