Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Immoscout crawler doesn't work #10

Closed
jann-klaas opened this issue Aug 17, 2020 · 16 comments
Closed

Immoscout crawler doesn't work #10

jann-klaas opened this issue Aug 17, 2020 · 16 comments

Comments

@jann-klaas
Copy link

jann-klaas commented Aug 17, 2020

Hi there,

I won't get the immoscout bot to work. WG-Gesucht works fine.

Here's the error depending if I use python3 or 2. I assume the crawler is broken as the href file never get's any data. Any ideas how to fix it?

Janns-MBP:immobot jann$ python immo.py
Traceback (most recent call last):
File "immo.py", line 8, in
from json import JSONDecodeError
ImportError: cannot import name JSONDecodeError

Janns-MBP:immobot jann$ python3 immo.py
There was a problem with reading a json formatted object
Traceback (most recent call last):
File "immo.py", line 17, in
data = json.load(data_file)
File "/usr/local/Cellar/python@3.8/3.8.4/Frameworks/Python.framework/Versions/3.8/lib/python3.8/json/init.py", line 293, in load
return loads(fp.read(),
File "/usr/local/Cellar/python@3.8/3.8.4/Frameworks/Python.framework/Versions/3.8/lib/python3.8/json/init.py", line 357, in loads
return _default_decoder.decode(s)
File "/usr/local/Cellar/python@3.8/3.8.4/Frameworks/Python.framework/Versions/3.8/lib/python3.8/json/decoder.py", line 337, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
File "/usr/local/Cellar/python@3.8/3.8.4/Frameworks/Python.framework/Versions/3.8/lib/python3.8/json/decoder.py", line 355, in raw_decode
raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)

Time: 2020-08-17 22:43:03.966169
^CTraceback (most recent call last):
File "immo.py", line 74, in
time.sleep(60)

@nickirk
Copy link
Owner

nickirk commented Aug 20, 2020

With python2, one should

from simplejson import  JSONDecodeError

Let's focus on python3.

Could you paste the content of the file href.json here?

@rodrigodealer
Copy link

rodrigodealer commented Aug 20, 2020

Hi @nickirk I got the same error.

The content of href.json is empty:

cat href.json | wc -l
0

I tried checking if the url you provide as an example worked (maybe mine was broken), but it fails the same way.

@nickirk
Copy link
Owner

nickirk commented Aug 20, 2020

Looks like immobilienscout24.de has put a restriction on spiders, when I use scrapy to fetch the content, I got a 405 error, meaning method not allowed. I am looking for a way to evade this using scrapy. If you guys have found a way, please comment here.

@rodrigodealer
Copy link

rodrigodealer commented Aug 20, 2020 via email

@nickirk
Copy link
Owner

nickirk commented Aug 28, 2020

I tried simply replace the user agent to a value I found online and it didn't work

@krassle
Copy link

krassle commented Mar 29, 2021

@nickirk Have you found a working solution for this issue yet? Thanks.

@francisjo
Copy link

@krassle @nickirk , Did you find a solution?
Thanks.

@nickirk
Copy link
Owner

nickirk commented Apr 5, 2021

Sorry guys, I have been busy with my thesis and have no time to look into this issue. I encourage you to follow the discussions here and try something by yourselves. I personally recommend using the script on wg-gesucht.de (there is also a minor issue regarding applying the filters on wg-gesucht.de, but other than that, the script should work).

@Alnik89
Copy link

Alnik89 commented Jun 9, 2021

anyone found a solution for this issue?
guess if not, then this entire bot is useless and waste of time for someone who is not a programmer.

@Alnik89
Copy link

Alnik89 commented Jun 12, 2021

Tried to fix this issue by proxy rotation.
Still not working

@jjanczur
Copy link

jjanczur commented Jul 29, 2021

Hi,
I have the same issue.
Instead of submitting an offer, I extended your scripts and added functionality to send myself a message on telegram so I could manually check if the apartment is ok.

In the case of your scripts you just need to change submit.py to the following -> instead of submitting send telegram message.

import requests

def submit_app(bot_message):
    bot_token = '<bot token>'
    bot_chatID = '<chat ID>' 
    link = 'https://www.immobilienscout24.de' + \
        bot_message + '%23/basicContact/email'


    send_text = 'https://api.telegram.org/bot' + bot_token + \
        '/sendMessage?chat_id=' + bot_chatID + \
        '&parse_mode=Markdown&text=' + link
    response = requests.get(send_text)
    return response.json()

Unfortunately crawler doesn't work anymore :/

@jjanczur
Copy link

jjanczur commented Jul 29, 2021

Immoscout uses some kind of bot protection and redirect to ReCaptcha :/
I guess that's the end of the automatic apartment finding :p

@xabirizar9
Copy link

Is this still not working? Thought about giving it a try but reading this comments it doesn't look too promising

@jjanczur
Copy link

Nope, unfortunately now there is no way to go around it. They heavily protect themselves against webscraping

@xabirizar9
Copy link

Well that's unfortunate, thanks for the quick reply tho

@enthusiasmus
Copy link

They are using a certain service for recaptcha against bots, all the used puzzles can be solved with a certain propability programmatically with a lot of effort. The question imo are if the anti captcha logic can be good enough and if there is somebody who wants to invest that time.

@nickirk nickirk closed this as completed Oct 11, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

9 participants