-
Notifications
You must be signed in to change notification settings - Fork 180
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
immoscout24 broken somehow #45
Comments
Hey there, From your logs, it looks like Immoscout has detected that your flathunter is a bot and has blocked it. Can you browse the site normally in a webbrowser? I've not encountered bot-detection on Immoscout before. It would be interesting to know if other users report the same thing. |
Hey, thanks for your fast reply. I can browse the website normally via Firefox. And I had the same problem at work as at home. So I assume it is not IP related. Does flathunter use some cookies or so? |
it seems like immoscout is complaining that cookies are not used and JavaScript is disabled. |
Okay. I see the same thing on my machine. So it looks like they've upgraded their bot detection. I just tried here with a fake user agent (so it looks like Firefox instead of a Python script), but that doesn't help. I also tried here adding cookies support, and that doesn't fix it. I'll need to take a deeper look at what they're detecting, and I don't have any time to do that in the coming weeks I'm afraid. But thanks for the report - this is something that will be affecting all users. |
I also have gotten the same error page as the crawler in Firefox after doing some manual refreshs this morning. So it is not purely flathunter related. After a Google Captcha I was able to continue using immoscout with Firefox. The used Firefox is without any plugins or ad blockers. |
Sounds like they just have some new aggressive filtering in place then.
There are a couple of python projects that offer to solve captchas - that
would be an option, but it's also not free (though it is very cheap - 1eur
/ 1000 fetches).
Another option would be to implement the immoscout API. That would mean
every flathunter user has to register with them.
…On Wed, 5 Aug 2020, 09:50 choeffer, ***@***.***> wrote:
I also have gotten the same error page as the crawler in Firefox after
doing some manual refreshs this morning. So it is not purely flathunter
related. After a Google Captcha I was able to continue using immoscout with
Firefox. The used Firefox is without any plugins or ad blockers.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#45 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAAEK5V5JC72XFAY7OYDT4TR7EFLJANCNFSM4PUWCHAA>
.
|
Same happens to my setup run on a raspberry pi. It also crashes my script every time 🤔
I can still access Immoscout without problems via chromium on the Pi though... 🤔 |
I havnt had a deeper look into the implementation of the crawler. But maybe selenium would help with the bot detection. This is really sad i was so excited when i came accross this tool and wanted to use it for my personal flat hunt ;) |
Same experience here. My first thought was they block an IP that has made too many requests, but I cann access ImmoScout as usual with a browser. I suppose it has to do with the request headers or the lack of cookie and javascript support as was mentioned above. |
Hmm... so sad :/ same problem here... Cheers |
I have tried to use http://html.python-requests.org/ and https://selenium-python.readthedocs.io/ . But I am still getting the Google captcha thingy on immoscout24. At least it is somehow easy to replace the way how the HTML content is received. After digging through the code, I was able to replace the used Python request package with the above mentioned by just applying changes in
from https://github.com/flathunters/flathunter/blob/main/flathunter/abstract_crawler.py for selenium with Chrome
for requests_html
With both changes, at least ebay kleinanzeigen is still working fine. |
With the help of https://github.com/berstend/puppeteer-extra/tree/master/packages/puppeteer-extra-plugin-stealth and the code // puppeteer-extra is a drop-in replacement for puppeteer,
// it augments the installed puppeteer with plugin functionality
const puppeteer = require('puppeteer-extra')
const fs = require("fs");
// add stealth plugin and use defaults (all evasion techniques)
const StealthPlugin = require('puppeteer-extra-plugin-stealth')
puppeteer.use(StealthPlugin())
// puppeteer usage as normal
puppeteer.launch({ headless: false }).then(async browser => {
console.log('Running tests..')
const page = await browser.newPage()
await page.goto('https://www.immobilienscout24.de/Suche/de/nordrhein-westfalen/koeln/wohnung-mieten?sorting=2')
await page.waitFor(5000)
const html = await page.content();
fs.writeFileSync("index.html", html);
// await page.screenshot({ path: 'testresult.png', fullPage: true })
await browser.close()
console.log(`All done, check the screenshot. ✨`)
}) I was able to bypass the bot protection. But right now, this is more a prove of concept. The website is loading fine but continues loading until only an add is shown as the final content. But this could be starting point to bypass the immoscout24 bot protection. |
Too bad I'm having the same issue and can't run flathunter on ImmoScout.. |
Just merged a fix, use the latest code stand from the Please let me know if it works now. |
Thank you! Seems to run fine now. Do you know how I can check if the program loops after 5 minutes? For me nothing happens atm after I wait for the looptime configured inside of the config file. |
It works now for me. Thanks for the patch. |
Put the logs to verbose and check the output. I guess this is related to this issue: #50? Let move the chat to there.
Ok, closing the ticket. |
Does not seem to be solved. It has worked properly for a few times, but now I can see new offers on immoscout24 via Firefox which are not listed by flathunter. Maybe the response status is still 200 and it seems to work fine, but I do not think the requested content is delivered. |
A print() of the HTML content reveals that the ouput is the same as the |
Yes, they rolled out just a new version. It seems like they added cookies to their headers. Its another issue, I created a follow up: #51 |
I am using the url
https://www.immobilienscout24.de/Suche/de/nordrhein-westfalen/koeln/wohnung-mieten?sorting=2
and since a few hours, I am just getting a long printout but not any results sent via telgramm bot anymore. Ebay Kleinanzeigen is still working fine.file.log
If I can provide more info, please let me know.
The text was updated successfully, but these errors were encountered: