New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Can WebsiteAgent support anti DDoS website #2658
Comments
A temporary solution is to get the cloudflare cookie from the browser and put it into the WebsiteAgent configuration similar to this:
|
@ dsander Thanks a lot for your help! I am very new to huginn and web programming, but I am willing to learn! I registered on PhantomJSCloud and still trying to figure out how to use it. The headers/cookie option seems interesting, but I find very little help in the WebsiteAgent documentation. I found |
I installed "phantomjs-2.1.1-windows". Now this is interesting. I created a simple test.js file var page = require('webpage').create(); It works and the output test.png shows the screenshot of the webpage! I don't even need to change the default user_agent. However, if I change the timeout from 10 seconds to 1 second, it shows the "Checking you browser" message. It makes me to believe the browser just have to wait for 5 seconds to get pass the DDoS protection page and to be redirected to the real page. In Hubinn's Phantom Js Cloud Agent, it does give me the option to set the timeout, but no matter what I change it to, it always gives me the 404 error. So is it an Agent bug or PhantomJSCloud's bug? |
I played with the PhantomJSCloudAgent a little bit and it seems this is a bug in the Agent. It gives me this url:
The
|
Thank you for the troubleshooting! The 404 error is gone now. More questions:
PhantomJS is discontinued in 2018. I don't know how long PhantomJSCloud will be alive. However, PhantomJS 2.1.1 can be downloaded and installed locally on any linux or Windows box. So,
|
Yeah we can do that.
From their docs it kind of reads like that they already moved away from PhantomJS, but they could also be maintaining a fork of that.
You can if you expose it via a HTTP interface so that Huginn can call it, there is also fulldom-server . |
fulldom-server runs PhantomJS and it seems also discontinued. Then I tried to run fulldom in a docker and I found this: After the container is running on port 3600, I tried to build the URL that Fulldom container can process as per the wiki here: 2nd try: Running PhantomJS in a docker and expose the port. So it looks like phantomjs is working on port 8910. Since I am very new to Phantomjs and Fulldom, could you please shed some light on how to make use of them in Huggin? |
I have no problems installing it in a ubuntu 18.04 docker image, your error shows a permission error which is very odd when running the command as root.
fulldom-server works similarly to PhantomJSCloud, you send it a HTTP request like it is documented on the website, it waits until the given selector is found and then returns the HTML. The docker image you used isn't running fulldom-server, is is "just" phantomjs which needs to be called using the webdriver protocol https://w3c.github.io/webdriver/. See the docker hub documentation https://hub.docker.com/r/wernight/phantomjs/ There seem to be two fulldom docker images, but I have never used them: |
About Fulldom docker:
About phantomjs docker: |
Sorry I missed that part, this seems to work for me:
I don't think you can at the moment. It would require an Agent that can use the protocol. |
Looks like fulldom is my only hope. If I put this URL, (selector=body) If I put the URL like yours (selector=%23jsddm) |
That makes sense,
You probably can filter out the error Huginn a liquid I think your best bet it still |
WebsiteAgent has trouble to access websites with DDoS protection by Cloudflare
Example website (adult): https://www.inthecrack.com/
Command curl gives me "Checking your browser" message with 503 error.
curl -Lv https://www.inthecrack.com
HTTP/1.1 503 Service Temporarily Unavailable
The error log of a dry run is also attached.
itc.error.log
Any browsers like firefox, IE or chrome can open the website with no problem but have to wait for a few seconds to get pass the “Checking your browser” page.
So, is there a way for websiteagent to act like a browser?
The text was updated successfully, but these errors were encountered: