New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Mongo, improved proxies, and updated scraping logic #49
Conversation
SW stopped flying to MEX. This was causing false negatives.
Run with `node console.js`
I'm not sure what a 1 second cooldown was supposed to prevent. Duplicate messages from the same flight?
This isn't ruby
Awesome! I’ll look through this and merge it in by tonight. |
puppeteer-extra has the same workarounds wrapped in a package
This looks great. I'm have tried a previously working proxy setup (with both hostname and port) and one through illuminati.io and am getting the following errors along with the price not updating: Jun 11 13:41:15 swacheck2 app/scheduler.9385: > southwest-price-drop-bot@3.1.4 task:check /app |
I was able to get the proxy working by including http:// in front of the url. That being said, now it's having issues scraping. See logs: Jun 11 16:12:28 swacheck2 app/scheduler.5302: mongo successfully connected! |
How recently was this working? If you revert to your previous setup are you able to scrape successfully? Since I posted this PR it looks like SW is blocking requests (from a proxy or my local connection). It looks like they've updated their bot detection system, and it's gotten much much better. |
@iloveitaly It had been working prior to when their bot detection was first implemented. That being said, I was able to move past the error I was receiving in my first post by including "http://" in the proxy var. That being said, now the app is having trouble scraping the price. I was initially searching an international flight booked with points, so to test I tried a US flight booked with cash and it's still having issues. |
I'm seeing the same thing - looks like an Akamai block. |
@samyun and @iloveitaly - I setup a proxy server at my homelab and still run into the issues - no problems accessing the southwest site through a browser. Not sure if it's Akamai in this case. |
https://github.com/pyro2927/SouthwestCheckin/ <-- This is working as of now. I wonder if we can pull some of the techniques used. It uses the mobile api. |
We don't want to load all of the ad stuff
@razzamatazm ah, interesting! I didn't realize there was a mobile API. Looks like the flight cost endpoint hasn't been figured out yet. Any ideas on how to hit it? @samyun I'm pretty sure it's not a Akamai block. Here's why:
I went ahead and did this one last time and realized the flags I had to disable the WebGL/GPU stuff was causing the issue. This is now working again! |
Hmm, now it's not working for me. No idea why. Can you guys try |
I'm getting build errors on Heroku
info fsevents@1.2.9: The platform "linux" is incompatible with this module.
info "fsevents@1.2.9" is an optional dependency and failed
compatibility check. Excluding it from installation.
error fsevents@2.0.7: The platform "linux" is incompatible with
this module.
error Found incompatible module.
info Visit https://yarnpkg.com/en/docs/cli/install for
documentation about this command.
…-----> Build failed
We're sorry this build is failing! You can troubleshoot
common issues here:
https://devcenter.heroku.com/articles/troubleshooting-node-deploys
Some possible problems:
- Dangerous semver range (>) in engines.node
https://devcenter.heroku.com/articles/nodejs-support#specifying-a-node-js-version
Love,
Heroku
! Push rejected, failed to compile Node.js app.
! Push failed
On Thu, Jun 13, 2019 at 8:53 AM Michael Bianco ***@***.***> wrote:
Hmm, now it's not working for me. No idea why. Can you guys try HEAD and
see if it works for you?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#49?email_source=notifications&email_token=AFOEFJUQTKMZNRVUJ5FI6RLP2JUQDA5CNFSM4HTMMMSKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODXUEYZA#issuecomment-501763172>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AFOEFJRRCAGBTUPH6IMHL2TP2JUQDANCNFSM4HTMMMSA>
.
|
New, different, exciting errors :) Jun 13 11:52:55 swacheck3 heroku/router: at=info method=GET path="/style.css" host=swacheck3.herokuapp.com request_id=1fad41d7-311b-41c0-9876-8bd743dc5526 fwd="67.53.122.46" dyno=web.1 connect=0ms service=6ms status=304 bytes=269 protocol=https |
@razzamatazm yup, the 403 is SW blocking us. No idea how to get around this. I think it has something to do with the IP used, but I can't be sure. |
I can reach the site using chrome at my home, via the same proxy. So
strange.
…On Thu, Jun 13, 2019 at 1:24 PM Michael Bianco ***@***.***> wrote:
@razzamatazm <https://github.com/razzamatazm> yup, the 403 is SW blocking
us. No idea how to get around this. I think it has something to do with the
IP used, but I can't be sure.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#49?email_source=notifications&email_token=AFOEFJU3XFJDRPH6CS7RQKDP2KUIPA5CNFSM4HTMMMSKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODXU5VKQ#issuecomment-501865130>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AFOEFJU2AMF4HX7QIZ6W6NDP2KUIPANCNFSM4HTMMMSA>
.
|
@razzamatazm is that using this repo, or by manually accessing it via standard chrome? I think what's going on is SW is associating a browser fingerprint with an IP and then blocking that IP. I know somewhere in the SW code they are checking the I think the best option is to use the mobile API, but it doesn't look like the price check endpoint has been figured out yet (and I don't have the time to tinker with it). In any case, this is a huge improvement over what was there, although it doesn't actually work :( |
I went ahead and merged this in - I found some other evasion repos I'm going to try to work in. Thanks for your help! |
@samyun awesome! It's worth noting that this is now working locally again. I think there is some sort of IP block triggered by repeated requests for the same flight (or something alone those lines... just guessing really). Keep us posted on what you find! |
Lots of improvements!