Skip to content
This repository has been archived by the owner on Jul 5, 2022. It is now read-only.

CVS thinks we're a bot #1

Open
estiens opened this issue Feb 9, 2021 · 7 comments
Open

CVS thinks we're a bot #1

estiens opened this issue Feb 9, 2021 · 7 comments

Comments

@estiens
Copy link

estiens commented Feb 9, 2021

Thanks for this! I was well on my way to building a ruby scraper that could start hitting places like CVS, Costco, Walmart, etc

Playing with this a bit CVS always sends me back a Service Denied error and says I seem like a bot - wondering if you've seen this and if we just need to rotate through proxies or if it's something about hitting that internal API directly instead of going through the screening process

@jwoglom
Copy link
Owner

jwoglom commented Feb 10, 2021

Hi Eric,

I also noticed this problem, and I haven't found a solution yet. I tried sending some additional requests beyond just the API endpoints to see if that helped, but it didn't seem to make a difference. For now I've just disabled the CVS backend in my own instance.

My guess is that some combination of spreading out the requests to be less often, using Selenium or similar to actually complete a full user session beyond hitting API endpoints directly, and switching between proxies might help, but I haven't actively worked on any of those approaches yet.

@estiens
Copy link
Author

estiens commented Feb 10, 2021

Ok, as long as it's not just me with an easy workaround :D I'm going the selenium route now just to see if it can make headway. If anyone knows of a centralized place where folks might be working on some sort of scraping of the big chains (Walmart, CVS, Kroger, Publix, Costco, etc etc) please let me know. Also tracking this here (where you might want to add this repo to their README with a PR if you feel like sharing at some point

usdigitalresponse/vaccine-finder-tools#1

@zzany
Copy link

zzany commented Feb 11, 2021

Let me know if you figure this out - I was using Selenium and tried the likely suspects (changing headers, etc.) and still had difficulty. Possible that my browser was somehow fingerprinted before I tried every trick in the book, but switching networks hasn't helped me either.

@estiens
Copy link
Author

estiens commented Feb 11, 2021 via email

@zzany
Copy link

zzany commented Feb 11, 2021

Happy to chat more about this offline (I think members of our team including John Deyrup have been in touch), but I suspect that CVS and Walgreens will stay independent for some time. Notably, the states get vaccines from Operating Warpspeed and have some control of their distribution, but Walgreens and CVS specifically have a different system and therefore have different policies.

@MoralCode
Copy link

have you tried visiting the site in a browser and exporting your cookies? that seems to work for me for other webscraping-related projects

@jwoglom
Copy link
Owner

jwoglom commented Feb 17, 2021

Just wanted to share a brief update related to scraping CVS I received from a user of this script over email, who has been able to get this working at scale:

  • After some trial and error, the bot detection seems to be based on timing, IP, user agent, and cookies
  • They are using GitHub Actions, which provides the benefit of randomized IPs in the Azure range, which likely helps with not being IP-blocked
  • They are hitting the getIMZStore endpoint with a randomized the user agent across 10 entries and no cookies approximately every 10 minutes
  • They are querying the availabletimeslots endpoint with specific location IDs only if the getIMZStore endpoint returns data, and capping these requests to every 5-10 seconds.

I'll be making some updates in this repo soon to try and take these observations in to consideration.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants