New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Run in weekly, monthly intervals #357
Comments
👍 I've a whole bunch of scrapers where running once a week or so would be sufficient, but currently I either need to run them daily, or remember to come back and run them all manually. |
Yes please. This would be a huge help. |
Thanks @Charrod
Could you please share the problem this would solve for you? |
I currently am running a scraper that takes approximately 8 hours on the morph.io hardware to finish. It is scraping team and player stats for the game, CS:GO. In order to reduce load on the host servers, and due to the plasticity of the stats, I really only need to update this database once every week or so. It would be much easier if I could have this run automatically weekly. Not to mention the reduced load on morph.io's servers! |
Thanks @Charrod that's really helpful 👍 One suggestion in the meantime: you could add code to your scraper so it only does it's scraping actions on a specific day of the week. A hack, but if it solves a problem for you now could be worth a go? |
@equivalentideas Does Morph guarantee that the scraper will actually fire at least once every day? i.e. if there was a 'exit unless today == Monday' at the start of a script would it definitely continue beyond that every week, or is there a scenario where it runs at, say, 23:55 on a Sunday, and then not run again until 00:34 on the Tuesday, skipping the Monday entirely for a week? |
Good question @tmtmtmtm Looking through some of my scrapers it appears not. I've got one that ran on 2016-02-03 23:07 then not again till 2016-02-05 10:19. If the precise day that the scraper runs on is not important, could you try only scraping if 7 days had passed since the last scrape? |
After looking at the codebase a bit, I can see why this could be a hassle to implement. Currently the DB has a column titled "Auto-Run" which stores a Boolean value. I can see a few ways around this. Migrate this column to a different integer column where 0 = False, 1 = True, and 2 could be for a weekly run. Beyond that, it just seems like fixing some variable references to use the new system, which would be pretty easy, and I would be happy to contribute. But as it stands there needs to be some restructuring of that DB column for this feature to be possible. |
Some data sources don't require daily updates, and some server admins might be irked by daily scraping. It would be great to be able to set longer intervals for the automated scraper run.
The text was updated successfully, but these errors were encountered: