Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Run in weekly, monthly intervals #357

Open
rlafuente opened this issue May 17, 2014 · 8 comments
Open

Run in weekly, monthly intervals #357

rlafuente opened this issue May 17, 2014 · 8 comments

Comments

@rlafuente
Copy link

Some data sources don't require daily updates, and some server admins might be irked by daily scraping. It would be great to be able to set longer intervals for the automated scraper run.

@tmtmtmtm
Copy link

tmtmtmtm commented Aug 7, 2015

👍

I've a whole bunch of scrapers where running once a week or so would be sufficient, but currently I either need to run them daily, or remember to come back and run them all manually.

@CharlieIO
Copy link

Yes please. This would be a huge help.

@equivalentideas
Copy link
Collaborator

Thanks @Charrod

Yes please. This would be a huge help.

Could you please share the problem this would solve for you?

@CharlieIO
Copy link

I currently am running a scraper that takes approximately 8 hours on the morph.io hardware to finish. It is scraping team and player stats for the game, CS:GO. In order to reduce load on the host servers, and due to the plasticity of the stats, I really only need to update this database once every week or so. It would be much easier if I could have this run automatically weekly. Not to mention the reduced load on morph.io's servers!

@equivalentideas
Copy link
Collaborator

Thanks @Charrod that's really helpful 👍

One suggestion in the meantime: you could add code to your scraper so it only does it's scraping actions on a specific day of the week. A hack, but if it solves a problem for you now could be worth a go?

@tmtmtmtm
Copy link

@equivalentideas Does Morph guarantee that the scraper will actually fire at least once every day? i.e. if there was a 'exit unless today == Monday' at the start of a script would it definitely continue beyond that every week, or is there a scenario where it runs at, say, 23:55 on a Sunday, and then not run again until 00:34 on the Tuesday, skipping the Monday entirely for a week?

@equivalentideas
Copy link
Collaborator

@equivalentideas Does Morph guarantee that the scraper will actually fire at least once every day?

Good question @tmtmtmtm Looking through some of my scrapers it appears not. I've got one that ran on 2016-02-03 23:07 then not again till 2016-02-05 10:19.

If the precise day that the scraper runs on is not important, could you try only scraping if 7 days had passed since the last scrape?

@CharlieIO
Copy link

After looking at the codebase a bit, I can see why this could be a hassle to implement. Currently the DB has a column titled "Auto-Run" which stores a Boolean value. I can see a few ways around this. Migrate this column to a different integer column where 0 = False, 1 = True, and 2 could be for a weekly run. Beyond that, it just seems like fixing some variable references to use the new system, which would be pretty easy, and I would be happy to contribute. But as it stands there needs to be some restructuring of that DB column for this feature to be possible.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants