Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Switch From Ping Mechanism To Heroku Scheduler #38

Closed
thealphadollar opened this issue Nov 4, 2020 · 15 comments
Closed

Switch From Ping Mechanism To Heroku Scheduler #38

thealphadollar opened this issue Nov 4, 2020 · 15 comments
Assignees
Labels
enhancement Hacktoberfest Issues that can be solved for Hacktoberfest help wanted research

Comments

@thealphadollar
Copy link
Contributor

Currently, we have a cron job on the metakgp digital ocean server that pings the URL https://mftp.herokuapp.com/ and hence fetches the new notices. This way, though has been working, leads to sometimes Heroku idling the instance of the application and no ping is able to reach or somehow the metakgp cron job fails.

Solution

It would be better to use the Heroku scheduler add-on and modify the script to work with it and set a periodic frequency for the script (or function) to run and send new notices. This would make us totally independent of the cron job as well.

Ref:

@thealphadollar
Copy link
Contributor Author

@iakshat I believe this issue may interest you.

@icyflame @amrav @ghostwriternr Please provide your suggestions and ideas from experience on how we can tackle this issue in a better way and then we will let @iakshat take the decision as to how he wants to solve.

@icyflame
Copy link
Member

icyflame commented Nov 4, 2020

@thealphadollar

Currently, we have a cron job on the metakgp digital ocean server that pings the URL https://mftp.herokuapp.com/ and hence fetches the new notices

The purpose of the ping is to keep the dyno running constantly. As you can see, the /ping handler doesn't do anything 🙆‍♂️

mftp/main.py

Lines 40 to 45 in 99aed4a

class PingHandler(tornado.web.RequestHandler):
def head(self):
return
def get(self):
return

It doesn't have anything to do with fetching the notices. The notices are fetched using the ioloop:

mftp/main.py

Lines 53 to 55 in 99aed4a

tornado.ioloop.PeriodicCallback(run_updates,
UPDATE_PERIOD).start()
ioloop.start()

@icyflame
Copy link
Member

icyflame commented Nov 4, 2020

no ping is able to reach or somehow the metakgp cron job fails

Oh, is this the reason for the alerts that we see from Papertrail frequently? If this is the root cause, can you put the steps you took for the investigation and how you arrived at this conclusion in this issue? 🙇‍♂️

@iakshat
Copy link
Member

iakshat commented Nov 4, 2020

I think @thealphadollar may have misunderstood the scraper part but the problem is correctly pointed out by him that the cron is failing to send the ping. I verified it with the logs. I don't know what's wrong with the metakgp server but the cron jobs have failed many times in the last two days and thus the heroku instance goes to sleep.

Surprizingly, the script wasn't running from 5am to around 12 today and we didn't notice.
Logs:
Nov 04 05:13:59 mftp heroku/web.1 Idling
Nov 04 05:13:59 mftp heroku/web.1 State changed from up to down
Nov 04 05:14:00 mftp heroku/web.1 Stopping all processes with SIGTERM
Nov 04 05:14:00 mftp heroku/web.1 Process exited with status 143
Nov 04 11:50:47 mftp heroku/web.1 Unidling

@icyflame I think we at least need some sort of heartbeat ping monitor that can help check the script status.

@icyflame
Copy link
Member

icyflame commented Nov 5, 2020

the problem is correctly pointed out by him that the cron is failing to send the ping

😱 Oh, interesting! Did you check the cron logs? I believe you can find them in /var/log/cron.log or something like that. (I will check in a few hours too, if you can't find anything)

My understanding is that the cron shouldn't fail and we shouldn't need to move from the existing Free dyno as long as the ping works.

Although, if we want to move to Heroku Scheduler just to improve our system, I will leave that up to you 🙆‍♂️

@iakshat
Copy link
Member

iakshat commented Nov 5, 2020

Did you check the cron logs? I believe you can find them in /var/log/cron.log

Sorry I don't have access to those rn.

we shouldn't need to move from the existing Free dyno

Yes! I think so too. It's not worth it to pay $7 a month just to keep server alive. I googled and found some services available particularly for this 🤩.

@icyflame
Copy link
Member

I set up the cron on my personal Metakgp server account for the time being 🙆‍♂️

icyflame@metakgp-blr:~$ crontab -l | grep -v '^#'
*/5 * * * * curl mftp.herokuapp.com -o NUL

This should ensure that MFTP won't go down 😌 Meanwhile, we can move this into a docker container that's part of metakgp-wiki or mftp.

@thealphadollar
Copy link
Contributor Author

@icyflame I think keeping it separate and using Heroku scheduler would be a better option as that would keep everything else the same - automatic deployments, no hassle from our side - push and forget mechanism.

Using the method listed on heroku scheduler, we will get the following benefits alongside the Heroku free environment and serverless approach.

  1. Use less dyno hours compared to current which keeps the application online.
  2. Specify a more suitable interval - say 10 minutes in the scheduler itself.
  3. Remove REST endpoint clutter and keep single function which can be triggered by running python main.py

@icyflame What do you say? Should we go with scheduler or move to a separate instance on the digital ocean server?

@iakshat Can you research more on this as you get time and let us know the exact changes that may be required to make this migration?

Ref: https://devcenter.heroku.com/articles/scheduler

@icyflame
Copy link
Member

image

Sure, Scheduler looks good ✅ Being able to remove the REST API clutter would be a major plus 👍 This caveat seems unlikely, it says "very rare" 😅 🙆‍♂️

@nilesh05apr
Copy link

@thealphadollar can we use github actions for this?

@thealphadollar
Copy link
Contributor Author

We should be able to replace it with a corn schedule inside a GitHub action that does the same action as we do from our server. If you are interested, please take it up and ping me for any help you need.

@nilesh05apr
Copy link

@thealphadollar i am working on it. Just few questions we need to execute only update.py every nth minute right? also what should be the duration of the update time, its 2 min currently, should we keep it same or update it?. GIthub provides 2000 min/month of free workflows. can you please confirm the details.
Thank you.

@thealphadollar
Copy link
Contributor Author

@nilesh05apr If we are referring to the Usage page, the execution would simply send a curl to remind the worker of the task and then fire and forget. So, on the execution side, we will not reach 2000 minutes. However, we can increase the limit to 10 minutes; the rationale is most CDC notifications are not time sensitive to expire or reach urgent in just a small amount of time.

Thanks for the research, Nilesh.

@nilesh05apr
Copy link

I have made a PR please review the changes.
Thanks

@proffapt
Copy link
Member

Closing this since, it won't be possible to host this on heroku anymore because of the discontinuation of free plan on heroku. And the workflow will be different and also already considered int the revamp todo list: #55

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Hacktoberfest Issues that can be solved for Hacktoberfest help wanted research
Projects
None yet
Development

No branches or pull requests

5 participants