Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prevent running again already running cron group #12497

Merged

Conversation

paveq
Copy link
Contributor

@paveq paveq commented Nov 30, 2017

Description

Prevents running the same cron group concurrently. Alternative implementation to #11465

Fixed Issues (if relevant)

  1. Cron starts when it's already running #10650: Cron starts when it's already running

Manual testing scenarios

  1. Define cron job that lasts a while, or add sleep() call under _runJob() method
  2. Run cron:run entry point multiple times

Contribution checklist

  • Pull request has a meaningful description of its purpose
  • All commits are accompanied by meaningful commit messages
  • All new or changed code is covered with unit/integration tests (if applicable)
  • All automated tests passed successfully (all builds on Travis CI are green)

@paveq
Copy link
Contributor Author

paveq commented Nov 30, 2017

@dmanners @ihor-sviziev I've created this MVP implementation of cron group based locking. This is now being tested in Vaimo internally too.

@magento-engcom-team magento-engcom-team added bugfix Reproduced on 2.1.x The issue has been reproduced on latest 2.1 release Reproduced on 2.2.x The issue has been reproduced on latest 2.2 release SQUASHTOBERFEST labels Nov 30, 2017
Copy link
Contributor

@ihor-sviziev ihor-sviziev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @paveq,
Thank you so much for such good PR!
I think there should be done few fixes. Could you review my comments?

* @param string $name lock name
* @return bool
*/
public function setLock($name, $timeout = -1)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As magento 2.2 and later support php 7+ only, could you scecify types for these variables? Same for other methods in this file + interface.

Also would be great to add declare strict types for new file.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another thing:

MySQL 5.7.5 and later enforces a maximum length on lock names of 64 characters. Previously, no limit was enforced.

I think it should be checked in order to prevent mysql errors there

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Implemented scalar type hints.

private $resource;

public function __construct(
\Magento\Framework\App\ResourceConnection $resource
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you import this resource connection, it will be more readable there.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.


namespace Magento\Framework\lock\Backend;

class Database implements \Magento\Framework\Lock\LockManagerInterface
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be great to cover this class with unit and integration tests. Could you add them?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, I'll work on implementing some test cases for this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Test cases added.

@ihor-sviziev ihor-sviziev added the Reproduced on 2.3.x The issue has been reproduced on latest 2.3 release label Dec 1, 2017
@paveq
Copy link
Contributor Author

paveq commented Dec 1, 2017

In order for locks not to conflict in setups where multiple Magento installations are running in single database, we need to use some kind of installation specific unique prefix for the lock.

Framework\Cache\Core::_id() and Framework\App\Cache\Frontend\Factory already contain similar mechanism that is used by cache, and could be used by locks too. However that method is protected, and injecting cache core to this code seems not a correct solution anyway.

Would it make sense for this unique installation specific id generation to be moved away from Cache Core, and be available in a generic way? (I'm suggesting that in scope of this PR we make such mechanism/API interface available, but not refactor Cache Core yet, otherwise the scope might grow to be quite big).

@paveq
Copy link
Contributor Author

paveq commented Dec 1, 2017

Before MySQL 5.7.5, only a single simultaneous lock (per session) can be acquired and GET_LOCK() releases any existing lock (on the same connection).

In MySQL 5.7.5, GET_LOCK() was reimplemented using the metadata locking (MDL) subsystem and its capabilities were extended. Multiple simultaneous locks can be acquired and GET_LOCK() does not release any existing locks.

Does this pose an issue? In Magento system requirements we support MySQL 5.6 onwards.

Until system requirements have been raised to MySQL 5.7.5, perhaps we can check that only one lock can be acquired through LockManager at once? Ideally we would support multiple locks from MySQL 5.7.5 upwards, but at this moment that feature is not needed (YAGNI), and there appears to be no easy way to detect MySQL version from the connection.

Copy link
Contributor

@ihor-sviziev ihor-sviziev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi,

I don't think we need to check mysql version there. Current implementation looks good.

One thing: now Magento\Cron\Test\Unit\Observer\ProcessCronQueueObserverTest fails, could you update it?

/**
* @var \Magento\Framework\Lock\Backend\Database
*/
protected $model;
Copy link
Contributor

@ihor-sviziev ihor-sviziev Dec 1, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you use private visibility instead of protected (for all methods in this file)?

* @param string $name lock name
* @return bool
*/
public function releaseLock(string $name): bool
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add @throws to phpdoc block there?

return (bool)$this->resource->getConnection()->query("SELECT IS_USED_LOCK(?);", array((string)$name))->fetchColumn();
}

private function checkLength(string $name)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add phpdoc block there?

Copy link
Contributor

@ihor-sviziev ihor-sviziev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks really good!

\Magento\Framework\App\State $state
\Magento\Framework\App\State $state,
StatFactory $statFactory,
\Magento\Framework\Lock\LockManagerInterface $lockManager

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't BC be preserved?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@adrian-martinez-interactiv4, We treat Observers and Plugins as internal implementation and don't try to preserve BC for them

@erikhansen
Copy link
Contributor

I've applied this patch to two production sites. While it does seem to resolve the duplicate CRON issue, it seems to have introduced a regression: The cron_schedule table does not seem to be getting cleaned according to the "History Lifetime" settings. I have set the lifetime value at 180 (3 hours), however there are many records in the cron_schedule table from the previous couple of weeks.

Has anyone else seen this issue? I didn't open a separate Github issue for this, as this code isn't in a GA release yet, so I can't imagine many have run into this. In the #11002 (comment) issue, someone suggested a solution to solve this problem, however it seems like that should not be necessary if the success lifetime is set to 180 for all CRON groups.

@kandy
Copy link
Contributor

kandy commented Jun 4, 2018

Hi @erikhansen.

Looks like problem introduced by this code https://github.com/paveq/magento2/blob/175229e23b5df6473bebf7acd24356a38ca6c9e3/app/code/Magento/Cron/Observer/ProcessCronQueueObserver.php#L502

Need change it to

foreach ($historyLifetimes as $status => $time) {
            $count += $connection->delete(
                $scheduleResource->getMainTable(),
                [
                    'status = ?' => $status,
                    'job_code in (?)' => array_keys($jobs),
                    'created_at < ?' => $connection->formatDate($currentTime - $time)
                ]
            );
        } 

Can I ask you to create separate issue for this?

@Ian410
Copy link
Member

Ian410 commented Jun 4, 2018

I implemented this on a 2.2.4 production site. It definitely prevents duplicates from running but in our case, it didn't run jobs at all if one of the jobs ended up taking a long time. Since it's locking groups I think if a job in the default group takes a long time it won't schedule or launch other groups; at least that's my working theory right now. I had to revert this set of changes.

@ihor-sviziev
Copy link
Contributor

ihor-sviziev commented Jun 5, 2018

@kandy feel free to create PR for this, even without issue reporting

@erikhansen
Copy link
Contributor

@kandy Can you expound on the change you mentioned in your comment from a couple of weeks ago? The only difference I see in your codeblock is that this line:

'status = ?' => Schedule::STATUS_PENDING,

should be changed to:

'status = ?' => $status,

However if you look at the source file, there is no $status variable defined.

Can you let me know what this should be changed to, and I'll change it and test it on a stage environment? I can then create a PR for this, expounding on the issue along with your suggested fix.

@hostep
Copy link
Contributor

hostep commented Jun 18, 2018

@erikhansen, look at his updated foreach loop, you'll see the variable in there ;)

foreach ($historyLifetimes as $status => $time) {

@erikhansen
Copy link
Contributor

@hostep Thanks. I just realized that independently and was coming back here to post when I saw your comment. I'm deploying this to a stage environment that had 1.2M rows in cron_schedule and once I confirm this fixes the issue, I'll submit a PR to resolve this.

@georgebotley
Copy link

@erikhansen Can you confirm if this has fixed issues for you? We've got a few servers with production Magento sites on and it brings MySQL to its knees. It'd be great if this is a word around fixes things..

@erikhansen
Copy link
Contributor

erikhansen commented Jun 29, 2018

@torindul Applying the contents of this PR as a patch to a Magento 2.2.3 site fixes the issue with the same CRON job running at the same time. By the way, this Gist contains a query that will show you if you have a problem with this.

However applying the patch introduced an issue where CRON jobs are not getting cleaned up, as I explained in my comment from 27 days ago. I made the change that @kandy suggested but that did not solve my issue. So until I make time to investigate a proper solution for it, I've implemented a custom CRON job as mentioned here. UPDATE: See my comment from July 16th, 2018 below.

For anyone wanting to apply this PR as a patch to their environment, here are patches that can be applied via composer-patches (you'll need to remove .txt from the filenames). However be aware that is introduces a fairly serious regression of CRONs not being cleaned up. UPDATE: This was not actually accurate.

@erikhansen
Copy link
Contributor

I'm currently going through the process of upgrading a site from 2.2.3 > 2.2.5. In the process, I realized that the changes in this PR are included in 2.2.5. I expect that anyone upgrading to 2.2.5 will experience the issue with CRONs not getting cleaned up.

I will soon be upgrading a vanilla Magento site to 2.2.5 and if I'm able to reproduce the CRON issue, I'll open a Github issue with full details.

@kandy
Copy link
Contributor

kandy commented Jul 2, 2018

hi @erikhansen, thank for so deep investigation
please pay attention that fix for job cleanup already included in 2.2.5
https://github.com/magento/magento2/blob/2.2.5/app/code/Magento/Cron/Observer/ProcessCronQueueObserver.php#L506

@erikhansen
Copy link
Contributor

@kandy Thanks. I was aware that the fix you referenced was included in 2.2.5. However I have confirmed that this is still an issue. I will be opening a new Github issue this morning and will post a link here once I've done that.

@kandy
Copy link
Contributor

kandy commented Jul 3, 2018

@erikhansen, Also, pay attention that cron history cleanup time was changed to 7 days in commit
ba63d94

@erikhansen
Copy link
Contributor

@kandy I finally made time to look into this. I was wrong: 2.2.5 does not have an issue with CRONs that never get deleted. What I was seeing was exactly what you pointed out: the cron_schedule gets filled with so many rows because the default is now for Magento to store 7 days of history. This means that a vanilla M2 install will have 10K+ rows in the cron_schedule table on a regular basis.

In 2.2.4 and prior, having a large row count in the cron_schedule table occasionally caused issues since the entries in that table were deleted one-by-one—that is why I was initially concerned about 2.2.5. However now that the DELETE query has been optimized, this should no longer be an issue.

For the record, here's what a DELETE statement looks like on 2.2.5:

DELETE FROM `cron_schedule` 
    WHERE (status = 'pending') 
        AND (job_code in ('staging_apply_version', 'staging_remove_updates', 'staging_synchronize_entities_period')) 
        AND (created_at < '2018-05-31 17:55:04')

Compared to 2.2.4 and before:

DELETE FROM `cron_schedule` WHERE (schedule_id = '6594032')

@csdougliss
Copy link
Contributor

csdougliss commented Jul 8, 2019

@paveq @hostep @sidolov @magento-engcom-team @nmalevanec

I am getting:

[2019-07-08 16:11:08] main.WARNING: Could not acquire lock for cron group: index, skipping run [] []
[2019-07-08 16:11:08] main.WARNING: Could not acquire lock for cron group: ddg_automation, skipping run [] []
[2019-07-08 16:11:08] main.WARNING: Could not acquire lock for cron group: consumers, skipping run [] []

I have emptied cron_schedule, still same issue. How do I clear the locks down???

This is how my cron runs:

flock -w 1 /var/www/vhosts/xx/current/var/cron.lock php -f /var/www/vhosts/xx/current/bin/magento cron:run | grep -v "Ran jobs by schedule" >> /var/www/vhosts/xx/current/var/log/magento.cron.log

@kandy
Copy link
Contributor

kandy commented Jul 8, 2019

@craigcarnell Can I ask you create separate issue for this? Also, can you describe your environment?

@hostep
Copy link
Contributor

hostep commented Jul 8, 2019

@craigcarnell: it might have something to do with your cron.lock file existing in the current directory. Which means you could potentially have 2 or more cron executions still running next to each other just after you deployed a new release. I'd advise you to put the cron.lock file in a directory which doesn't change in between different deploys.
Also: we are no longer use the flock "fix", Magento 2.2.5 and later now handle locking on the database level since this PR got merged.

@mowny
Copy link

mowny commented Jul 23, 2020

We're still running 2.2.8. It doesn't run very fast, though. A big part of this was cronjobs piling up due to taking longer than a minute to run because of mysql congestion.

Examining cron_schedule, I found:
291: if ($scheduledTime < $currentTime - $scheduleLifetime) {
should move planned entries from pending to missed after "Missed if not Run Within" expired, correct? Then why do I have thousands of entries in cron_schedule that are still pending, and only being purged after "Failure History Lifetime"? (The oldest I see match the 4320 minutes "Failure History Lifetime".)

IMO there should ever only be one scheduled job of a given type in "pending" state which is past its scheduled time, as all others that were scheduled even earlier can be considered "missed"; there is no benefit in having them still pending, as multiple runs will not "catch up" any better than a single run.

@dmanners dmanners removed their assignment Jul 24, 2020
@ghost ghost assigned dmanners Jul 24, 2020
@ghost
Copy link

ghost commented Jul 24, 2020

@dmanners unfortunately, only members of the maintainers team are allowed to unassign developers from the pull request

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Award: complex bugfix Partner: Vaimo Pull Request is created by partner Vaimo partners-contribution Pull Request is created by Magento Partner Progress: accept Release Line: 2.2 Reproduced on 2.1.x The issue has been reproduced on latest 2.1 release Reproduced on 2.2.x The issue has been reproduced on latest 2.2 release Reproduced on 2.3.x The issue has been reproduced on latest 2.3 release
Projects
None yet
Development

Successfully merging this pull request may close these issues.