Varnish "Connection reset by peer" error when large catalog is reindexed on schedule #8815

erikhansen · 2017-03-07T19:21:13Z

Addition information for Re-Open

The Issue was reopened based on several complaints that it was not properly fixed and it still can be reproduced on the 2.2.8 and 2.3.1
For more details and explanation see next comments:

Varnish "Connection reset by peer" error when large catalog is reindexed on schedule #8815 (comment)
Varnish "Connection reset by peer" error when large catalog is reindexed on schedule #8815 (comment)

Preconditions

Magento EE 2.1.5 with ~10K sample products
Server is running PHP version 7.0.15
All indexes are configured to "UPDATE BY SCHEDULE"
Cron job is running on server. Cron error email address is configured with your email so you get notifications of errors.
Server is configured to use Varnish for Full Page Cache
Varnish is configured with http_req_hdr_len and http_req_size configured at their default values (per the documentation)

Steps to reproduce

In the admin, go to SYSTEM > Export and export a CSV of all ~10K products
Import the CSV back into Magento using the SYSTEM > Import interface

Expected result

The import will be processed without any errors.

Actual result

The cron will have an error when reindexing and will send an error like this to the cron email address:

PHP Notice: fwrite(): send of 8192 bytes failed with errno=104 Connection reset by peer in /var/www/stage/releases/20170213214641/vendor/zendframework/zend-http/src/Client/Adapter/Socket.php on line 376

The error above is triggered by line 375 of vendor/zendframework/zend-http/src/Client/Adapter/Socket.php

In order to determine what request was causing the error, I edited the Socket.php file to add logging code at line 375 to log large requests, along with a backtrace:

if (strlen($request) > 8000) {
    file_put_contents('/var/www/stage/current/var/log/connection_error.log', date('c') . PHP_EOL . print_r(debug_backtrace(DEBUG_BACKTRACE_IGNORE_ARGS, 100), true), FILE_APPEND);
    file_put_contents('/var/www/stage/current/var/log/connection_error.log', date('c') . PHP_EOL . print_r($request, true), FILE_APPEND);
}

The requests that were causing errors were ~1.1MB in size, which exceeded the http_req_size default Varnish setting (which is 32k). While we could crank up that setting to allow the massive requests to be sent to Varnish, this would increase the Varnish memory footprint and is not an acceptable solution. Read this for context on the memory implications: https://www.varnish-cache.org/docs/4.0/reference/varnishd.html#http-req-size

Here is a preview of what the 1.1MB request contained:

PURGE / HTTP/1.1
X-Magento-Tags-Pattern: ((^|,)catalog_product_6574(,|$))|((^|,)catalog_product_6575(,|$))|((^|,)catalog_product_6576(,|$))|((^|,)catalog_product_6577(,|$))|((^|,)catalog_product_6578(,|$))|((^|,)catalog_product_6579(,|$))|((^|,)catalog_product_6580(,|$))|((^|,)catalog_product_6581(,|$))|((^|,)catalog_product_6582(,|$))|((^|,)catalog_product_6583(,|$))|((^|,)catalog_product_6584(,|$))|((^|,)catalog_product_6585(,|$))|((^|,)catalog_product_6586(,|$))|((^|,)catalog_product_6587(,|$))|((^|,)catalog_product_6588(,|$))|((^|,)catalog_product_6589(,|$))|((^|,)catalog_product_6590(,|$))|((^|,)catalog_product_6591(,|$))|((^|,)catalog_product_6593(,|$))|((^|,)catalog_product_6594(,|$))|((^|,)catalog_product_6595(,|$))|((^|,)catalog_product_6596(,|$))|((^|,)catalog_product_6597(,|$))|((^|,)catalog_product_6598(,|$))|((^|,)catalog_product_6599(,|$))|((^|,)catalog_product_6600(,|$))|((^|,)catalog_product_6601(,|$))|((^|,)catalog_product_6602(,|$))|((^|,)catalog_product_6603(,|$))|((^|,)catalog_product_6604(,|$))|((^|,)catalog_product_6605(,|$))|((^|,)catalog_product_6606(,|$))|((^|,)catalog_product_6607(,|$))|((^|,)catalog_product_6608(,|$))|((^|,)catalog_product_6644(,|$))|((^|,)catalog_product_6645(,|$))|((^|,)catalog_product_6646(,|$))|((^|,)catalog_product_6647(,|$))|((^|,)catalog_product_6648(,|$))|((^|,)catalog_product_6649(,|$))|((^|,)catalog_product_6650(,|$))|((^|,)catalog_product_6651(,|$))|((^|,)

In case it is useful for troubleshooting this issue, here is the backtrace logged by the logging code I added to the Socket.php file:

The text was updated successfully, but these errors were encountered:

AirmanAJK · 2017-03-12T03:42:53Z

My solution to this problem was to add 2 plugins to Magento:
afterGetOutput for "\Magento\Framework\View\Layout" (sortOrder > 1)
beforeSendPurgeRequest for "\Magento\CacheInvalidate\Model\PurgeCache"

Both solutions check if the 'X-Magento-Tags' header exists, and if so, truncates it to 1000 characters. The obvious implication here is that if you do any operation that would have such a long tags header, you must manually clear your full page cache to make sure no stale content is improperly stored.

Here is a snippet from my Layout plugin:

public function afterGetOutput(\Magento\Framework\View\Layout $subject, $result) {
	$tagsHeader = $this->response->getHeader('X-Magento-Tags');
	if($tagsHeader) {
		$tagsValue = $tagsHeader->getFieldValue();
		if(strlen($tagsValue) > 1000) {
			$shortTagsHeader = substr($tagsValue, 0, 1000);
			$this->response->setHeader('X-Magento-Tags', $shortTagsHeader, true);
		}
	}
	return $result;
}

and here is a snippet from my PurgeCache plugin:

public function beforeSendPurgeRequest(\Magento\CacheInvalidate\Model\PurgeCache $subject, $tagsPattern) {
	if(strlen($tagsPattern) > 1000) {
		return substr($tagsPattern, 0, 1000);
	} else {
		return [$tagsPattern];
	}
}

I realize that this may not be a great solution for the Magento core code, but I think it's fine to use for anyone who understands the implications. Personally, I disable all cache flushing short of the "everything" (.*) value. Magento invalidates too much too often IMO. This makes me responsible for clearing caches after any changes, and I'm fine with that since we only make changes once a day.

Hope this helps.

Vedrillan · 2017-03-15T10:18:22Z

In my opinion the best approach is to send the purge request by fix batch size, based on amount of tags or request length.

This can be achieved by replacing the observer Magento\CacheInvalidate\Observer\InvalidateVarnishObserver and adding the batching mechanism.

I might as well do a PR if I have the time in the next days.

erikhansen · 2017-03-15T15:06:35Z

@AirmanAJK Thanks for sharing your workaround, but as you mentioned, that solution is not a good solution for the core code.

@Vedrillan A batching solution sounds like a good solution to this problem.

swayoleg · 2017-04-18T15:50:18Z

@AirmanAJK Thanks but here are results of implementation on Magento 2.1.4 Enterprice

`exception(s):
Exception #0 (Exception): Notice: Undefined property: MageXo\Fixers\Plugin\Layout\Interceptor::$response in /app/app/code/MageXo/Fixers/Plugin/Layout.php on line 8

Exception #0 (Exception): Notice: Undefined property: MageXo\Fixers\Plugin\Layout\Interceptor::$response in /app/app/code/MageXo/Fixers/Plugin/Layout.php on line 8
#0 /app/app/code/MageXo/Fixers/Plugin/Layout.php(8): Magento\Framework\App\ErrorHandler->handler(8, 'Undefined prope...', '/app/app/code/M...', 8, Array)
#1 /app/var/generation/MageXo/Fixers/Plugin/Layout/Interceptor.php(24): MageXo\Fixers\Plugin\Layout->afterGetOutput(Object(Magento\Framework\View\Layout\Interceptor), '<div class="men...')
#2 /app/vendor/magento/framework/Interception/Interceptor.php(152): MageXo\Fixers\Plugin\Layout\Interceptor->afterGetOutput(Object(Magento\Framework\View\Layout\Interceptor), '<div class="men...')
#3 /app/var/generation/Magento/Framework/View/Layout/Interceptor.php(494): Magento\Framework\View\Layout\Interceptor->___callPlugins('getOutput', Array, Array)
#4 /app/vendor/magento/framework/View/Result/Page.php(243): Magento\Framework\View\Layout\Interceptor->getOutput()
#5 /app/vendor/magento/framework/View/Result/Layout.php(164): Magento\Framework\View\Result\Page->render(Object(Magento\Framework\App\Response\Http\Interceptor))
#6 /app/var/generation/Magento/Backend/Model/View/Result/Page/Interceptor.php(193): Magento\Framework\View\Result\Layout->renderResult(Object(Magento\Framework\App\Response\Http\Interceptor))
#7 /app/vendor/magento/framework/App/Http.php(139): Magento\Backend\Model\View\Result\Page\Interceptor->renderResult(Object(Magento\Framework\App\Response\Http\Interceptor))
#8 /app/vendor/magento/framework/App/Bootstrap.php(258): Magento\Framework\App\Http->launch()
#9 /app/index.php(39): Magento\Framework\App\Bootstrap->run(Object(Magento\Framework\App\Http))
#10 {main}`

franckgarnier21 · 2017-06-28T07:54:27Z

Any update of this issue ?

I report you the same issue with Magento 2 EE 2.0.7 with > 10K products

denisrpriebe · 2017-07-24T16:36:43Z

@erikhansen @AirmanAJK @Vedrillan Have you guys found a solution to this problem? Do you have any code I could try and implement? Thanks.

erikhansen · 2017-07-24T16:53:33Z

@denisrpriebe No, I don't have any patch or solution to share.

erikhansen · 2017-07-24T16:55:13Z

@denisrpriebe Actually, I do have a patch you could try, although I never tested it. If you want it, email me at erik@<domain listed in my Github profile>.

Update: I was incorrect—the patch I had was for a different issue.

denisrpriebe · 2017-07-24T17:12:52Z

@erikhansen Cool, just shot you an email.

StoneISStephan · 2017-08-18T12:00:55Z

Hi,

any news on this patch?
Did it work?

Experiencing the same issue on Magento 2.1.8

Vedrillan · 2017-08-18T14:24:56Z

I did a PR there #8919 but they can't reproduce the issue and as a result closed the PR x)

You can adapt a patch from this https://patch-diff.githubusercontent.com/raw/magento/magento2/pull/8919.diff and apply it on your project.

denisrpriebe · 2017-09-26T17:39:57Z

For anyone else experiencing this, I had to disabled "Full Page Caching" within cache management in the admin...or you can do it through the command line. Once I did that, I no longer received the notice.

AirmanAJK · 2017-09-26T18:15:08Z

@denisrpriebe Simply not using the full page cache is hardly a solution. If CMS pages weren't loading, you surely wouldn't suggest deleting them all to fix the problem, right?

denisrpriebe · 2017-09-26T18:27:06Z

@AirmanAJK I understand your point. I'm not proposing a solution but rather an alternative for what worked for me. I think some documentation is better than none when it comes to a Magento issue.

markalston · 2017-10-06T17:56:12Z

@Vedrillan Thanks! I have been having this problem forever and still have it with 2.1.9. Your patch seems like the obvious fix. No tags are lost and simply sent in batches. Hopefully it will be added to 2.2.

franckgarnier21 · 2017-10-17T08:52:26Z

@Vedrillan Thank you for your pull request, and it seems not managed correctly in 2.2 too...

They may need to improve their test coverages to the limit in this file : vendor/magento/module-cache-invalidate/Test/Unit/Observer/InvalidateVarnishObserverTest.php:66

J-Fricke · 2018-05-18T00:07:48Z

Any movement on this issue? We had this happening and then implemented a patch which sent over the requests in smaller batches, however, this sometimes ended up with way too many requests sent to varnish and overloaded the ephemeral ports eventually creating 500 errors on the site for users after getting slow to process. Our solution most likely will have to be to increase the http_req_size value to one which will accommodate these larger requests unless a better solution is proposed.

jdubbya · 2018-05-18T12:48:47Z

@J-Fricke We also received a patch that separated the invalidation request into smaller batches and quickly discovered that every time the cron jobs run a PURGE request with almost every active product in the catalog was being sent to the Varnish server for processing. This happens even when there are no updates happening in the admin or via the API. Separating the large PURGE requests into batches caused too many PURGE requests to the varnish server each minute and caused memory issues with the Varnish process because of the constant requests to remove things from cache. I expect that increasing the http_req_size may cause similar problems for you. It seems like there is another issue possibly related to indexing that is causing this in our case. Have you inspected the purge requests to see if what is being invalidated makes sense in your case?

J-Fricke · 2018-05-25T00:03:12Z

@jdubbya We are currently testing removing the chunking patch and increasing the values Varnish can send/receive based on this: https://devdocs.magento.com/guides/v2.2/config-guide/varnish/tshoot-varnish-503.html - This guide only discusses http_resp_hdr_len & http_resp_size, though only increasing those values did not fix our issue. We have a current test we'll continue over the next couple of days with increasing http_req_hdr_len & http_req_size as well to matching values as the response ones. I'll update this thread once we are confident in the testing.

As an aside, I'm curious what users would do if their catalog count * 21 exceeds the Varnish maximums would do to solve this issue? Perhaps a combinations of chunking and increasing the values to the max. I was pointed to a solution Fast.ly used for their extension which also leveraged some chunking, so that's probably the case.

m2-assistant · 2019-10-17T06:24:55Z

Hi @engcom-Delta. Thank you for working on this issue.
In order to make sure that issue has enough information and ready for development, please read and check the following instruction: 👇

1. Verify that issue has all the required information. (Preconditions, Steps to reproduce, Expected result, Actual result).
Details
If the issue has a valid description, the label Issue: Format is valid will be added to the issue automatically. Please, edit issue description if needed, until label Issue: Format is valid appears.
2. Verify that issue has a meaningful description and provides enough information to reproduce the issue. If the report is valid, add Issue: Clear Description label to the issue by yourself.
3. Add Component: XXXXX label(s) to the ticket, indicating the components it may be related to.
4. Verify that the issue is reproducible on 2.3-develop branch
Details
- Add the comment @magento give me 2.3-develop instance to deploy test instance on Magento infrastructure.
- If the issue is reproducible on 2.3-develop branch, please, add the label Reproduced on 2.3.x.
- If the issue is not reproducible, add your comment that issue is not reproducible and close the issue and stop verification process here!
5. Add label Issue: Confirmed once verification is complete.
6. Make sure that automatic system confirms that report has been added to the backlog.

engcom-Delta · 2019-10-17T08:52:33Z

@joaolisboa @gigadesign1 @pczerkas Unfortunately, I cannot reproduce issue on 2.3-develop CE and 2.3-develop EE.
Testing scenario:

Generate 10K products
Configure varnish v.4.1.1
Set indexers to "UPDATE BY SCHEDULE"
Run bin/magento cron:install
In the admin, go to SYSTEM > Export and export a CSV of all ~10K products
Import the CSV back into Magento using the SYSTEM > Import interface

Result:
✔️ Import is finished without errors.

Go to Catalog->Product
Check products using checkbox ( in my case I used Select All for testing)
Click Actions->Update Attributes
Set Advanced Inventory->Stock Availability=Out of Stock
Click Save

Result:
✔️ Stock Status is updated without errors

Could you take a look if my steps are correct?

Vedrillan · 2019-10-17T09:17:42Z

From the 1st update The cron will have an error when reindexing, the error is not in the BO, it is in the reindexing cronjob...

ihor-sviziev · 2020-04-02T15:50:18Z

@engcom-Delta,
Could you review comment following comment #8815 (comment) and try to reproduce the issue during reindexing cron job run?

magento-engcom-team · 2020-08-13T16:35:29Z

Hi @erikhansen. Thank you for your report.
The issue has been fixed in #26256 by @moloughlin in 2.4-develop branch
Related commit(s):

The fix will be available with the upcoming 2.4.1 release.

…-05-2024 [Support Tier-4-Kings glo16746] 03.05.2024 Regular delivery of bugfixes and improvements

Vedrillan mentioned this issue Jul 12, 2017

Varnish "Connection reset by peer" error when large catalog is reindexed on schedule Issue #8815 #8919

Closed

4 tasks

magento-engcom-team added G1 Passed Issue: Format is valid Gate 1 Passed. Automatic verification of issue format passed and removed G1 Passed labels Sep 5, 2017

ghost self-assigned this Sep 7, 2018

engcom-Delta self-assigned this Oct 17, 2019

engcom-Delta added the Progress: needs update label Oct 17, 2019

engcom-Delta removed the Progress: needs update label Nov 11, 2019

ghost unassigned engcom-Delta Nov 11, 2019

engcom-Echo assigned engcom-Echo and unassigned engcom-Echo Dec 6, 2019

magento deleted a comment from m2-assistant bot Dec 9, 2019

wujku added a commit to ageno/magento2 that referenced this issue Jan 28, 2020

Fixed splitting purge request chunks magento#8815

0bb73f4

wujku mentioned this issue Jan 28, 2020

Fixed #8815 splitting purge request chunks #26566

Closed

4 tasks

magento-engcom-team added this to PR In Progress in Community Backlog Mar 24, 2020

sdzhepa added the Progress: PR in progress label Mar 26, 2020

moloughlin mentioned this issue Apr 22, 2020

GH-26255: Refactor CacheInvalidate sendPurgeRequest to fix incorrect tag splitting #26256

Merged

4 tasks

ghost assigned moloughlin Apr 22, 2020

ihor-sviziev mentioned this issue Apr 22, 2020

Magento_CacheInvalidate mis-handles very large tag patterns when doing a PURGE #26255

Closed

engcom-Alfa added the Priority: P2 A defect with this priority could have functionality issues which are not to expectations. label Aug 11, 2020

magento-engcom-team added the Fixed in 2.4.x The issue has been fixed in 2.4-develop branch label Aug 13, 2020

magento-engcom-team closed this as completed Aug 13, 2020

ghost moved this from PR In Progress to Done (last 30 days) in Community Backlog Aug 13, 2020

ghost added Progress: done and removed Progress: PR in progress labels Aug 13, 2020

sdzhepa mentioned this issue May 10, 2022

2.4.1 Aligent Consulting sdzhepa/magento2#29

Open

magento-devops-reposync-svc pushed a commit that referenced this issue Mar 13, 2024

Merge pull request #8815 from adobe-commerce-tier-4/Tier4-Kings-PR-03…

39d54c2

…-05-2024 [Support Tier-4-Kings glo16746] 03.05.2024 Regular delivery of bugfixes and improvements

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Varnish "Connection reset by peer" error when large catalog is reindexed on schedule #8815

Varnish "Connection reset by peer" error when large catalog is reindexed on schedule #8815

erikhansen commented Mar 7, 2017 •

edited by sdzhepa

Loading

AirmanAJK commented Mar 12, 2017

Vedrillan commented Mar 15, 2017

erikhansen commented Mar 15, 2017

swayoleg commented Apr 18, 2017 •

edited

Loading

franckgarnier21 commented Jun 28, 2017 •

edited

Loading

denisrpriebe commented Jul 24, 2017

erikhansen commented Jul 24, 2017

erikhansen commented Jul 24, 2017 •

edited

Loading

denisrpriebe commented Jul 24, 2017

StoneISStephan commented Aug 18, 2017

Vedrillan commented Aug 18, 2017

denisrpriebe commented Sep 26, 2017

AirmanAJK commented Sep 26, 2017

denisrpriebe commented Sep 26, 2017

markalston commented Oct 6, 2017

franckgarnier21 commented Oct 17, 2017 •

edited

Loading

J-Fricke commented May 18, 2018

jdubbya commented May 18, 2018

J-Fricke commented May 25, 2018 •

edited

Loading

m2-assistant bot commented Oct 17, 2019

engcom-Delta commented Oct 17, 2019

Vedrillan commented Oct 17, 2019

ihor-sviziev commented Apr 2, 2020

magento-engcom-team commented Aug 13, 2020

Varnish "Connection reset by peer" error when large catalog is reindexed on schedule #8815

Varnish "Connection reset by peer" error when large catalog is reindexed on schedule #8815

Comments

erikhansen commented Mar 7, 2017 • edited by sdzhepa Loading

Addition information for Re-Open

Preconditions

Steps to reproduce

Expected result

Actual result

AirmanAJK commented Mar 12, 2017

Vedrillan commented Mar 15, 2017

erikhansen commented Mar 15, 2017

swayoleg commented Apr 18, 2017 • edited Loading

franckgarnier21 commented Jun 28, 2017 • edited Loading

denisrpriebe commented Jul 24, 2017

erikhansen commented Jul 24, 2017

erikhansen commented Jul 24, 2017 • edited Loading

denisrpriebe commented Jul 24, 2017

StoneISStephan commented Aug 18, 2017

Vedrillan commented Aug 18, 2017

denisrpriebe commented Sep 26, 2017

AirmanAJK commented Sep 26, 2017

denisrpriebe commented Sep 26, 2017

markalston commented Oct 6, 2017

franckgarnier21 commented Oct 17, 2017 • edited Loading

J-Fricke commented May 18, 2018

jdubbya commented May 18, 2018

J-Fricke commented May 25, 2018 • edited Loading

m2-assistant bot commented Oct 17, 2019

engcom-Delta commented Oct 17, 2019

Vedrillan commented Oct 17, 2019

ihor-sviziev commented Apr 2, 2020

magento-engcom-team commented Aug 13, 2020

erikhansen commented Mar 7, 2017 •

edited by sdzhepa

Loading

swayoleg commented Apr 18, 2017 •

edited

Loading

franckgarnier21 commented Jun 28, 2017 •

edited

Loading

erikhansen commented Jul 24, 2017 •

edited

Loading

franckgarnier21 commented Oct 17, 2017 •

edited

Loading

J-Fricke commented May 25, 2018 •

edited

Loading