Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Data transfer between web server and redis cache is too high #33685

Open
1 of 5 tasks
onlinebizsoft opened this issue Aug 4, 2021 · 54 comments
Open
1 of 5 tasks

Data transfer between web server and redis cache is too high #33685

onlinebizsoft opened this issue Aug 4, 2021 · 54 comments
Assignees
Labels
Issue: needs update Additional information is require, waiting for response not-confirmed Use for Issue that was closed during confirmation Reported on 2.4.x Indicates original Magento version for the Issue report. Triage: Dev.Experience Issue related to Developer Experience and needs help with Triage to Confirm or Reject it

Comments

@onlinebizsoft
Copy link

onlinebizsoft commented Aug 4, 2021

Summary (*)

We run a Magento site on multiple servers on multiple aws regions, the website have multple domains and multiple languages and the catalog has many products as well.

We relealized that the data transfer cost on the aws bill is quite much higher than normal. It cover up to 50-60% the total.

We figured out that most of data transfer was from Elasticache (one region) to EC2.

Examples (*)

Proposed solution

From our side we are looking at 2 things

  1. Setup multiple regions for elasticache
    https://aws.amazon.com/blogs/database/reduce-cost-and-boost-throughput-with-global-datastore-for-amazon-elasticache-for-redis/
  2. Enable L2 caching
    https://devdocs.magento.com/guides/v2.4/config-guide/cache/two-level-cache.html

I think the code core need to be re-worked.


Please provide Severity assessment for the Issue as Reporter. This information will help during Confirmation and Issue triage processes.

  • Severity: S0 - Affects critical data or functionality and leaves users with no workaround.
  • Severity: S1 - Affects critical data or functionality and forces users to employ a workaround.
  • Severity: S2 - Affects non-critical data or functionality and forces users to employ a workaround.
  • Severity: S3 - Affects non-critical data or functionality and does not force users to employ a workaround.
  • Severity: S4 - Affects aesthetics, professional look and feel, “quality” or “usability”.
@onlinebizsoft onlinebizsoft added the Triage: Dev.Experience Issue related to Developer Experience and needs help with Triage to Confirm or Reject it label Aug 4, 2021
@m2-assistant
Copy link

m2-assistant bot commented Aug 4, 2021

Hi @onlinebizsoft. Thank you for your report.
To help us process this issue please make sure that you provided the following information:

  • Summary of the issue
  • Information on your environment
  • Steps to reproduce
  • Expected and actual results

Please make sure that the issue is reproducible on the vanilla Magento instance following Steps to reproduce. To deploy vanilla Magento instance on our environment, please, add a comment to the issue:

@magento give me 2.4-develop instance - upcoming 2.4.x release

For more details, please, review the Magento Contributor Assistant documentation.

Please, add a comment to assign the issue: @magento I am working on this


⚠️ According to the Magento Contribution requirements, all issues must go through the Community Contributions Triage process. Community Contributions Triage is a public meeting.

🕙 You can find the schedule on the Magento Community Calendar page.

📞 The triage of issues happens in the queue order. If you want to speed up the delivery of your contribution, please join the Community Contributions Triage session to discuss the appropriate ticket.

🎥 You can find the recording of the previous Community Contributions Triage on the Magento Youtube Channel

✏️ Feel free to post questions/proposals/feedback related to the Community Contributions Triage process to the corresponding Slack Channel

@onlinebizsoft
Copy link
Author

P/S Im on 2.4.2 so it is not same as #32118

@pmonosolo
Copy link

P/S Im on 2.4.2 so it is not same as #32118

^^^That issue is still present on 2.4.2-p1

@onlinebizsoft
Copy link
Author

@mrtuvn
Copy link
Contributor

mrtuvn commented Aug 7, 2021

Seem much complex with use-case multiple domain? Sorry but i dont have much experiences at this case ! You should add some details. What version that you are experiencing this problem. Any customises modules used

@onlinebizsoft
Copy link
Author

@mrtuvn yes, multiple domain, multiple websites

Im on 2.4.2 with many customization however I believe there is nothing causing this. Our Magento installation is around 20GB (without media and without databases

@IbrahimS2
Copy link

@onlinebizsoft Have you configured FPC to utilise redis?

@onlinebizsoft
Copy link
Author

@IbrahimS2 no, we end up with using single zone for the system now so all EC2 and elasticache instances are on same zone (cut down 50%-60% the bill)

@IbrahimS2
Copy link

@onlinebizsoft Please share the Redis configuration section from your env.php.

@onlinebizsoft
Copy link
Author

@IbrahimS2
............
'x-frame-options' => 'SAMEORIGIN',
'MAGE_MODE' => 'production',
'cache' => [
'frontend' => [
'default' => [
'backend' => 'Cm_Cache_Backend_Redis',
'backend_options' => [
'server' => 'xxxxxxxxxxxxxxxxxxxxxxxxx',
'port' => '6379',
'persistent' => '',
'database' => 1,
'password' => '',
'force_standalone' => 0,
'connect_retries' => 2,
'read_timeout' => 10,
'automatic_cleaning_factor' => 0,
'compress_tags' => 1,
'compress_data' => 1,
'compress_threshold' => 20480,
'compression_lib' => 'gzip'
]
]
]
],
'session' => [
'save' => 'redis',
'redis' => [
'host' => 'xxxxxxxxxxxxxxxx',
'port' => '6379',
'password' => '',
'timeout' => '2.5',
'persistent_identifier' => '',
'database' => '0',
'compression_threshold' => '2048',
'compression_library' => 'gzip',
'log_level' => '1',
'max_concurrency' => '16',
'break_after_frontend' => '5',
'break_after_adminhtml' => '30',
'first_lifetime' => '600',
'bot_first_lifetime' => '60',
'bot_lifetime' => '7200',
'disable_locking' => '1',
'min_lifetime' => '60',
'max_lifetime' => '2592000'
]
],
'cache_types' => [
'config' => 1,
'layout' => 1,
'block_html' => 1,
'collections' => 1,
'reflection' => 1,
'db_ddl' => 1,
'compiled_config' => 1,
'eav' => 1,
'customer_notification' => 1,
'config_integration' => 1,
'config_integration_api' => 1,
'full_page' => 1,
'config_webservice' => 1,
'translate' => 1,
'vertex' => 1
],..........................

@mrtuvn
Copy link
Contributor

mrtuvn commented Aug 12, 2021

cc: @vzabaznov may know better at this section

@onlinebizsoft
Copy link
Author

Please keep in mind that data transfer between EC2 and Redis it is much much bigger than response data to traffic (from both nginx and varnish)

I'm thinking it can be possible that some private data ajax action could cause data transfer? So these ajax actions might serve very small data but it still request and transfer big data from redis?

@mrtuvn
Copy link
Contributor

mrtuvn commented Aug 13, 2021

about ajax request seem we have fixed this case here
#31933 Landed in 2.4.3

@onlinebizsoft
Copy link
Author

onlinebizsoft commented Aug 13, 2021

@mrtuvn not really, it may help a bit but not fix whole the problem. Also there are more cases with Ajax on any customized system. The root of the problem is with how/which Magento 2 cache and fetch cache
(in my case, it is a big system multiple store, multiple languages, many products, not sure if any every cache component are spitted correctly for each store,........ our Redis instance reaches 60GB after around 2 days, 100GB after around 6 days,....)

Any one have experience with L2 caching setup? Not sure if it is effective because each web server will have a very small memory storage

@mrtuvn
Copy link
Contributor

mrtuvn commented Aug 13, 2021

Yeap that's why i have tagged a guy above previous reply. He is the guy play role performance team lead

@onlinebizsoft
Copy link
Author

@vzabaznov

Setup multiple regions for elasticache
https://aws.amazon.com/blogs/database/reduce-cost-and-boost-throughput-with-global-datastore-for-amazon-elasticache-for-redis/

This approach doesnt work because it has only write on the primary instance and only read on all instances.

So Im thinking we could have a work around if we separate read and write redis connection in env.php.

What do you think?

@Gelmo
Copy link
Member

Gelmo commented Aug 31, 2021

It's a bit concerning that the Magento team has not made an announcement related to this issue.

We are seeing a significant increase in outgoing network usage from the Redis instance being used for cache in most of our client environments that are running 2.3.7 or 2.4.2. Sites that previously had a maximum output of 200mbps from the Redis instance being used for cache are now experiencing over 1gbps at times since upgrading, and these are relatively small sites. One of our larger clients is now going above 5gbps on most days. We have been able to reduce the impact by disabling the Magento_CSP module in some cases, however, the overall outgoing throughput is still significantly higher than prior to the upgrades.

It would be great if someone from Magento/Adobe could acknowledge this issue and confirm that this is being worked on. While this may not have a major impact on Adobe Cloud customers, the impact is significant for AWS clients due to the increased billing associated with network usage. I can only imagine how many bare metal 2.4.2 environments are in the wild with an NIC that only supports 1gbps.

@jonathanribas
Copy link

Hi there, we finally feel not alone anymore in this scenario! Thanks for opening an issue!

As we have decided to be resilient, we run Adobe Commerce on AWS EC2 several regions for a zone. We have seen a huge impact on the famous Data Transfer (intra regional) AWS cost line.

We have 20 store views (and growing), run FPC with Redis.

We have decided to dig into the topic checking for improvements:

  • As a basic step, we have enabled gzip compression. Haven't tried other ones. It has reduced Data Transfer a bit.
  • Remove some unnecessary Ajax calls to /customer/section/load: we had some bad practices on custom code hitting hard our backend. It was most used route, even more than Page Cache one!
  • Play with Redis preload keys: I've checked most common keys loaded on must busy pages and set them on Redis cache: CMS / PLP / PDP. Difficult to see an improvement...
  • Play with L2 cache: it was a nightmare, we had to rollback. PHP-FPM memory was growing so fast, it seems it introduced a memory leak and we also lost connectivity to backoffice sessions for some users. After rolling back only this change everything was fine.
  • Check if we are not setting big custom keys in Redis on our custom code
  • Check for optimizations on native keys: still in progress, not an easy one

Maybe some improvements for the future for Adobe Commerce?

  • Adobe Commerce should stop setting whole store views system cache config into Redis cache at init. If you open English store view for US website, it should only set this store view cache and when you get it back from cache for next page views it should only get it's own store views system cache configs
  • Rely less on Redis: translations files are all stored on Redis with a 7200 seconds TTL only (we have done a patch to set unlimited cache key lifetime). All Zend Locale currencies are stored on Redis too. Having Opcache enabled should do the job, don't you think?
  • There is still an open issue for Redis cache load on concurrent requests with the actual sleep method in place. Trying to improve this part, we have set an unlimited cache key lifetime for translations as there are several translations keys and weight some mb on locale folder for multi languages stores.
  • FPC still chats a lot with Redis for pages already in cache. It should render pages directly without all those preload keys Redis calls and other ones

I hope you guys won't tell us to use Varnish in order to decrease this Data Transfer chat between Adobe Commerce and Redis.

@mrtuvn
Copy link
Contributor

mrtuvn commented Sep 10, 2021

Not sure but magento already updated redis dependencies in composer.json. (latest code)

"colinmollenhour/cache-backend-file": "~1.4.1",
"colinmollenhour/cache-backend-redis": "^1.14",
"colinmollenhour/credis": "1.12.1",
"colinmollenhour/php-redis-session-abstract": "~1.4.0",

Not sure how much affected and relate with this issue

version 2.4.3

"colinmollenhour/cache-backend-file": "~1.4.1",
"colinmollenhour/cache-backend-redis": "1.11.0",
"colinmollenhour/credis": "1.11.1",
"colinmollenhour/php-redis-session-abstract": "~1.4.0",

https://github.com/magento/magento2/blob/2.4-develop/composer.json
If not i think we still open for pull requests for this such case

@magenx
Copy link

magenx commented Sep 10, 2021

is this issue related to single redis instance or cluster?
any tried to connect redis auditor/profiler to see whats inside its doing?

https://devdocs.magento.com/guides/v2.3/release-notes/release-notes-2-3-5-open-source.html#performance-boosts
https://devdocs.magento.com/guides/v2.4/release-notes/release-notes-2-4-0-open-source.html#performance-improvements

@jonathanribas
Copy link

I have also noticed that SYSTEM Redis cache key for 16 websites weights 10MB. After serialize and encrypt it (M2 core), it weights 14MB! Core source code about this: https://github.com/magento/magento2/blob/2.4-develop/app/code/Magento/Config/App/Config/Type/System.php#L338

If we suppose custom configurations for passwords are already encrypted (payments ...), I don't understand why we encrypt the whole thing again. We lose precious time here serializing / encrypting and after decrypting when getting those keys all the time... Do you guys know the reason why this SYSTEM Redis cache key is encrypted?

With such a size key, it may explain issues on parallel generation...

@vzabaznov
Copy link
Contributor

Hey guys, thank you for reporting please consider to use L2 cache https://devdocs.magento.com/guides/v2.4/config-guide/cache/two-level-cache.html

@jonathanribas
Copy link

L2 cache was a disaster on our Kubernetes cluster, really bad performance.
We will try to give it a try once again.

@theozzz
Copy link

theozzz commented Apr 4, 2022

@jonathanribas any update on your issue / pain point? Thanks for your answer

@jonathanribas
Copy link

jonathanribas commented Apr 4, 2022

Hi @theozzz, unfortunately we don't had time to test L2 cache again.
Are you experiencing same issue on high data transfer?

@theozzz
Copy link

theozzz commented Apr 4, 2022

@jonathanribas thanks for your answer.

We are experiencing aswell some issues on Redis transfer slowness (we notified it on NewRelic), especially when traffic is high. The platform got 32 stores and 22 websites.

Preload keys seems not to have any "big" impact for us.

@engcom-Lima engcom-Lima removed their assignment Aug 22, 2022
@engcom-Hotel engcom-Hotel self-assigned this Jan 18, 2023
@m2-assistant
Copy link

m2-assistant bot commented Jan 18, 2023

Hi @engcom-Hotel. Thank you for working on this issue.
In order to make sure that issue has enough information and ready for development, please read and check the following instruction: 👇

  • 1. Verify that issue has all the required information. (Preconditions, Steps to reproduce, Expected result, Actual result).

    DetailsIf the issue has a valid description, the label Issue: Format is valid will be added to the issue automatically. Please, edit issue description if needed, until label Issue: Format is valid appears.

  • 2. Verify that issue has a meaningful description and provides enough information to reproduce the issue. If the report is valid, add Issue: Clear Description label to the issue by yourself.

  • 3. Add Component: XXXXX label(s) to the ticket, indicating the components it may be related to.

  • 4. Verify that the issue is reproducible on 2.4-develop branch

    Details- Add the comment @magento give me 2.4-develop instance to deploy test instance on Magento infrastructure.
    - If the issue is reproducible on 2.4-develop branch, please, add the label Reproduced on 2.4.x.
    - If the issue is not reproducible, add your comment that issue is not reproducible and close the issue and stop verification process here!

  • 5. Add label Issue: Confirmed once verification is complete.

  • 6. Make sure that automatic system confirms that report has been added to the backlog.

@engcom-Hotel
Copy link
Contributor

Hello @onlinebizsoft,

Are you still facing this issue? Can you please try to reproduce the issue in the latest 2.4-develop branch and let us know if the issue is still reproducible for you?

Thanks

@engcom-Hotel
Copy link
Contributor

Dear @onlinebizsoft,

We have noticed that this issue has not been updated for a period of 14 Days. Hence we assume that this issue is fixed now, so we are closing it. Please raise a fresh ticket or reopen this ticket if you need more assistance on this.

Regards

@onlinebizsoft
Copy link
Author

@mrtuvn @engcom-Hotel @vzabaznov can you get this opened?

@mrtuvn mrtuvn reopened this Feb 22, 2023
@m2-community-project m2-community-project bot removed the Issue: needs update Additional information is require, waiting for response label Feb 22, 2023
@mrtuvn
Copy link
Contributor

mrtuvn commented Feb 22, 2023

Reopen due author ticket got response. Can you update problem still able to reproduce ? @onlinebizsoft
I'm not sure how to reproduce clear steps for QC can reproduce such case in auto infra tests

@mrtuvn mrtuvn added not-confirmed Use for Issue that was closed during confirmation Reported on 2.4.x Indicates original Magento version for the Issue report. labels Feb 22, 2023
@engcom-Hotel engcom-Hotel added the Issue: needs update Additional information is require, waiting for response label Feb 23, 2023
@onlinebizsoft
Copy link
Author

The problem is still existing but we don't have any deeper information. This is confirmed by quite some users
https://community.magento.com/t5/Magento-2-x-Technical-Issues/Significant-increase-in-outgoing-network-usage-from-Redis-cache/td-p/481801

Remember that in our case, we have up to 100 stores (but traffic is not kind of 100 busy websites) and here is networkbytesout from redis

image

P/S : Again, we are always on latest Magento version, we are using Varnish for full page cache.

@onlinebizsoft
Copy link
Author

Another related issue #21334

@jonathanribas
Copy link

I have also noticed that SYSTEM Redis cache key for 16 websites weights 10MB. After serialize and encrypt it (M2 core), it weights 14MB! Core source code about this: https://github.com/magento/magento2/blob/2.4-develop/app/code/Magento/Config/App/Config/Type/System.php#L338

If we suppose custom configurations for passwords are already encrypted (payments ...), I don't understand why we encrypt the whole thing again. We lose precious time here serializing / encrypting and after decrypting when getting those keys all the time... Do you guys know the reason why this SYSTEM Redis cache key is encrypted?

With such a size key, it may explain issues on parallel generation...

On our side we have removed encryption / decryption of cache config and results are really good! We have reduce our Data Transfer bill around 30 to 40%!

@onlinebizsoft
Copy link
Author

@jonathanribas so look like most of data transfer is because of the SYSTEM redis cache key ?

@jonathanribas
Copy link

@onlinebizsoft yes it is.
Lot of replication inside save keys, it's a mess touching something here without breaking something as whole Adobe Commerce / Magento relies on this caching system.

@onlinebizsoft
Copy link
Author

@igorwulff did you make any separated measurement for preload key in Redis? From what I can see, this has no improvement for me and it seems to be very useless to collect small redis key one by one to save some redis call in total 100-200 redis calls for each page.

What do you think? Or something I don't understand about this feature?

@jonathanribas
Copy link

@onlinebizsoft, if you use AWS and more than one zone inside your region, take a look at this AWS notification.This should help reducing Data Transfer between zones of same region.

We have observed that your Amazon VPC resources are using a shared NAT Gateway across multiple Availability Zones (AZ). To ensure high availability and minimize inter-AZ data transfer costs, we recommend utilizing separate NAT Gateways in each AZ and routing traffic locally within the same AZ.

Each NAT Gateway operates within a designated AZ and is built with redundancy in that zone only. As a result, if the NAT Gateway or AZ experiences failure, resources utilizing that NAT Gateway in other AZ(s) also get impacted. Additionally, routing traffic from one AZ to a NAT Gateway in a different AZ incurs additional inter-AZ data transfer charges. We recommend choosing a maintenance window for architecture changes in your Amazon VPC.

@max-grosch
Copy link

I use this and it works well:
https://github.com/Genaker/FastFPC

@onlinebizsoft
Copy link
Author

@Nuranto @jonathanribas @igorwulff Just FYI I just saw realized that Magento is loading config cache for all stores and config cache is one of biggest one in the cache storage which mean the total transfer size is multiplying many times than it should be.

@jonathanribas
Copy link

@onlinebizsoft, yes I know about this.
Design for cache should be completely changed as it doesn't make sense in some contexts to load configuration from all stores.

@IvanChepurnyi
Copy link
Contributor

IvanChepurnyi commented Nov 10, 2023

@onlinebizsoft I would look also into block cache, as each review and price block in Magento has own cache entry. By using https://github.com/EcomDev/magento2-product-preloader and disabling cache for those blocks via plugin I usually drop HMGET from 2000+ to 110 max.
image

@duydo278
Copy link

We are facing the same issue with the Redis backend cache. (not about page cache)

The key size of zc:k:69d_SYSTEM is approximately 80MB in memory, and the network output is about 0.5GB/s.

redis-grafana

redis-insight

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Issue: needs update Additional information is require, waiting for response not-confirmed Use for Issue that was closed during confirmation Reported on 2.4.x Indicates original Magento version for the Issue report. Triage: Dev.Experience Issue related to Developer Experience and needs help with Triage to Confirm or Reject it
Projects
Development

No branches or pull requests