将`PaddleDevice.Mkldnn()`的ctor参数`cacheCapacity`默认值从10降低到1 #46

n0099 · 2023-04-23T20:12:45Z

如下图表所示，针对完全不同的图片（但每次执行时的顺序相同）依次PaddleOCRAll.Run()时如果在ctorPaddleOCRAll时使用了PaddleDevice.Mkldnn()的ctor并对其提供不同的cacheCapacity（即对PaddleConfig.MkldnnCacheCapacity的赋值，其又是在调用paddleinfer的PD_ConfigSetMkldnnCacheCapacity()）ctor参数值（默认值10），那么持续运行下来的内存占用完全不同：

并且值得注意的是在设为0时几乎必定会触发#44 中所述的'dnnl:error' could not execute a primitive导致无法长时间运行下去，而使用默认值10时或许可以继续让内存增长并在达峰后稳定下来，但我没有足够的内存和耐心去等待结果

回顾paddle文档：
https://github.com/PaddlePaddle/docs/blob/63362b7443c77a324f58a045bcc8d03bb59637fa/docs/design/mkldnn/caching/caching.md?plain=1#L70

The design of MKL-DNN cache is to support dynamic shapes scenario (BERT model for example). Since MKLDNN primitives are sensitive to src/dst shape, if new input shape comes, new primitive needs to be created. That means there will be many primitives cached and MKL-DNN cache would consume lots of memory. By introducing second level cache, we can consider these kind of primitive as a group, once reach memory limitation, we can clear a whole group instead of just one primitive, and release more memory.

机翻：

MKL-DNN 缓存的设计是为了支持动态形状场景（例如 BERT 模型）。由于 MKLDNN 图元对 src/dst 形状敏感，如果有新的输入形状到来，则需要创建新的图元。这意味着将有许多基元被缓存，而 MKL-DNN 缓存将消耗大量内存。通过引入二级缓存，我们可以将这些图元视为一组，一旦达到内存限制，我们可以清除一整组而不是只清除一个图元，并释放更多内存。

Store once created MKL-DNN objects in order To avoid MKL-DNN recreation
While MKL-DNN computational algorithms are fast to be executed, preparing to execution e.g. Creation of computational primitives and its primitive descriptors takes significant time (From 2% up to 40% for latency mode inference, depends on Platform instruction sets and MKL-DNN version). We can save some time on recreation of computational MKL-DNN primitives and its primitive descriptors, by storing once created MKL-DNN objects in a cache and refer to them in subsequent iterations when needed.

机翻：

虽然 MKL-DNN 计算算法可以快速执行，但准备执行例如创建计算基元及其基元描述符需要大量时间（延迟模式推理需要 2% 到 40%，取决于平台指令集和 MKL-DNN 版本）。通过将一次创建的 MKL-DNN 对象存储在缓存中并在需要时在后续迭代中引用它们，我们可以节省一些重新创建计算 MKL-DNN 基元及其基元描述符的时间。

可得paddle的这一套仅用于mkldnn环境的缓存机制是为了缓解（在paddleocr的语境下）针对不同输入图片时需要对每图片创建mkldnn对象的耗时，这也解释了为什么如果一直输入完全相同的图片集，那不论MkldnnCacheCapacity是多少内存占用都是十分稳定的
尽管缓存的本意是为了提升性能然而从上图表中可见cap=10时的耗时反而比cap=1更久，这可能是由于运行环境系统内存不足导致页都被swap所以反而拖累了整体性能

而

c. Cache clearing mode

Another situation is when clearing of cache does happen is cache clearing mode: platform::kMKLDNNSessionID_CacheClearing. At that mode when new Entry to be added to cache then size of cache is compared with given capacity and once no space for next objects is in a cache , then MKL-DNN cache is partially cleared. By default cache is NOT working in clearing mode e.g. cache will store all objects it was given. To enable MKL-DNN cache clearing mode one needs to set capacity of MKL-DNN cache with SetMkldnnCacheCapacity (by default capacity is set to 0, meaning no clearing depending on size of cache, any non-negative value is allowed and its meaning is: size of second level cache e.g. number of different input shapes cached groups that can be cached).

机翻：

另一种情况是缓存清除确实发生时是缓存清除模式：platform::kMKLDNNSessionID_CacheClearing。在那种模式下，当新条目被添加到缓存时，缓存的大小与给定的容量进行比较，一旦缓存中没有用于下一个对象的空间，MKL-DNN 缓存就会被部分清除。默认情况下，缓存在清除模式下不工作，例如缓存将存储给它的所有对象。要启用 MKL-DNN 缓存清除模式，需要使用 SetMkldnnCacheCapacity 设置 MKL-DNN 缓存的容量（默认容量设置为 0，表示根据缓存大小不进行清除，允许任何非负值，其含义是：二级缓存的大小，例如可以缓存的不同输入形状缓存组的数量）。

进一步证明了paddleinfer中PD_ConfigSetMkldnnCacheCapacity()函数的文档错误描述了0值时的行为
https://github.com/PaddlePaddle/Paddle/blob/040f8aa50191b81313237e39d510dfb7b531cda9/paddle/fluid/inference/capi_exp/pd_config.h#L553

/// Default value 0 means not caching any shape.

（以及复制粘贴进本库中的）

PaddleSharp/src/Sdcb.PaddleInference/Native/PaddleNative.g.cs

Line 325 in 5687fb0

    
           /// <summary>Set the cache capacity of different input shapes for MKLDNN. Default value 0 means not caching any shape. Please see MKL-DNN Data Caching Design Document: https://github.com/PaddlePaddle/FluidDoc/blob/develop/doc/fluid/design/mkldnn/caching/caching.md</summary>

PaddleSharp/src/Sdcb.PaddleInference/PaddleConfig.cs

Line 194 in 5687fb0

    
           /// <summary>Set the cache capacity of different input shapes for MKLDNN. Default value 0 means not caching any shape.</summary>

实际上由于paddle没有提供有关api目前根本没有办法能完全关闭这个缓存机制，而将其设为0反而是完全阻止了对缓存的定期清理导致只要继续输入不同的图片内存就必然会无限增长下去

因此本pr将这其在PaddleDevice.Mkldnn()的ctor的默认值从10降低到1以尽可能缓解这种容易被用户视作内存溢出（#17 #27 #37 ）的缓存行为（也就是使内存更快达峰然后稳定下来）
并在ocr.md文档中指出这个保守的默认值并建议运行环境有足够内存的用户考虑增加此值

n0099 · 2023-04-23T20:42:25Z

另外即便是在执行PD_PredictorDestroy（即本库中PaddlePredictor.Dispose()所封装的，其又被PaddleOcrDetector/Recognizer/Classifer调用）后也不会释放这些缓存，也就是必须得完全退出进程才能真正意义上的销毁paddle了

sdcb · 2023-04-24T06:13:57Z

非常感谢！

… @ `PaddleOcrRecognizerAndDetector.GetModelFactory()` * merge field `_percentageThresholdOfIntersectionAreaToConsiderAs(Same|New)TextBox` and their config into a single tuple/config section `_intersectionAreaThresholds` @ ImageOcrConsumer.cs * clarify reloadability for each config in section `ImageOcrPipeline` @ appsettings.json * rename all param with type `KeyValurPair<,>` to `pair` @ PaddleOcrRecognizerAndDetector.cs & ClientRequester.cs @ crawler

…95/LINQKit#168 - remove unused method `GetValuesByKeys()` * move method `NanToZero()` and `RoundToUshort()` to project `tbm.Shared` since they are also used by project `tbm.ImagePipeline` @ crawler/ExtensionMethods.cs * remove the value to argument `cacheCapacity` of method `PaddleDevice.Mkldnn()` since it's now the default value: sdcb/PaddleSharp#46 @ `PaddleOcrRecognizerAndDetector.GetPaddleOcrFactory()` * update NuGet packages @ imagePipeline @ c#

n0099 added 2 commits April 24, 2023 03:10

Update PaddleDevice.cs

e0ca8d2

Update ocr.md

91c6f64

This was referenced Apr 23, 2023

反复针对相同/不同的图片持续执行PaddleOCRAll.Run()10分钟后stdout显示'dnnl:error' could not execute a primitive并退出进程 #44

Open

decrease the value to SetMkldnnCacheCapacity() from 10 to 1 raoyutian/PaddleOCRSharp#24

Open

sdcb merged commit f088848 into sdcb:master Apr 24, 2023

a442509097 mentioned this pull request Jun 11, 2023

内存达到了4GB之多，不知道有没有内存泄露 #37

Closed

n0099 mentioned this pull request Jul 24, 2023

内存占用问题 raoyutian/PaddleOCRSharp#17

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

将`PaddleDevice.Mkldnn()`的ctor参数`cacheCapacity`默认值从10降低到1 #46

将`PaddleDevice.Mkldnn()`的ctor参数`cacheCapacity`默认值从10降低到1 #46

n0099 commented Apr 23, 2023

n0099 commented Apr 23, 2023

sdcb commented Apr 24, 2023

将PaddleDevice.Mkldnn()的ctor参数cacheCapacity默认值从10降低到1 #46

将PaddleDevice.Mkldnn()的ctor参数cacheCapacity默认值从10降低到1 #46

Conversation

n0099 commented Apr 23, 2023

c. Cache clearing mode

n0099 commented Apr 23, 2023

sdcb commented Apr 24, 2023

将`PaddleDevice.Mkldnn()`的ctor参数`cacheCapacity`默认值从10降低到1 #46

将`PaddleDevice.Mkldnn()`的ctor参数`cacheCapacity`默认值从10降低到1 #46