[CORE][VL] Add option to limit the memory Gluten can use for each task to N = (memory / task slots) #3101

zhztheplayer · 2023-09-11T08:57:29Z

Add new option spark.gluten.memory.isolation (by default false):

Description of the option:

Enable isolate memory mode. If true, Gluten controls the maximum off-heap memory can be used by each task to X, X = executor memory / max task slots. It's recommended to set true if Gluten serves concurrent queries within a single session, since not all memory Gluten allocated is guaranteed to be spillable. In the case, the feature should be enabled to avoid OOM.

The implementation inserts a complete memory management layer TreeMemoryConsumer between Spark's memory manager and Gluten. Once the task memory limit is hit, TreeMemoryConsumer will first try calling child spillers inside it's own scope without notifying Spark. After freeing some spaces, TreeMemoryConsumer continues to acquire memory from Spark.

User is supposed to use the feature to get rid of OOMs caused by pinned non-spillable Velox memory (#3030) used by a historical task that is commanded to spill due to the coming of the other new tasks. This typically happens when the session is shared by a couple of concurrent queries.

github-actions · 2023-09-11T08:57:47Z

Thanks for opening a pull request!

Could you open an issue for this pull request on Github Issues?

https://github.com/oap-project/gluten/issues

Then could you also rename commit message and pull request title in the following format?

[GLUTEN-${ISSUES_ID}][COMPONENT]feat/fix: ${detailed message}

See also:

Other pull requests

github-actions · 2023-09-11T08:58:01Z

Run Gluten Clickhouse CI

Yohahaha · 2023-09-11T10:13:39Z

gluten-core/src/main/java/io/glutenproject/memory/memtarget/IsolatedByTaskSlot.java

+// A decorator to a task memory target, to restrict memory usage of the delegated
+//   memory target to X, X = free executor memory / task slots.
+// Using this to prevent OOMs if the delegated memory target could possibly
+//   hold large memory blocks that are not spillable.
+//   See https://github.com/oap-project/gluten/issues/3030


How to restrict specific memory consumer's usage to X? return zero when acquireMemory or just OOM?

More designs needed here. But overall the consumer with the decorator should behave like a consumer registered to a task memory manager with a fixed limit (which is X).

Great, hope following design could solve my question, when a consumer hit limit, what behavior is expected.

winningsix · 2023-09-12T00:51:50Z

gluten-core/src/main/java/io/glutenproject/memory/memtarget/IsolatedByTaskSlot.java

+import org.apache.spark.memory.TaskMemoryManager;
+
+// A decorator to a task memory target, to restrict memory usage of the delegated
+//   memory target to X, X = free executor memory / task slots.


Does task slot equal to the configured CPU cores of current Spark executor?

Yes. There is a calculation

def getTaskSlots(conf: SparkConf): Int = { val executorCores = SparkResourceUtil.getExecutorCores(conf) val taskCores = conf.getInt("spark.task.cpus", 1) executorCores / taskCores }

winningsix · 2023-09-12T00:54:10Z

gluten-core/src/main/java/io/glutenproject/memory/memtarget/IsolatedByTaskSlot.java

+
+  private final TaskMemoryTarget delegated;
+
+  public IsolatedByTaskSlot(TaskMemoryTarget delegated) {


Didn't find the usage of this class. Inference from the name, does that mean we want to introduce a concept of a memory pool at each task?

The PR was not ready at that time, The class was just a placeholder for my initial thoughts of the design.

github-actions · 2023-09-12T06:04:09Z

Run Gluten Clickhouse CI

github-actions · 2023-09-12T06:06:34Z

Run Gluten Clickhouse CI

… = (memory / task slots)

github-actions · 2023-09-12T06:07:46Z

Run Gluten Clickhouse CI

github-actions · 2023-09-12T06:10:20Z

Run Gluten Clickhouse CI

github-actions · 2023-09-12T06:12:06Z

Run Gluten Clickhouse CI

github-actions · 2023-09-12T06:15:23Z

Run Gluten Clickhouse CI

github-actions · 2023-09-12T06:25:00Z

Run Gluten Clickhouse CI

github-actions · 2023-09-12T06:30:23Z

Run Gluten Clickhouse CI

github-actions · 2023-09-13T01:45:39Z

Run Gluten Clickhouse CI

Yohahaha · 2023-09-13T02:45:42Z

gluten-core/src/main/java/io/glutenproject/memory/memtarget/spark/TreeMemoryConsumer.java

+    while (q.peek() != null && remainingBytes > 0) {
+      TreeMemoryConsumerNode head = q.remove();
+      long spilled = spillTree(head, remainingBytes);
+      remainingBytes -= spilled;
+    }


This logic wants invoke spill from smallest consumer to largest consumer right?

It's from largest to smallest.

In future we may want to follow vanilla Spark's rule, which uses the smallest one among those are larger than the target size.

Still not fully understanding...

We sort children in descending order and invoke spillTree on peek element's children recursively, which means we pick largest consumer and then pass smaller consumer into spillTree, when node has no children, we spill it and return.
I think spillTree loop from largest to smallest, but spill from smallest to largest, please correct me if I'm wrong, thanks!

Another misunderstanding logic, does the sort of root not sort its all children? seems we sort node's children in every spillTree invoked.

Hi, thanks for helping checking this code. I haven't verified it carefully but let's take a simple example:

a 200 (self 80) |-b 70 |-c 50 (self 10) |- d 30 |- e 10

With the code we implemented post-order traversal on this tree (children-first, self-last), which means the visiting order is supposed to be

b (70) -> d (30) -> e (10) -> c (10) -> a (80)

Which seems to be aligned with my initial assumption: largest to smallest, but self last.
Is that the same with your thoughts?

Another misunderstanding logic, does the sort of root not sort its all children? seems we sort node's children in every spillTree invoked.

Yes the data structure is not efficient since we only sorted the children of the current node (See code q.addAll(node.children().values());). So probably we can move to TreeMap/TreeSet later on.

Which seems to be aligned with my initial assumption: largest to smallest, but self last.
Is that the same with your thoughts?

Thanks, it's same.

Yohahaha · 2023-09-13T02:52:48Z

shims/common/src/main/scala/io/glutenproject/GlutenConfig.scala

+  def conservativeOffHeapMemorySize: Long =
+    conf.getConf(COLUMNAR_CONSERVATIVE_OFFHEAP_SIZE_IN_BYTES)
+
+  def conservativeTaskOffHeapMemorySize: Long =
+    conf.getConf(COLUMNAR_CONSERVATIVE_TASK_OFFHEAP_SIZE_IN_BYTES)
+


Some questions here:

what's the difference for these two configs?

what does conservative mean?

seems conservativeOffHeapMemorySize was not used?

"Conservative" means the max size Gluten can consider is "safe" to use. The two new options are set in GlutenPlugin.scala using the following code:

// Pessimistic off-heap sizes, with the assumption that all non-borrowable storage memory // determined by spark.memory.storageFraction was used. val fraction = 1.0d - conf.getDouble("spark.memory.storageFraction", 0.5d) val conservativeOffHeapSize = (offHeapSize * fraction).toLong conf.set( GlutenConfig.GLUTEN_CONSERVATIVE_OFFHEAP_SIZE_IN_BYTES_KEY, conservativeOffHeapSize.toString) val conservativeOffHeapPerTask = conservativeOffHeapSize / taskSlots conf.set( GlutenConfig.GLUTEN_CONSERVATIVE_TASK_OFFHEAP_SIZE_IN_BYTES_KEY, conservativeOffHeapPerTask.toString)

The difference between the options and "non-conservative" options is that the "conservative" ones take storage memory into account. Assuming Spark had used 30% off-heap memory in storage memory pool, the memory would not be evicted although a "borrow" is request from execution memory pool.

I think for stability, we may need to by default use "conservative" options since it's safter in most cases. I left the "non-conservative" options unchanged because of compatibility consideration.

Besides, I don't want users to set these "auto-generated" options. But we don't develop a general way to guard against that yet.

seems conservativeOffHeapMemorySize was not used?

Yes. But it's worth to keep it for future use.

Thanks for detailed explanation! I have realized the "conservative" meaning.

we may need to by default use "conservative" options since it's safter in most cases.

Please keep a switch config for that, Spark already has maybeGrowExecutionPool to shrink storagePool and executionPool, I prefer let Spark control this logic. If use "conservative" option by default, we may not fully utilize the unused storage memory.

I meant to use "conservative" options for just Gluten's related codes that requires reading the off-heap size. For example shuffle writer and partial agg. It's not a goal to touch vanilla Spark's memory management.

Worth noting that Spark's "storage region size" determined by spark.memory.storageFraction is not evictable if in use. That's why we add the conservative options.

github-actions · 2023-09-13T03:56:44Z

Run Gluten Clickhouse CI

github-actions · 2023-09-13T04:50:32Z

Run Gluten Clickhouse CI

github-actions · 2023-09-13T04:57:12Z

Run Gluten Clickhouse CI

github-actions · 2023-09-13T05:29:32Z

Run Gluten Clickhouse CI

github-actions · 2023-09-13T06:11:32Z

Run Gluten Clickhouse CI

github-actions · 2023-09-13T06:21:32Z

Run Gluten Clickhouse CI

github-actions · 2023-09-13T07:01:04Z

Run Gluten Clickhouse CI

zhztheplayer · 2023-09-13T07:36:36Z

/Benchmark Velox

GlutenPerfBot · 2023-09-13T08:17:43Z

===== Performance report for TPCH SF2000 with Velox backend, for reference only ====

query	log/native_3101_time.csv	log/native_master_09_12_2023_892535655_time.csv	difference	percentage
q1	43.68	44.37	0.692	101.58%
q2	24.44	23.17	-1.271	94.80%
q3	35.85	36.71	0.865	102.41%
q4	41.31	41.27	-0.040	99.90%
q5	69.59	69.43	-0.163	99.77%
q6	6.93	5.00	-1.934	72.09%
q7	85.07	83.94	-1.134	98.67%
q8	81.76	82.77	1.018	101.25%
q9	116.53	115.52	-1.016	99.13%
q10	47.72	46.05	-1.674	96.49%
q11	19.41	19.04	-0.364	98.12%
q12	27.89	25.94	-1.947	93.02%
q13	52.37	51.71	-0.664	98.73%
q14	19.38	13.72	-5.662	70.79%
q15	28.27	27.98	-0.290	98.97%
q16	15.88	15.60	-0.280	98.24%
q17	120.52	119.89	-0.631	99.48%
q18	162.48	163.11	0.633	100.39%
q19	12.43	12.03	-0.405	96.74%
q20	29.19	38.93	9.745	133.39%
q21	237.04	248.23	11.197	104.72%
q22	15.43	15.56	0.122	100.79%
total	1293.19	1299.99	6.800	100.53%

github-actions · 2023-09-14T01:03:39Z

Run Gluten Clickhouse CI

github-actions · 2023-09-14T01:08:27Z

Run Gluten Clickhouse CI

zhouyuan

👍
Another big change on memory component!

GlutenPerfBot · 2023-09-14T08:16:15Z

===== Performance report for TPCH SF2000 with Velox backend, for reference only ====

query	log/native_3101_time.csv	log/native_master_09_13_2023_1dc14126a_time.csv	difference	percentage
q1	43.59	43.54	-0.052	99.88%
q2	24.71	24.58	-0.128	99.48%
q3	37.38	37.58	0.202	100.54%
q4	41.46	41.77	0.306	100.74%
q5	70.47	69.36	-1.105	98.43%
q6	6.66	6.35	-0.305	95.42%
q7	85.08	85.84	0.753	100.88%
q8	80.21	79.11	-1.106	98.62%
q9	116.00	118.08	2.079	101.79%
q10	46.87	45.32	-1.551	96.69%
q11	19.98	19.61	-0.371	98.14%
q12	24.00	26.16	2.162	109.01%
q13	48.93	50.61	1.677	103.43%
q14	16.24	16.64	0.401	102.47%
q15	31.32	26.80	-4.521	85.56%
q16	15.94	15.83	-0.112	99.30%
q17	119.27	120.95	1.677	101.41%
q18	160.27	162.53	2.262	101.41%
q19	12.56	13.26	0.705	105.62%
q20	30.09	29.37	-0.722	97.60%
q21	236.86	235.96	-0.900	99.62%
q22	15.58	15.92	0.341	102.19%
total	1283.46	1285.15	1.692	100.13%

Yohahaha reviewed Sep 11, 2023

View reviewed changes

winningsix mentioned this pull request Sep 12, 2023

[VL] Spill related issues #3030

Open

14 tasks

winningsix reviewed Sep 12, 2023

View reviewed changes

zhztheplayer added 11 commits September 12, 2023 14:06

[VL] Optimize naming for over-acquire root target and its dummy target

868b454

fixup

80c8ce6

Simplify codes

0958727

[VL] Add option to limit the memory Gluten can use for each task to N…

e68d6ce

… = (memory / task slots)

fixup

cc2b544

fixup

ce46da6

fixup

a4f82d2

fixup

0aed9d8

fixup

ef428ed

fixup

f4ae9e5

fixup

37e8d61

fixup

ca67a3c

zhztheplayer force-pushed the wip-local-limit branch from b360bcc to 37e8d61 Compare September 12, 2023 06:09

fixup

5fa91f9

fixup

1b72c76

zhztheplayer added 2 commits September 12, 2023 14:26

fixup

672da70

fixup

5febbe1

zhztheplayer marked this pull request as ready for review September 13, 2023 01:39

zhztheplayer changed the title ~~WIP: [CORE][VL] Add option to limit the memory Gluten can use for each task to N = (memory / task slots)~~ [CORE][VL] Add option to limit the memory Gluten can use for each task to N = (memory / task slots) Sep 13, 2023

fixup

0163f84

zhztheplayer force-pushed the wip-local-limit branch from 1ed5c28 to 0163f84 Compare September 13, 2023 01:45

Yohahaha reviewed Sep 13, 2023

View reviewed changes

fixup

e357398

fixup

ea94efb

fixup

3507ca7

fixup

9fbfe64

fixup

7654d6a

fixup

86b4e44

fixup

a170507

zhztheplayer force-pushed the wip-local-limit branch from a170507 to 654a774 Compare September 14, 2023 01:03

zhztheplayer force-pushed the wip-local-limit branch from 654a774 to a170507 Compare September 14, 2023 01:07

zhouyuan approved these changes Sep 14, 2023

View reviewed changes

zhztheplayer merged commit 7311703 into apache:main Sep 14, 2023
28 checks passed


		private final TaskMemoryTarget delegated;

		public IsolatedByTaskSlot(TaskMemoryTarget delegated) {

[CORE][VL] Add option to limit the memory Gluten can use for each task to N = (memory / task slots) #3101

[CORE][VL] Add option to limit the memory Gluten can use for each task to N = (memory / task slots) #3101

Conversation

zhztheplayer commented Sep 11, 2023 • edited Loading

github-actions bot commented Sep 11, 2023

github-actions bot commented Sep 11, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

github-actions bot commented Sep 12, 2023

github-actions bot commented Sep 12, 2023

github-actions bot commented Sep 12, 2023

github-actions bot commented Sep 12, 2023

github-actions bot commented Sep 12, 2023

github-actions bot commented Sep 12, 2023

github-actions bot commented Sep 12, 2023

github-actions bot commented Sep 12, 2023

github-actions bot commented Sep 13, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

zhztheplayer Sep 13, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

github-actions bot commented Sep 13, 2023

github-actions bot commented Sep 13, 2023

github-actions bot commented Sep 13, 2023

github-actions bot commented Sep 13, 2023

github-actions bot commented Sep 13, 2023

github-actions bot commented Sep 13, 2023

github-actions bot commented Sep 13, 2023

zhztheplayer commented Sep 13, 2023

GlutenPerfBot commented Sep 13, 2023

github-actions bot commented Sep 14, 2023

github-actions bot commented Sep 14, 2023

zhouyuan left a comment

Choose a reason for hiding this comment

GlutenPerfBot commented Sep 14, 2023

zhztheplayer commented Sep 11, 2023 •

edited

Loading

zhztheplayer Sep 13, 2023 •

edited

Loading