Large Task Deserialization Time during Optimization #333

Jiaweihu08 · 2024-06-13T09:28:02Z

What went wrong?

There's an enormous task deserialization time during optimizations—specifically the last collect from RollupDataWriter.compact().

The IndexStatus.cubeStatuses is packaged within each task, and their size increases as the metadata size increases.

How to reproduce?

Try to optimize a relatively large table and compare the Task Deserialization Time from the second collect with that from the execute.; The values from execute should be an order of magnitude smaller.

2. Branch and commit id: `main,` b7f1906

3. Spark version: `3.5.0`

4. Hadoop version: `3.3.4`

5. How are you running Spark? `locally,` `distributed`

The text was updated successfully, but these errors were encountered:

Jiaweihu08 added the type: bug Something isn't working label Jun 13, 2024

Jiaweihu08 mentioned this issue Jun 13, 2024

Issue 333: Broadcast cube weights during optimization file writing #334

Merged

6 tasks

cugni mentioned this issue Jun 14, 2024

Issue 343: Reduce metadata memory footprint #335

Merged

3 tasks

cdelfosse assigned Jiaweihu08 Jun 17, 2024

osopardo1 closed this as completed in #334 Jul 1, 2024

osopardo1 mentioned this issue Jul 12, 2024

Parallelize metadata processing and reduce metadata footprint #343

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Large Task Deserialization Time during Optimization #333

Large Task Deserialization Time during Optimization #333

Jiaweihu08 commented Jun 13, 2024

Large Task Deserialization Time during Optimization #333

Large Task Deserialization Time during Optimization #333

Comments

Jiaweihu08 commented Jun 13, 2024

What went wrong?

How to reproduce?

2. Branch and commit id: main, b7f1906

3. Spark version: 3.5.0

4. Hadoop version: 3.3.4

5. How are you running Spark? locally, distributed

2. Branch and commit id: `main,` b7f1906

3. Spark version: `3.5.0`

4. Hadoop version: `3.3.4`

5. How are you running Spark? `locally,` `distributed`