## Spark Memory Management
Spark memory management is crucial for ensuring that your Spark applications run efficiently. Spark uses a unified memory management model that handles both execution and storage memory. 

### Driver out of memory
- The driver is a Java process where the main() method of your Java/Scala/Python program runs. 
- It manages the SparkContext, responsible for creating DataFrames, Datasets, and RDDs, and executing SQL, transformations, and actions.
- The driver is responsible for maintaining metadata, scheduling tasks, and collecting results, so insufficient memory for these operations can cause an OOM error


##### Total Driver Memory:
The total memory available to the driver is determined by the combination of:

**Heap memory** (`spark.driver.memory`): 
- The driver uses this memory to store metadata, maintain DAGs, manage tasks, and process data collected to the driver (e.g., via collect()).
- Default = 1 GB.
- how to set: `--conf spark.driver.memory=4g`


**Overhead memory** (`spark.driver.memoryOverhead`):
- Specifies the amount of non-heap memory allocated to the driver for:
  - JVM overhead, such as native threads and direct memory buffers.
  - Other Spark internal tasks like shuffling and network communication.
- Default = 10% of spark.driver.memory or `384 MB`, whichever is larger.


#### Why do we get Driver OOM?
The driver OOM occurs when the Spark driver process runs out of memory. Common driver memory issues include:

**1. Collect() Operation:**`
- The collect operation in Spark retrieves data from distributed workers and consolidates it on the driver. 
- This can lead to OOM errors if the collected data is too large to fit into the driver's memory.

**2. Broadcast Join:**
- Broadcast joins are useful for optimizing joins when one side of the join is small enough to fit in memory. - - - However, if the broadcasted data is too large, it can exhaust driver memory:

**Excessive Metadata or:**
- The driver stores metadata about RDDs, DataFrames, tasks, and jobs. If the application creates too many objects or stages without proper garbage collection, the metadata can overwhelm the driver's memory.

**Improper Memory Allocation:**
- Insufficient memory allocation for the driver (`spark.driver.memory` or `spark.driver.memoryOverhead`) can lead to OOM, especially for complex operations.

#### What is driver overhead memory?
- Driver Overhead Memory is the additional memory allocated to the driver process for non-heap tasks like:
  - JVM overhead
  - Garbage collection
  - Internal Spark operations (network communication, buffering)
- It is controlled by the configuration parameter `spark.driver.memoryOverhead`.
- The default value is usually 10% of the total driver memory or 384MB, whichever is greater.

#### How to Handle Driver OOM?

**Avoid Collecting Large Data:**
- Minimize the use of `collect()`, `take()`, and `toPandas()`. Process data at the executor level using distributed transformations like `map`, `filter`, or `reduce`.

**Broadcast Wisely:**
- Use `broadcast()` sparingly and only for small variables.

**Increase Driver Memory:**
- Allocate more memory to the driver using:
  ```bash
  --conf spark.driver.memory=4g
  --conf spark.driver.memoryOverhead=1g
  ```

**Cache and Persist Effectively:**
- Cache only the required intermediate results to reduce memory usage.

**Efficient Query Design:**
- Use narrow transformations where possible.
- Avoid generating excessive stages and shuffle operations.