## Airflow DAG Parallelism

Parallelism in Airflow means running **multiple tasks at the same time**, **as long as they don’t depend on each other**.


### When Does Parallelism Happen?

- Tasks linked with `>>` must wait — they run **sequentially**
- Tasks **without direct dependencies** can run **in parallel**


###  Example Workflows That Use Parallelism

| Example Workflow                              | Where Parallelism Happens               |
| --------------------------------------------- | --------------------------------------- |
| Ingest from 5 different sources               | Run 5 ingestion tasks in parallel       |
| Clean multiple tables at once                 | Run 3–4 cleaning jobs after extract     |
| Run multiple SQL transformations              | Parallel SQL tasks after a staging step |
| Export data to multiple destinations (S3, BQ) | Run all export tasks in parallel        |


### Code Example (Parallel Tasks)

```python
extract = PythonOperator(task_id='extract', ...)
clean_a = PythonOperator(task_id='clean_user_data', ...)
clean_b = PythonOperator(task_id='clean_sales_data', ...)
clean_c = PythonOperator(task_id='clean_product_data', ...)
join = PythonOperator(task_id='merge_clean_data', ...)

extract >> [clean_a, clean_b, clean_c] >> join
```
### How Executors Affect Parallelism
- SequentialExecutor →  No parallelism (1 task at a time only)

- LocalExecutor, CeleryExecutor, KubernetesExecutor →  Parallelism allowed

Parallelism settings:

- parallelism (global limit)
- dag_concurrency or max_active_tasks (DAG-level)
- pool (shared resource limit)

**NOTE THAT:**

Executors enable parallelism, but your DAG structure + config must allow it.

---



## Executor and Parallelism

Airflow executors determine how tasks are executed. Some executors, like `SequentialExecutor`, do **not** support parallelism.

### DAG-Level Execution Controls

You can control how tasks run per DAG using:

| Feature                   | What It Does                                               |
|---------------------------|------------------------------------------------------------|
| `max_active_tasks` in DAG | Limits how many tasks run in parallel per DAG              |
| `pool`                    | Restricts shared resources across DAGs                     |
| `priority_weight`         | Sets task priority for the scheduler                       |
| `queue`                   | (CeleryExecutor only) Sends tasks to different worker queues |

### Limitation with SequentialExecutor

- These settings are **ignored** when using `SequentialExecutor`
- Only **one task runs at a time** globally, regardless of configuration
- To enable parallelism and make use of these features, use:
  - `LocalExecutor`
  - `CeleryExecutor`
  - `KubernetesExecutor`
