# Dask Futures

## What Are Dask Futures?
One of the features of the client is that we can submit (python functions to it (`client.submit(func_a)`). If you do that than its result is a *future* ( a pointer to some remote data that will exists at some point). You also can chain functions (`client.submit(funca, funcb)`)
Futures are objects that represent a computation that may still be running.
The client can later retrieve the result, cancel the task, or chain further computations.
This model mirrors the `concurrent.futures` API in the Python standard library, making it easy to adopt.


## Core API (Client‑side)
| Method                              | Purpose                                                                                     |
|-------------------------------------|---------------------------------------------------------------------------------------------|
| `client = Client()`                 | Connect to a local or remote Dask scheduler                                                |
| `future = client.submit(fn, *args, **kwargs)` | Schedule `fn(*args, **kwargs)` and get a **Future**                                         |
| `future.result()`                   | Block until the computation finishes and return the value                                  |
| `future.cancel()`                   | Attempt to stop the task (if it hasn’t started)                                            |
| `client.gather([f1, f2, …])`        | Retrieve results from many futures at once                                                  |
| `client.map(fn, iterable)`          | Convenience: submit `fn` for each element, returns a list of futures                       |
| `client.wait(futures)`              | Wait for completion without pulling results (useful for side‑effects)                      |





You can loop

`futures = [client.submit(some_heavy_func, arg) for arg in args]`


`results = client.gather(futures)   # runs many tasks concurrently`

This is the principle of chaining

`first = client.submit(step_one, data)`

`second = client.submit(step_two, first)`

`third = client.submit(step_three, second)`

In [1]:
# Simple Code Example
# Set up a Dask client (local threads by default)
from dask.distributed import Client
client = Client() 


In [2]:
import time

def inc(x):
    time.sleep(1)
    return x + 1

def dec(x):
    time.sleep(1)
    return x - 1

def add(x, y):
    time.sleep(1)
    return x + y



In [3]:
%%time 
inc(10)

CPU times: user 35.3 ms, sys: 14 ms, total: 49.3 ms
Wall time: 1 s


11

In [4]:
%%time

results = []
for x in range(10):
    result = inc(x)
    result = dec(result)
    results.append(result)

CPU times: user 771 ms, sys: 277 ms, total: 1.05 s
Wall time: 20.1 s


In [5]:
%%time

results = []
for x in range(10):
    result = client.submit(inc, x)
    result = client.submit(dec, result)
    results.append(result)
    
final_results = client.gather(results)

CPU times: user 166 ms, sys: 51.4 ms, total: 218 ms
Wall time: 3.05 s


In [6]:
import time
def timed(fn):
    """Return a new function that logs execution time for *fn*."""
    def wrapper(*args, **kwargs):
        start = time.time()
        print(f"[{time.strftime('%H:%M:%S', time.localtime(start))}] START {fn.__name__}{args}")
        result = fn(*args, **kwargs)          # <-- actual work
        end = time.time()
        elapsed = end - start
        print(
            f"[{time.strftime('%H:%M:%S', time.localtime(end))}] "
            f"END   {fn.__name__} – took {elapsed:.3f}s"
        )
        return result
    return wrapper

In [7]:

# Define a regular Python function
def slow_square(x):
    """Pretend this is a heavy computation."""
    import time
    time.sleep(1)               # simulate work
    return x * x


timed_slow_square = timed(slow_square)

# Submit a few tasks and get futures
futures = [client.submit(timed_slow_square, i) for i in range(5)]




[20:48:23] START slow_square(0,)


In [8]:
# Do something else while tasks run 
print("Tasks submitted - doing other very important work...")

# Gather results (blocks until all are done)
results = client.gather(futures)
print("Results:", results)  

Tasks submitted - doing other very important work...
[20:48:23] START slow_square(1,)
[20:48:23] START slow_square(2,)
[20:48:23] START slow_square(3,)
[20:48:23] START slow_square(4,)
[20:48:24] END   slow_square – took 1.005s
[20:48:24] END   slow_square – took 1.001s
[20:48:24] END   slow_square – took 1.005s
[20:48:24] END   slow_square – took 1.001s
[20:48:24] END   slow_square – took 1.003s
Results: [0, 1, 4, 9, 16]



## What Happens Under the Hood?
`client.submit` serialises methods and its argument, sends them to the scheduler.
The scheduler assigns each task to a worker thread/process.
Each Future tracks the task’s state (pending → running → finished).
`client.gather` pulls the results back to the driver process.

In [9]:
## Chaining Futures (Simple Pipeline)
def add_one(x):
    return x + 1

timed_add_one = timed(add_one)

# Submit first stage
f0 = client.submit(timed_slow_square, 3)          # → 9 after ~1 s

# Chain a second stage that runs after f0 completes
f1 = client.submit(timed_add_one, f0)             # Dask knows to wait for f0

print(f1.result())  # prints 10

[20:48:24] START add_one(9,)
[20:48:24] END   add_one – took 0.000s
10


Notice: By passing a future (f0) as an argument to another submit, Dask automatically creates a dependency graph.

## Tips
| Issue                                                                                                 | Recommendation                                                                                           |
|-------------------------------------------------------------------------------------------------------|----------------------------------------------------------------------------------------------------------|
| Blocking too early – calling `.result()` on every future defeats parallelism                         | Submit all tasks first, then gather or wait                                                             |
| Large data movement – returning huge objects can overwhelm the driver                                 | Keep data on workers; use `persist()` or write to disk if needed                                          |
| Forgot to close the client – stray processes linger                                                    | Use a context manager: `with Client() as client:`                                                       |
| Debugging failures – exceptions are re‑raised when you call `.result()`                               | Inspect `future.exception()` or view the dashboard for stack traces                                      |

## Quick Recap
Futures give you asynchronous, fine‑grained control over Dask tasks.
Use `client.submit` for individual jobs, `client.map` for bulk submission.
Chain futures to build dynamic pipelines without pre‑defining a full DAG.
Always submit first, then retrieve results (or wait) to maximise parallelism.



In [17]:
client.close()