# Performance optimizations

This notebook describes two available performance optimizations for instances of the `Potential` class:

* **Autobatching**: Batches concurrent requests to the `complete`, `prefix` and `logw_next` methods.
* **Multiprocessing**: Runs multiple instances of a `Potential` in parallel across different CPU cores.


In [1]:
import time
import asyncio
import numpy as np
from arsenal.timer import timeit

## Autobatching concurrent requests

Autobatching can be used to improve the performance of a `Potential` class in which the batch methods (`batch_complete` and `batch_prefix`) are much more efficient than sequentially running the individual instance methods. 

Consider the following `Potential` class, in which the `complete` and `prefix` methods each sleep for 0.5 seconds, but the batch methods each sleep for 0.55 seconds.


In [2]:
from genlm_control.potential import Potential


class TimedPotential(Potential):
    async def complete(self, context):
        time.sleep(0.5)
        return len(context)

    async def prefix(self, context):
        time.sleep(0.5)
        return len(context)

    # Batched methods are much quicker than sequentially
    # calling the instance methods.

    async def batch_complete(self, contexts):
        time.sleep(0.55)
        return [len(context) for context in contexts]

    async def batch_prefix(self, contexts):
        time.sleep(0.55)
        return [len(context) for context in contexts]

    def __repr__(self):
        return "TimedPotential()"


potential = TimedPotential(list(range(256)))  # Vocabulary of bytes.

INFO 02-25 15:21:36 __init__.py:183] Automatically detected platform cuda.


The `to_autobatched()` method creates a wrapper around any `Potential` that automatically batches concurrent requests. When multiple requests are made concurrently (like using `asyncio.gather`), the wrapper collects these requests in the background and processes them together using the potential's batch methods. This happens transparently - you don't need to change how you write your code, just wrap your potential with `to_autobatched()`.

In [3]:
autobatched = potential.to_autobatched()
autobatched

AutoBatchedPotential(TimedPotential())

In [4]:
sequences = [b"hello", b"cats", b"foo", b"fy"]

# Concurrent requests to complete will be automatically batched
# and processed by the batch_complete method.

with timeit("without autobatching"):
    results = await asyncio.gather(*(potential.complete(seq) for seq in sequences))

with timeit("with autobatching"):
    results_autobatched = await asyncio.gather(
        *(autobatched.complete(seq) for seq in sequences)
    )

In [None]:
# Results are the same whether we use autobatching or not.
results, results_autobatched

([5, 4, 3, 2], [5, 4, 3, 2])

## CPU Parallelization

CPU parallelization can be used to improve the performance of a `Potential` class whose methods are compute-intensive. For example, if your potential performs heavy computation for each request to `complete`, `prefix` or `logw_next`, running it across multiple cores can significantly reduce processing time.

In [None]:
class TimedPotential(Potential):
    async def complete(self, context):
        time.sleep(1)
        return len(context)

    async def prefix(self, context):
        time.sleep(1)
        return len(context)

    # These are the default implementations of batch_complete and batch_prefix
    # which subclasses inherit. We repeat them here for clarity.
    async def batch_complete(self, contexts):
        results = await asyncio.gather(
            *(self.complete(context) for context in contexts)
        )
        return np.array(results)

    async def batch_prefix(self, contexts):
        results = await asyncio.gather(*(self.prefix(context) for context in contexts))
        return np.array(results)

    def spawn(self):
        return TimedPotential(self.decode)


potential = TimedPotential(list(range(256)))

The `to_multiprocess()` method creates a wrapper that runs multiple instances of a potential in parallel across different CPU cores. When you call this method with a specified number of workers, it creates a process pool where each worker contains its own instance of the potential. Requests are then automatically distributed across these workers, allowing for parallel processing.

Note that for multiprocessing to work, the potential must implement a picklable `spawn` method, which creates a new instance of the potential. This is true for only some of the built-in `Potential` classes, and if you are using a custom potential you will need to implement this method.

In [None]:
mp_potential = potential.to_multiprocess(num_workers=2)
mp_potential

MultiProcPotential(self.num_workers=2)

Multiprocessing improves performance for both the batched (`batch_complete`, `batch_prefix`, `batch_logw_next`) and unbatched (`complete`, `prefix`, `logw_next`) methods. In the batched case, each request in the batch is processed in parallel across different workers. For individual method calls (like `complete` or `prefix`), each request is sent to an available worker process and executed independently, allowing multiple requests to run in parallel without blocking each other.

Here we compare the performance of the batched methods.

In [None]:
with timeit("without multiprocessing"):
    results = await potential.batch_complete(sequences)

with timeit("with multiprocessing"):
    results_mp = await mp_potential.batch_complete(sequences)

without multiprocessing (4.0025 sec)
with multiprocessing (2.0028 sec)


In [None]:
results, results_mp

([5, 4, 3, 2], array([5, 4, 3, 2]))

And here we compare the performance of the unbatched methods.

In [None]:
with timeit("without multiprocessing"):
    results = await asyncio.gather(*(potential.complete(seq) for seq in sequences))

with timeit("with multiprocessing"):
    results_mp = await asyncio.gather(
        *(mp_potential.complete(seq) for seq in sequences)
    )

without multiprocessing (4.0008 sec)
with multiprocessing (2.0021 sec)


In [None]:
results, results_mp

([5, 4, 3, 2], [5, 4, 3, 2])